SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract sequence from multi fasta file with PERL andreitudor Bioinformatics 27 07-07-2019 07:45 AM
extract data from fasta-files with perl?? anna_ Bioinformatics 20 02-17-2016 07:29 AM
perl : Remove redundant feature in fasta file StephaniePi83 Bioinformatics 9 12-15-2012 06:01 PM
Editing fasta , reference base in snp calling samtools moriah Bioinformatics 2 08-09-2011 11:11 PM
Stupid perl scripts for converting colour-space <-> base-space gringer Bioinformatics 7 07-20-2011 07:35 AM

Reply
 
Thread Tools
Old 02-01-2012, 12:34 PM   #1
njh_TO
Junior Member
 
Location: Toronto

Join Date: Nov 2011
Posts: 4
Default Perl: get specific base from FASTA file.

I've written a piece of Perl code that retrieves a base from a FASTA file, given the contig label (e.g. chr1) and base position. However- I think it runs slowly, and am sure is not optimal: mainly because requests are usually made in an ordered manner, which is not reflected by my code.

Is anyone aware of a function that has already been written to do this well?
njh_TO is offline   Reply With Quote
Old 02-01-2012, 12:36 PM   #2
Dameon
Member
 
Location: St. Louis, MO - USA

Join Date: Dec 2011
Posts: 14
Default

samtools faidx /path/to/your/genome.fa chr1:base_position
Dameon is offline   Reply With Quote
Old 02-01-2012, 12:47 PM   #3
njh_TO
Junior Member
 
Location: Toronto

Join Date: Nov 2011
Posts: 4
Default

Fast response- thanks
njh_TO is offline   Reply With Quote
Old 02-01-2012, 12:59 PM   #4
Dameon
Member
 
Location: St. Louis, MO - USA

Join Date: Dec 2011
Posts: 14
Default

No problem. I was just incorporating that little bit into one of my perl scripts yesterday. Glad it could help.
Dameon is offline   Reply With Quote
Old 02-01-2012, 01:09 PM   #5
njh_TO
Junior Member
 
Location: Toronto

Join Date: Nov 2011
Posts: 4
Default

FYI:
using Benchmark:

Rate mine samtools
mine 111/s -- -92%
samtools 1449/s 1201% --
njh_TO is offline   Reply With Quote
Old 02-02-2012, 02:18 AM   #6
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Are you using Bio:: DB ::Fasta? (Apologies for the spaces; otherwise the I get a smilie version: Bio:B::Fasta)

For your benchmark, are you calling samtools repeatedly or once? If you have multiple requests, better to batch them up and call samtools -- there's an overhead in process creation and probably in firing up samtools.
krobison is offline   Reply With Quote
Old 02-02-2012, 05:34 AM   #7
njh_TO
Junior Member
 
Location: Toronto

Join Date: Nov 2011
Posts: 4
Default

Thanks. No, I'm not- but I'll take a look.

Not sure how you would go about bundling these requests, is it possible to pass samtools an array of positions, returning a corresponding array of bases?

Benchmarking calls the following function X times.

Currently looks something like:

sub getFASTABaseSamtools
{
my $chrID = shift;
my $basepos = shift;
my $fastapath = shift;
my @result = `samtools faidx $fastapath $chrID:$basepos-$basepos`;
chomp($result[1]);
return uc($result[1]);
}
njh_TO is offline   Reply With Quote
Reply

Tags
fasta, perl

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO