![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Geneid can predict differnet genes when the length of input sequence are different? | bio_tt | Bioinformatics | 0 | 01-06-2012 04:47 PM |
Illumina adaptor sequence filter | slny | Bioinformatics | 1 | 04-21-2011 12:25 PM |
Roche gsMapper output exon contigs rather than full-length sequence? | sulicon | Bioinformatics | 0 | 02-28-2011 04:51 PM |
samtools view error: CIGAR and sequence length are inconsistent (tophat/bowtie) | glacierbird | Bioinformatics | 2 | 06-29-2010 01:58 AM |
Help:primer and low complexity sequence filter | alvin1982 | Illumina/Solexa | 0 | 04-21-2010 07:05 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 264
|
![]()
Hi,
To filter sequencesby length from a fasta file, I'm using now a bioperl script. But is there an another method (faster) ? here's my little script Code:
#!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; my $file = $ARGV[0]; my $min = $ARGV[1]; my $max = $ARGV[2]; my $out = $ARGV[3]; open (FILE, ">>$out") or die ("Error : Cannot open file $out for writing..!\n"); my $seq_in = Bio::SeqIO->new( -format => 'fasta',-file => $file); while( my $seq1 = $seq_in->next_seq() ) { my $id = $seq1->primary_id; chomp $id; my $seq = $seq1->seq; chomp $seq; my $lseq = length($seq); if($lseq>=$min && $lseq <=$max){ print FILE ">",$id,"\n",$seq,"\n"; } } |
![]() |
![]() |
![]() |
#2 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
You could try Galaxy:
http://main.g2.bx.psu.edu/root?tool_...lter_by_length It's very fast and you can do different type of analysis. |
![]() |
![]() |
![]() |
#3 |
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 264
|
![]()
Thanks,
is it possible to have this tool in command line ? |
![]() |
![]() |
![]() |
#4 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
From Galaxy i don't think is possible. But probably if you search you may find a similar script.
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Denmark Join Date: Apr 2009
Posts: 153
|
![]()
With Biopieces (www.biopieces.org) you can do:
Code:
read_fasta -i test.fna | grab -e 'SEQ_LEN > 10' | grab -e 'SEQ_LEN <= 100' | write_fasta -x Martin |
![]() |
![]() |
![]() |
#6 | |
Member
Location: Oregon Join Date: Jan 2011
Posts: 53
|
![]() Quote:
To get any tool that is included with Galaxy, follow the instructions at http://getgalaxy.org/. Tools are defined in the tools directory. Many will have dependencies (although the one below does not). Code:
#!/usr/bin/env python """ Input: fasta, minimal length, maximal length Output: fasta Return sequences whose lengths are within the range. """ import sys, os assert sys.version_info[:2] >= ( 2, 4 ) def stop_err( msg ): sys.stderr.write( msg ) sys.exit() def __main__(): input_filename = sys.argv[1] try: min_length = int( sys.argv[2] ) except: stop_err( "Minimal length of the return sequence requires a numerical value." ) try: max_length = int( sys.argv[3] ) except: stop_err( "Maximum length of the return sequence requires a numerical value." ) output_filename = sys.argv[4] output_handle = open( output_filename, 'w' ) tmp_size = 0 #-1 tmp_buf = '' at_least_one = 0 for line in file(input_filename): if not line or line.startswith('#'): continue if line[0] == '>': if min_length <= tmp_size <= max_length or (min_length <= tmp_size and max_length == 0): output_handle.write(tmp_buf) at_least_one = 1 tmp_buf = line tmp_size = 0 else: if max_length == 0 or tmp_size < max_length: tmp_size += len(line.rstrip('\r\n')) tmp_buf += line # final flush of buffer if min_length <= tmp_size <= max_length or (min_length <= tmp_size and max_length == 0): output_handle.write(tmp_buf.rstrip('\r\n')) at_least_one = 1 output_handle.close() if at_least_one == 0: print "There is no sequence that falls within your range." if __name__ == "__main__" : __main__() |
|
![]() |
![]() |
![]() |
#7 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Very useful. Thanks
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|