Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • NicoBxl
    not just another member
    • Aug 2010
    • 264

    filter sequence by length

    Hi,

    To filter sequencesby length from a fasta file, I'm using now a bioperl script. But is there an another method (faster) ?

    here's my little script


    Code:
    #!/usr/bin/perl
    
    use warnings;
    use strict;
    use Bio::SeqIO;
    
    
    my $file = $ARGV[0]; 
    my $min = $ARGV[1];
    my $max = $ARGV[2];
    my $out = $ARGV[3];
    
    open (FILE, ">>$out") or die ("Error : Cannot open file $out for writing..!\n");
    
    my $seq_in  = Bio::SeqIO->new( -format => 'fasta',-file => $file);
    
    while( my $seq1 = $seq_in->next_seq() ) {	
    	
    	my $id  = $seq1->primary_id;
    	chomp $id;
    	my $seq = $seq1->seq;
    	chomp $seq;
    	my $lseq = length($seq);
    	if($lseq>=$min && $lseq <=$max){
    		print FILE ">",$id,"\n",$seq,"\n";	
    	}
    }
  • Giorgio C
    Member
    • Oct 2010
    • 89

    #2
    You could try Galaxy:

    Galaxy is a community-driven web-based analysis platform for life science research.


    It's very fast and you can do different type of analysis.

    Comment

    • NicoBxl
      not just another member
      • Aug 2010
      • 264

      #3
      Thanks,

      is it possible to have this tool in command line ?

      Comment

      • Giorgio C
        Member
        • Oct 2010
        • 89

        #4
        From Galaxy i don't think is possible. But probably if you search you may find a similar script.

        Comment

        • maasha
          Senior Member
          • Apr 2009
          • 153

          #5
          With Biopieces (www.biopieces.org) you can do:

          Code:
          read_fasta -i test.fna | grab -e 'SEQ_LEN > 10' | grab -e 'SEQ_LEN <= 100' | write_fasta -x
          Cheers,


          Martin

          Comment

          • tnabtaf
            Member
            • Jan 2011
            • 53

            #6
            Originally posted by Giorgio C View Post
            From Galaxy i don't think is possible. But probably if you search you may find a similar script.
            This is a command line tool that is included in the Galaxy distribution. I've included the code below.

            To get any tool that is included with Galaxy, follow the instructions at http://getgalaxy.org/. Tools are defined in the tools directory. Many will have dependencies (although the one below does not).

            Code:
            #!/usr/bin/env python
            """
            Input: fasta, minimal length, maximal length
            Output: fasta
            Return sequences whose lengths are within the range.
            """
            
            import sys, os
            
            assert sys.version_info[:2] >= ( 2, 4 )
            
            def stop_err( msg ):
                sys.stderr.write( msg )
                sys.exit()
            
            def __main__():
                input_filename = sys.argv[1]
                try:
                    min_length = int( sys.argv[2] )
                except:
                    stop_err( "Minimal length of the return sequence requires a numerical value." )
                try:
                    max_length = int( sys.argv[3] )
                except:
                    stop_err( "Maximum length of the return sequence requires a numerical value." )
                output_filename = sys.argv[4]
                output_handle = open( output_filename, 'w' )
                tmp_size = 0 #-1
                tmp_buf = ''
                at_least_one = 0
                for line in file(input_filename):
                    if not line or line.startswith('#'):
                        continue
                    if line[0] == '>':
                        if min_length <= tmp_size <= max_length or (min_length <= tmp_size and max_length == 0):
                            output_handle.write(tmp_buf)
                            at_least_one = 1
                        tmp_buf = line
                        tmp_size = 0                                                       
                    else:
                        if max_length == 0 or tmp_size < max_length:
                            tmp_size += len(line.rstrip('\r\n'))
                            tmp_buf += line
                # final flush of buffer
                if min_length <= tmp_size <= max_length or (min_length <= tmp_size and max_length == 0):
                    output_handle.write(tmp_buf.rstrip('\r\n'))
                    at_least_one = 1
                output_handle.close()
                if at_least_one == 0:
                    print "There is no sequence that falls within your range."
            
            if __name__ == "__main__" : __main__()

            Comment

            • Giorgio C
              Member
              • Oct 2010
              • 89

              #7
              Very useful. Thanks

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              36 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              99 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              120 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              113 views
              0 reactions
              Last Post SEQadmin2  
              Working...