Seqanswers Leaderboard Ad

**Giorgio C** · 09-09-2011, 12:22 AM

You could try Galaxy:

Galaxy

http://main.g2.bx.psu.edu/root?tool_id=fasta_filter_by_length

Galaxy is a community-driven web-based analysis platform for life science research.

It's very fast and you can do different type of analysis.

**NicoBxl** · 09-09-2011, 12:23 AM

Thanks,

is it possible to have this tool in command line ?

**Giorgio C** · 09-09-2011, 12:57 AM

From Galaxy i don't think is possible. But probably if you search you may find a similar script.

**maasha** · 09-09-2011, 01:27 AM

With Biopieces (www.biopieces.org) you can do:

Code:

read_fasta -i test.fna | grab -e 'SEQ_LEN > 10' | grab -e 'SEQ_LEN <= 100' | write_fasta -x

Cheers,

Martin

**tnabtaf** · 09-09-2011, 09:08 AM

Originally posted by Giorgio C View Post

From Galaxy i don't think is possible. But probably if you search you may find a similar script.

This is a command line tool that is included in the Galaxy distribution. I've included the code below.

To get any tool that is included with Galaxy, follow the instructions at http://getgalaxy.org/. Tools are defined in the tools directory. Many will have dependencies (although the one below does not).

Code:

#!/usr/bin/env python
"""
Input: fasta, minimal length, maximal length
Output: fasta
Return sequences whose lengths are within the range.
"""

import sys, os

assert sys.version_info[:2] >= ( 2, 4 )

def stop_err( msg ):
    sys.stderr.write( msg )
    sys.exit()

def __main__():
    input_filename = sys.argv[1]
    try:
        min_length = int( sys.argv[2] )
    except:
        stop_err( "Minimal length of the return sequence requires a numerical value." )
    try:
        max_length = int( sys.argv[3] )
    except:
        stop_err( "Maximum length of the return sequence requires a numerical value." )
    output_filename = sys.argv[4]
    output_handle = open( output_filename, 'w' )
    tmp_size = 0 #-1
    tmp_buf = ''
    at_least_one = 0
    for line in file(input_filename):
        if not line or line.startswith('#'):
            continue
        if line[0] == '>':
            if min_length <= tmp_size <= max_length or (min_length <= tmp_size and max_length == 0):
                output_handle.write(tmp_buf)
                at_least_one = 1
            tmp_buf = line
            tmp_size = 0                                                       
        else:
            if max_length == 0 or tmp_size < max_length:
                tmp_size += len(line.rstrip('\r\n'))
                tmp_buf += line
    # final flush of buffer
    if min_length <= tmp_size <= max_length or (min_length <= tmp_size and max_length == 0):
        output_handle.write(tmp_buf.rstrip('\r\n'))
        at_least_one = 1
    output_handle.close()
    if at_least_one == 0:
        print "There is no sequence that falls within your range."

if __name__ == "__main__" : __main__()

**Giorgio C** · 09-09-2011, 11:00 AM

Very useful. Thanks

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 23 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

filter sequence by length

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News