SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract subset of Fastq sequences based on a list of IDs pepperoni Bioinformatics 36 05-06-2013 01:38 AM
Your favorite web-based Gene List Tool? Richard Finney Bioinformatics 5 01-30-2013 12:59 PM
fasta sequence: 0 based or 1 based index ardmore Bioinformatics 8 11-15-2011 09:23 AM
how to find the genomic coordinate based on the amino acid change cliff Bioinformatics 0 07-07-2011 01:20 PM
For consed, how can i change a fasta file to a fake read? rucyfa Bioinformatics 14 05-18-2010 09:27 AM

Reply
 
Thread Tools
Old 04-22-2013, 08:55 AM   #1
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default change order of FASTA seqs, based on ID list

Is there any tool for doing this -- change the order of sequences in a FASTA file according to the order of a list of sequence IDs in another file?

Seaview will reorder sequences based on sequence order in a tree, but that is very specific case. I'd like something more general.

I know I could make a db of the FASTA set and use something like a batch NCBI blastdbcmd to extract the sequences in the order I want, but I'm hoping something less time consuming exists.
ssully is offline   Reply With Quote
Old 04-22-2013, 10:03 AM   #2
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 380
Default

Without being a code monkey I would have to do it a slightly long-winded way:
1. Use Galaxy web portal to change Fasta-Tabular
2.Copy and Paste this list into excel
3. Use the Match+Index function in excel to create the new list
4. save to .txt, and use Galaxy to convert to FASTA
JackieBadger is offline   Reply With Quote
Old 04-23-2013, 12:22 AM   #3
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by ssully View Post
Is there any tool for doing this -- change the order of sequences in a FASTA file according to the order of a list of sequence IDs in another file?
Hi- The script below should do what you want. Save it as reorder_fasta.py (or whatever you want) and execute it as
Code:
reorder_fasta.py seq.fasta ref.txt
. See the help in the script itself for more detail and example.

Hope this helps!
Dario

Code for reorder_fasta.py
Code:
#!/usr/bin/env python

import sys

docstring= """DESCRIPTION
    Reorder the sequences in a FASTA file according to the order given in a reference
    file. The reference file has one sequence name per line.
USAGE
    reorder_fasta.py <file.fasta> <file.reference>

----------- EXAMPLE -------------
## fasta file
echo '>second_seq
AAAAAAAAAAAA
AAAAAAAAAAAA
AAAA
>first_seq
TTTTTTTTTTTTTT
TTTTTTTTTTTTTT
TTTTTTTTT
>third_seq
CCCCCCCCCCCCCCCCCC' > seq.fasta

## reference file
echo 'first_seq
second_seq
third_seq' > ref.txt

## Reorder fasta according to reference:
reorder_fasta.py seq.fasta ref.txt
>first_seq
TTTTTTTTTTTTTT
TTTTTTTTTTTTTT
TTTTTTTTT
>second_seq
AAAAAAAAAAAA
AAAAAAAAAAAA
AAAA
>third_seq
CCCCCCCCCCCCCCCCCC
"""

if len(sys.argv) != 3:
    sys.exit(docstring)

fasta= open(sys.argv[1])
ref= open(sys.argv[2])

seq_dict= {}
while True:
    line= fasta.readline()
    if line == '':
        break
    if line.strip().startswith('>'):
        seq_name= line.strip()[1:]
        seq_dict[seq_name]= []
    else:
        seq_dict[seq_name].append(line.strip())
fasta.close()
for seq_name in ref:
    seq_name= seq_name.strip()
    print('>' + seq_name)
    print('\n'.join(seq_dict[seq_name]))
ref.close()
sys.exit()
dariober is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO