View Single Post
Old 08-13-2013, 12:49 PM   #6
Location: quebec

Join Date: Apr 2013
Posts: 35

Originally Posted by kmcarr View Post
Here is a script I wrote a while back to almost do what you want. It takes as input a FASTA file, a text file with a list of sequence IDs (one per line) and a mode argument to include or exclude the IDs in your list from the output. You could simply run the script twice, once in each mode to get the two complementary outputs, or if you feel like it modify the code to generate two output files. As it works now output is written to STDOUT so you can only capture one output by redirecting STDOUT to a file.


% -f <fastaFileName> -l <listFileName> -m [i or e]


% -f mySeqs.fasta -l myList.txt -m i > inList.fasta
% -f mySeqs.fasta -l myList.txt -m e > notInList.fasta
If you do not specify a -mode argument the script defaults to the 'include' mode.

A note about ID matching: the script bases a match on the first non-white space delimited text on the defline. If your defline is:

>sequenceID sequence description follows
The script will only attempt to match 'sequenceID' so make sure that is the text in list file.
Thanks very much. The script works perfectly!
lran2008 is offline   Reply With Quote