![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
change order of FASTA seqs, based on ID list | ssully | General | 2 | 04-23-2013 01:22 AM |
Split a SAM file | rahul | Bioinformatics | 6 | 12-20-2011 12:12 PM |
split a fastq file | lfaino | Bioinformatics | 4 | 04-14-2011 04:28 PM |
Split fastq to fasta and qual file? | ewilbanks | Bioinformatics | 8 | 01-07-2011 03:02 AM |
Split GA FASTQ file | aritakum | Bioinformatics | 3 | 06-10-2010 05:15 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: quebec Join Date: Apr 2013
Posts: 35
|
![]()
Hi ALL,
I have a fasta file and I want to split it in two two fasta files according to a list of sequence names in a text file (one seq name per line). So those seqs which have a match with the sequences names can be output to one fasta file and the others in another file. Could anybody provide me a script or some programs to perform this work? There are some online tools, but it would take a large amount of time to upload my file. Thanks. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: sub-surface moon base Join Date: Apr 2013
Posts: 372
|
![]()
If your sequences aren't split to multiple lines you can do this with grep. I think:
grep -A 1 -f yourSeqIDFile.txt yourFastaFile.fasta > SeqsFromIDList.fasta grep -A 1 -v -f yourSeqIDFile.txt yourFastaFile.fasta > TheOtherSeqs.fasta might remember wrong.. If you have QIIME, you can do this with filter_fasta.py.. Last edited by rhinoceros; 08-13-2013 at 09:56 AM. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
Here is a script I wrote a while back to almost do what you want. It takes as input a FASTA file, a text file with a list of sequence IDs (one per line) and a mode argument to include or exclude the IDs in your list from the output. You could simply run the script twice, once in each mode to get the two complementary outputs, or if you feel like it modify the code to generate two output files. As it works now output is written to STDOUT so you can only capture one output by redirecting STDOUT to a file.
Code:
Usage: % subSetFasta.pl -f <fastaFileName> -l <listFileName> -m [i or e] Example: % subSetFasta.pl -f mySeqs.fasta -l myList.txt -m i > inList.fasta % subSetFasta.pl -f mySeqs.fasta -l myList.txt -m e > notInList.fasta A note about ID matching: the script bases a match on the first non-white space delimited text on the defline. If your defline is: Code:
>sequenceID sequence description follows Last edited by kmcarr; 08-13-2013 at 11:16 AM. Reason: Add note about default mode. |
![]() |
![]() |
![]() |
#4 | |
Member
Location: Toronto Join Date: Jan 2011
Posts: 30
|
![]() Quote:
Hopefully it will do the job you need. J Last edited by JohnN; 08-13-2013 at 11:19 AM. Reason: Wrong URL |
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: quebec Join Date: Apr 2013
Posts: 35
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#6 | |
Member
Location: quebec Join Date: Apr 2013
Posts: 35
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#7 |
@jamimmunology
Location: London Join Date: Nov 2012
Posts: 96
|
![]()
In case anyone needed more alternatives, you can also use fastq_select.tcl which is bundled in with mira. This also got discussed in an earlier thread, which might be useful.
|
![]() |
![]() |
![]() |
#8 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
If you want a Galaxy solution, try this:
http://toolshed.g2.bx.psu.edu/view/p...q_filter_by_id Or this related but subtly different tool which pulls out the reads in the ID order given http://toolshed.g2.bx.psu.edu/view/p...q_select_by_id |
![]() |
![]() |
![]() |
#9 | |
Member
Location: quebec Join Date: Apr 2013
Posts: 35
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#10 | |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]() Quote:
http://toolshed.g2.bx.psu.edu/view/p...q_filter_by_id There is a preview/mockup of the tool available to view within the Tool Shed which should help explain this. |
|
![]() |
![]() |
![]() |
Tags |
fasta file |
Thread Tools | |
|
|