SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
extract reads matching barcodes from fastq file? odoyle81 Bioinformatics 7 12-03-2014 12:52 PM
Extract gene sequences from gff3 file and reference fasta JonB Bioinformatics 1 07-15-2014 01:13 AM
How to use coordinates in order to extract sequences in FASTA file? prs321 Bioinformatics 1 09-14-2013 10:07 AM
Extract subset of Fastq sequences based on a list of IDs pepperoni Bioinformatics 36 05-06-2013 02:38 AM
Extract index reads from raw Fastq file ostrakon Bioinformatics 6 02-13-2013 01:54 PM

Reply
 
Thread Tools
Old 08-14-2014, 12:39 PM   #1
caputcastellae
Junior Member
 
Location: canada

Join Date: Apr 2010
Posts: 1
Default Extract sequences from a FASTQ file based on another file

I have two files:

file 1: fastaq XXGB file
file 2: list of a few FASTQ ID headers formatted as follows:

@M01032:192:000000000-AAEFT:1:1101:10100:1004 2:N:0:1
@M01032:192:000000000-AAEFT:1:1101:10100:1004 1:N:0:1

Is there an easy way (perl?) to generate a third file with the info (including the ID headers of the targeted sequences (file 2)?

Thanks!
caputcastellae is offline   Reply With Quote
Old 08-14-2014, 01:03 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

There are probably more elegant ways of doing this but you can try this now. With that said (in tcshell and assuming that your sequences do not wrap around)

Code:
$ foreach i (`cat ./file_of_ID_you_need`)
foreach? grep ^$i -A 3 your_sequence_file >> new_sequence_file
foreach? end
GenoMax is offline   Reply With Quote
Old 08-14-2014, 02:10 PM   #3
gandalf886
Member
 
Location: Ipswich, MA

Join Date: Feb 2013
Posts: 11
Default

perhaps this?
Code:
grep --no-group-separator -A 3 -Fxf file_2 file_1 > output
Quote:
Originally Posted by GenoMax View Post
There are probably more elegant ways of doing this but you can try this now. With that said (in tcshell and assuming that your sequences do not wrap around)

Code:
$ foreach i (`cat ./file_of_ID_you_need`)
foreach? grep ^$i -A 3 your_sequence_file >> new_sequence_file
foreach? end
gandalf886 is offline   Reply With Quote
Old 08-14-2014, 02:39 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

If sequence file is compressed then extending gandalf886's suggestion
Code:
$ zmore sample.fastq.gz | grep -A 3 -Fxf  file_of_IDs > new_sequence_file
or

Code:
$ zcat sample.fastq.gz | grep -A 3 -Fxf  file_of_IDs > new_sequence_file
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO