![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
extract reads matching barcodes from fastq file? | odoyle81 | Bioinformatics | 7 | 12-03-2014 12:52 PM |
Extract gene sequences from gff3 file and reference fasta | JonB | Bioinformatics | 1 | 07-15-2014 01:13 AM |
How to use coordinates in order to extract sequences in FASTA file? | prs321 | Bioinformatics | 1 | 09-14-2013 10:07 AM |
Extract subset of Fastq sequences based on a list of IDs | pepperoni | Bioinformatics | 36 | 05-06-2013 02:38 AM |
Extract index reads from raw Fastq file | ostrakon | Bioinformatics | 6 | 02-13-2013 01:54 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: canada Join Date: Apr 2010
Posts: 1
|
![]()
I have two files:
file 1: fastaq XXGB file file 2: list of a few FASTQ ID headers formatted as follows: @M01032:192:000000000-AAEFT:1:1101:10100:1004 2:N:0:1 @M01032:192:000000000-AAEFT:1:1101:10100:1004 1:N:0:1 Is there an easy way (perl?) to generate a third file with the info (including the ID headers of the targeted sequences (file 2)? Thanks! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
There are probably more elegant ways of doing this but you can try this now. With that said (in tcshell and assuming that your sequences do not wrap around)
Code:
$ foreach i (`cat ./file_of_ID_you_need`) foreach? grep ^$i -A 3 your_sequence_file >> new_sequence_file foreach? end |
![]() |
![]() |
![]() |
#3 | |
Member
Location: Ipswich, MA Join Date: Feb 2013
Posts: 11
|
![]()
perhaps this?
Code:
grep --no-group-separator -A 3 -Fxf file_2 file_1 > output Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
If sequence file is compressed then extending gandalf886's suggestion
Code:
$ zmore sample.fastq.gz | grep -A 3 -Fxf file_of_IDs > new_sequence_file Code:
$ zcat sample.fastq.gz | grep -A 3 -Fxf file_of_IDs > new_sequence_file |
![]() |
![]() |
![]() |
Thread Tools | |
|
|