SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract fastq files of unaligned reads with Bowtie 2 Mad4Seq Bioinformatics 4 06-19-2013 09:53 PM
How to extract mapped and unmapped raw reads from bwa's sam file ? vaibhavvsk Bioinformatics 11 02-07-2013 09:01 AM
fastq file without index? lisann_5 Bioinformatics 3 10-26-2012 09:57 AM
Extract unaligned reads (Tophat) from FastQ Uwe Appelt Bioinformatics 5 08-07-2012 04:33 AM
how to extract raw unaligned reads? joseph Bioinformatics 2 12-20-2011 05:24 PM

Reply
 
Thread Tools
Old 02-13-2013, 09:58 AM   #1
ostrakon
Junior Member
 
Location: CA

Join Date: Jan 2011
Posts: 8
Default Extract index reads from raw Fastq file

We sequenced some 16S rRNA gene amplicons on our collaborator's MiSeq, and got the raw fastq file that looks like the seqs below.
It seems that there are only reads_1 and reads_2 in the file, and the index reads are missing.
In addition, the 1:N:0 or 2:N:0 in the header are missing the sample number as seen in the MiSeq fastq files.
As I need the index reads to feed the data to Qiime for further analysis, I have the following questions:
1. Is the index-reads info still in this fastq file?
2. If yes, Is there a way I can extract the index reads?
3. If no, besides asking our collaborator for the index reads (which I've done but haven't heard from them), is there any other similar program as Qiime but does not require an index read file as the input?
Thank you for any feedback.
@MISEQ04:37:000000000-A2G8E:1:1101:14157:1957 1:N:0:TCCACAGGAGT
TACAGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTTGTTAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTCAAAACTGACTGACTAGAGTATGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACA
+
?????BB?DDDDDEDDEEEEFFHIHECFFHHHIIIFHHHHIIIEHHHHEHHHAEFEHHEHHEGHHHHHHHHHHHFFFHHHHHHFFEFEFFFFFFFEEEFFFFFFFFFEFFFFFFFFEFFEFFEEFFFFFFFEEFEEFDEEEEEFFFFFFECEEEFEDDED?AACEEEEEDAEEEFEFEFEEEEEEEEEFECEE>?>?8A>;???EEFEEEFFCEE?*1::A:A0CCECA*14)48AEEEE>;;8;88:AC#
@MISEQ04:37:000000000-A2G8E:1:1101:14157:1957 2:N:0:TCCACAGGAGT
ACGGACTACCCGGGTTTCTAATCCTGTTTGCTCCCCACGCTTTCGCACCTCAGTGTCAGTATCAGTCCAGGTGGTCGCCTTCGCCACTGGTGTTCCTTCCTATATCTACGCATTTCACCGCTACACAGGAAATTCCACCACCCTCTACCATACTCTAGTCAGTCAGTTTTGAATGCAGTTCCCAGGTTGAGCCCGGGGATTTCACATCCAACTTAACAAACCACCAACCCGCGCTTTACGCCCAGCAATTC
+
?????@@BDDDDDDDDEFFFFFCFFHHHHHHGFHHHHHCDDHHDEDEHHHHHHHFEHFHGGGHHHHHHFCFHHHHF=EEEDCDEHEHHFCFHHEFHFHHFFFFFFFFFFFEEDEEDDDDEE<6@EBCEEFFFFECEEEEECEEE8:CEFFAECECEFAEFE?CEEEECAAAEAEEEFFCEFFFFEE?CEEAEFE'.8?88:?*:AE:CE?*1*:?C*?A?EAEE###########################
@MISEQ04:37:000000000-A2G8E:1:1101:14713:1991 1:N:0:TCCACAGGAGT
AACGGAGGGGGCAAGTGTTTCTCGCAATGACTGGGCCTAAAGGGCACGCAGGTGGTTTTCGACAACAGGTATTTCGGTTAAACACTGCAGGCTAACAACAGGTCTGGAATATCTACTAGGAAACTAAGAGTAGTGCTCAGGTCTTTAGAATTGCTAGCGGAGGGGTGGAATCCGGCGAGGCTAGTAGGAATGCTTATGAGTGAAGGCAATTTTCTGGAGCTGACTGACGCTCAGGTGCGCAAGCATGGGGA
+
9?????@@DDDDDDDDFFFFFFIEHHHHHHHIIHHIIHHHIHHHIHHEHHHHAEFEHIIIHHHH=FHHHC=DFHFFHEHHFFFFFFFFFDEEEFFFFFFFFFBEEFE=BEEEFFFFEEEEAECEFFFFFFCCEEFFFF?AECAEFFEEEEFFFFFEEEDD8<>DD)8>AEECEA?D?D?D>C?C??:E1?CEEAE?:CAECEAEFFFE8AEEF:?:A:8?*?*:?CAEEEEADCC*0??DD8<?ECEEEE#
@MISEQ04:37:000000000-A2G8E:1:1101:14713:1991 2:N:0:TCCACAGGAGT
ACGGACTACTGGGGTATCTAATCCTATTTGATCCCCATGCTTGCGCACCTGAGCGTCAGTCAGCTCCAGAAAATTGCCTTCACTCATAAGCATTCCTACTAGCCTCGCCGGATTCCACCCCTCCGCTAGCAATTCTAAAGACCTGAGCACTACTCTTAGTTTCCTAGTAGATATTCCAGACCTGTTGTTAGCCTGCAGTGTTTAACCGAAATACCTGTTGTCGCAAACCACCTGCGTGCCCTTTAGGCCCA
+
AAA?AABBDDDDDD<AFFFGFGHIHFFHHIIHHHHIHIIIIIHHHH@HHIIFHHHHHHHIIHHIIHHGHHFHIIIHHHFCECGHHFHIIHHHHHHHHHHHHHFHDHHHHGGGGGDEEGDEGCGGEEGGGGGGEEGGGEGEEEGCGGCGCEGGGGGGGGGEGGGGEGGEGG?CGGGGGEGGGGGGGGCGEGGCEEGGGGEECEG?C:?828<CCE?EGGGCCCC*.).CC?CEECE8CEC*11CEEE#####
@MISEQ04:37:000000000-A2G8E:1:1101:13997:2108 1:N:0:TCCACAGGAGT
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTAATTAAACCAGTTGTGAAATCCCCGGGGTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGGATTCAGCGGGTAGCCGGGAAAAGCGTAGATATGCCGAGGAAACACGGAGGCGAAGGGAATTCTCTGGAACTGGACTTGCGCTCCTGCACGAAAAGCTGGGGAGGAAACA
+
?????BB?BDDDBBBDDDEEFFHIHHHHHHHIHHHIHHHHIHHHEHECEHECEHH<<<,,,,5,,44+4C,@D,CF,,@FF);@))34AAC################################################################################################################################################################
@MISEQ04:37:000000000-A2G8E:1:1101:13997:2108 2:N:0:TCCACAGGAGT
ACGGACTACAAGGGTTTCTAATCCTGTTTGCTCCCCACGCTTTCGTGCATGAGCGTCAGTACAGGTCCAGAGGATTGCCTTCGCCATCGGTGTTCCTCCGCATATCTACGCATTTCACTGCTACACGCGGAATTCCATCCCCCTCTACCGTACTCTAGCTATACAGTCACAGATGCAATTCCCAGGTTGAGCCCGGGGATTTCACAACTGTCTTATATAACCGCCTGCGCACGCTTTACGCCCAGCAATTC
+
?????@@BDDDBDD?BEFFFFFFHIIHHHHHIIHHHIC=DDFFGHHFHHIIIHFCCEEHGHIHHH-AEFHDDFFHHHFGGFFHHHHHFHECDEEDHHDFCDEDDFFDFFF@DDED=DEED=,ACFFAEDEDDAEFFFFE?C??8EEEF:8).:AAAEF?CEAECEA?:::CC:?EEEFFE?CCECE*?*:?ADDD84)*1:?EEEECA*00::*::CE:?>'.A?EDD;''')08*AEAD48?######
ostrakon is offline   Reply With Quote
Old 02-13-2013, 10:01 AM   #2
ostrakon
Junior Member
 
Location: CA

Join Date: Jan 2011
Posts: 8
Default

BTW, another question:
1. Do the sequences look like they've already been de-multiplexed?
Also, I am suspecting the index reads are missing b/c I only saw long reads like the ones shown above, but not the shorter ones seen elsewhere.
ostrakon is offline   Reply With Quote
Old 02-13-2013, 10:22 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

The reads are already de-multiplexed. The tag for the particular sample is appended to the end of the ID line (example below from your data)

@MISEQ04:37:000000000-A2G8E:1:1101:13997:2108 2:N:0:TCCACAGGAGT

The two reads (pairs R1--> <--R2) also appear to have been already interlaced.

Last edited by GenoMax; 02-13-2013 at 10:32 AM.
GenoMax is offline   Reply With Quote
Old 02-13-2013, 10:30 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

I assume you have multiple sample files? Each file should have the sample_ID (somewhere in the file name).
GenoMax is offline   Reply With Quote
Old 02-13-2013, 11:34 AM   #5
ostrakon
Junior Member
 
Location: CA

Join Date: Jan 2011
Posts: 8
Default

Quote:
Originally Posted by GenoMax View Post
I assume you have multiple sample files? Each file should have the sample_ID (somewhere in the file name).
Thanks GenoMax. For this raw Fastq file I got the sequences from, it is a pooled sample of multiple indexed-samples.
ostrakon is offline   Reply With Quote
Old 02-13-2013, 11:47 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

Quote:
Originally Posted by ostrakon View Post
Thanks GenoMax. For this raw Fastq file I got the sequences from, it is a pooled sample of multiple indexed-samples.
So hopefully you have the mapping of "tags" <--> "sample_ID" and can split the original file, if needed.
GenoMax is offline   Reply With Quote
Old 02-13-2013, 12:54 PM   #7
ostrakon
Junior Member
 
Location: CA

Join Date: Jan 2011
Posts: 8
Default

Quote:
Originally Posted by GenoMax View Post
So hopefully you have the mapping of "tags" <--> "sample_ID" and can split the original file, if needed.
Thank you again, GenoMax. I have the mapping info. I am very glad that the Qiime guys have resolved this issue. They made a python script to parse the raw fastq files, extract the index, and add fake quality scores to make an index read file. I am now happily splitting libraries using the mapping info and the index read file generated this way.

Here is the reference in case anyone has a similar fastq file as mine:
https://groups.google.com/forum/?fro...um/0O2pMsmORCQ
https://groups.google.com/forum/?fromgroups=#!topicsearchin/qiime-forum/mariana$20illumina$20sequencing/qiime-forum/TJOtt13jlPs
ostrakon is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:19 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO