View Single Post
Old 08-20-2013, 03:02 PM   #6
Senior Member
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,088

This is the important part from three files that you need to be looking at. If you see the description for the fastq format (illumina sequence identifiers) that string uniquely identifies a cluster. The /1,/2,/3 on the end signify that these are R1 = forward read, R2 = Tag read and R3= Reverse read (as you have already figured out).

So for the following tag read:


The two corresponding real reads are in /1 and /3 parts. In illumina pipeline the tag read is automatically taken into consideration and then added to the ID lines of the R1 and R2 (reverse read takes the R2 designation) like so

@HWUSI-EAS100R:6:73:941:1973#NNNNN/1 (NNN= Tag)
When you split the files (either with your own script or from qiime) make sure that you add the tag sequence to the ID otherwise it may be difficult to keep track of it later on.

You should also format the files so they are in the correct fastq format

Sequence goes on this line
Quality values for corresponding bases on this line
GenoMax is offline   Reply With Quote