View Single Post
Old 08-20-2013, 03:02 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

Quote:
@IPAR1:2:1:4029:1196:1#0/1
This is the important part from three files that you need to be looking at. If you see the description for the fastq format (illumina sequence identifiers) that string uniquely identifies a cluster. The /1,/2,/3 on the end signify that these are R1 = forward read, R2 = Tag read and R3= Reverse read (as you have already figured out).

So for the following tag read:

@IPAR1:2:1:4029:1196:1#0/2
TGACCTTGATCTCGT
+
HIHIIGIIIH8CCDC

The two corresponding real reads are in /1 and /3 parts. In illumina pipeline the tag read is automatically taken into consideration and then added to the ID lines of the R1 and R2 (reverse read takes the R2 designation) like so

Quote:
@HWUSI-EAS100R:6:73:941:1973#NNNNN/1 (NNN= Tag)
When you split the files (either with your own script or from qiime) make sure that you add the tag sequence to the ID otherwise it may be difficult to keep track of it later on.

You should also format the files so they are in the correct fastq format

Quote:
@ID
Sequence goes on this line
+
Quality values for corresponding bases on this line
GenoMax is offline   Reply With Quote