View Single Post
Old 09-10-2010, 04:15 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by sandhya View Post
Dear all,

I have recently started work on sequenced data. We have paired-end reads from Illumina in Fastq format and I had some questions about these.

1. In the fastq format, what do the numbers in the 1st line mean?
@0:1:1:34:429
GAAGNAAAAATAAAAGCATTAGNAGAAATTTGTACA
+
IIII$IIIII&IIIIIIIIIII$IIIIIIIIIIIII
In general FASTQ identifiers like FASTA identifiers can mean anything. In this case, they tell you about where on the slide this read came from, see:
http://en.wikipedia.org/wiki/FASTQ_f...ce_identifiers

Quote:
Originally Posted by sandhya View Post
2. I see that these numbers (or 1st lines) always have a one-to-one mapping between the 2 paired datasets (ie for left and right reads). Therefore is it right to say that the 1st entry in dataset1 (of left reads) is paired with the 1st entry in dataset2 (of right reads) and likewise?

Please advice.
Yes, there should be a one-to-one mapping between the forward reads file and the reverse reads file. i.e. Same fragments in same order.

P.S. It is also common for the Illumina forward reads to have a /1 suffix, and the reverse reads to have a /2 suffix. Yours don't for some reason.
maubp is offline   Reply With Quote