View Single Post
Old 06-19-2011, 11:06 PM   #6
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by atgc View Post
Initially the paired end read data that was given to me was two files/ sample - a forward and a reverse read. However this new data set includes 4 or more files / sample. I don't understand why this is.
If this data has been generated via CASAVA 1.8 then this is due to the fact that every fastq file generated has a constant number of sequences (except for the last one which holds the remainder), but at most 16mio. So one lane of HiSeq data is almost always splitted into more than one read (fastq) file.

E.g.
Code:
sample2_CGATGT_L003_R1_001.fastq.gz
sample2_CGATGT_L003_R2_001.fastq.gz
sample2_CGATGT_L003_R1_002.fastq.gz
sample2_CGATGT_L003_R2_002.fastq.gz
sample2_CGATGT_L003_R1_003.fastq.gz
sample2_CGATGT_L003_R2_003.fastq.gz
hth, Sven
sklages is offline   Reply With Quote