Seqanswers Leaderboard Ad

**TiborNagy** · 03-25-2014, 05:57 AM

It is depends on the experiment. If your samples have different conditions, you can not combine them. If you use the original files you can use as replicates (more statistical power).

**GenoMax** · 03-25-2014, 06:06 AM

As far as data files are concerned for a single sample (on a flowcell) having them in many small pieces or just a single large file is equivalent. One can set up illumina CASAVA pipeline to generate a single file (instead of the ~2 M sequence file chunks that are produced by default).

**shirley0818** · 03-25-2014, 06:39 AM

Thanks both of you for your quick reply.
GenoMax, the protocol used in our project is 7 samples mixture each lane from lane 1-7 in each flowcell. For each sample, there will be data generated from Lane 1-7, and within each lane, there are multiple small (~300Mb) sequence file chunks as shown below. Can one still set up illumina CASAVA pipeline to generate a single file which is equivalent to the following many small pieces? Thanks a lot!

300103469 Mar 19 9:48 2894_CCTTCA_L001_R1_001.fastq.gz
299267851 Mar 19 9:47 2894_CCTTCA_L001_R1_002.fastq.gz
296812322 Mar 19 9:53 2894_CCTTCA_L001_R1_003.fastq.gz
298068175 Mar 19 9:56 2894_CCTTCA_L001_R1_004.fastq.gz
298941666 Mar 19 9:59 2894_CCTTCA_L001_R1_005.fastq.gz
297368542 Mar 19 10:00 2894_CCTTCA_L001_R1_006.fastq.gz
295074828 Mar 19 10:02 2894_CCTTCA_L001_R1_007.fastq.gz
27339550 Mar 19 10:02 2894_CCTTCA_L001_R1_008.fastq.gz
299788150 Mar 19 9:48 2894_CCTTCA_L002_R1_001.fastq.gz
297005199 Mar 19 9:49 2894_CCTTCA_L002_R1_002.fastq.gz
299336456 Mar 19 9:51 2894_CCTTCA_L002_R1_003.fastq.gz
298957127 Mar 19 9:55 2894_CCTTCA_L002_R1_004.fastq.gz
298370958 Mar 19 9:57 2894_CCTTCA_L002_R1_005.fastq.gz
296303213 Mar 19 10:00 2894_CCTTCA_L002_R1_006.fastq.gz
297318084 Mar 19 10:01 2894_CCTTCA_L002_R1_007.fastq.gz
56309336 Mar 19 9:48 2894_CCTTCA_L002_R1_008.fastq.gz
299490670 Mar 19 10:02 2894_CCTTCA_L003_R1_001.fastq.gz
298204197 Mar 19 9:48 2894_CCTTCA_L003_R1_002.fastq.gz
298381878 Mar 19 9:52 2894_CCTTCA_L003_R1_003.fastq.gz
298207558 Mar 19 9:54 2894_CCTTCA_L003_R1_004.fastq.gz
297211698 Mar 19 9:57 2894_CCTTCA_L003_R1_005.fastq.gz
296272949 Mar 19 10:00 2894_CCTTCA_L003_R1_006.fastq.gz
295333326 Mar 19 10:01 2894_CCTTCA_L003_R1_007.fastq.gz
25252928 Mar 19 9:47 2894_CCTTCA_L003_R1_008.fastq.gz
298636337 Mar 19 9:46 2894_CCTTCA_L004_R1_001.fastq.gz
298401494 Mar 19 9:49 2894_CCTTCA_L004_R1_002.fastq.gz
298056832 Mar 19 9:52 2894_CCTTCA_L004_R1_003.fastq.gz
297487782 Mar 19 9:55 2894_CCTTCA_L004_R1_004.fastq.gz
296972912 Mar 19 9:58 2894_CCTTCA_L004_R1_005.fastq.gz
296600770 Mar 19 9:59 2894_CCTTCA_L004_R1_006.fastq.gz
296969650 Mar 19 10:01 2894_CCTTCA_L004_R1_007.fastq.gz
6172325 Mar 19 10:02 2894_CCTTCA_L004_R1_008.fastq.gz
299219937 Mar 19 9:47 2894_CCTTCA_L005_R1_001.fastq.gz
299250792 Mar 19 9:51 2894_CCTTCA_L005_R1_002.fastq.gz
299132778 Mar 19 9:53 2894_CCTTCA_L005_R1_003.fastq.gz
298451004 Mar 19 9:56 2894_CCTTCA_L005_R1_004.fastq.gz
297911999 Mar 19 9:58 2894_CCTTCA_L005_R1_005.fastq.gz
297310880 Mar 19 10:00 2894_CCTTCA_L005_R1_006.fastq.gz
295327365 Mar 19 10:01 2894_CCTTCA_L005_R1_007.fastq.gz
59057213 Mar 19 10:02 2894_CCTTCA_L005_R1_008.fastq.gz
297818921 Mar 19 9:46 2894_CCTTCA_L006_R1_001.fastq.gz
299471365 Mar 19 9:49 2894_CCTTCA_L006_R1_002.fastq.gz
299352842 Mar 19 9:53 2894_CCTTCA_L006_R1_003.fastq.gz
297294165 Mar 19 9:56 2894_CCTTCA_L006_R1_004.fastq.gz
296796918 Mar 19 9:59 2894_CCTTCA_L006_R1_005.fastq.gz
297483409 Mar 19 10:00 2894_CCTTCA_L006_R1_006.fastq.gz
295547701 Mar 19 10:02 2894_CCTTCA_L006_R1_007.fastq.gz
45004013 Mar 19 10:02 2894_CCTTCA_L006_R1_008.fastq.gz

**GenoMax** · 03-25-2014, 07:10 AM

See page 23 of the CASAVA manual for possible organization of the sample files based on concept of "projects". http://supportres.illumina.com/docum..._15011196d.pdf

I am inclined to keep sample files organized on a per lane basis, which is what CASAVA will do. Probably faster to feed them into an aligner in parallel than a huge single file. That said, you could cat them across lanes into a big file (since you should be able to figure out what lane a sequence came from by looking at the Fastq ID header) but the files may become too unwieldy to handle.

**shirley0818** · 03-25-2014, 09:15 AM

Got it. Thank you!

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 23 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

How to keep the raw .fastq.gz files for RNASeq data

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News