Hello. Here is the problem set up.
I have received fastq.gz files that were ran in "four different pools". We have 10 samples per pool, and each sample was paired end read, so we have two reads per sample. Now each sample was ran on two lanes, so in actuality, for each sample there are 4 reads.
I am interested in creating the bam files for each pool, and just would like to make sure I understand what is going on.
So first I am concatenating the same read from each different lane into a single read. ie. SameID_Lane1_R1.fastq.gz is cat with SameID_Lane2_R1.fastq.gz etc.
After cat these files in Linux is there an error check I should perform to make sure the cat was done correctly?
And secondly, how do I align the four different pools? Should I align each pool independent of the others? Say for pool 1, I just run the TopHat and process all the bam files for these ten sampleIDs? Or should I align all 4 pools together?
Thank you.
To my understanding, a "pool" is just a batched run of samples that have been processed.
I have received fastq.gz files that were ran in "four different pools". We have 10 samples per pool, and each sample was paired end read, so we have two reads per sample. Now each sample was ran on two lanes, so in actuality, for each sample there are 4 reads.
I am interested in creating the bam files for each pool, and just would like to make sure I understand what is going on.
So first I am concatenating the same read from each different lane into a single read. ie. SameID_Lane1_R1.fastq.gz is cat with SameID_Lane2_R1.fastq.gz etc.
After cat these files in Linux is there an error check I should perform to make sure the cat was done correctly?
And secondly, how do I align the four different pools? Should I align each pool independent of the others? Say for pool 1, I just run the TopHat and process all the bam files for these ten sampleIDs? Or should I align all 4 pools together?
Thank you.
To my understanding, a "pool" is just a batched run of samples that have been processed.
Comment