Hi
Even after reading several posts, I'm still confused about merging data from multiple lanes...
I've got a single library run on four lanes (each library run with several other libraries in a given lane but the paired-end fastq files are already separated by library). My goal is to generate a single consensus sequence of the mitochondria for each library.
My overall pipeline for trimmed reads consists of 1) alignment to the reference with BWA mem 2) convert sam to bam 3) sorting with Piccard tools 4) removing duplicates with Piccard tools 5) removing ambiguous reads with samtools and 6) then splitting the bam file into separate nuclear and mitochondrial bams (using samtools).
I'm specifically wondering if there are any problems with merging the resulting mitochondrial bams from running this pipeline separately for each lane? Or should I be merging the data from the four lanes at an earlier step?
Also, I'm really confused about the concept of @RGs and @SQ...Is a RG simply the bam version of SQ? I was thinking of using samtools merge with the -r parameter specified...does this replace the original RGs somehow? How does the RG effect things downstream (for me I'm eventually generating an mpileup file and then a consensus sequence...).
Thanks in advance for the help!
Even after reading several posts, I'm still confused about merging data from multiple lanes...
I've got a single library run on four lanes (each library run with several other libraries in a given lane but the paired-end fastq files are already separated by library). My goal is to generate a single consensus sequence of the mitochondria for each library.
My overall pipeline for trimmed reads consists of 1) alignment to the reference with BWA mem 2) convert sam to bam 3) sorting with Piccard tools 4) removing duplicates with Piccard tools 5) removing ambiguous reads with samtools and 6) then splitting the bam file into separate nuclear and mitochondrial bams (using samtools).
I'm specifically wondering if there are any problems with merging the resulting mitochondrial bams from running this pipeline separately for each lane? Or should I be merging the data from the four lanes at an earlier step?
Also, I'm really confused about the concept of @RGs and @SQ...Is a RG simply the bam version of SQ? I was thinking of using samtools merge with the -r parameter specified...does this replace the original RGs somehow? How does the RG effect things downstream (for me I'm eventually generating an mpileup file and then a consensus sequence...).
Thanks in advance for the help!
Comment