Hi Folks,
I was hoping that someone could shed some light on an issue that is driving me crazy. I am aligning some trimmed reads (from cutadapt 1.9.1, PE mode) using bowtie 2 v2.3.2. I aligned my biological replicates and then aligned all replicates together to generate a concatenated file. When peak calling with macs2 I noticed that the concatenated file did not have the expected number of peaks. Since I thought this could be an issue with piping to samtools v1.5 to make bam files, I instead kept things simple and realigned sending my output to a sam file instead (line breaks added for clarity):
#rep1
bowtie2 -p 4 -q -x hg19 \
-1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz \
-2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz \
-S rep1.sam
#rep2
bowtie2 -p 4 -q -x hg19 \
-1 rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz \
-2 rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz \
-S rep2.sam
#rep3
bowtie2 -p 4 -q -x hg19 \
-1 rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
-2 rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
-S rep3.sam
To make the concatenated alignment, I ran with all fastq files:
#rep cat
bowtie2 -p 4 -q -x hg19 \
-1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz,rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz,rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
-2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz,rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz,rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
-S cat.sam
However when I check the file sizes, I observe this:
wc -l *.sam
44436462 rep1.sam
45941920 rep2.sam
46204354 rep3.sam
44436462 cat.sam <-!!!
Shouldn't the concatenated file be the sum of the replicates 44436462 + 45941920 + 46204354 = 136582736 (minus a few lines for duplicated headers)? It's also noteworthy that it is giving the number of lines as the first replicate. I checked if there were any errant spaces in my comma separated fastq lists, and there aren't any. It's almost like bowtie2 stops after reading so many fastq files.
Yes, I know that I could use samtools merge on my sorted bams to get a concatenated file. However, I have always been leery about putting on a new header. In theory aligning to all files should overcome this, so any ideas why it is not?
Thanks
I was hoping that someone could shed some light on an issue that is driving me crazy. I am aligning some trimmed reads (from cutadapt 1.9.1, PE mode) using bowtie 2 v2.3.2. I aligned my biological replicates and then aligned all replicates together to generate a concatenated file. When peak calling with macs2 I noticed that the concatenated file did not have the expected number of peaks. Since I thought this could be an issue with piping to samtools v1.5 to make bam files, I instead kept things simple and realigned sending my output to a sam file instead (line breaks added for clarity):
#rep1
bowtie2 -p 4 -q -x hg19 \
-1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz \
-2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz \
-S rep1.sam
#rep2
bowtie2 -p 4 -q -x hg19 \
-1 rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz \
-2 rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz \
-S rep2.sam
#rep3
bowtie2 -p 4 -q -x hg19 \
-1 rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
-2 rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
-S rep3.sam
To make the concatenated alignment, I ran with all fastq files:
#rep cat
bowtie2 -p 4 -q -x hg19 \
-1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz,rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz,rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
-2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz,rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz,rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
-S cat.sam
However when I check the file sizes, I observe this:
wc -l *.sam
44436462 rep1.sam
45941920 rep2.sam
46204354 rep3.sam
44436462 cat.sam <-!!!
Shouldn't the concatenated file be the sum of the replicates 44436462 + 45941920 + 46204354 = 136582736 (minus a few lines for duplicated headers)? It's also noteworthy that it is giving the number of lines as the first replicate. I checked if there were any errant spaces in my comma separated fastq lists, and there aren't any. It's almost like bowtie2 stops after reading so many fastq files.
Yes, I know that I could use samtools merge on my sorted bams to get a concatenated file. However, I have always been leery about putting on a new header. In theory aligning to all files should overcome this, so any ideas why it is not?
Thanks
Comment