I am using bowtie2 (default settings) to map ~1.95 million reads against a small reference. The input queries I fed to bowtie2 were all fastq. There were 4 total files:
2 paired fastq (1523516 total reads ... 761758 pairs)
2 'orphan' files (not paired ... file1: 232282 reads, file2: 193747 reads)
In total my query files contain 1,949,545 reads. I ran all this in a single bowtie2 command using -1, -2 & -U with the 2 orphan files being comma-delimited list for -U.
Bowtie2 finished without complaint, and it reports these mapping stats:
+++++++++++++++++++++
1187787 reads; of these:
761758 (64.13%) were paired; of these:
471690 (61.92%) aligned concordantly 0 times
290068 (38.08%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
471690 pairs aligned concordantly 0 times; of these:
1089 (0.23%) aligned discordantly 1 time
----
470601 pairs aligned 0 times concordantly or discordantly; of these:
941202 mates make up the pairs; of these:
746446 (79.31%) aligned 0 times
194756 (20.69%) aligned exactly 1 time
0 (0.00%) aligned >1 times
426029 (35.87%) were unpaired; of these:
290590 (68.21%) aligned 0 times
135439 (31.79%) aligned exactly 1 time
0 (0.00%) aligned >1 times
46.81% overall alignment rate
+++++++++++++++++++++
Those numbers are wrong. My input had exactly 1,949,545 reads. Bowtie2 is reporting 1187787 total reads.
Then I took the output sam file and ran samtools flagstats, getting these numbers:
1252499 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
215463 + 0 mapped (17.20%:-nan%)
961790 + 0 paired in sequencing
519797 + 0 read1
441993 + 0 read2
212926 + 0 properly paired (22.14%:-nan%)
215046 + 0 with itself and mate mapped
298 + 0 singletons (0.03%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
this is also wrong. Then I tried running samstats, and when I add up all the MAPQ category numbers (including the unmapped), it sums up to:
1252499 reads
So, then I tried a simple samtools view command to try and count the total # reads reported:
samtools view <bam> | cut -f1 | sort | uniq | wc -l
and I get:
868833 reads
None of these methods agree with any other method. And none come close to my actual input read number. My understanding is that bowtie2 should (by default) only be reporting 1 alignment per query, so the total numbers reported shouldn't be confused by issues of supplemental & non-primary alignments.
Can anyone tell me what is going on? I see no evidence of truncation or failure in any of my outputs. And even if there had been, I would have expected these methods to report the same number of total reads. Help?
2 paired fastq (1523516 total reads ... 761758 pairs)
2 'orphan' files (not paired ... file1: 232282 reads, file2: 193747 reads)
In total my query files contain 1,949,545 reads. I ran all this in a single bowtie2 command using -1, -2 & -U with the 2 orphan files being comma-delimited list for -U.
Bowtie2 finished without complaint, and it reports these mapping stats:
+++++++++++++++++++++
1187787 reads; of these:
761758 (64.13%) were paired; of these:
471690 (61.92%) aligned concordantly 0 times
290068 (38.08%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
471690 pairs aligned concordantly 0 times; of these:
1089 (0.23%) aligned discordantly 1 time
----
470601 pairs aligned 0 times concordantly or discordantly; of these:
941202 mates make up the pairs; of these:
746446 (79.31%) aligned 0 times
194756 (20.69%) aligned exactly 1 time
0 (0.00%) aligned >1 times
426029 (35.87%) were unpaired; of these:
290590 (68.21%) aligned 0 times
135439 (31.79%) aligned exactly 1 time
0 (0.00%) aligned >1 times
46.81% overall alignment rate
+++++++++++++++++++++
Those numbers are wrong. My input had exactly 1,949,545 reads. Bowtie2 is reporting 1187787 total reads.
Then I took the output sam file and ran samtools flagstats, getting these numbers:
1252499 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
215463 + 0 mapped (17.20%:-nan%)
961790 + 0 paired in sequencing
519797 + 0 read1
441993 + 0 read2
212926 + 0 properly paired (22.14%:-nan%)
215046 + 0 with itself and mate mapped
298 + 0 singletons (0.03%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
this is also wrong. Then I tried running samstats, and when I add up all the MAPQ category numbers (including the unmapped), it sums up to:
1252499 reads
So, then I tried a simple samtools view command to try and count the total # reads reported:
samtools view <bam> | cut -f1 | sort | uniq | wc -l
and I get:
868833 reads
None of these methods agree with any other method. And none come close to my actual input read number. My understanding is that bowtie2 should (by default) only be reporting 1 alignment per query, so the total numbers reported shouldn't be confused by issues of supplemental & non-primary alignments.
Can anyone tell me what is going on? I see no evidence of truncation or failure in any of my outputs. And even if there had been, I would have expected these methods to report the same number of total reads. Help?
Comment