Hello,
I am trying to understand a discrepancy between alignment numbers reported by flagstat and ht-seq. I have a BAM file from tophat2 which I filtered (in Galaxy) to get uniquely mapped (MAPQ>=50) and properly paired reads (isProperPair:true). Running flagstat on the filtered file gives me:
14783250 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
14783250 + 0 mapped (100.00%:-nan%)
14783250 + 0 paired in sequencing
7391625 + 0 read1
7391625 + 0 read2
14783250 + 0 properly paired (100.00%:-nan%)
14783250 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I then count features with ht-seq using the following:
htseq-count --format=bam --order=pos --stranded=no --mode=union filtered file.bam annotation.gtf>myfile.counts
which works fine. The thing I don't understand is that I get this warning:
Warning: Mate records missing for 42366 records
and the total number of SAM alignment pairs processed is reported as 7412808. I notice that 7412808= 7391625 + (42366/2)
It would make sense to me that these are records whose mates were multiply mapped and therefore filtered out, leaving them without their pair, but I can't understand why they are not showing up in the flagstat output.
I'd be really grateful if anybody can shed some light on this!
I am trying to understand a discrepancy between alignment numbers reported by flagstat and ht-seq. I have a BAM file from tophat2 which I filtered (in Galaxy) to get uniquely mapped (MAPQ>=50) and properly paired reads (isProperPair:true). Running flagstat on the filtered file gives me:
14783250 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
14783250 + 0 mapped (100.00%:-nan%)
14783250 + 0 paired in sequencing
7391625 + 0 read1
7391625 + 0 read2
14783250 + 0 properly paired (100.00%:-nan%)
14783250 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I then count features with ht-seq using the following:
htseq-count --format=bam --order=pos --stranded=no --mode=union filtered file.bam annotation.gtf>myfile.counts
which works fine. The thing I don't understand is that I get this warning:
Warning: Mate records missing for 42366 records
and the total number of SAM alignment pairs processed is reported as 7412808. I notice that 7412808= 7391625 + (42366/2)
It would make sense to me that these are records whose mates were multiply mapped and therefore filtered out, leaving them without their pair, but I can't understand why they are not showing up in the flagstat output.
I'd be really grateful if anybody can shed some light on this!