View Single Post
Old 02-21-2017, 07:51 PM   #1
rajesh1989
Junior Member
 
Location: india

Join Date: Feb 2015
Posts: 7
Default what is wrong with samtools flagstat or read mapping with tophat?

I have 6,673,385 (around 6 million) reads in each pair end file after quality filtering. but when i map it using tophat and run samtools flagstat on bam file it gives following output
1343686 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1343686 + 0 mapped (100.00%:-nan%)
1343686 + 0 paired in sequencing
670808 + 0 read1
672878 + 0 read2
1203600 + 0 properly paired (89.57%:-nan%)
1311198 + 0 with itself and mate mapped
32488 + 0 singletons (2.42%:-nan%)
15874 + 0 with mate mapped to a different chr
452 + 0 with mate mapped to a different chr (mapQ>=5)
I am not very sure how to interpret samtools flagstat output, but as i assume there are only 670808 (around 0.6 million) reads in pair1 are mapped 672878 (around 0.6 million) from pair2. is it correct? That is 1/10 th of the total input reads. where are rest of my reads???

Report produced by tophat shows some other statistics

Left reads:
Input : 468668
Mapped : 443344 (94.6% of input)
of these: 216780 (48.9%) have multiple alignments (1 have >20)
Right reads:
Input : 468668
Mapped : 444468 (94.8% of input)
of these: 217699 (49.0%) have multiple alignments (1 have >20)
94.7% overall read mapping rate.
Aligned pairs: 433356
of these: 211726 (48.9%) have multiple alignments
5512 ( 1.3%) are discordant alignments
91.3% concordant pair alignment rate.
why is tophat saying it mapped around 94% of reads when there are around 6 million reads in beginning?
how to interpret all these numbers thank you.
rajesh1989 is offline   Reply With Quote