Hi i m using tophat-cufflink pipeline for RNA seq data analysis. i have ~26 million high quality paired end reads each pair 1 and pair2. i mapped it on reference genome with tophat and output file accepted_hits.bam generated. then i cheked it with samtools and took stats with samtools flagstat it show the result mentioned below:
81357362 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
81357362 + 0 mapped (100.00%:nan%)
81357362 + 0 paired in sequencing
40660320 + 0 read1
40697042 + 0 read2
74893468 + 0 properly paired (92.05%:nan%)
79296710 + 0 with itself and mate mapped
2060652 + 0 singletons (2.53%:nan%)
4103348 + 0 with mate mapped to a different chr
78008 + 0 with mate mapped to a different chr (mapQ>=5)
its shows more number of reads in read 1 and read2 and then properly paired 92.5% in compare of reads used in mapping. can anyone guide me or tell me how is it possible or why this is happened?? i m stucked with this problem when mapping with tophat. CAN ANYBODY TELL ME THE REASON? How can i know proper mapping percentage? THANKS.
81357362 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
81357362 + 0 mapped (100.00%:nan%)
81357362 + 0 paired in sequencing
40660320 + 0 read1
40697042 + 0 read2
74893468 + 0 properly paired (92.05%:nan%)
79296710 + 0 with itself and mate mapped
2060652 + 0 singletons (2.53%:nan%)
4103348 + 0 with mate mapped to a different chr
78008 + 0 with mate mapped to a different chr (mapQ>=5)
its shows more number of reads in read 1 and read2 and then properly paired 92.5% in compare of reads used in mapping. can anyone guide me or tell me how is it possible or why this is happened?? i m stucked with this problem when mapping with tophat. CAN ANYBODY TELL ME THE REASON? How can i know proper mapping percentage? THANKS.
Comment