Hi,
There are very large differences between the concordant pair alignment rate calculated by TopHat2, and the properly paired percentage calculated by samtools flagstat.
In the following example of paired-end 50 bases RNA-Seq, TopHat calculates a concordant pair alignment rate of 95.1%, while samtools flagstat calculates that 46.13% of the reads are properly paired.
These large differences are present in all my analyses, not just this one.
I trust the TopHat2 numbers more, but I need to be able to explain why samtools flagstat should not be used as a quality control.
I assume TopHat2 just didn't set the flags correctly, causing problems to samtools flagstat, but I would like to confirm this.
TopHat v2.0.10
samtools v0.1.19
RNA-Seq: 50 bases paired-end (library-type fr-firststrand)
Average fragment size: 150 bases
Thank you for your help.
There are very large differences between the concordant pair alignment rate calculated by TopHat2, and the properly paired percentage calculated by samtools flagstat.
In the following example of paired-end 50 bases RNA-Seq, TopHat calculates a concordant pair alignment rate of 95.1%, while samtools flagstat calculates that 46.13% of the reads are properly paired.
These large differences are present in all my analyses, not just this one.
I trust the TopHat2 numbers more, but I need to be able to explain why samtools flagstat should not be used as a quality control.
I assume TopHat2 just didn't set the flags correctly, causing problems to samtools flagstat, but I would like to confirm this.
TopHat v2.0.10
samtools v0.1.19
RNA-Seq: 50 bases paired-end (library-type fr-firststrand)
Average fragment size: 150 bases
Code:
[blancha@lg-1r17-n02 tophat]$ more tophat_R2.sh tophat \ --no-novel-juncs -p 3 \ --library-type fr-firststrand \ -G genomes/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/Mus_musculus.GRCm38.75.gtf \ -o ../results/tophat/R2 \ genomes//Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/genome \ ../data/FASTQ_files/untrimmed/HI.1774.007.Index_4.MAG_R2_R1.fastq.gz \ ../data/FASTQ_files/untrimmed/HI.1774.007.Index_4.MAG_R2_R2.fastq.gz [blancha@lg-1r17-n02 tophat]$ more ../results/tophat/R2/align_summary.txt Left reads: Input : 39626765 Mapped : 39103386 (98.7% of input) of these: 2042138 ( 5.2%) have multiple alignments (117551 have >20) Right reads: Input : 39626765 Mapped : 38922972 (98.2% of input) of these: 2035395 ( 5.2%) have multiple alignments (117438 have >20) 98.5% overall read mapping rate. Aligned pairs: 38557982 of these: 2008047 ( 5.2%) have multiple alignments 856546 ( 2.2%) are discordant alignments 95.1% concordant pair alignment rate. [blancha@lg-1r17-n02 tophat]$ samtools flagstat ../results/tophat/R2/accepted_hits.bam 93634895 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 93634895 + 0 mapped (100.00%:-nan%) 93634895 + 0 paired in sequencing 46920556 + 0 read1 46714339 + 0 read2 43190782 + 0 properly paired (46.13%:-nan%) 92474116 + 0 with itself and mate mapped 1160779 + 0 singletons (1.24%:-nan%) 9792980 + 0 with mate mapped to a different chr 237120 + 0 with mate mapped to a different chr (mapQ>=5)
Comment