Seqanswers Leaderboard Ad

**aslihan** · 07-20-2014, 08:14 PM

same problem. Is there anyone to explain these differences??

**aslihan** · 07-20-2014, 08:18 PM

I think we need to trust the Tophat align_summary file to determine the number of reads since samtools flagstat reporting all multiple reads.

**bpb9** · 11-16-2014, 06:53 PM

Discrepancy between Tophat align_summary and samtools flagstat paired end

I also used TopHat2 to align 50bp paired-end reads. I used the --no-mixed option in TopHat and then compared align_summary.txt to samtools flagstat. The numbers are not identical but they are not as off as the ones you report. (Note: In my data, the number of mapped reads is so low because I was mapping to a bacterial genome to check for contamination, thankfully there does not seem to be much).

This is what I got:

TopHat:
Left reads:
Input: 5442643
Mapped: 18683 ( 0.3% of input)
of these: 41 ( 0.2%) have multiple alignments (0 have >20)
Right reads:
Input: 5438826
Mapped: 14866 ( 0.3% of input)
of these: 685 ( 4.6%) have multiple alignments (0 have >20)
0.3% overall read alignment rate.

Aligned pairs: 4026
of these: 0 ( 0.0%) have multiple alignments
and: 0 ( 0.0%) are discordant alignments
0.1% concordant pair alignment rate.

SAMtools:
8052 + 0 mapped (100.00%:-nan%)
8052 + 0 paired in sequencing
4026 + 0 read1
4026 + 0 read2
7602 + 0 properly paired (94.41%:-nan%)
8052 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

So perhaps try running TopHat with --no-mixed option.

You could also try --no-discordant, which tells TopHat to only report concordant pairs, but in my experience this causes the program to crash on versions <2.0.9.

**blancha** · 11-17-2014, 06:26 AM

I've figured out the issue.
TopHat includes the intron in the insert size.
Hence, when samtools flagstat reads the insert size, more often than not, the insert size is far greater than what would be expected from properly paired reads.

So, the numbers are a bit misleading but not incorrect.

TopHat includes the intron in the insert size reported.
Properly paired reads can therefore have very large insert sizes.
Samtools flagstat does not expect the intron to be included in the insert size.
It reports as being improperly paired reads with large insert sizes, which actually just span an intron.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

TopHat2 align_summary.txt vs samtools flagstat

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News