SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
samtools flagstat reportin incorrect input count for Bowtie2 mapping jmartin Bioinformatics 4 02-17-2016 02:55 PM
Tophat-accepted_hits.bam file shows more read with samtools flagstat mehtaaditya RNA Sequencing 15 09-22-2015 10:37 AM
Samtools flagstat on Tophat generated .bam file rg_gis Bioinformatics 6 02-27-2015 09:35 PM
Samtools flagstat - low % reads mapping nr23 Bioinformatics 5 11-01-2012 08:04 AM
samtools flagstat bair Bioinformatics 3 05-28-2010 06:15 AM

Reply
 
Thread Tools
Old 02-21-2017, 07:51 PM   #1
rajesh1989
Junior Member
 
Location: india

Join Date: Feb 2015
Posts: 7
Default what is wrong with samtools flagstat or read mapping with tophat?

I have 6,673,385 (around 6 million) reads in each pair end file after quality filtering. but when i map it using tophat and run samtools flagstat on bam file it gives following output
1343686 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1343686 + 0 mapped (100.00%:-nan%)
1343686 + 0 paired in sequencing
670808 + 0 read1
672878 + 0 read2
1203600 + 0 properly paired (89.57%:-nan%)
1311198 + 0 with itself and mate mapped
32488 + 0 singletons (2.42%:-nan%)
15874 + 0 with mate mapped to a different chr
452 + 0 with mate mapped to a different chr (mapQ>=5)
I am not very sure how to interpret samtools flagstat output, but as i assume there are only 670808 (around 0.6 million) reads in pair1 are mapped 672878 (around 0.6 million) from pair2. is it correct? That is 1/10 th of the total input reads. where are rest of my reads???

Report produced by tophat shows some other statistics

Left reads:
Input : 468668
Mapped : 443344 (94.6% of input)
of these: 216780 (48.9%) have multiple alignments (1 have >20)
Right reads:
Input : 468668
Mapped : 444468 (94.8% of input)
of these: 217699 (49.0%) have multiple alignments (1 have >20)
94.7% overall read mapping rate.
Aligned pairs: 433356
of these: 211726 (48.9%) have multiple alignments
5512 ( 1.3%) are discordant alignments
91.3% concordant pair alignment rate.
why is tophat saying it mapped around 94% of reads when there are around 6 million reads in beginning?
how to interpret all these numbers thank you.
rajesh1989 is offline   Reply With Quote
Old 02-22-2017, 02:26 AM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 794
Default

samtools flagstat can only report what's in the file, so if there are no unmapped reads in the BAM file then the calculated mapping rate will be 100% (with some reduction in that due to unpaired and low-quality mappings, if included).
gringer is offline   Reply With Quote
Old 02-22-2017, 06:21 AM   #3
rajesh1989
Junior Member
 
Location: india

Join Date: Feb 2015
Posts: 7
Default

thank you for the reply.
this is output of tophat prep_reads.info

left_min_read_len=25
left_max_read_len=101
left_reads_in =6673385
left_reads_out=6667431
right_min_read_len=25
right_max_read_len=101
right_reads_in =6673385
right_reads_out=6673220

where are rest of the reads if tophat didn't map them. i also checked unmapped.bam it's size is very small.
rajesh1989 is offline   Reply With Quote
Old 02-22-2017, 07:12 AM   #4
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 196
Default

Quote:
Originally Posted by rajesh1989 View Post
Report produced by tophat shows some other statistics

Left reads:
Input : 468668
Mapped : 443344 (94.6% of input)
of these: 216780 (48.9%) have multiple alignments (1 have >20)
Right reads:
Input : 468668
Mapped : 444468 (94.8% of input)
of these: 217699 (49.0%) have multiple alignments (1 have >20)
94.7% overall read mapping rate.
Aligned pairs: 433356
of these: 211726 (48.9%) have multiple alignments
5512 ( 1.3%) are discordant alignments
91.3% concordant pair alignment rate.
This says your input to tophat is only ~460k read pairs. This directly contradicts what you posted in the prep_reads.info. Are you sure you don't have mismatched files?
fanli is offline   Reply With Quote
Old 02-22-2017, 07:21 AM   #5
rajesh1989
Junior Member
 
Location: india

Join Date: Feb 2015
Posts: 7
Default

Hello,

whatever i have written here is correct i just copied details and pasted here.

what do you mean by mismatched files?

that is my actual query why tophat is taking only ~460k read pairs?
rajesh1989 is offline   Reply With Quote
Old 02-22-2017, 09:17 AM   #6
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 196
Default

Like the prep_reads.info is from one sample and the tophat align_summary is from another?
fanli is offline   Reply With Quote
Old 02-22-2017, 09:44 PM   #7
rajesh1989
Junior Member
 
Location: india

Join Date: Feb 2015
Posts: 7
Default

no they are not in very same folder i have those two files.
rajesh1989 is offline   Reply With Quote
Old 02-23-2017, 07:16 AM   #8
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 196
Default

Perhaps you have mixed up files in your script. You may want to check the logs in your tophat output directory.

As an example, here's what my align_summary.txt looks like:
Code:
Left reads:
          Input     :   6551998
           Mapped   :   5980941 (91.3% of input)
            of these:    199516 ( 3.3%) have multiple alignments (10560 have >10)
Right reads:
          Input     :   6551998
           Mapped   :   5574400 (85.1% of input)
            of these:    184354 ( 3.3%) have multiple alignments (10346 have >10)
88.2% overall read mapping rate.

Aligned pairs:   5394272
     of these:    177939 ( 3.3%) have multiple alignments
                  148603 ( 2.8%) are discordant alignments
80.1% concordant pair alignment rate.
and the corresponding prep_reads.info:
Code:
left_min_read_len=75
left_max_read_len=75
left_reads_in =6551998
left_reads_out=6544622
right_min_read_len=75
right_max_read_len=75
right_reads_in =6551998
right_reads_out=6495499
Note that both files refer to 6551998 as the number of read pairs input.
fanli is offline   Reply With Quote
Old 02-23-2017, 06:18 PM   #9
rajesh1989
Junior Member
 
Location: india

Join Date: Feb 2015
Posts: 7
Default

i got the answer. i think this is some issue with multi threading. when i run tophat on single core i get correct results. other peoples have also reported this issue.
rajesh1989 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:01 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO