SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat: merging accepted_hits.bam and unmapped.bam offspring RNA Sequencing 36 08-13-2015 03:08 AM
tophat problem: no accepted_hits.bam generated RNAer Bioinformatics 9 08-30-2013 09:41 AM
Tophat2: Not getting an accepted_hits.bam output rndouglas Bioinformatics 3 08-08-2013 02:30 AM
tophat problem: no accepted_hits.bam generated RNAer Bioinformatics 0 07-19-2011 12:18 PM
TOPHAT EMPTY accepted_hits.bam ISSUE waterboy Bioinformatics 1 11-16-2010 08:48 AM

Reply
 
Thread Tools
Old 06-28-2013, 03:44 AM   #1
lucyyang1991
Junior Member
 
Location: china

Join Date: Dec 2011
Posts: 5
Post How can I determine the mapping rates of tophat output such as accepted_hits.bam?

I used TopHat to run the same RNA-Seq data with different -r/--mate-inner-dist and --mate-std-dev.

Here are the parameters:
1. -r 160, --mate-std-dev (default) 20
2. -r (default) 50, --mate-std-dev (default) 20
3. -r 0, --mate-std-dev 60

After the TopHat runned, I used the samtools flagstat to estimates the results.

The results are listed below in order:
1.-r 160, --mate-std-dev (default) 20
Code:
27139030 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
27139030 + 0 mapped (100.00%:-nan%)
27139030 + 0 paired in sequencing
14171642 + 0 read1
12967388 + 0 read2
22063409 + 0 properly paired (81.30%:-nan%)
24154960 + 0 with itself and mate mapped
2984070 + 0 singletons (11.00%:-nan%)
516422 + 0 with mate mapped to a different chr
217580 + 0 with mate mapped to a different chr (mapQ>=5)
4141901 + 2091 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
0 + 0 mapped (0.00%:0.00%)
4141901 + 2091 paired in sequencing
1533088 + 997 read1
2608813 + 1094 read2
0 + 0 properly paired (0.00%:0.00%)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:0.00%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
2.-r (default) 50, --mate-std-dev (default) 20
Code:
27639199 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
27639199 + 0 mapped (100.00%:-nan%)
27639199 + 0 paired in sequencing
14422450 + 0 read1
13216749 + 0 read2
21085751 + 0 properly paired (76.29%:-nan%)
24654856 + 0 with itself and mate mapped
2984343 + 0 singletons (10.80%:-nan%)
706460 + 0 with mate mapped to a different chr
215918 + 0 with mate mapped to a different chr (mapQ>=5)
4142842 + 2091 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
0 + 0 mapped (0.00%:0.00%)
4142842 + 2091 paired in sequencing
1533869 + 997 read1
2608973 + 1094 read2
0 + 0 properly paired (0.00%:0.00%)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:0.00%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
3. -r 0, --mate-std-dev 60
Code:
41145664 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
41145664 + 0 mapped (100.00%:-nan%)
41145664 + 0 paired in sequencing
21422982 + 0 read1
19722682 + 0 read2
22975306 + 0 properly paired (55.84%:-nan%)
37774543 + 0 with itself and mate mapped
3371121 + 0 singletons (8.19%:-nan%)
10967682 + 0 with mate mapped to a different chr
207758 + 0 with mate mapped to a different chr (mapQ>=5)
2934463 + 2091 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
0 + 0 mapped (0.00%:0.00%)
2934463 + 2091 paired in sequencing
906826 + 997 read1
2027637 + 1094 read2
0 + 0 properly paired (0.00%:0.00%)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:0.00%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
As the total input reads of the sample were 31387112, so at first I felt confusing about the result 3, because the total output reads of accepted_hits.bam were much more than the total input reads.

After I checked the bam file, I found there were lots of repeats because of the multihits.

So the results I've got from the samtools flagstat were not that accurate.
Is there any way to estimates the mapping rates and unique mapping rates or anything else?

Hoping for your help!
lucyyang1991 is offline   Reply With Quote
Old 06-28-2013, 05:26 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

Try these to see if they suit your needs.

BAMStats: http://bamstats.sourceforge.net/

Bam_utils: http://genome.sph.umich.edu/wiki/BamUtil:_stats
GenoMax is offline   Reply With Quote
Old 06-28-2013, 07:35 AM   #3
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by lucyyang1991 View Post
After I checked the bam file, I found there were lots of repeats because of the multihits.

So the results I've got from the samtools flagstat were not that accurate.
Is there any way to estimates the mapping rates and unique mapping rates or anything else?
Hi- A quick shortcut to get the mapping rate is to count reads in the bam file where tophat puts the unmapped reads, called unmapped.bam or something like that. Your mapping rate than would be (tot reads - reads in unmapped.bam)/tot reads. For uniquely mapped reads you could use the mapq score if tophat sets correctly to reflect uniqueness of mapping.

Dario
dariober is offline   Reply With Quote
Old 06-29-2013, 11:33 PM   #4
lucyyang1991
Junior Member
 
Location: china

Join Date: Dec 2011
Posts: 5
Smile

Quote:
Originally Posted by GenoMax View Post
Try these to see if they suit your needs.

BAMStats: http://bamstats.sourceforge.net/
Thanks a lot for your help!
I've downloaded the BAMStats. After I unzip the 'BAMStats-1.25-src.zip', I couldn't find the 'BAMStats-GUI-1.25.jar' and didn't know how to use the program even when I was told to run
Code:
java -Xmx4g -jar BAMStats-1.25.jar -i <bam file>
.
lucyyang1991 is offline   Reply With Quote
Old 06-30-2013, 04:33 AM   #5
lucyyang1991
Junior Member
 
Location: china

Join Date: Dec 2011
Posts: 5
Default

Quote:
Originally Posted by GenoMax View Post
Try these to see if they suit your needs.

BAMStats: http://bamstats.sourceforge.net/

Bam_utils: http://genome.sph.umich.edu/wiki/BamUtil:_stats
Hi,
I've tried all the methods, and find out that BamUtil give the same result with samtools flagstat. So, I still can't estimate which parameter is better because they just can't rule out the repeats due to multihits.
lucyyang1991 is offline   Reply With Quote
Old 06-30-2013, 04:32 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

If you are only interested in uniquely mapped reads then see post #14 in this thread: http://seqanswers.com/forums/showthread.php?t=25096

Here is one more option for summarizing read mappings: http://bioinf.wehi.edu.au/featureCounts/
GenoMax is offline   Reply With Quote
Reply

Tags
accepted_hits.bam, mapping statistics, samtools flagstat, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:33 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO