SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
count reads or count base-pairs yuelics Introductions 3 07-29-2011 06:41 AM
Quantification: count reads or count base pairs? yuelics Bioinformatics 0 07-27-2011 05:48 AM
BWA: specifying SAM/BAM file header fields before read alignment? nora Bioinformatics 3 12-04-2010 10:11 PM
Picard MarkDuplicates - How to identify duplicates in generated BAM file makarovv Bioinformatics 6 11-10-2010 09:02 AM
BWA Uniquely Mapped Reads NF_seq Bioinformatics 0 09-06-2010 04:32 AM

Reply
 
Thread Tools
Old 09-19-2012, 09:01 PM   #1
mrfox
Senior Member
 
Location: USA

Join Date: Aug 2010
Posts: 103
Default count the uniquely and duplicates reads for BWA bam file

Hi all,

If this topic has been solved in a previous post, I apologize for that. And please direct me to the solution.

I used BWA to align exome sequencing data. And I want to count the number of uniquely mapped reads and the number of duplicates reads (not multi-hits reads) in the obtained BAM files. Samtools flagstat is supposed to work: uniquely = mapped - duplicate. Unfortunately samtools flagstat always get "0+0 duplicate" for BWA aligned BAM files.

I wonder if anybody has an easy solution for this.

Thank you all.
mrfox is offline   Reply With Quote
Old 09-19-2012, 09:52 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by mrfox View Post
Hi all,

If this topic has been solved in a previous post, I apologize for that. And please direct me to the solution.

I used BWA to align exome sequencing data. And I want to count the number of uniquely mapped reads and the number of duplicates reads (not multi-hits reads) in the obtained BAM files. Samtools flagstat is supposed to work: uniquely = mapped - duplicate. Unfortunately samtools flagstat always get "0+0 duplicate" for BWA aligned BAM files.

I wonder if anybody has an easy solution for this.

Thank you all.
Samtools flagstat is just reading the flags in the bam lines. If you haven't used software that will assign the "duplicate" flag in your bam, flagstat won't know that you have duplicates.

Software like Picard's MarkDuplicates will flag reads as duplicates where appropriate. samtools rmdup will just get rid of duplicates.
swbarnes2 is offline   Reply With Quote
Old 09-20-2012, 06:59 AM   #3
mrfox
Senior Member
 
Location: USA

Join Date: Aug 2010
Posts: 103
Default

Hi swbarnes2,

Thanks for your reply. I seldom use Picard's MarkDuplicates but it seems to me that if we apply it on a BAM file and then the duplicates will be labeled and then we are able to count the number of duplicates in the new BAM. I will try that. Please let me know if you have a better idea.

Thank you.
mrfox is offline   Reply With Quote
Old 09-20-2012, 12:31 PM   #4
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by mrfox View Post
Hi swbarnes2,

Thanks for your reply. I seldom use Picard's MarkDuplicates but it seems to me that if we apply it on a BAM file and then the duplicates will be labeled and then we are able to count the number of duplicates in the new BAM. I will try that. Please let me know if you have a better idea.

Thank you.
Yeah, that's fine. Or use samtools rmdup, and count how many reads are missing from the de-dupped file.

Those tools should yield almost exactly the same thing (one known exception, rmdup won't touch reads where each read falls in a different chromosome, MarkDuplicates should deal with those properly.)
swbarnes2 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO