![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
inconsistent between samtools mpileup output and bam tview | Rainbird | Bioinformatics | 0 | 05-31-2013 11:08 PM |
bam file statistics | adrian | Bioinformatics | 3 | 10-09-2012 02:02 AM |
convert bam file to bedgraph format with inconsistent results | zhiqiang | Bioinformatics | 0 | 07-27-2012 12:48 AM |
How to get statistics data from sam/bam file generated by trinity packages | taoxiang180 | Bioinformatics | 1 | 07-05-2012 01:46 AM |
BAM with inconsistent flag and mapq | dariober | Bioinformatics | 2 | 07-18-2011 12:27 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: 63130 Join Date: Apr 2012
Posts: 125
|
![]()
I used tophat2 to map my RNAseq reads to the reference genome. When I tried to get the statistics for the resultant mapped bam file, I found the inconsistency between the different tools.
'samstat' gave me a total number of 33.7M aligned including all MAPQ scores, however 'samtools flagstat' gave me 37.3M aligned reads. Here I see over 10% off. Could anybody explain why and what should I follow? |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
I would guess that samstat doesn't count secondary alignments (samtools flagstat does).
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
I noticed something like that. I had the same fastq aligned with top hat and bwa. samtools flagstat on the bwa file gave me exactly as many lines as were in the fastq (I counted the fastq to be sure). samtools flagstat on the tophat file gave me more mapped reads than there were fastq entries!
I think the answer is that bowtie is reporting more than one line for some reads, and bwa isn't. So maybe samstat is accounting for that, and flagstat isn't. |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: 63130 Join Date: Apr 2012
Posts: 125
|
![]()
My situation is: the same aligned bam file and different stat results.
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: India Join Date: Jun 2010
Posts: 20
|
![]() Quote:
Here is the flagstat output on sample reads aligned using bowtie 1 --------------------------------------------------------------- 73321912 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 49773724 + 0 mapped (67.88%:nan%) 73321912 + 0 paired in sequencing 36660956 + 0 read1 36660956 + 0 read2 49773724 + 0 properly paired (67.88%:nan%) 49773724 + 0 with itself and mate mapped 0 + 0 singletons (0.00%:nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5) Here is the flagstat output on the same sample reads aligned using Tophat2 ------------------------------------------------------------------------ 82281727 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 82281727 + 0 mapped (100.00%:nan%) 82281727 + 0 paired in sequencing 41392697 + 0 read1 40889030 + 0 read2 51432316 + 0 properly paired (62.51%:nan%) 80601394 + 0 with itself and mate mapped 1680333 + 0 singletons (2.04%:nan%) 7719052 + 0 with mate mapped to a different chr 591122 + 0 with mate mapped to a different chr (mapQ>=5) Observations: ![]() 1. Difference in total input reads 2. 'rmdup' command is working well on the sorted bowtie1 generated bam whereas on tophat2 generated bam (already coordinate sorted) I am getting error/warnings like.. ------------- [bam_rmdup_core] inconsistent BAM file for pair 'HWI-1KL155:43:H088BADXX:2:2108:10331:62994'. Continue anyway. [bam_rmdup_core] inconsistent BAM file for pair 'HWI-1KL155:43:H088BADXX:1:2102:8980:94393'. Continue anyway. ----------------------------- Could someone share their experience or suggest whats going wrong and how to handle this disparity. Regards Last edited by Brajbio; 12-05-2013 at 12:04 AM. |
|
![]() |
![]() |
![]() |
#6 |
Member
Location: Germany Join Date: Jan 2013
Posts: 42
|
![]()
The difference in total input reads should come from the fact that you probably used accepted_hits.bam as TopHat output file. It does not include the unmapped reads, while the Bowtie output does. You can see that the Tophat results have 100% mapped reads.
You might have run bowtie with the option that only accepts concordant pairs. This means, there will be no reads mapped as singletons or with the mate mapped to a different chromosome, because then the mapped read would count as unmapped, as it is not part of a concordant pair. This also already explained the difference in mapping percentage. Considering that you mapped RNAseq reads, your problem with rmdup is more that duplicate removal does not do good for RNAseq reads. Okay, I think this is an ongoing debate. But it will definitely mess up your quantitative results. But there is also another thread about this problem on here that might help you. http://seqanswers.com/forums/showthread.php?t=19856 |
![]() |
![]() |
![]() |
#7 |
Member
Location: India Join Date: Jun 2010
Posts: 20
|
![]()
Thanks sBeier, for the response.
No, I have not applied the options to accept concordant reads in bowtie1 run (is this option available in bowtie-0.12.3? Not sure if it is taken care by default on using -X option), but yes I have used only the "protein coding genes" isoforms sequences as my reference while alignment with bowtie 1 whereas with the tophat2 I have used the hg19 complete seq with gtf including all 'NM' and 'NR' entries. Difference in the references (with and without NR sequences) might be one reason I can think of and as you said unmapped reads counts are included in the bowtie1 alignment stats could led to difference in mapping %. I admit But still the input reads files used is same in both the cases but I can see the difference in their count too. 73321912 + 0 in total (QC-passed reads + QC-failed reads) ------------------------------------- 82281727 + 0 in total (QC-passed reads + QC-failed reads) Regarding rmdup don't see much convincing answers in the blogs I think i should explore the code here http://samtools.sourcearchive.com/do...8c-source.html |
![]() |
![]() |
![]() |
#8 | |
Member
Location: NZ Join Date: May 2012
Posts: 14
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|