Hi,
I'm trying to check alignment accuracy of bam files generated by tophat using Picard tools.
But the resulting mismatch rates ('PF_MISMATCH_RATE') were unreasonably high.
When trying to figure out why, I found some strange things.
Here is the entries in the sample bam file:
All 3 reads were single-ended, and originated from chr1.fa (and aligned to chr1 perfectly).
First I tried 'CollectAlignmentSummaryMetrics' module just using reference sequence for chr1, and got mismatch rate of 0.
But when I tried the same command with multi-fasta file containing every chromosomes, I got the mismatch rate of 0.77!
Is there anybody who can explain me what is wrong?
My first hunch is that this tool might not support multi-fasta format as reference. But then, how can I get the mismatch rate of a bam file?
I'm trying to check alignment accuracy of bam files generated by tophat using Picard tools.
But the resulting mismatch rates ('PF_MISMATCH_RATE') were unreasonably high.
When trying to figure out why, I found some strange things.
Here is the entries in the sample bam file:
Code:
@HD VN:1.0 SO:coordinate @SQ SN:chr1 LN:249250621 @PG ID:TopHat VN:1.4.0 1925_202_1426 0 chr1 1102502 255 23M * 0 0 GCCATCTTACTGGGCAGCATTGG ___abbb_[NRb``[[`Y[_^&& NM:i:0 NH:i:1 49_971_1359 0 chr1 1102503 255 23M * 0 0 CCATCTTACTGGGCAGCATTGGA QQIFIQ?@NW`VP@E[EGUVVEE NM:i:0 NH:i:1 76_239_469 0 chr1 1102503 255 22M * 0 0 CCATCTTACTGGGCAGCATTGG WWVTUXRS[^^`[CGL<N8;KK NM:i:0 NH:i:1
First I tried 'CollectAlignmentSummaryMetrics' module just using reference sequence for chr1, and got mismatch rate of 0.
Code:
java -jar ~/picard-tools-1.67/CollectAlignmentSummaryMetrics.jar INPUT=hh.sam R=chr1.fa OUTPUT=tmp
Is there anybody who can explain me what is wrong?
My first hunch is that this tool might not support multi-fasta format as reference. But then, how can I get the mismatch rate of a bam file?
Comment