Hi everyone,
I've run bowtie and bwa (using generally default parameters) on the same dataset and provisionally called SNPs using samtools pileup on each of them (without removing duplicates, doing local realignment etc).
The results surprised me, so I'm wondering whether this is what should be expected.
Basically, the SNPs found in bowtie and bwa alignments overlap by only ~50%. There are fewer bwa-only SNPs, but still a significant amount. In terms of samtools consensus scores (within a reasonable coverage range (5-25)), SNPs detected by both aligners tend to have the highest scores, followed rather closely by bwa-only and then bowtie-only SNPs.
I'm quite surprised that the two alignments produced by essentially similar algorithms differ so much. Is this to be expected? What would be the best strategy for dealing with this discrepancy? Just trust bwa since it can detect indels and forget about bowtie? Or focus only on SNPs that are found in both bowtie and bwa alignments? Or maybe this discrepancy indicates that there's some problem with the initial dataset in the first place?
Will greatly appreciate your thoughts/experience on this...
I've run bowtie and bwa (using generally default parameters) on the same dataset and provisionally called SNPs using samtools pileup on each of them (without removing duplicates, doing local realignment etc).
The results surprised me, so I'm wondering whether this is what should be expected.
Basically, the SNPs found in bowtie and bwa alignments overlap by only ~50%. There are fewer bwa-only SNPs, but still a significant amount. In terms of samtools consensus scores (within a reasonable coverage range (5-25)), SNPs detected by both aligners tend to have the highest scores, followed rather closely by bwa-only and then bowtie-only SNPs.
I'm quite surprised that the two alignments produced by essentially similar algorithms differ so much. Is this to be expected? What would be the best strategy for dealing with this discrepancy? Just trust bwa since it can detect indels and forget about bowtie? Or focus only on SNPs that are found in both bowtie and bwa alignments? Or maybe this discrepancy indicates that there's some problem with the initial dataset in the first place?
Will greatly appreciate your thoughts/experience on this...
Comment