I find that the major short read aligners map poorly for highly polymorphic genes like HLA or Cytochrome P450. This is expected because for alleles that differ a lot from the reference allelle, there will be too many mismatches.
I think we can solve this problem by reading a dbsnp vcf that contains possible alleles and allele frequencies during alignment, then treat calls differ from the reference to be matches instead of mismatches if they exceed a certain allele frequency, e.g. 1%. I think this feature can improve the percentage of reads mapped greatly.
Can any short read aligner authors add this feature? Or is it already available in some aligners?
Thanks!
I think we can solve this problem by reading a dbsnp vcf that contains possible alleles and allele frequencies during alignment, then treat calls differ from the reference to be matches instead of mismatches if they exceed a certain allele frequency, e.g. 1%. I think this feature can improve the percentage of reads mapped greatly.
Can any short read aligner authors add this feature? Or is it already available in some aligners?
Thanks!
Comment