Hi Everyone,
Has anyone else noticed that the BWA mapping quality score is not deterministic so that if you map the exact same read/pair multiple times you sometimes get different mapping quality scores? This is not for pure repeats but for reads that have reasonable mapping quality scores around 30 and the read maps to the same position each time.
I ran an experiment where I simulated 100 error-free paired-end reads from each position in dm3 chrM (read length=100, outer distance=300), and mapped them back to all of dm3 to compute the mapping quality score of each position. For about 1/3 of the starting positions, I would get 2 different mapping quality scores such that about half the reads at a position would have a mapping quality score of X and half the time it would get a score of X+7. For example the reads simulated starting at position 500 either get a mqs of 29 or 36.
Does anyone have a good explanation for this? My understanding is the mapping quality score computation should be completely deterministic (except maybe if the pair distance is reestimated), but the results look like there is a random component - it is not always exactly 50-50 split between 2 values, but a tight distribution around 50-50.
Thank you,
Mike
Has anyone else noticed that the BWA mapping quality score is not deterministic so that if you map the exact same read/pair multiple times you sometimes get different mapping quality scores? This is not for pure repeats but for reads that have reasonable mapping quality scores around 30 and the read maps to the same position each time.
I ran an experiment where I simulated 100 error-free paired-end reads from each position in dm3 chrM (read length=100, outer distance=300), and mapped them back to all of dm3 to compute the mapping quality score of each position. For about 1/3 of the starting positions, I would get 2 different mapping quality scores such that about half the reads at a position would have a mapping quality score of X and half the time it would get a score of X+7. For example the reads simulated starting at position 500 either get a mqs of 29 or 36.
Does anyone have a good explanation for this? My understanding is the mapping quality score computation should be completely deterministic (except maybe if the pair distance is reestimated), but the results look like there is a random component - it is not always exactly 50-50 split between 2 values, but a tight distribution around 50-50.
Thank you,
Mike