Hi,
Can someone explain the basic idea behind quality recalibration and how it is corrected?
I realize that the sequencing platform has difficulty in base calling for the bases near the late cycles (due to phasing or cluster overlap). A naive way to correct for this is to simply find out the mean drop in base quality between adjacent bases and then add this differential to read. This approach is flawed as the drop in quality could be due to reasons I am unaware of but might be independent of the detection problem mentioned earlier.
So the idea is that one can use information from (local re)alignment to improve the quality of the base call. I actually don't understand this latter strategy. What I am guessing is that if the read aligns very well to the reference, but for some reason the quality score of some base is much lower than the average quality of overall read -- suggesting then it might be justified to increase the confidence in the quality of the position.
I saw a plot in the GATK documents that after recalibration the expected and the observed quality aligns much better (at the diagonal) but I don't understand how the expected quality information was obtained.
Thanks in advance
Christoph
Can someone explain the basic idea behind quality recalibration and how it is corrected?
I realize that the sequencing platform has difficulty in base calling for the bases near the late cycles (due to phasing or cluster overlap). A naive way to correct for this is to simply find out the mean drop in base quality between adjacent bases and then add this differential to read. This approach is flawed as the drop in quality could be due to reasons I am unaware of but might be independent of the detection problem mentioned earlier.
So the idea is that one can use information from (local re)alignment to improve the quality of the base call. I actually don't understand this latter strategy. What I am guessing is that if the read aligns very well to the reference, but for some reason the quality score of some base is much lower than the average quality of overall read -- suggesting then it might be justified to increase the confidence in the quality of the position.
I saw a plot in the GATK documents that after recalibration the expected and the observed quality aligns much better (at the diagonal) but I don't understand how the expected quality information was obtained.
Thanks in advance
Christoph