Hi,
Can someone explain to me the logic behind quality recalibration and how it is done. I realize that the sequencing platform has difficulty in base calling for the bases near the late cycles (say, due to phasing or cluster overlap). A naive way to correct for this is to simply find out the mean drop in base quality between adjacent bases and then add this differential to read. This approach is flawed as the drop in quality could be due to reasons I am unaware of but might be independent of the detection problem mentioned earlier.
So the idea is that one can use information from alignment to improve the quality of the base call. I actually don't understand this latter strategy. What I am guessing is that if the read aligns very well to the reference, but for some reason the quality score of some base is much lower than the quality of sub-sequence bases then it might be justified to increase the quality to the average quality of the subsequent bases. This is a guess, but I would appreciate it if someone can explain this to me.
Also, a colleague of mine tried using GATK but found it to be extremely slow (despite running it on a fast computer with 100G+ RAM); is there another software that you can recommend that one can use for recalibration? Google searching indicates that novoalign has this feature (http://www.novocraft.com/wiki/tiki-i...0Calibration); anyone used this software and are they confident if this is working well?
thanks in advance
Christoph
Can someone explain to me the logic behind quality recalibration and how it is done. I realize that the sequencing platform has difficulty in base calling for the bases near the late cycles (say, due to phasing or cluster overlap). A naive way to correct for this is to simply find out the mean drop in base quality between adjacent bases and then add this differential to read. This approach is flawed as the drop in quality could be due to reasons I am unaware of but might be independent of the detection problem mentioned earlier.
So the idea is that one can use information from alignment to improve the quality of the base call. I actually don't understand this latter strategy. What I am guessing is that if the read aligns very well to the reference, but for some reason the quality score of some base is much lower than the quality of sub-sequence bases then it might be justified to increase the quality to the average quality of the subsequent bases. This is a guess, but I would appreciate it if someone can explain this to me.
Also, a colleague of mine tried using GATK but found it to be extremely slow (despite running it on a fast computer with 100G+ RAM); is there another software that you can recommend that one can use for recalibration? Google searching indicates that novoalign has this feature (http://www.novocraft.com/wiki/tiki-i...0Calibration); anyone used this software and are they confident if this is working well?
thanks in advance
Christoph