Quote:
Originally Posted by Buzz0r
In that case how can you compare the Qscores at all? Clearly, it makes a huge difference if you look at a Illumina Q30 run, a Proton Q17 run or a PacBio <Q10 run.
|
Evaluating (and comparing) the correctness of Q scores for a well-known dataset is fairly easy -- PhiX is a common one, but everyone has their own pet. All you need to do is to compare the quality values with what is expected ("This is a 17-generation inbred strain with SNPs here, here and here, and a gene translocation from here to here. If the sequence is showing a SNP here, then that's probably an incorrect base").
The
generation of Q scores is difficult (think P vs NP if you're a maths person), and that leads to data-specific biases that in most cases cannot be predicted prior to carrying out a run on a new sample.