Hi All,
I am been analyzing the quality score distribution reported by samtools for both variant and reference sites (QUAL) in exome and genome data. Despite looking at the mathematical notes of samtools I still have the following questions:
1) Are the quality scores of reference and variant sites comparable? or are there calculated differently ? or better are there reported differently ?
2) In Heng Li paper describing MAQ (2009) two different things are reported as the formula for calculating Qg: a) (Main text) Qg = -10*log10[1-P(g^|D)], b) (Supplementary) Qg=min(g different g^){q} - {q^} where q^ is the genotype that maximizes the posterior probability.
No reference is given in the paper the mathematical notes of 2010.
Does anybody knows exactly what is the QUAL number that is reported ?
3) Why the Variant quality score is not correlated with the variant fraction of a particular site ?
I am been analyzing the quality score distribution reported by samtools for both variant and reference sites (QUAL) in exome and genome data. Despite looking at the mathematical notes of samtools I still have the following questions:
1) Are the quality scores of reference and variant sites comparable? or are there calculated differently ? or better are there reported differently ?
2) In Heng Li paper describing MAQ (2009) two different things are reported as the formula for calculating Qg: a) (Main text) Qg = -10*log10[1-P(g^|D)], b) (Supplementary) Qg=min(g different g^){q} - {q^} where q^ is the genotype that maximizes the posterior probability.
No reference is given in the paper the mathematical notes of 2010.
Does anybody knows exactly what is the QUAL number that is reported ?
3) Why the Variant quality score is not correlated with the variant fraction of a particular site ?
Comment