 04-14-2011, 04:56 AM #1 quicksand21 Junior Member   Location: San Francisco, CA Join Date: May 2010 Posts: 6 Complete Genomics Variant Calls Hello, I was wondering if anyone could shed some light on the totalScore column in the VAR files produced by Complete Genomics? Specifically what do these scores mean? Is there a best practice in terms of thresholding for high confidence variants? Thank you in advance for your advice!
 04-22-2011, 06:13 AM #2 jason.laramie Junior Member   Location: Boston Join Date: Feb 2011 Posts: 3 Hi, The totalScore is a likelihood ratio test between the most likely hypothesis (e.g. genotype) and the next most likely, and we express this score in decibels (dB). Bioinformaticists will recognize dB as the basis of the Phred scale: 10 dB means the likelihood ratio is 10:1, 20 dB means 100:1, 30 dB is 1000:1, etc. The variant scores factor in quantity of evidence (read depth), quality of evidence (base call quality values), and mapping probabilities. Therefore, the score measures our confidence in calling the variant. Likewise, we produce a "refScore" value that is calculated in a similar fashion but with the numerator of the likelihood being set to homozygous reference. Finally, the refScore can be used to ask how confident we are in the position being homozygous reference (e.g. high scores = high confidence) and if not homozygous reference the totalScore will then ask how confident are we in the genotype we called. Scores for variants are not calibrated on an absolute scale to error rate. A score of 30 dB does not necessarily indicate that the P(error)=0.001. 20 dB is presently the minimum score for calling a homozygous variant and 40dB is for a heterozygous variant. Based on empirical testing, these thresholds were chosen to balance call-rate accuracy. Additionally, we add another layer of calls into our assembly process which is the "no-call". Therefore, a call can be homozygous ref, something else, or no-call. The no-call results from one hypothesis not being well separated from the other hypothesizes (>20dB) and, therefore, not sure what the correct answer is. As for best practices, since we have thresholded these as mentioned above and generated "no-calls" when the information is not well separated for each hypothesis, most of our customers take the genotype calls "as is" without applying another filter. Jason Laramie, PhD Principal Field Application Scientist Complete Genomics, Inc
 10-11-2011, 08:21 AM #3 karenliu Junior Member   Location: Seattle, WA Join Date: Oct 2011 Posts: 1 Hi Jason, A follow up question to your answer: you said 20 dB is presently the minimum score for calling a homozygous variant and 40dB is for a heterozygous variant I see that each allele in a diploid locus is called separately. For example, I can have a genotype AN or GN or NN. Namely, no-calls are determined per allele bases. If this is the case, what does the homozygous vs. heterozygous variant mean in your definition above? Thanks. Karen Liu

