![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
going from RNA seq TopHat output to variant calls | efoss | Bioinformatics | 12 | 11-11-2013 02:15 AM |
Where can I get GM19240 HapMap cell line variant calls as VCF or a BED file | adrian | Bioinformatics | 2 | 09-13-2012 02:27 AM |
Variant calls with a low fraction of alt reads | Jeremy37 | Bioinformatics | 9 | 04-17-2012 07:18 PM |
convert CASAVA variant calls to VCF? | krish | Bioinformatics | 0 | 12-01-2011 09:44 PM |
merging and de-duplicating structural variant calls (bedpe) | splaisan | Bioinformatics | 0 | 06-27-2011 08:29 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: San Francisco, CA Join Date: May 2010
Posts: 6
|
![]()
Hello,
I was wondering if anyone could shed some light on the totalScore column in the VAR files produced by Complete Genomics? Specifically what do these scores mean? Is there a best practice in terms of thresholding for high confidence variants? Thank you in advance for your advice! |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Boston Join Date: Feb 2011
Posts: 3
|
![]()
Hi,
The totalScore is a likelihood ratio test between the most likely hypothesis (e.g. genotype) and the next most likely, and we express this score in decibels (dB). Bioinformaticists will recognize dB as the basis of the Phred scale: 10 dB means the likelihood ratio is 10:1, 20 dB means 100:1, 30 dB is 1000:1, etc. The variant scores factor in quantity of evidence (read depth), quality of evidence (base call quality values), and mapping probabilities. Therefore, the score measures our confidence in calling the variant. Likewise, we produce a "refScore" value that is calculated in a similar fashion but with the numerator of the likelihood being set to homozygous reference. Finally, the refScore can be used to ask how confident we are in the position being homozygous reference (e.g. high scores = high confidence) and if not homozygous reference the totalScore will then ask how confident are we in the genotype we called. Scores for variants are not calibrated on an absolute scale to error rate. A score of 30 dB does not necessarily indicate that the P(error)=0.001. 20 dB is presently the minimum score for calling a homozygous variant and 40dB is for a heterozygous variant. Based on empirical testing, these thresholds were chosen to balance call-rate accuracy. Additionally, we add another layer of calls into our assembly process which is the "no-call". Therefore, a call can be homozygous ref, something else, or no-call. The no-call results from one hypothesis not being well separated from the other hypothesizes (>20dB) and, therefore, not sure what the correct answer is. As for best practices, since we have thresholded these as mentioned above and generated "no-calls" when the information is not well separated for each hypothesis, most of our customers take the genotype calls "as is" without applying another filter. Jason Laramie, PhD Principal Field Application Scientist Complete Genomics, Inc |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Seattle, WA Join Date: Oct 2011
Posts: 1
|
![]()
Hi Jason,
A follow up question to your answer: you said 20 dB is presently the minimum score for calling a homozygous variant and 40dB is for a heterozygous variant I see that each allele in a diploid locus is called separately. For example, I can have a genotype AN or GN or NN. Namely, no-calls are determined per allele bases. If this is the case, what does the homozygous vs. heterozygous variant mean in your definition above? Thanks. Karen Liu |
![]() |
![]() |
![]() |
Tags |
cgi, variant analysis |
Thread Tools | |
|
|