Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Complete Genomics Variant Calls

    Hello,

    I was wondering if anyone could shed some light on the totalScore column in the VAR files produced by Complete Genomics? Specifically what do these scores mean? Is there a best practice in terms of thresholding for high confidence variants?

    Thank you in advance for your advice!

  • #2
    Hi,
    The totalScore is a likelihood ratio test between the most likely hypothesis (e.g. genotype) and the next most likely, and we express this score in decibels (dB). Bioinformaticists will recognize dB as the basis of the Phred scale: 10 dB means the likelihood ratio is 10:1, 20 dB means 100:1, 30 dB is 1000:1, etc. The variant scores factor in quantity of evidence (read depth), quality of evidence (base call quality values), and mapping probabilities. Therefore, the score measures our confidence in calling the variant. Likewise, we produce a "refScore" value that is calculated in a similar fashion but with the numerator of the likelihood being set to homozygous reference. Finally, the refScore can be used to ask how confident we are in the position being homozygous reference (e.g. high scores = high confidence) and if not homozygous reference the totalScore will then ask how confident are we in the genotype we called.

    Scores for variants are not calibrated on an absolute scale to error rate. A score of 30 dB does not necessarily indicate that the P(error)=0.001.

    20 dB is presently the minimum score for calling a homozygous variant and 40dB is for a heterozygous variant. Based on empirical testing, these thresholds were chosen to balance call-rate accuracy. Additionally, we add another layer of calls into our assembly process which is the "no-call". Therefore, a call can be homozygous ref, something else, or no-call. The no-call results from one hypothesis not being well separated from the other hypothesizes (>20dB) and, therefore, not sure what the correct answer is.

    As for best practices, since we have thresholded these as mentioned above and generated "no-calls" when the information is not well separated for each hypothesis, most of our customers take the genotype calls "as is" without applying another filter.

    Jason Laramie, PhD
    Principal Field Application Scientist
    Complete Genomics, Inc
    Jason Laramie, PhD
    Principal Application Scientist
    Complete Genomics, Inc.

    Comment


    • #3
      Hi Jason,

      A follow up question to your answer: you said
      20 dB is presently the minimum score for calling a homozygous variant and 40dB is for a heterozygous variant
      I see that each allele in a diploid locus is called separately. For example, I can have a genotype AN or GN or NN. Namely, no-calls are determined per allele bases. If this is the case, what does the homozygous vs. heterozygous variant mean in your definition above?

      Thanks.
      Karen Liu

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X