Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Varscan frequency question

    I've been trying out varscan for indel calling recently but I have an issue with the output, rather the information in the output... The field named Freq doesnt seem to agree with the different Reads fields in my output. Here is a couple of examples (I've formatted the output so that it is a bit more reader-friendly):

    No code has to be inserted here.I'd have thought the frequency would be Reads2/(Reads1+Reads2), the reads supporting the variant over all the reads. This is not the case for most of my entries. They are all ball-park-close but few are spot on (I've included an entry that seems correct, the last one).

    For the first row of my example: 13/14 = 0.9286 != 0.8667
    However, what is 0.8667 is this: 13/15.

    Are the reads wrong? Are the frequencies wrong? Are they calculated with different views on what is a supporting read? Something is going on that I cant seem to be able to figure out, or find searching forums etc.

    I'm using VarScan v2.3.2, mpileup2indel with a few parameters:
    --min-var-freq 0.001
    --min-avg-qual 30
    --min-reads2 10
    --strand-filter 1
    --p-value 0.9

    I'm sure theres a simple answer, does anyone have it?
    Cheers
    //Adrian

  • #2
    Hi Adrian,
    I'm also new to VarScan. I wonder if these figures are affected by the flag:
    "--min-avg-qual Minimum base quality at a position to count a read [15]"
    The Freq of 94.12% given in your second example can be derived from 32/34, so perhaps there are 34 reads over that position, but only 33 that have base qualities of 15 or above. You could eyeball the pileup and see if this is true or try using the
    VarScan readcounts tools with parameters:
    --min-coverage 0
    --min-base-qual 0

    I'd be interested to see how you get on.
    Cheers,
    Graham

    Comment


    • #3
      Thank you Graham, for your input!

      I ran readcounts with parameters:
      --min-coverage 0
      --min-base-qual 30 (Since this is what I ran mpileup2idel with)

      And I think I found the answer to my question.

      No code has to be inserted here.The entries with "deviating" frequencies had additional variants, in the example cases one read each. Adding this read to the total gives us the same frequency as printed in the output file (13/15 and 32/34). Some of my positions had several additional variants in the pileup file making the total even more "wrong" when added together just from the mpileup2indel output file. At least now that I understand it there is no problem anymore, I can trust these values a bit more.

      So in the end the answer was quite simple, I guess.

      Thanks again
      //Adrian


      Edit: I may have broken the forum boundaries with my huge table...

      Comment


      • #4
        Hello Adrian and Graham,

        Thank you for bringing this up and looking into the issue. Adrian, would you mind sending the raw pileup for the two positions that you mentioned?

        Feel free to use VarScan's support forum if you have other issues or questions.

        Note that correctly counting reads (supporting or refuting) for indels is difficult using the first-pass alignments in a BAM file. Optimally, you would use VarScan to discover the indels, and then use realignment (GATK) or indel haplotype remapping (DINDEL) to obtain more accurate read counts and variant allele frequencies.

        Yours,

        Dan Koboldt

        Comment


        • #5
          Originally posted by adrianl View Post
          Thank you Graham, for your input!

          I ran readcounts with parameters:
          --min-coverage 0
          --min-base-qual 30 (Since this is what I ran mpileup2idel with)

          And I think I found the answer to my question.

          No code has to be inserted here.The entries with "deviating" frequencies had additional variants, in the example cases one read each. Adding this read to the total gives us the same frequency as printed in the output file (13/15 and 32/34). Some of my positions had several additional variants in the pileup file making the total even more "wrong" when added together just from the mpileup2indel output file. At least now that I understand it there is no problem anymore, I can trust these values a bit more.

          So in the end the answer was quite simple, I guess.

          Thanks again
          //Adrian


          Edit: I may have broken the forum boundaries with my huge table...
          Can you tell how this result come out? With which program and parameters?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Today, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          37 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          35 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X