Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools variant calling questions

    Hi,

    In our group we are using samtools for variant calling. As a basic guide we use the example given at http://samtools.sourceforge.net/mpileup.shtml. It seems samtools is able to perform as a nice tool to get from bam to a useful variant call format that can be annotated using other resources. Yet we have some difficulties understanding and applying some parts to proper use.

    Instead of what is shown in the example we want to apply variant calling on a single sample. The first question is if it's safe to use mpileup on a single sample in a similar way as is shown in the example, or should I use normal pileup for this? (And does this still apply BAQ?)

    Then the data is converted to a raw bcf file using bcftools. The second question is if this output contains every possible variant disregarding quality, depth, and the number of variant supporting calls? I assume this is the case and further polishing is done using vcfutils but please correct me if I'm wrong.

    Finally, vcfutils' varfilter is applied for filtering. In the example only a depth filter is shown. Next to the depth there are some other thresholds we would like to set. We would like to apply a (base) quality cutoff, a strand-bias filter for reference and variant calls, and inlcude variant supporting calls.

    A close inspection of the varfilter help shows a couple of possibilities. I'll briefly describe how we think they should be used, or what our difficulties are.
    -Using the -a flag we can set the number of variant supporting calls?
    -The -1 flag seems to be a p-val for strand bias cutoff. Yet I'm unable to find any explanation on what useful values we can use. (Or how this behaves in certain conditions we are interested in. i.e. Both reference and variant calls found on both strands.
    -Then there are the -2, -3, and -4 flags which imply serveral p-val setting. Default values are given. However, also here an explanation on how to alter this for different practical conditions would be very welcome.
    -The default value for mapQ bias is 0, why?

    We couldn't find much information on these issues in literature or other recources. Nevertheless, some of these setting are crucial in variant calling and I would expect better descriptions than what we could find so far, especially when a clinical setting comes into play. It would be greatly appreciated if anyone could give some answers. Thanks.

  • #2
    Originally posted by Chiel View Post
    Hi,

    In our group we are using samtools for variant calling. As a basic guide we use the example given at http://samtools.sourceforge.net/mpileup.shtml. It seems samtools is able to perform as a nice tool to get from bam to a useful variant call format that can be annotated using other resources. Yet we have some difficulties understanding and applying some parts to proper use.

    Instead of what is shown in the example we want to apply variant calling on a single sample. The first question is if it's safe to use mpileup on a single sample in a similar way as is shown in the example, or should I use normal pileup for this? (And does this still apply BAQ?)

    Then the data is converted to a raw bcf file using bcftools. The second question is if this output contains every possible variant disregarding quality, depth, and the number of variant supporting calls? I assume this is the case and further polishing is done using vcfutils but please correct me if I'm wrong.

    Finally, vcfutils' varfilter is applied for filtering. In the example only a depth filter is shown. Next to the depth there are some other thresholds we would like to set. We would like to apply a (base) quality cutoff, a strand-bias filter for reference and variant calls, and inlcude variant supporting calls.

    A close inspection of the varfilter help shows a couple of possibilities. I'll briefly describe how we think they should be used, or what our difficulties are.
    -Using the -a flag we can set the number of variant supporting calls?
    -The -1 flag seems to be a p-val for strand bias cutoff. Yet I'm unable to find any explanation on what useful values we can use. (Or how this behaves in certain conditions we are interested in. i.e. Both reference and variant calls found on both strands.
    -Then there are the -2, -3, and -4 flags which imply serveral p-val setting. Default values are given. However, also here an explanation on how to alter this for different practical conditions would be very welcome.
    -The default value for mapQ bias is 0, why?

    We couldn't find much information on these issues in literature or other recources. Nevertheless, some of these setting are crucial in variant calling and I would expect better descriptions than what we could find so far, especially when a clinical setting comes into play. It would be greatly appreciated if anyone could give some answers. Thanks.
    I have many of the same questions and cannot find answers. Can someone give some guidance or points us towards resources which explain this more.

    Comment


    • #3
      I would also like to know how to filter strand bias using GATK Unified Genotyper. What is the ideal SB (Strand Bias) threshold value?

      Thanks,
      Sérgio

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      66 views
      0 likes
      Last Post seqadmin  
      Working...
      X