Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools snp calling -- old version seems to be better for my dataset?

    Hi,

    I used samtools pileup (0.1.7) and mpileup/bcftool (0.1.14) to call SNPs for my datasets. And we also perform taqman assay of one SNP for the same 28 samples. (One SNP is a small sample, however I still compare the results between samtools and taqman assay.)

    We found that results of old version with pileup can match 27/28 samples from Taqman; results of new version with mpileup can only match 7/28 samples from taqman. So old version seems to be better for my dataset. Did anybody do the similar validation before? What is your conclusion?

    Thanks!

  • #2
    Can you post the command lines you are using for the pileup and mpileup?

    Comment


    • #3
      Originally posted by oiiio View Post
      Can you post the command lines you are using for the pileup and mpileup?
      I used the example command lines in Samtools manual.
      After gaining the sorted bam files,

      samtools pileup -vcf hg18_genome.fna sample1-sorted.nosoft.bam | tee sample1.raw.txt | samtools.pl varFilter -D6000 > sample1.flt.txt
      awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' sample1.flt.txt > sample1.final.txt

      samtools mpileup -uf hg18_genome.fna sample1-sorted.bam | bcftools view -p 0.99 -bvcg - > sample1-var.raw.bcf
      bcftools view sample1-var.raw.bcf | awk '$6>=3' | vcfutils.pl varFilter -d8 -D10000 -11e-5 -20 -41e-7 > sample1-var.flt.vcf

      Comment


      • #4
        I've had too many experiences where the BAQ calculations ate real SNPs, I always turn it off now. So you might be losing lots of real SNPs in mpileup, which is why the ones you do get aren't as likely to be real.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          I've had too many experiences where the BAQ calculations ate real SNPs, I always turn it off now. So you might be losing lots of real SNPs in mpileup, which is why the ones you do get aren't as likely to be real.
          Thanks! May I ask which one is BAQ? What should I remove from my command line?

          Comment


          • #6
            samtools mpileup -Buf

            Will disengage thr BAQ calculations.

            Granted, I do this because I align with bwa, which will find indels; I think the point of the BAQ calculations is to correct for aligners that don't, so if you are using something that doesn't find indels, then maybe you should keep the BAQ calculation in.

            But either way, it's worth trying.

            Comment


            • #7
              Originally posted by swbarnes2 View Post
              samtools mpileup -Buf

              Will disengage thr BAQ calculations.

              Granted, I do this because I align with bwa, which will find indels; I think the point of the BAQ calculations is to correct for aligners that don't, so if you are using something that doesn't find indels, then maybe you should keep the BAQ calculation in.

              But either way, it's worth trying.

              Thanks a lot! I will give it a try and report my results here again.

              Comment


              • #8
                Originally posted by swbarnes2 View Post
                samtools mpileup -Buf

                Will disengage thr BAQ calculations.

                Granted, I do this because I align with bwa, which will find indels; I think the point of the BAQ calculations is to correct for aligners that don't, so if you are using something that doesn't find indels, then maybe you should keep the BAQ calculation in.

                But either way, it's worth trying.
                You are right! The results look good after I add -B in the command line. Many thanks to you!

                Comment


                • #9
                  Originally posted by swbarnes2 View Post
                  I've had too many experiences where the BAQ calculations ate real SNPs, I always turn it off now. So you might be losing lots of real SNPs in mpileup, which is why the ones you do get aren't as likely to be real.
                  hi,

                  I have checked the difference between the mpileup results with and without -B. The following SNP is filtered without -B but retained with -B
                  Code:
                  scaffold1454    124026  .       C       T       74      .       DP=8;VDB=0.0000;AF1=1;AC1=2;DP4=0,0,0,8;MQ=25;FQ=-51    GT:PL:DP:SP:GQ  1/1:107,24,0:8:0:45
                  but when i check the reads quality from the cns file generated from pileup
                  Code:
                  scaffold1454    124026  C       N       0       0       0       8       ^:t^:t^:t^:t^:t^:t^:t^:t        !!!!!!!!
                  It seems this position locates at the end of the read and the quality is not very high. So the filtering is reasonable.

                  Comment


                  • #10
                    I've also found BAQ correction to be unhelpful when comparing triplicates of closely related bacteria.

                    Without BAQ, about 90-95 % of reads covering SNPs are listed as "high quality" in Samtools VCF output format.

                    With BAQ, frequently only about 5-10 % of reads covering SNPs are high quality.

                    I am not sure if BAQ has been extensively tested on bacteria, which in this example have higher rates of variation, i.e. more SNPs in any given region, than in humans.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    12 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    59 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    54 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X