Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools snp calling -- old version seems to be better for my dataset?

    Hi,

    I used samtools pileup (0.1.7) and mpileup/bcftool (0.1.14) to call SNPs for my datasets. And we also perform taqman assay of one SNP for the same 28 samples. (One SNP is a small sample, however I still compare the results between samtools and taqman assay.)

    We found that results of old version with pileup can match 27/28 samples from Taqman; results of new version with mpileup can only match 7/28 samples from taqman. So old version seems to be better for my dataset. Did anybody do the similar validation before? What is your conclusion?

    Thanks!

  • #2
    Can you post the command lines you are using for the pileup and mpileup?

    Comment


    • #3
      Originally posted by oiiio View Post
      Can you post the command lines you are using for the pileup and mpileup?
      I used the example command lines in Samtools manual.
      After gaining the sorted bam files,

      samtools pileup -vcf hg18_genome.fna sample1-sorted.nosoft.bam | tee sample1.raw.txt | samtools.pl varFilter -D6000 > sample1.flt.txt
      awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' sample1.flt.txt > sample1.final.txt

      samtools mpileup -uf hg18_genome.fna sample1-sorted.bam | bcftools view -p 0.99 -bvcg - > sample1-var.raw.bcf
      bcftools view sample1-var.raw.bcf | awk '$6>=3' | vcfutils.pl varFilter -d8 -D10000 -11e-5 -20 -41e-7 > sample1-var.flt.vcf

      Comment


      • #4
        I've had too many experiences where the BAQ calculations ate real SNPs, I always turn it off now. So you might be losing lots of real SNPs in mpileup, which is why the ones you do get aren't as likely to be real.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          I've had too many experiences where the BAQ calculations ate real SNPs, I always turn it off now. So you might be losing lots of real SNPs in mpileup, which is why the ones you do get aren't as likely to be real.
          Thanks! May I ask which one is BAQ? What should I remove from my command line?

          Comment


          • #6
            samtools mpileup -Buf

            Will disengage thr BAQ calculations.

            Granted, I do this because I align with bwa, which will find indels; I think the point of the BAQ calculations is to correct for aligners that don't, so if you are using something that doesn't find indels, then maybe you should keep the BAQ calculation in.

            But either way, it's worth trying.

            Comment


            • #7
              Originally posted by swbarnes2 View Post
              samtools mpileup -Buf

              Will disengage thr BAQ calculations.

              Granted, I do this because I align with bwa, which will find indels; I think the point of the BAQ calculations is to correct for aligners that don't, so if you are using something that doesn't find indels, then maybe you should keep the BAQ calculation in.

              But either way, it's worth trying.

              Thanks a lot! I will give it a try and report my results here again.

              Comment


              • #8
                Originally posted by swbarnes2 View Post
                samtools mpileup -Buf

                Will disengage thr BAQ calculations.

                Granted, I do this because I align with bwa, which will find indels; I think the point of the BAQ calculations is to correct for aligners that don't, so if you are using something that doesn't find indels, then maybe you should keep the BAQ calculation in.

                But either way, it's worth trying.
                You are right! The results look good after I add -B in the command line. Many thanks to you!

                Comment


                • #9
                  Originally posted by swbarnes2 View Post
                  I've had too many experiences where the BAQ calculations ate real SNPs, I always turn it off now. So you might be losing lots of real SNPs in mpileup, which is why the ones you do get aren't as likely to be real.
                  hi,

                  I have checked the difference between the mpileup results with and without -B. The following SNP is filtered without -B but retained with -B
                  Code:
                  scaffold1454    124026  .       C       T       74      .       DP=8;VDB=0.0000;AF1=1;AC1=2;DP4=0,0,0,8;MQ=25;FQ=-51    GT:PL:DP:SP:GQ  1/1:107,24,0:8:0:45
                  but when i check the reads quality from the cns file generated from pileup
                  Code:
                  scaffold1454    124026  C       N       0       0       0       8       ^:t^:t^:t^:t^:t^:t^:t^:t        !!!!!!!!
                  It seems this position locates at the end of the read and the quality is not very high. So the filtering is reasonable.

                  Comment


                  • #10
                    I've also found BAQ correction to be unhelpful when comparing triplicates of closely related bacteria.

                    Without BAQ, about 90-95 % of reads covering SNPs are listed as "high quality" in Samtools VCF output format.

                    With BAQ, frequently only about 5-10 % of reads covering SNPs are high quality.

                    I am not sure if BAQ has been extensively tested on bacteria, which in this example have higher rates of variation, i.e. more SNPs in any given region, than in humans.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    9 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X