Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • SamH
    Member
    • Sep 2010
    • 15

    samtools/bcftools missing obvious SNPs?

    Hi, I am having some trouble with what appears to be missed calls for SNPs while using the samtools/bcftools package.
    I am using the latest release samtools 0.1.18 (r982:295), and bcftools 0.1.17-dev (r973:277). My reads are SE Illumina, which have been trimmed of low-quality ends and discarded if they contain low-quality internal bases. Reads are ~55bp long, and coverage is 300-500x per sample.

    I first aligned against a reference genome using bwa and sorted using samtools:
    bwa aln -t 3 ./ST3-index/NC_015433_mod.fna ../03-Trimmed/s_3_trimmed.fastq > s_3.aln.sai
    bwa samse ./ST3-index/NC_015433_mod.fna s_3.aln.sai ../03-Trimmed/s_3_trimmed.fastq | gzip > s_3.sam.gz
    samtools view -uS s_3.sam.gz | samtools sort - s_3

    I then calculated coverage:
    samtools mpileup -BQ0 -d10000000 -f ./ST3-index/NC_015433_mod.fna s_1.bam s_2.bam s_3.bam > SSIM_1-2-3_coverage.mpileup

    As well as unfiltered variants:
    samtools mpileup -uf ./ST3-index/NC_015433_mod.fna s_1.bam s_2.bam s_3.bam | bcftools view -bvcg - > SSIM_1-2-3_variants.bcf
    bcftools view SSIM_1-2-3_variants.bcf > SSIM_1-2-3_variants.vcf

    I then started spot checking some of the more interesting variants using a bam viewer (tablet) when I noticed something odd. One of the variants which showed up in the bam file, wasn't listed at all in the vcf file. I checked this position in the mpileup file, and it appears as if it should be an obvious SNP with very deep coverage. I am hoping that someone can please help me understand why this SNP wouldn't be called by bcftools.

    Relevant mpileup line for the missing SNP (position 569984 in ref) is:
    NC_015433.1_Ss_ST3 569984 A 376 C$C$C$c$c$CCcCCCccCcccCccccCCcccccCccCCcccCCCCCccccccccCcCCCcccCCCCCCCcCCcCCccccCCCccccCccCCCccCCCcCcCCCcCCccCCCCCccccccccCCccccCccccCccccccCCccccCCcCCCCcccccCCCcCCCCCcCCCCCcccCCCcCcCCcCcCcCcCcccCCCccCCccCCcCcCCCCCcCCCCCcccCCccccccccCCCCcCCCcccCCCcccCCcCCCCccCccCCCCCCCccccCCCccCCCcccCCCCcCCCcccCCCCCCcccCCCCccCcCcCCCCCCCCCcCccCCCCCCCcccccCCCCCCCccCCcCCCccCccCCCCCccccCCcCcccc^FC^FC^Fc^FC^Fc hhhghgghgghfhhfhggehdhhhhhfhhhffgfchYhhghhfhhhghghcfgghhhhcghhhggchgfchhhffhggh_ffhhggghffcghghfafhhhhchhhhhhchgghcgghfhhffffa_chdhhghhhhhfhhfgfghhgghhhghhfghfhghhchgfghhhhhghhfhhfghhhgdffgghhghfhhchhhhfgfhgggghfgghhfhghfhfhdhhhhg^ghchhfeghhhhhhhghgggghghhhhgdghhhhhfhhhghhhchhgcgghhgghhcghehhfh]hhgfhhhhhhhhhghhgghgafehcgfhhhhghhhfhhhhhhdghhhhhhhfhhhhfhhhhhhcghgh[ghhgh_hh`hh 505 C$C$C$C$C$C$c$c$C$C$CCcccccctccccCCCCCcccCccccCCcccccCCCcCcccCCCCcccccCCCcccccCCcCccCCCcCccccCCcCcCccCCCCcccccCcCcCccccccccccccccccCCCCCCcCcccCCCccccccCCccCcCCCCCCcccCCCCCCCCcccCCCCCCccCcCccccCcccCccCCCCcccCCCCcccCCCCCcCCcccCCCCCCCcccccCCCCCCCcCccCcccccCCCCCcccCCCCCCCCcCcCcCcccCCCCCCcCcCCCCCCcCcCcCCCccccCCCCCCccCCcCCCCCccCCCCCcCCCCCCccCCCccCCCCCCCccccccCccCCcccCCccccCcccccCccCcccCcCCCcccCCCCCCCcccCcCCccccccCCcccccCccCCCCCccCccccCCCCcccccCCcCCCcCCCCCCCccccccCcCCCcccCcCcccCCCCCCCccccCCCccCccCCCCCcccCc^FC^FC^FC^FC^Fc^Fc^Fc^Fc^Fc^FC^Fc fhhhggghhgghdhhfhhhhhghhcchfhhhfhhhhhhhhhh_hghhhhhhhhhfgaffhhhheghdfdghdhhgggfhfhhhgefhdhf_ghghfh_ghghghcfhhghchhghgggfgchghghhhhfYdfghfagggehafggfdhhhgde_hghhhhhhhhfefhfhhhhhfggahhdhhhhgffhhhhfhfhhhhfhhhhghfahhgghfhhhhhhgehhhedhhfhdehghafghhhhhghhhfhhhhhdhehghghhhhhhgghhhghhhhhfhhhhhhhhhhhhghhhhhhhhhhhhhfhfghhhggghhhhfhghggehaghhgh_hhhhhhchgfhhchhfhhdfhfhhhhhhhbhhghhahhghhhhaghghhhhhhh\hhfhhhhghhhhghhfhhhghhfh_cchhhhghghhffhhhahhgcgehhfhghhhdghhhhhhhfhhehhhhghhchhfhchfhfhhhhhhhfchhhfggdhghgfhhhghhhf 447 C$C$C$C$C$C$c$CCCCCCcccCCCccCCCCCCccccCcCccccCcCCccccCcccCCCCcccCCcCCCCCcCcccCCcccCCCCCCCCCCCcCccccCccCCcccCcCCcCccccCCcccCcCcccCCCccCccCCccccCccCcCcccCCccCCcccccCCCCCcCccCCccccCCCCCcccccCCCcCcCCCCCccCcCCCcCCCccccCCCCCcCcCccCCCCCCccCCCCCCccCcccccCCCCCCCCCCccccCCcCCCCCCCccCCCCCCCCcccCcCCCCCCcCCCCCccccccCccCCCCcccccCCcCcCCCCccCCccCCCccCCccccCCCCCCccCCCCcCCCCCCCCCCcCCccCccCccCCCccccccCcccCCCCCCCCccccCCCCcccCcCcccCCCCCCCCcccccCCCCCCccCcCCCCcc^FC^FC^FC^FC^FC^FC^FC^FC^Fc^FC^Fc^Fc hfhhghhhhfhhfhhhfhghhhhhchegfhaghhhhhhhhhgghfgcfhhhhghghhhhghhgfchfhghhgghdhgfhhhhghggghhhhfhg\gfhcehhghfh^Whfdhggghfgchhhhfhahgfchgfgghgghghhgeehdhdggdfffhhfhghggfgdfghhfhghhhhehfghhhhhehghhhdhhghdgghggdchhhghhhhhhhhhhhhhggghgdhhfghfgahfhghfhhhehhhghfhfhhhhhhcghhhghgghhhhahhhffgdghhghhgghhghghhchhhhhhhhfhgcfhh^fhghhhhhhfhhghfhhfhchhgghghhhhhhffahhg^hgghhhhhfghhggdch\hhhhhcffhghhghhhhhgfhgghdghhhhehhhhhhhhhgffahghhhhfahchgghgfhhghhhfhhhchfghgd

    A nearby SNP which did show up in the vcf file and is very similar:
    NC_015433.1_Ss_ST3 542551 A 237 g$GGGgGGgGgGGGggggGGGGGGGggGGGGGggGgGgGGGGGGgggGgggGGGGGGgggGggGGGGGGGggGgGGgGgGGGGGGgggGGGGggGGGGgGGGggGGggGggGGggggGggggGggggGGGGGgggGgGGGgggGgGGGGGggGGGgggGGGGgggGgGgGggGggggGgGgggGGGGGGggGGGggGgGGGGgGGGGGgGGgGgGGGGgGGGGGgGGGGGGGggGggg hhfhhfghhhfdghhghcfgfffbhhfgchhhhghhhhfhhhahgfdhhdfhfeffgggfhghhhhghhfhfhhhhfhggf]gfhghhhhhgh_fdhhfgghaggagghhfghhhhdhghgghhhghgggchhhggggghhgddhgafgahhhgfhgfggfghhfhfhfhhchghhg\fhhhcffgcgghfdchgchggfghhfhdhhgghfhhfhghfhhffhfhghgfghhdhhh 270 G$G$G$g$g$GGGGGGgGgGGGGGGggGGGgggGGgggGGGgGgGgGGGGGGgGGgGGggGGGGGGggggGGggGGGGggGGGGGGgGgGGggggggGGgGGGGGGgggGGgGggGgGggggGggGgGGGGggGgggGGGggggGGGGGGgGGGGGGGGGgGggggGGGGGgGGGGGggGggGGGGGGgggGgGGGGgGgGGGGGGGggGggggGGGGGGGGgggGgGGGGgGGGGgggggGGGGGGGggGGGGGGGGGGGGGGGGGGGGggGGg dhhhgfgehffhghdffccghh\chhhgfchghfhahhhhgchehgdhcghhfghgeffdhhfhhhgghhhhfhhcfhdfhhfffcchghhhfghhhfhffhehdchhcgdhhhhhgghhcgfhfhhhcghhhffhhhhfgggghhghdeggggchchhhhhfggf\egfcgghghhhhffffhhhfh\dfdhfhfgffgfahg\hhhhdfcc_affhfhgfhffhhfhhhhhhhgdffhdffhhghhhghhghhfafghchfhhfgfXh 285 G$G$G$g$g$GGggGGGgggggggGgGggggGggggGGgggggggggGGGggGGgGgGGggGGGggGGGGGGggGGgggGGGGGGggGggGGGGGGGGgGggGgGGGgGGGGgGgGgGGGgggggggGggggGGGGGGgggGggGGGggGGGGGGgggggggggGggGggGGGGGGGGGGggggGGGGggGgGggGGgGGggGggggGGGGGggGGGGGGGGGGggGGGGGgggGGGGGGgggGGGGggggGGGGggggGgGgggGGGGgGGGGGGGGGgGGGGGGGGgg gchhffhgh^gfhghhhhgdecghhfdhhhhdfhhffhhhhgbfhhfehhfhc^hhfffhhcfdhchhhaahfhggghfdghfhegfhfdefffehhghghYhdgahhfhhhfffhhhhdhhfhhchcgffffhhggfhhhdgcdfgfgfhhghahghhghcahdgghhfffgdfhhhhefgfhhghchhhfgfchhghhggfgdffhhfdffffffafhhgfffchhhfafddfhfhcgcfhhhfffhfchggfgfhhhhhfhhhhhhbhhhhhhhhf\fhdfh


    And the two lines from the vcf file which should be flanking the missing SNP:

    NC_015433.1_Ss_ST3 543182 0 T G 999 0 DP=1575;VDB=0.0742;AF1=1;AC1=6;DP4=1,0,597,717;MQ=37;FQ=-260;PV4=0.45,7.8e-131,0.46,0.4 GT:PL:GQ 1/1:244,255,0:99 1/1:235,255,0:99 1/1:217,255,0:99 2 2 2
    NC_015433.1_Ss_ST3 570005 0 C T 999 0 DP=1403;VDB=0.0710;AF1=0.6667;AC1=4;DP4=239,177,498,334;MQ=37;FQ=999;PV4=0.43,0,0.24,1 GT:PL:GQ 0/0:0,255,255:99 1/1:255,255,0:99 1/1:255,255,0:99 0 2 2

    Although I have only discovered one of these cases, I am concerned that I might be missing other variants as well.

    Thanks for taking a look.

    Sam
  • colindaven
    Senior Member
    • Oct 2008
    • 417

    #2
    Try running Samtools mpileup with the BAQ option turned off (-B) and checking the same position. The SNP may have been masked.

    Comment

    • rand
      Junior Member
      • Jul 2009
      • 4

      #3
      Thanks, it helped me.

      Comment

      • SamH
        Member
        • Sep 2010
        • 15

        #4
        Yeah, adding -B made that SNP appear:

        NC_015433.1_Ss_ST3 569984 . A C 999 . DP=1328;VDB=0.0735;AF1=1;AC1=6;DP4=0,0,657,577;MQ=37;FQ=-288 GT:PL:GQ 1/1:255,255,0:99 1/1:255,255,0:99 1/1:255,255,0:99


        thanks for the tip!

        Sam

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Pathogen Surveillance with Advanced Genomic Tools
          by seqadmin




          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
          03-24-2025, 11:48 AM
        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        49 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        57 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        50 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        201 views
        0 reactions
        Last Post seqadmin  
        Working...