Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools/bcftools missing obvious SNPs?

    Hi, I am having some trouble with what appears to be missed calls for SNPs while using the samtools/bcftools package.
    I am using the latest release samtools 0.1.18 (r982:295), and bcftools 0.1.17-dev (r973:277). My reads are SE Illumina, which have been trimmed of low-quality ends and discarded if they contain low-quality internal bases. Reads are ~55bp long, and coverage is 300-500x per sample.

    I first aligned against a reference genome using bwa and sorted using samtools:
    bwa aln -t 3 ./ST3-index/NC_015433_mod.fna ../03-Trimmed/s_3_trimmed.fastq > s_3.aln.sai
    bwa samse ./ST3-index/NC_015433_mod.fna s_3.aln.sai ../03-Trimmed/s_3_trimmed.fastq | gzip > s_3.sam.gz
    samtools view -uS s_3.sam.gz | samtools sort - s_3

    I then calculated coverage:
    samtools mpileup -BQ0 -d10000000 -f ./ST3-index/NC_015433_mod.fna s_1.bam s_2.bam s_3.bam > SSIM_1-2-3_coverage.mpileup

    As well as unfiltered variants:
    samtools mpileup -uf ./ST3-index/NC_015433_mod.fna s_1.bam s_2.bam s_3.bam | bcftools view -bvcg - > SSIM_1-2-3_variants.bcf
    bcftools view SSIM_1-2-3_variants.bcf > SSIM_1-2-3_variants.vcf

    I then started spot checking some of the more interesting variants using a bam viewer (tablet) when I noticed something odd. One of the variants which showed up in the bam file, wasn't listed at all in the vcf file. I checked this position in the mpileup file, and it appears as if it should be an obvious SNP with very deep coverage. I am hoping that someone can please help me understand why this SNP wouldn't be called by bcftools.

    Relevant mpileup line for the missing SNP (position 569984 in ref) is:
    NC_015433.1_Ss_ST3 569984 A 376 C$C$C$c$c$CCcCCCccCcccCccccCCcccccCccCCcccCCCCCccccccccCcCCCcccCCCCCCCcCCcCCccccCCCccccCccCCCccCCCcCcCCCcCCccCCCCCccccccccCCccccCccccCccccccCCccccCCcCCCCcccccCCCcCCCCCcCCCCCcccCCCcCcCCcCcCcCcCcccCCCccCCccCCcCcCCCCCcCCCCCcccCCccccccccCCCCcCCCcccCCCcccCCcCCCCccCccCCCCCCCccccCCCccCCCcccCCCCcCCCcccCCCCCCcccCCCCccCcCcCCCCCCCCCcCccCCCCCCCcccccCCCCCCCccCCcCCCccCccCCCCCccccCCcCcccc^FC^FC^Fc^FC^Fc hhhghgghgghfhhfhggehdhhhhhfhhhffgfchYhhghhfhhhghghcfgghhhhcghhhggchgfchhhffhggh_ffhhggghffcghghfafhhhhchhhhhhchgghcgghfhhffffa_chdhhghhhhhfhhfgfghhgghhhghhfghfhghhchgfghhhhhghhfhhfghhhgdffgghhghfhhchhhhfgfhgggghfgghhfhghfhfhdhhhhg^ghchhfeghhhhhhhghgggghghhhhgdghhhhhfhhhghhhchhgcgghhgghhcghehhfh]hhgfhhhhhhhhhghhgghgafehcgfhhhhghhhfhhhhhhdghhhhhhhfhhhhfhhhhhhcghgh[ghhgh_hh`hh 505 C$C$C$C$C$C$c$c$C$C$CCcccccctccccCCCCCcccCccccCCcccccCCCcCcccCCCCcccccCCCcccccCCcCccCCCcCccccCCcCcCccCCCCcccccCcCcCccccccccccccccccCCCCCCcCcccCCCccccccCCccCcCCCCCCcccCCCCCCCCcccCCCCCCccCcCccccCcccCccCCCCcccCCCCcccCCCCCcCCcccCCCCCCCcccccCCCCCCCcCccCcccccCCCCCcccCCCCCCCCcCcCcCcccCCCCCCcCcCCCCCCcCcCcCCCccccCCCCCCccCCcCCCCCccCCCCCcCCCCCCccCCCccCCCCCCCccccccCccCCcccCCccccCcccccCccCcccCcCCCcccCCCCCCCcccCcCCccccccCCcccccCccCCCCCccCccccCCCCcccccCCcCCCcCCCCCCCccccccCcCCCcccCcCcccCCCCCCCccccCCCccCccCCCCCcccCc^FC^FC^FC^FC^Fc^Fc^Fc^Fc^Fc^FC^Fc fhhhggghhgghdhhfhhhhhghhcchfhhhfhhhhhhhhhh_hghhhhhhhhhfgaffhhhheghdfdghdhhgggfhfhhhgefhdhf_ghghfh_ghghghcfhhghchhghgggfgchghghhhhfYdfghfagggehafggfdhhhgde_hghhhhhhhhfefhfhhhhhfggahhdhhhhgffhhhhfhfhhhhfhhhhghfahhgghfhhhhhhgehhhedhhfhdehghafghhhhhghhhfhhhhhdhehghghhhhhhgghhhghhhhhfhhhhhhhhhhhhghhhhhhhhhhhhhfhfghhhggghhhhfhghggehaghhgh_hhhhhhchgfhhchhfhhdfhfhhhhhhhbhhghhahhghhhhaghghhhhhhh\hhfhhhhghhhhghhfhhhghhfh_cchhhhghghhffhhhahhgcgehhfhghhhdghhhhhhhfhhehhhhghhchhfhchfhfhhhhhhhfchhhfggdhghgfhhhghhhf 447 C$C$C$C$C$C$c$CCCCCCcccCCCccCCCCCCccccCcCccccCcCCccccCcccCCCCcccCCcCCCCCcCcccCCcccCCCCCCCCCCCcCccccCccCCcccCcCCcCccccCCcccCcCcccCCCccCccCCccccCccCcCcccCCccCCcccccCCCCCcCccCCccccCCCCCcccccCCCcCcCCCCCccCcCCCcCCCccccCCCCCcCcCccCCCCCCccCCCCCCccCcccccCCCCCCCCCCccccCCcCCCCCCCccCCCCCCCCcccCcCCCCCCcCCCCCccccccCccCCCCcccccCCcCcCCCCccCCccCCCccCCccccCCCCCCccCCCCcCCCCCCCCCCcCCccCccCccCCCccccccCcccCCCCCCCCccccCCCCcccCcCcccCCCCCCCCcccccCCCCCCccCcCCCCcc^FC^FC^FC^FC^FC^FC^FC^FC^Fc^FC^Fc^Fc hfhhghhhhfhhfhhhfhghhhhhchegfhaghhhhhhhhhgghfgcfhhhhghghhhhghhgfchfhghhgghdhgfhhhhghggghhhhfhg\gfhcehhghfh^Whfdhggghfgchhhhfhahgfchgfgghgghghhgeehdhdggdfffhhfhghggfgdfghhfhghhhhehfghhhhhehghhhdhhghdgghggdchhhghhhhhhhhhhhhhggghgdhhfghfgahfhghfhhhehhhghfhfhhhhhhcghhhghgghhhhahhhffgdghhghhgghhghghhchhhhhhhhfhgcfhh^fhghhhhhhfhhghfhhfhchhgghghhhhhhffahhg^hgghhhhhfghhggdch\hhhhhcffhghhghhhhhgfhgghdghhhhehhhhhhhhhgffahghhhhfahchgghgfhhghhhfhhhchfghgd

    A nearby SNP which did show up in the vcf file and is very similar:
    NC_015433.1_Ss_ST3 542551 A 237 g$GGGgGGgGgGGGggggGGGGGGGggGGGGGggGgGgGGGGGGgggGgggGGGGGGgggGggGGGGGGGggGgGGgGgGGGGGGgggGGGGggGGGGgGGGggGGggGggGGggggGggggGggggGGGGGgggGgGGGgggGgGGGGGggGGGgggGGGGgggGgGgGggGggggGgGgggGGGGGGggGGGggGgGGGGgGGGGGgGGgGgGGGGgGGGGGgGGGGGGGggGggg hhfhhfghhhfdghhghcfgfffbhhfgchhhhghhhhfhhhahgfdhhdfhfeffgggfhghhhhghhfhfhhhhfhggf]gfhghhhhhgh_fdhhfgghaggagghhfghhhhdhghgghhhghgggchhhggggghhgddhgafgahhhgfhgfggfghhfhfhfhhchghhg\fhhhcffgcgghfdchgchggfghhfhdhhgghfhhfhghfhhffhfhghgfghhdhhh 270 G$G$G$g$g$GGGGGGgGgGGGGGGggGGGgggGGgggGGGgGgGgGGGGGGgGGgGGggGGGGGGggggGGggGGGGggGGGGGGgGgGGggggggGGgGGGGGGgggGGgGggGgGggggGggGgGGGGggGgggGGGggggGGGGGGgGGGGGGGGGgGggggGGGGGgGGGGGggGggGGGGGGgggGgGGGGgGgGGGGGGGggGggggGGGGGGGGgggGgGGGGgGGGGgggggGGGGGGGggGGGGGGGGGGGGGGGGGGGGggGGg dhhhgfgehffhghdffccghh\chhhgfchghfhahhhhgchehgdhcghhfghgeffdhhfhhhgghhhhfhhcfhdfhhfffcchghhhfghhhfhffhehdchhcgdhhhhhgghhcgfhfhhhcghhhffhhhhfgggghhghdeggggchchhhhhfggf\egfcgghghhhhffffhhhfh\dfdhfhfgffgfahg\hhhhdfcc_affhfhgfhffhhfhhhhhhhgdffhdffhhghhhghhghhfafghchfhhfgfXh 285 G$G$G$g$g$GGggGGGgggggggGgGggggGggggGGgggggggggGGGggGGgGgGGggGGGggGGGGGGggGGgggGGGGGGggGggGGGGGGGGgGggGgGGGgGGGGgGgGgGGGgggggggGggggGGGGGGgggGggGGGggGGGGGGgggggggggGggGggGGGGGGGGGGggggGGGGggGgGggGGgGGggGggggGGGGGggGGGGGGGGGGggGGGGGgggGGGGGGgggGGGGggggGGGGggggGgGgggGGGGgGGGGGGGGGgGGGGGGGGgg gchhffhgh^gfhghhhhgdecghhfdhhhhdfhhffhhhhgbfhhfehhfhc^hhfffhhcfdhchhhaahfhggghfdghfhegfhfdefffehhghghYhdgahhfhhhfffhhhhdhhfhhchcgffffhhggfhhhdgcdfgfgfhhghahghhghcahdgghhfffgdfhhhhefgfhhghchhhfgfchhghhggfgdffhhfdffffffafhhgfffchhhfafddfhfhcgcfhhhfffhfchggfgfhhhhhfhhhhhhbhhhhhhhhf\fhdfh


    And the two lines from the vcf file which should be flanking the missing SNP:

    NC_015433.1_Ss_ST3 543182 0 T G 999 0 DP=1575;VDB=0.0742;AF1=1;AC1=6;DP4=1,0,597,717;MQ=37;FQ=-260;PV4=0.45,7.8e-131,0.46,0.4 GT:PL:GQ 1/1:244,255,0:99 1/1:235,255,0:99 1/1:217,255,0:99 2 2 2
    NC_015433.1_Ss_ST3 570005 0 C T 999 0 DP=1403;VDB=0.0710;AF1=0.6667;AC1=4;DP4=239,177,498,334;MQ=37;FQ=999;PV4=0.43,0,0.24,1 GT:PL:GQ 0/0:0,255,255:99 1/1:255,255,0:99 1/1:255,255,0:99 0 2 2

    Although I have only discovered one of these cases, I am concerned that I might be missing other variants as well.

    Thanks for taking a look.

    Sam

  • #2
    Try running Samtools mpileup with the BAQ option turned off (-B) and checking the same position. The SNP may have been masked.

    Comment


    • #3
      Thanks, it helped me.

      Comment


      • #4
        Yeah, adding -B made that SNP appear:

        NC_015433.1_Ss_ST3 569984 . A C 999 . DP=1328;VDB=0.0735;AF1=1;AC1=6;DP4=0,0,657,577;MQ=37;FQ=-288 GT:PL:GQ 1/1:255,255,0:99 1/1:255,255,0:99 1/1:255,255,0:99


        thanks for the tip!

        Sam

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        33 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        34 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Working...
        X