Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • indel calling problems with mpileup/bcftools

    Hello,

    I'm using samtools and bcftools to call snps from a targeted resequencing project. Nucleotide polymorphisms seem to be fine, but indels are proving to be a problem. I can see the indels in, say, samtools tview, with high coverage. However, my samtools/bcftools output find only a few reads at the relevant position.

    My commands (following local realignment) are:

    samtools mpileup -uf concatenated.fasta -L 1000000 -d 1000000 Ec_A10.realigned.bam >Ec_A10.mpileup
    bcftools view -bvc Ec_A10.mpileup >Ec_A10.var.raw.bcf
    bcftools view Ec_A10.var.raw.bcf >Ec_A10_name.vars.tsv

    The relevant line from the output is:

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Ec_A10.test
    concatenated 2000 . CATCCTCGATGT C 185 . INDEL;DP=9;VDB=0.0074;AF1=0.5;AC1=1;DP4=2,2,1,4;MQ=60;FQ=159;PV4=0.52,0.00017,1,1 PL 223,0,194

    So, what is strange is that the reported depth is 9 reads, but there are clearly many more - I've also attached a screen shot from tview. Any suggestions on how to clear this up?

    Thanks!
    Attached Files

  • #2
    The results are wrong sometimes

    More importantly , it even shows up wrong SNPs which otherwise is not a SNP as per the bam file loaded on IGV.


    0 4009334 . A C 125 . DP=5;VDB=3.277706e-02;RPB=-1.291774e+00;AF1=0.5;AC1=1;DP4=1,1,2,1;MQ=59;FQ=74;PV4=1,0,1,0.38 GT:PL:GQ 0/1:155,0,101:99
    0 4009336 . T A 17.1 . DP=5;VDB=6.080000e-02;RPB=1.291774e+00;AF1=0.5;AC1=1;DP4=2,1,1,1;MQ=59;FQ=20.1;PV4=1,1,0.14,1 GT:PL:GQ 0/1:47,0,154:50

    As above 4009334 is a SNP but not as per IGV picture below:
    Attached Files
    Last edited by abi; 05-21-2013, 10:59 AM.

    Comment


    • #3
      Originally posted by alexwong.carleton View Post
      Hello,

      I'm using samtools and bcftools to call snps from a targeted resequencing project. Nucleotide polymorphisms seem to be fine, but indels are proving to be a problem. I can see the indels in, say, samtools tview, with high coverage. However, my samtools/bcftools output find only a few reads at the relevant position.

      My commands (following local realignment) are:

      samtools mpileup -uf concatenated.fasta -L 1000000 -d 1000000 Ec_A10.realigned.bam >Ec_A10.mpileup
      bcftools view -bvc Ec_A10.mpileup >Ec_A10.var.raw.bcf
      bcftools view Ec_A10.var.raw.bcf >Ec_A10_name.vars.tsv

      The relevant line from the output is:

      #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Ec_A10.test
      concatenated 2000 . CATCCTCGATGT C 185 . INDEL;DP=9;VDB=0.0074;AF1=0.5;AC1=1;DP4=2,2,1,4;MQ=60;FQ=159;PV4=0.52,0.00017,1,1 PL 223,0,194

      So, what is strange is that the reported depth is 9 reads, but there are clearly many more - I've also attached a screen shot from tview. Any suggestions on how to clear this up?

      Thanks!
      I only count 5 reads covering base 2000 in your picture.

      Why don't you try counting the fastq directly by grepping a little bit of sequence from each allele, or by adding the putative indel allele to your reference, and realigning?

      Comment


      • #4
        swbarnes2 - Sorry, I should have made clear - the image captures only a small proportion of the reads covering the region. The mutation I'm looking for is an 11-bp deletion.

        I'll try out both of your suggestions - thanks! I should think that one or both should work, although ideally it would be nice to not have to have solutions tailored for specific mutations.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Today, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        37 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X