Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Odd behavior using samtools mpileup, bcftools

    I've tried this a few times and I'm getting very occassional / odd behavior in the output from mpileup / bcftools / vcfutils varFilter. The output reports an indel, but upon examination of the reads (with IGV), a single read contains the indel, while all the others do not, yet the vcf output file for my data reports:

    Chr1-gr2-PU5 662026 . ATGGAATGGTGGATTGG ATGGAATGGTGGATTGGAATGGTGGATTGG 99 . INDEL;DP=398;AF1=1;CI95=1,1;DP4=0,0,173,125;MQ=60 PL:GT:GQ 255,255,0:1/1:99

    One problem is that this output appears to state that there are 173, and 125 reads supporting the indel, when in fact there is 1. Other than manual inspection, I can't tell this apart from other true indels.

    I'm curious if this is due to a parameter in the command line? or if this is some kind of bug? (probably the former).

    The versions and command line parameters are as follows:
    This is a bacterial genome resequencing project with read depth > 300.
    Paired-end, 100 nt reads were aligned with default BWA parameters to a reference genome (BWA version 0.5.9-r16. (The genome contains some IUPAC codes, but none in the region of the problem I'm reporting).

    The resulting (sorted, indexed) bam file was used for input into samtools mpileup (0.1.12a)

    >samtools mpileup -uf ref.fasta aln.sorted.bam |bcftools view -bvcg - > aln.raw.bcf

    >bcftools view aln.raw.bcf | vcfutils.pl varFilter > aln.flt.vcf


    Thanks for any help !

    Jim

  • #2
    in the varfilter there is an option of specifying the minimum depth of reads that are aligned to be called as a SNP/InDel . I guess its the -D parameter.

    Comment


    • #3
      Between mpileup, and eyeballing IGV, I'd be more inclined to think that IGV is for some reason refusing to display the relevant reads, than that samtools is just making up a bunch of reads that really aren't there.

      Look at the .sam entries for that region. That's the sure way.
      Last edited by swbarnes2; 05-15-2013, 03:51 PM.

      Comment


      • #4
        I havea a similar problems with calling InDels

        I tried to see if I can get InDels by using some contigs from E.Coli that was generated by SGA assembler. In a way the contigs would serve as a single end NGS data with the sequences being much longer.

        I first aligned the contigs to reference e.coli using bwa-sw to obtain SAM/BAM file. The BAM file was then sorted and indexed.

        Then, here is what I did for pileup for processing for a small region of interest:

        samtools mpileup -r 0:4,000,679-5,000,894 -uf e.coli.fa contigalignsorted.bam > contigalignsorted.region.mpileup

        Then to obtain SNPs and InDels from mpileup file, here is what I did:

        bcftools view -bvcg contigalignsorted.region.mpileup > contigalignsorted.region.bcf

        bcftools view contigalignsorted.region.bcf > contigalignsorted.region.vcf


        However, when i see my alignment of BAM file on IGV for this region, I notice that the final VCF file obtained through commands above, has a lot of SNPs and InDels missing!


        What went wrong in the calls ?

        Comment


        • #5
          More importantly , it even shows up wrong SNPs which otherwise is not a SNP as per the bam file loaded on IGV.


          0 4009334 . A C 125 . DP=5;VDB=3.277706e-02;RPB=-1.291774e+00;AF1=0.5;AC1=1;DP4=1,1,2,1;MQ=59;FQ=74;PV4=1,0,1,0.38 GT:PL:GQ 0/1:155,0,101:99
          0 4009336 . T A 17.1 . DP=5;VDB=6.080000e-02;RPB=1.291774e+00;AF1=0.5;AC1=1;DP4=2,1,1,1;MQ=59;FQ=20.1;PV4=1,1,0.14,1 GT:PL:GQ 0/1:47,0,154:50

          As above 4009334 is a SNP but not as per IGV picture below:
          Attached Files

          Comment


          • #6
            IGV is not designed to visualize exactly what samtools mpileup is doing. They are different pieces of software, and they filter things differently.

            Likely, IGV is for some reason filtering away the relevant reads, or something else is different between the two pieces of software.

            Your vcf says there is a mixed C/A SNP, and your picture shows a mixed C/A SNP. Maybe there is a discrepancy in the references between the reference you aligned to for the vcf, and the reference you are looking at in IGV.

            Examine the raw read themselves, in the .bam. That's the way to be sure.
            Last edited by swbarnes2; 05-21-2013, 11:22 AM.

            Comment


            • #7
              @swbarnes2

              I did the indexing of my genome file which was used for alignment. I used the same genome to be uploaded to IGV and not ones already provided by IGV drop-down options. You can clearly see in the picture that IGV does not point out a SNP at location 4009334 while SAMtools/mpileup/bcftools extract it out! One got to be wrong, possibly the latter.

              Comment


              • #8
                Thanks @swbarnes2 for pointing out the difference between IGV and SAMtools. I am now using TVIEW and am able to see the SNP from my BAM file.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X