Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unstable output from mpileup/samtools

    I have observed seemingly unstable outputs from the mpileup using default settings (identical to those suggested in the bowtie and samtools tutorials) and seek suggestions on how to resolve the issue. A few lines of the problematic output below. Note that allele frequencies (AF1, AFE) are nonzero, but readcounts (DP4; and other columns) are missing information. am running samtools from a 32Gb RAM system with 12x2.67Gb cores using CentOS5.5, and I am reference mapping a few million reads against a genome of approx 400Mb with >2k scaffolds for a few samples. Has anyone observed this, too, and if so, what solutions would there be to generate complete, healthy output?
    thanx a lot in advance!!

    scaffold_8 18567961 . C T 21.8 . AF1=1.000;AFE=0.689;DP4=0,0,0,0;MQ=0
    scaffold_8 18579230 . C A 36.2 . AF1=0.500;AFE=0.499;DP4=0,0,0,0;MQ=0
    scaffold_9 246255 . A T 4.77 . AF1=0.997;AFE=0.417;DP4=0,0,0,0;MQ=0
    scaffold_9 246263 . A G 4.77 . AF1=0.997;AFE=0.417;DP4=0,0,0,0;MQ=0
    scaffold_9 338143 . A G 99 . AF1=1.000;AFE=1.000;DP4=0,0,0,0;MQ=0
    scaffold_9 338163 . A G 43.8 . AF1=1.000;AFE=0.694;DP4=0,0,0,0;MQ=0
    scaffold_9 744547 . C G 99 . AF1=1.000;AFE=0.997;DP4=0,0,0,0;MQ=0
    scaffold_9 753409 . C G 99 . AF1=1.000;AFE=1.000;DP4=0,0,0,0;MQ=0
    scaffold_9 793195 . A G 61 . AF1=1.000;AFE=0.833;DP4=0,0,0,0;MQ=0

  • #2
    Can you paste in the command you used to generate this file. There is a good tutorial at http://samtools.sourceforge.net/mpileup.shtml and you could do something like

    samtools mpileup -gf ref.fa aln1.bam aln2.bam | bcftools view -vcG - > snps.vcf

    You can validate your VCF file with vcftools' vcf-validator

    Comment


    • #3
      thanks for replying! The mpileup commands were as follows (and this pretty much maches what you suggested, and exactly what is said on the tutorial page):

      samtools mpileup -gf ref.fa aln1.bam aln2.bam > data.bcf
      bcftools view -vcG data.bcf > SNP-candidates.vcf

      Comment


      • #4
        I have been having some problems with VCF produced by samtools mpileup and I get errors when I run the validator. It's probably something that needs to be brought to attention of samtools developers or maybe we are not doing something right

        As a suggestion you could do what I did and perhaps switch to GATK to produce candidate variants

        Code:
        java -Xmx4g -jar /gatk_path/GenomeAnalysisTK.jar \
                -T UnifiedGenotyper \
                -l INFO \
                -R genome.fa \
                -I file.bam -I file2.bam \
                -pl solexa \
                -o snps.vcf
                -stand_call_conf 30 \
                -stand_emit_conf 30.0 \
                -mmq 20

        Comment


        • #5
          Thanks, didn't know about GATK but now I will definitely give it a try. I am mostly interested in the actual number of reads per allele (which I could parse out from the sam files manually, but then, why reinvent the wheel when others suffered already). Maybe lh3 will come across this page .

          Comment


          • #6
            This is true. You can add more annotation to your VCF file and what you're probably needing is VariantAnnotator once you've run the SNP caller.

            DISCLAIMER: I am in no way affiliated with the GATK team or Broad Institute, just trying to help out


            Originally posted by KNS View Post
            Thanks, didn't know about GATK but now I will definitely give it a try. I am mostly interested in the actual number of reads per allele (which I could parse out from the sam files manually, but then, why reinvent the wheel when others suffered already). Maybe lh3 will come across this page .

            Comment


            • #7
              @KNS

              Try the latest version. Yours is an older version. In addition, do you have the alignment generated by:

              samtools view -b aln1.bam scaffold_9:246255
              samtools view -b aln2.bam scaffold_9:246255

              It is possible that DP4 is miscomputed, although I was not aware of this before.

              @zee

              If you are not using "samtools calmd -Abr" for other callers, I am confident that samtools is better than all the rest. Even if you use "calmd -Abr" with UG, samtools will be better than GATK's UG given many low-coverage samples. GATK's UGv2 is coming out, which will be equivalent to samtools but with better filtering.

              The method implemented in "calmd -r" is a major improvement to SNP calling. The last time I talked about "major improvement" to my own works was when I released bwa.
              Last edited by lh3; 11-23-2010, 11:13 AM.

              Comment


              • #8
                This is very helpful information lh3. I will have a look at comparing the two SNP callers.

                Comment


                • #9
                  Thanks for these suggestions, lh3!
                  I will get samtools 0.1.11(2010-11-22). I generated the alignment strictly following the code on the bowtie tutorials:
                  Index building
                  bowtie-build /path/to/genome/fasta/file /path/to/index/file
                  mapping:
                  bowtie -S /path/to/index/file //path/to/individual/fastq/file /path/to/individual/sam/file (this step shows me that there are actually reads which map at SNP candidate loci where mpileup reports DP4=0)
                  conversion to bam:
                  samtools view -bS -o /path/to/individual/bam/file /path/to/individual/sam/file
                  sorting of the bams:
                  samtools sort /path/to/individual/bam/file /path/to/individual/bam/file_sorted
                  followed by the two already mentioned mpileup steps for a maximum of less than 20 individual files to be combined.
                  samtools mpileup -gf ref.fa aln1.bam aln2.bam > data.bcf
                  bcftools view -vcG data.bcf > SNP-candidates.vcf
                  Again, lets see what happens with the new version of samtools. Thanks again!

                  Comment


                  • #10
                    Unfortunately, the DP4-issue persists with samtools 0.1.11(and the exact same commands as in the tutorial):

                    scaffold_19 15319408 . A G 97.3 . DP=29;AF1=0.7493;CI95=0.75,0.75;DP4=0,0,0,0;MQ=0 PL:GT:GQ 54,0,255:0/1:62 81,24,0:1/1:26
                    scaffold_19 15319440 . C T 22 . DP=29;AF1=0.2507;CI95=0.25,0.25;DP4=0,0,0,0;MQ=0 PL:GT:GQ 55,0,248:0/1:53 0,24,255:0/0:26
                    scaffold_19 15331181 . A G 81 . DP=44;AF1=1;CI95=1,1;DP4=0,0,0,0;MQ=0 PL:GT:GQ 56,48,0:1/1:99 61,84,0:1/1:99
                    I noticed that these loci are not in the samfiles produced along the way. Instead, the locus is reported in the samfile with its first base like this : "scaffold_19 15319389". Can this play a role in getting correct read-counts? Is there another way of getting me correct readcounts for each allele using the mpileup?

                    For the sake of completeness I have also rerun the analysis with the "samtools view -b" commands instead of the -bS I have used before - this did not change the outcome for me. So, alternative suggestions anyone?
                    Last edited by KNS; 11-24-2010, 03:21 AM.

                    Comment


                    • #11
                      Running samtools mpileup as suggested on mpileup helpfile, however, produces seemingly intact output, which breaks off prematurely. Here, only sequences up to scaffold 8 (of 2518) are processed.

                      scaffold_8 15860509 . A G 99 . DP=341;AF1=1;CI95=1,1;DP4=0,0,341,0;MQ=60 PL:GT:GQ 118,255,0:1/1:99 84,255,0:1/1:99
                      scaffold_8 15861163 . T C 99 . DP=348;AF1=1;CI95=1,1;DP4=0,0,0,345;MQ=60 PL:GT:GQ 109,255,0:1/1:99 93,255,0:1/1:99

                      samtools mpileup -P ILLUMINA -ugf ref.fa *.bam | bcftools view -bcvg - > var.raw.bcf
                      bcftools view var.raw.bcf | vcfutils.pl varFilter -D 2000 > var.flt.v

                      Comment


                      • #12
                        No further suggestions?

                        Comment


                        • #13
                          Please try the latest SVN:

                          svn co https://samtools.svn.sourceforge.net...trunk/samtools

                          I believe the bug has fixed. Thank you for raising this issue.

                          EDIT: 0.1.12 is released for this bug fix.
                          Last edited by lh3; 12-02-2010, 08:53 AM.

                          Comment


                          • #14
                            mpileup multiple samples - individual DP4/AF1 values

                            Hi,

                            I am calling SNP from multiple individuals using mpileup. I was wondering whether it is possible to get DP4 and AF1 values for each individual? I currently have DP4 and AF1 values for the combined dataset, but would like them for each individual as well.

                            I used mpileup -uf
                            and then bcvg -D100

                            How can I count the actual number of reads per allele?????

                            Please help me!!!

                            Comment


                            • #15
                              I would not call SNPs on a combined dataset, but redo the analysis for each individual.

                              That would make more sense particularly for visualisation.

                              You can speed this up to do SNP calling for all individuals with the following code
                              Code:
                              for i in `ls *.bam`
                              
                              	do
                              	samtools (options) $i
                              
                              done
                              Last edited by colindaven; 10-25-2011, 01:39 AM. Reason: code tags

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X