Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MPileup not considering all reads when making calls?

    Hey,

    I've been testing mPileup with parameters -B, -E, or neither, and I noticed something weird.

    Neither:
    chr1 165656243 G T 3.01 DP=12;VDB=0.0287;AF1=0.4997;AC1=1;DP4=6,2,3,0;MQ=60;FQ=4.77;PV4=1,1.7e-10,1,1

    With -B:
    chr1 165656243 G T 45 DP=12;VDB=0.0287;AF1=0.5;AC1=1;DP4=6,2,3,1;MQ=60;FQ=48;PV4=1,1,1,0.46

    With -E:
    chr1 165656243 G T 18.1 DP=12;VDB=0.0287;AF1=0.5;AC1=1;DP4=6,2,3,0;MQ=60;FQ=21;PV4=1,1,1,1

    It gets the same call for all 3 with different quality scores. However, the weird part is that when looking at this in IGV, or with GATK DepthOfCoverage, there are 17 reads that covers this spot, not just 12. 12 reads are G and 5 reads are T (by IGV). All of the reads have a mapping quality of 150 except one has a mapping quality of 82. All of the bases have a Phred quality score of 29 or greater. My commands (after aligning/sorting/de-duping/sorting/realigning around indels/sorting) were (3 separate for -E, -B, or neither):

    samtools-0.1.18/samtools mpileup -E -ugf hg18_reference_seq.fa aligned_file.bam | /samtools-0.1.18/bcftools/bcftools view -bvcg -> Intermediate_File

    samtools-0.1.18/bcftools/bcftools view Intermediate_File | samtools-0.1.18/bcftools/vcfutils.pl varFilter -D99999 > called_variants

    Anybody have any ideas?
    Last edited by Heisman; 09-23-2011, 11:26 AM.

  • #2
    Ok, I think I figured it out. When looking at IGV and putting the cursor over a read, some of my reads have values for AM, SM, and PQ at the bottom, while some of them do not. Mpileup is only looking at the reads that have values for AM, SM, and PQ.

    I'm new to all of this bioinformatics so I'm not sure yet why some reads have those, but glancing through I think paired-end reads with bigger inserts are less likely to have those values. I also need to figure out how to get mpileup to consider those reads, so if anybody has input that would be great, although I'll keep looking.

    Comment


    • #3
      Alright, it works when including the -A parameter with the mpileup command.

      Does anybody have a good definition for what anomalous read is? I would have guessed it was when one end of a pair maps and the other doesn't, but it looks like it also occurs if the insert size is greater than say 450 bp or so.

      Comment


      • #4
        Originally posted by Heisman View Post
        Alright, it works when including the -A parameter with the mpileup command.

        Does anybody have a good definition for what anomalous read is? I would have guessed it was when one end of a pair maps and the other doesn't, but it looks like it also occurs if the insert size is greater than say 450 bp or so.
        I always think so. And the insert size is anomalous or not should depend on the mapping option you set.

        Such as: bwa sampe -a

        This is my personal guess.

        Comment


        • #5
          Originally posted by Heisman View Post
          Alright, it works when including the -A parameter with the mpileup command.

          Does anybody have a good definition for what anomalous read is? I would have guessed it was when one end of a pair maps and the other doesn't, but it looks like it also occurs if the insert size is greater than say 450 bp or so.
          You just saved my day.
          I couldn't understand the difference between the aggressive GATK call and the mpileup count until i read this post!!!

          Comment


          • #6
            I'm also using mpileup to summaryze the allele counting of each position of a bam file, but what i am getting is not what I would expect. On detail, I want to ignore those nucleotides with low quality, thus I use the '-Q' argument. However, it does not seem to work since all the nucleotides are outputted in the pileup summary regardless of its quality.

            For instance, the following command (note that I use -A to include all reads and -B to avoid BAQ 'recalibration' for more clarity in the output) :

            Code:
            /samtools1.18/samtools mpileup -f ref.fa -A -q 0 -B -Q 100 sample.bam > sample.pileup
            Obtains output lines as the following:

            Code:
            chr1	173824	T	21	GG...*..*.G.......GG.        ##9!!!!;!<#;<<=<<<!#<
            Note that there are bases with low quality (which are the same than the ones indicated in the bam file), so the -Q does not seem to work. It is strange, because the '-q' argument indeed works, and it filters out those reads with a mapping quality < q.

            Maybe any of you know what I am missing? I'm pretty jammed in it...

            thanks!
            david

            Comment


            • #7
              Interesting. Maybe try a lower number for "Q"? 100 isn't realistic, I don't think, as quality scores don't tend to go above 42 or so. Maybe try 15? I'll play around with this too when I have time.

              Comment


              • #8
                Yes, I put '100' just for being dramatic, but my initial attempts were by using several values between 1 and 30. Never works, mpileup output is always the same regardless of the base quality filtering.

                (thanks!)

                Comment


                • #9
                  Originally posted by david.tamborero View Post
                  Yes, I put '100' just for being dramatic, but my initial attempts were by using several values between 1 and 30. Never works, mpileup output is always the same regardless of the base quality filtering.

                  (thanks!)
                  Ok, I just tested it and it works for me. My commands:

                  Code:
                  samtools-0.1.18/samtools mpileup -q 5 -Q 0 -l [test_intervals] -AB -ugf [reference_sequence.fa] [aligned_file.bam] | samtools-0.1.18/bcftools/bcftools view -bvcg -> Q_0_intermediateFile
                  
                  samtools-0.1.18/bcftools/bcftools view -e Q_0_intermediateFile | samtools-0.1.18/bcftools/vcfutils.pl varFilter -D 99999 > Q_0_variants

                  Comment


                  • #10
                    Thank you for your answer, Heisman.

                    If I'm not wrong, the mpileup command you're using is for getting the bcf file. In my case, I want the pileup file (to use as input for the Varscan tool), so the mpileup command has no -u argument, and I don't need the remaining commands to get the vcf. Doing so, do you have success with the -Q argument?

                    I mean, with the following command:

                    Code:
                    /samtools1.18/samtools mpileup -f ref.fa -A -q 0 -B -Q 100 sample.bam > sample.pileup
                    I have the same output regardless of the Q value. Could yo tell me if it works for you? Maybe is some problem with my bam file, but everything seems OK and actually the -q argument (filters out based on mapping quality) is working fine.

                    I'm becoming a bit crazy!

                    thanks again for your time,

                    cheers
                    david

                    Comment


                    • #11
                      You're right, I am seeing the same thing. I would make a post at the following link and hope somebody can fix the bug: http://sourceforge.net/mailarchive/f...=samtools-help

                      Comment


                      • #12
                        I'll try, I will post if I find the solution somehow.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        51 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X