Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools mpileup

    Hello All,

    I have a bam file generated by MiSeq + GATK (Realignment/recalibration), I use samtools (v0.1.18) / GATK for variant calling. For a site, I find a big difference about the depth between these two methods.

    bcftools view sample.bcf | awk '$1=="chr7" && $2==150648198'
    chr7 150648198 . A G,X 0 . DP=8;I16=1,0,0,1,24,576,25,625,60,3600,60,3600,25,625,25,625;VDB=0.0000 PLP:SP 19,0,18,22,21,40:2:0

    Here DP=8 before quality control and only two reads passed quality control, thus samtools does not call variant for this site.

    GATK with default setting calls this site as SNP with DP=516, and IGV shows there are 233A + 283G at this site with good quality (mapQ~37, baseQ~30.

    This site is a true SNP confirmed by Sanger sequencing, so any explanation will be appreciated.

    Many thanks!

  • #2
    One quick thing to try; remake bcf, using -B in the samtools mpileup stage.

    Sometimes, the BAQ calculations will count good variant reads as misalignments, and it will give them poor quality, which might explain your low coverage.

    Comment


    • #3
      Originally posted by swbarnes2 View Post
      One quick thing to try; remake bcf, using -B in the samtools mpileup stage.

      Sometimes, the BAQ calculations will count good variant reads as misalignments, and it will give them poor quality, which might explain your low coverage.
      Thanks.

      I remade bcf with -B, no change. DP=8.

      Worked on the raw bam file without GATK realign/recalibration, samtools gave the same DP with/without -B option.

      Comment


      • #4
        Hi,

        I had the same exact problem with my sample. I thought at the beginning that it was due to the soft clipping in my reads, but when I correct for that, I still have the same result. I have also played with the mapping and base qualities, still not able to recover the same depth for both programs.

        Then, I tried the option -A in mpileup and pouff! Magic, I got the same numbers. I suppose that it's little bit risky to take those reads, but at least now I know what was doing mpileup.

        Cheers

        Comment


        • #5
          jcgrenier - could you tell me how you corrected for soft clipping? I have a lot of mysterious soft clipping in my bwa sw alignment, and I'd love to get rid of it. Just asking on the off chance that you had the same problem as me…Thanks!

          See also: http://seqanswers.com/forums/showthr...119#post121119

          Comment


          • #6
            Hi petrie,

            For the moment, I'm kind of exploiting a bug in GATK to help me to solve this problem. If you go on their forum there :



            you will see why it does that. Anyways, here's the option that I used to kind of revert the softclipped bases. The problem was coming from the fact that the starting position of a softclipped read is in the sam file line not the one from the beginning of the read, but the first non-clipped base.

            java -Djava.io.tmpdir=$SCRATCH -Xmx10g -jar ~/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar -T ClipReads -l INFO -I $SCRATCH/$f.bam -o $f.hardclip.bam -R ~/References/hg19/ref.fasta -CR HARDCLIP_BASES -os $f.hardclip.stats

            Cheers.

            Comment


            • #7
              Hi petrie,

              For the moment, I'm kind of exploiting a bug in GATK to help me to solve this problem. If you go on their forum there :



              you will see why it does that. Anyways, here's the option that I used to kind of revert the softclipped bases. The problem was coming from the fact that the starting position of a softclipped read is in the sam file line not the one from the beginning of the read, but the first non-clipped base.

              java -Djava.io.tmpdir=$SCRATCH -Xmx10g -jar ~/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar -T ClipReads -l INFO -I $SCRATCH/$f.bam -o $f.hardclip.bam -R ~/References/hg19/ref.fasta -CR HARDCLIP_BASES -os $f.hardclip.stats

              Cheers.

              Comment


              • #8
                jcgrenier - that looks like it will solve my problem (either the hardclip_bases workaround or the revert_softclipped_bases if it's been updated). Thanks so much!

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X