Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK Haplotype Caller calls Indels in SOLID reads that IGV does not Display

    Hello everyone,

    I am Daniele and I'm a Junior Researcher in a private foundation in Rome.

    Before explaining my problem, let me say that I am pretty new to the SOLID technology and to Variant Calling in general so forgive me if the question sounds dumb, but I couldn't find an answer nowhere!

    That said, I am analyzing SOLID reads for a target resequencing experiments. The files were given me as BAM, already aligned to my reference genome using the lifescope suite provided by AB.

    I used a classical approach for variant calling, so i preprocessed the reads, marked duplicates with Picard, and run the GATK pipeline using the Best Practices for Variant Calling (so I recalibrate the base QSs and I realigned around INDELS. ).

    I have then used HaplotypeCaller for the variant calling and outputted the VCF files for my experiments.

    Thing is that HaplotypeCaller does call several InDels that, when i check my final bam file (the one i give to Haplotype caller for calling variants) are not presents.

    specifically, any InDel in my vcf is not seen in IGV, but some of them appears as single nucleotide variants when I unchecked the "Quality weight allele fraction" in the Alignment Panel inside the IGV preferences. I thought this was an IGV issue and played a little bit with the options, but I found no solution. Notably, the Genotype Quality of these position is always around 99 I checked around the web, but I cannot find any explanation to this behavior.

    Can someone provide some help?


    Thanks in advance!

    Daniele

  • #2
    Hello Daniele,

    To at least partly answer your question: HaplotypeCaller performs local realignment within the run, so your "final bam file" is actually not the one you input to HaplotypeCaller but something you don't see.. So HC might realign a region and find an indel, while in your input bam you see nothing or one ore more SNPs (which likely is caused by mismatches in an anyway wrong alignment).

    If you want to see how it looks like AFTER HaplotypeCaller has realigned the reads, you can rerun it using the flag:
    --bamOutput newbamfile.bam
    (takes quite a while to run).

    You can also make it print out all possible haplotypes to the bam with:
    --bamWriterType ALL_POSSIBLE_HAPLOTYPES
    and then see them in IGV (choose "color alignment by: tag" and then write "HC" in the box).

    Hope this helps at least a bit,
    Linnéa

    Comment


    • #3
      Hi Linnea, and thanks for the very quick reply!

      I actually found out a similar answer on the GATK forum after I posted it, but yours was concise and very explanatory, so thank you again!

      I'm now running the --bamOutput option on my samples in order to check ho HaplotypeCaller realigned the reads.

      However, something still does not add up, specifically, why the GenotypeQuality (GQ) of these Indels is always 99 (checked multiple times on multiple samples)?


      Thanks in advance!

      Daniele
      Last edited by wariobrega; 07-01-2015, 01:30 AM.

      Comment


      • #4
        No sorry, I can't explain that part, maybe someone else has an idea?

        But actually, why can't they just be real indels with a very high quality? (99 seems to be the highest quality you can get: "Because the most likely PL is always 0, GQ = second highest PL - 0. If the second most likely PL is greater than 99, we still assign a GQ of 99, so the highest value of GQ is 99." -from the GATK webpage). Maybe it will be clear after the realignment? (And sorry if I misunderstood something, I am really no indel expert..)

        Comment


        • #5
          GQ does not tell you anything about the variant quality. GQ tells you about how certain the HC is about the zygosity.

          About the deletions: do they disappear if you use this option in the variant calling step?

          --dontUseSoftClippedBases

          Comment


          • #6
            Originally posted by Zaag View Post
            GQ does not tell you anything about the variant quality. GQ tells you about how certain the HC is about the zygosity.

            About the deletions: do they disappear if you use this option in the variant calling step?

            --dontUseSoftClippedBases
            Hi Zaag and apologies for the late reply (I happened to be on holyday these last couple of weeks!),

            Part of the deletions disappeared after I used your options, although new ones reappeared. coll thing though, many of the FP InDels were among the one that were cleansed.

            I am now also trying to use Picard CleanSam to filter them BEFORE the GATK pipeline and compare the differences (also to see why these new InDels appear). Thanks a lot for your reply, was very helpful!

            Originally posted by Linnea View Post
            No sorry, I can't explain that part, maybe someone else has an idea?

            But actually, why can't they just be real indels with a very high quality? (99 seems to be the highest quality you can get: "Because the most likely PL is always 0, GQ = second highest PL - 0. If the second most likely PL is greater than 99, we still assign a GQ of 99, so the highest value of GQ is 99." -from the GATK webpage). Maybe it will be clear after the realignment? (And sorry if I misunderstood something, I am really no indel expert..)
            Hi Linnea! again, apoologies for the late reply.

            I am quite confident these InDels were not real beacuase the same regions were validated with Sanger before the experiment Also, seeing a lot of inDel nvery close to each other (3-4 bps at the most) and considering the nature of the disease, as long as the conservation of these regions makes me think these are FP. Again, I'm not an InDel expert as well, so we're on the same boat! Thanks a lot for your contribution though, it was really helpful!

            Daniele
            Last edited by wariobrega; 08-12-2015, 06:50 AM. Reason: Forgot to reply to Linnea!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            71 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            80 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X