SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   SOLiD (http://seqanswers.com/forums/forumdisplay.php?f=7)
-   -   GATK Haplotype Caller calls Indels in SOLID reads that IGV does not Display (http://seqanswers.com/forums/showthread.php?t=60732)

wariobrega 06-24-2015 08:28 AM

GATK Haplotype Caller calls Indels in SOLID reads that IGV does not Display
 
Hello everyone,

I am Daniele and I'm a Junior Researcher in a private foundation in Rome.

Before explaining my problem, let me say that I am pretty new to the SOLID technology and to Variant Calling in general so forgive me if the question sounds dumb, but I couldn't find an answer nowhere:confused:!

That said, I am analyzing SOLID reads for a target resequencing experiments. The files were given me as BAM, already aligned to my reference genome using the lifescope suite provided by AB.

I used a classical approach for variant calling, so i preprocessed the reads, marked duplicates with Picard, and run the GATK pipeline using the Best Practices for Variant Calling (so I recalibrate the base QSs and I realigned around INDELS. ).

I have then used HaplotypeCaller for the variant calling and outputted the VCF files for my experiments.

Thing is that HaplotypeCaller does call several InDels that, when i check my final bam file (the one i give to Haplotype caller for calling variants) are not presents.

specifically, any InDel in my vcf is not seen in IGV, but some of them appears as single nucleotide variants when I unchecked the "Quality weight allele fraction" in the Alignment Panel inside the IGV preferences. I thought this was an IGV issue and played a little bit with the options, but I found no solution. Notably, the Genotype Quality of these position is always around 99 I checked around the web, but I cannot find any explanation to this behavior.

Can someone provide some help?


Thanks in advance!

Daniele

Linnea 06-29-2015 10:39 PM

Hello Daniele,

To at least partly answer your question: HaplotypeCaller performs local realignment within the run, so your "final bam file" is actually not the one you input to HaplotypeCaller but something you don't see.. So HC might realign a region and find an indel, while in your input bam you see nothing or one ore more SNPs (which likely is caused by mismatches in an anyway wrong alignment).

If you want to see how it looks like AFTER HaplotypeCaller has realigned the reads, you can rerun it using the flag:
--bamOutput newbamfile.bam
(takes quite a while to run).

You can also make it print out all possible haplotypes to the bam with:
--bamWriterType ALL_POSSIBLE_HAPLOTYPES
and then see them in IGV (choose "color alignment by: tag" and then write "HC" in the box).

Hope this helps at least a bit,
Linnéa

wariobrega 07-01-2015 01:24 AM

Hi Linnea, and thanks for the very quick reply!

I actually found out a similar answer on the GATK forum after I posted it, but yours was concise and very explanatory, so thank you again!

I'm now running the --bamOutput option on my samples in order to check ho HaplotypeCaller realigned the reads.

However, something still does not add up, specifically, why the GenotypeQuality (GQ) of these Indels is always 99 (checked multiple times on multiple samples)?


Thanks in advance!

Daniele

Linnea 07-01-2015 02:52 AM

No sorry, I can't explain that part, maybe someone else has an idea?

But actually, why can't they just be real indels with a very high quality? (99 seems to be the highest quality you can get: "Because the most likely PL is always 0, GQ = second highest PL - 0. If the second most likely PL is greater than 99, we still assign a GQ of 99, so the highest value of GQ is 99." -from the GATK webpage). Maybe it will be clear after the realignment? (And sorry if I misunderstood something, I am really no indel expert..)

Zaag 07-01-2015 03:46 AM

GQ does not tell you anything about the variant quality. GQ tells you about how certain the HC is about the zygosity.

About the deletions: do they disappear if you use this option in the variant calling step?

--dontUseSoftClippedBases

wariobrega 08-12-2015 06:46 AM

Quote:

Originally Posted by Zaag (Post 176642)
GQ does not tell you anything about the variant quality. GQ tells you about how certain the HC is about the zygosity.

About the deletions: do they disappear if you use this option in the variant calling step?

--dontUseSoftClippedBases

Hi Zaag and apologies for the late reply (I happened to be on holyday these last couple of weeks!),

Part of the deletions disappeared after I used your options, although new ones reappeared. coll thing though, many of the FP InDels were among the one that were cleansed.

I am now also trying to use Picard CleanSam to filter them BEFORE the GATK pipeline and compare the differences (also to see why these new InDels appear). Thanks a lot for your reply, was very helpful!

Quote:

Originally Posted by Linnea (Post 176639)
No sorry, I can't explain that part, maybe someone else has an idea?

But actually, why can't they just be real indels with a very high quality? (99 seems to be the highest quality you can get: "Because the most likely PL is always 0, GQ = second highest PL - 0. If the second most likely PL is greater than 99, we still assign a GQ of 99, so the highest value of GQ is 99." -from the GATK webpage). Maybe it will be clear after the realignment? (And sorry if I misunderstood something, I am really no indel expert..)

Hi Linnea! again, apoologies for the late reply.

I am quite confident these InDels were not real beacuase the same regions were validated with Sanger before the experiment :( Also, seeing a lot of inDel nvery close to each other (3-4 bps at the most) and considering the nature of the disease, as long as the conservation of these regions makes me think these are FP. Again, I'm not an InDel expert as well, so we're on the same boat! Thanks a lot for your contribution though, it was really helpful!

Daniele


All times are GMT -8. The time now is 03:07 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.