SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
Samtools mpileup for Solid Data - calls wrong and too less SNP/INDELS scondo Bioinformatics 2 08-02-2013 01:27 PM
GATK Haplotype caller missing calls nvteja Bioinformatics 0 11-29-2012 06:51 AM
Haplotype caller empyrean Bioinformatics 0 08-23-2012 06:44 PM
How to deal with no calls from GATK Unifiedgenotyper for indels audqf Bioinformatics 2 02-01-2012 02:53 PM
missing reads in IGV display cswarth Bioinformatics 2 06-28-2011 08:30 AM

Reply
 
Thread Tools
Old 06-24-2015, 08:28 AM   #1
wariobrega
Member
 
Location: Quadraro, Rome

Join Date: Jul 2012
Posts: 11
Default GATK Haplotype Caller calls Indels in SOLID reads that IGV does not Display

Hello everyone,

I am Daniele and I'm a Junior Researcher in a private foundation in Rome.

Before explaining my problem, let me say that I am pretty new to the SOLID technology and to Variant Calling in general so forgive me if the question sounds dumb, but I couldn't find an answer nowhere!

That said, I am analyzing SOLID reads for a target resequencing experiments. The files were given me as BAM, already aligned to my reference genome using the lifescope suite provided by AB.

I used a classical approach for variant calling, so i preprocessed the reads, marked duplicates with Picard, and run the GATK pipeline using the Best Practices for Variant Calling (so I recalibrate the base QSs and I realigned around INDELS. ).

I have then used HaplotypeCaller for the variant calling and outputted the VCF files for my experiments.

Thing is that HaplotypeCaller does call several InDels that, when i check my final bam file (the one i give to Haplotype caller for calling variants) are not presents.

specifically, any InDel in my vcf is not seen in IGV, but some of them appears as single nucleotide variants when I unchecked the "Quality weight allele fraction" in the Alignment Panel inside the IGV preferences. I thought this was an IGV issue and played a little bit with the options, but I found no solution. Notably, the Genotype Quality of these position is always around 99 I checked around the web, but I cannot find any explanation to this behavior.

Can someone provide some help?


Thanks in advance!

Daniele
wariobrega is offline   Reply With Quote
Old 06-29-2015, 10:39 PM   #2
Linnea
Member
 
Location: Uppsala, Sweden

Join Date: Mar 2010
Posts: 23
Default

Hello Daniele,

To at least partly answer your question: HaplotypeCaller performs local realignment within the run, so your "final bam file" is actually not the one you input to HaplotypeCaller but something you don't see.. So HC might realign a region and find an indel, while in your input bam you see nothing or one ore more SNPs (which likely is caused by mismatches in an anyway wrong alignment).

If you want to see how it looks like AFTER HaplotypeCaller has realigned the reads, you can rerun it using the flag:
--bamOutput newbamfile.bam
(takes quite a while to run).

You can also make it print out all possible haplotypes to the bam with:
--bamWriterType ALL_POSSIBLE_HAPLOTYPES
and then see them in IGV (choose "color alignment by: tag" and then write "HC" in the box).

Hope this helps at least a bit,
Linnéa
Linnea is offline   Reply With Quote
Old 07-01-2015, 01:24 AM   #3
wariobrega
Member
 
Location: Quadraro, Rome

Join Date: Jul 2012
Posts: 11
Default

Hi Linnea, and thanks for the very quick reply!

I actually found out a similar answer on the GATK forum after I posted it, but yours was concise and very explanatory, so thank you again!

I'm now running the --bamOutput option on my samples in order to check ho HaplotypeCaller realigned the reads.

However, something still does not add up, specifically, why the GenotypeQuality (GQ) of these Indels is always 99 (checked multiple times on multiple samples)?


Thanks in advance!

Daniele

Last edited by wariobrega; 07-01-2015 at 01:30 AM.
wariobrega is offline   Reply With Quote
Old 07-01-2015, 02:52 AM   #4
Linnea
Member
 
Location: Uppsala, Sweden

Join Date: Mar 2010
Posts: 23
Default

No sorry, I can't explain that part, maybe someone else has an idea?

But actually, why can't they just be real indels with a very high quality? (99 seems to be the highest quality you can get: "Because the most likely PL is always 0, GQ = second highest PL - 0. If the second most likely PL is greater than 99, we still assign a GQ of 99, so the highest value of GQ is 99." -from the GATK webpage). Maybe it will be clear after the realignment? (And sorry if I misunderstood something, I am really no indel expert..)
Linnea is offline   Reply With Quote
Old 07-01-2015, 03:46 AM   #5
Zaag
Senior Member
 
Location: Amsterdam

Join Date: Nov 2009
Posts: 112
Default

GQ does not tell you anything about the variant quality. GQ tells you about how certain the HC is about the zygosity.

About the deletions: do they disappear if you use this option in the variant calling step?

--dontUseSoftClippedBases
Zaag is offline   Reply With Quote
Old 08-12-2015, 06:46 AM   #6
wariobrega
Member
 
Location: Quadraro, Rome

Join Date: Jul 2012
Posts: 11
Red face

Quote:
Originally Posted by Zaag View Post
GQ does not tell you anything about the variant quality. GQ tells you about how certain the HC is about the zygosity.

About the deletions: do they disappear if you use this option in the variant calling step?

--dontUseSoftClippedBases
Hi Zaag and apologies for the late reply (I happened to be on holyday these last couple of weeks!),

Part of the deletions disappeared after I used your options, although new ones reappeared. coll thing though, many of the FP InDels were among the one that were cleansed.

I am now also trying to use Picard CleanSam to filter them BEFORE the GATK pipeline and compare the differences (also to see why these new InDels appear). Thanks a lot for your reply, was very helpful!

Quote:
Originally Posted by Linnea View Post
No sorry, I can't explain that part, maybe someone else has an idea?

But actually, why can't they just be real indels with a very high quality? (99 seems to be the highest quality you can get: "Because the most likely PL is always 0, GQ = second highest PL - 0. If the second most likely PL is greater than 99, we still assign a GQ of 99, so the highest value of GQ is 99." -from the GATK webpage). Maybe it will be clear after the realignment? (And sorry if I misunderstood something, I am really no indel expert..)
Hi Linnea! again, apoologies for the late reply.

I am quite confident these InDels were not real beacuase the same regions were validated with Sanger before the experiment Also, seeing a lot of inDel nvery close to each other (3-4 bps at the most) and considering the nature of the disease, as long as the conservation of these regions makes me think these are FP. Again, I'm not an InDel expert as well, so we're on the same boat! Thanks a lot for your contribution though, it was really helpful!

Daniele

Last edited by wariobrega; 08-12-2015 at 06:50 AM. Reason: Forgot to reply to Linnea!
wariobrega is offline   Reply With Quote
Reply

Tags
bam, gatk, solid, variantcalling, vcf

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO