SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GATK HaplotypeCaller memory problem Robby Bioinformatics 1 04-03-2013 10:18 AM
GATK- haplotypecaller or unifiedgenotyper? lre1234 Bioinformatics 2 03-28-2013 08:48 AM
GATK select variants doc.ramses Bioinformatics 2 10-18-2011 03:41 AM
GATK : Drop in number of confidently called bases claratsm Bioinformatics 0 10-06-2011 12:04 AM
Pileup shows variants - Not called as SNPs BertieWooster Bioinformatics 10 05-12-2011 10:57 AM

Reply
 
Thread Tools
Old 11-16-2013, 09:58 PM   #1
drmaly
Junior Member
 
Location: Tokyo

Join Date: Apr 2013
Posts: 2
Default Too many Variants called by HaplotypeCaller GATK

Hi everybody,

I am doing Exome analysis on 8 individual (family) and I just got something weird as I think. My pipeline is following the best practices of GATK but when I checked my last vcf file after calling with HaplotypeCaller walker, I found the file contains 2.1 million lines (am assuming these are all variants) . My vcf file is a merged file for all of the 8 samples together. As I knew Exome pipeline should usually get you around 20 thousand variants so am really confused now on what is wrong in my pipeline. Any ideas will be really appreciated.
drmaly is offline   Reply With Quote
Old 11-19-2013, 07:16 AM   #2
vdauwera
Member
 
Location: Boston, MA

Join Date: Apr 2012
Posts: 42
Exclamation

Quote:
Originally Posted by drmaly View Post
when I checked my last vcf file after calling with HaplotypeCaller walker, I found the file contains 2.1 million lines
Hi there,

If you consider the HaplotypeCaller output to be your final file, you're not really following Best Practices... As indicated in the GATK documentation, the output of HaplotypeCaller is a set of raw variants that is likely to include a lot of false positives. You have to filter them (either with VQSR or hard filters) to generate a callset with the desired level of sensitivity and specificity.
vdauwera is offline   Reply With Quote
Old 12-05-2013, 08:56 PM   #3
jp.
Senior Member
 
Location: NikoNarita.jp

Join Date: Jul 2013
Posts: 142
Question

vdauwera is right
VCF output might differ if your commands are not proper from the begining (BWA to HaplotypeCaller to V). could you post your command and message you got before finishing the each job to help understanding what happened there

jp.
Quote:
Originally Posted by drmaly View Post
Hi everybody,

I am doing Exome analysis on 8 individual (family) and I just got something weird as I think. My pipeline is following the best practices of GATK but when I checked my last vcf file after calling with HaplotypeCaller walker, I found the file contains 2.1 million lines (am assuming these are all variants) . My vcf file is a merged file for all of the 8 samples together. As I knew Exome pipeline should usually get you around 20 thousand variants so am really confused now on what is wrong in my pipeline. Any ideas will be really appreciated.
jp. is offline   Reply With Quote
Reply

Tags
gatk, haplotypecaller, variants

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO