SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GATK depthofcoverage problem gary Bioinformatics 0 07-01-2012 05:11 PM
Problem with GATK ReadBackedPhasing mschizas Bioinformatics 1 05-26-2012 05:41 AM
Problem GATK with CountCovariates amathieu Bioinformatics 7 02-27-2012 12:28 AM
GATK - Problem amathieu Bioinformatics 2 09-26-2011 04:23 AM

Reply
 
Thread Tools
Old 05-09-2019, 04:27 AM   #1
micknudsen
Junior Member
 
Location: Denmark

Join Date: May 2019
Posts: 3
Default The N+1 problem in GATK 4

At our place we use GATK (3-series) for germline SNV/INDEL calling in a clinical setup, and we are now considering how to make the move to the new GATK 4.

One of the benefits of GATK (and often often emphasized as a sales point from Broad) is the solution to the N+1 problem: When a new sample arrives, one can run GenotypeGVCFs on that sample together with a huge GVCF catalogue of previous samples, thus improving the accuracy of calling.

However, with GATK 4 this functionality has changed tremendeously. It is now recommended to use a GenomicsDB object instead of a combined GVCF file and use that as input to GenotypeGVCFs. In itself this is not a problem, but GenotypeGVCFs now only accepts one "-V" input. Thus, one cannot use both the the large GenomicsDB and the GVCF file from a new sample.

Our first thought was to add the new GVCF file to the GenomicsDB, but that is not supported by the GenomicsDBImport tool. The only solution appears to be to create a new GenomicsDB object from scratch each time a new sample arrives, but that takes days (if not weeks) of computing and is just not feasable. It all seems very odd.

Has anybody here found a way of solving the N+1 problem in GATK 4?
micknudsen is offline   Reply With Quote
Old 05-09-2019, 04:46 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,941
Default

I assume you have checked GATK support forums? You must not be the first person to have run into this? You may want to post there to get an official response. If you do please post the relevant link here so anyone finding this thread in future will know the answer.
GenoMax is offline   Reply With Quote
Old 05-09-2019, 04:50 AM   #3
micknudsen
Junior Member
 
Location: Denmark

Join Date: May 2019
Posts: 3
Default

Yes, that was my first attempt, but the discussion seems to have died. I hoped to find more users here.

Last edited by micknudsen; 05-10-2019 at 05:06 AM.
micknudsen is offline   Reply With Quote
Old 05-10-2019, 12:18 AM   #4
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 164
Default

I'd suggest raising an issue on the GATK support forums. One of the other issues I had been encountering with GATK4 is that there is a discordance between the command line options in the online documentation and what the tool actually expects, I've often had to search through their forums to find other such issues to fix my commands. The rollout of GATK4 has been far from smooth.
vivek_ is offline   Reply With Quote
Old 05-10-2019, 12:50 AM   #5
micknudsen
Junior Member
 
Location: Denmark

Join Date: May 2019
Posts: 3
Default

There is no discordance in this case. In fact, they have admitted that the situation really is as described (see here), but they haven't suggested a solution. I was wondering if anybody else had figured something out on their own.
micknudsen is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO