SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   The N+1 problem in GATK 4 (http://seqanswers.com/forums/showthread.php?t=89241)

micknudsen 05-09-2019 04:27 AM

The N+1 problem in GATK 4
 
At our place we use GATK (3-series) for germline SNV/INDEL calling in a clinical setup, and we are now considering how to make the move to the new GATK 4.

One of the benefits of GATK (and often often emphasized as a sales point from Broad) is the solution to the N+1 problem: When a new sample arrives, one can run GenotypeGVCFs on that sample together with a huge GVCF catalogue of previous samples, thus improving the accuracy of calling.

However, with GATK 4 this functionality has changed tremendeously. It is now recommended to use a GenomicsDB object instead of a combined GVCF file and use that as input to GenotypeGVCFs. In itself this is not a problem, but GenotypeGVCFs now only accepts one "-V" input. Thus, one cannot use both the the large GenomicsDB and the GVCF file from a new sample.

Our first thought was to add the new GVCF file to the GenomicsDB, but that is not supported by the GenomicsDBImport tool. The only solution appears to be to create a new GenomicsDB object from scratch each time a new sample arrives, but that takes days (if not weeks) of computing and is just not feasable. It all seems very odd.

Has anybody here found a way of solving the N+1 problem in GATK 4?

GenoMax 05-09-2019 04:46 AM

I assume you have checked GATK support forums? You must not be the first person to have run into this? You may want to post there to get an official response. If you do please post the relevant link here so anyone finding this thread in future will know the answer.

micknudsen 05-09-2019 04:50 AM

Yes, that was my first attempt, but the discussion seems to have died. I hoped to find more users here.

vivek_ 05-10-2019 12:18 AM

I'd suggest raising an issue on the GATK support forums. One of the other issues I had been encountering with GATK4 is that there is a discordance between the command line options in the online documentation and what the tool actually expects, I've often had to search through their forums to find other such issues to fix my commands. The rollout of GATK4 has been far from smooth.

micknudsen 05-10-2019 12:50 AM

There is no discordance in this case. In fact, they have admitted that the situation really is as described (see here), but they haven't suggested a solution. I was wondering if anybody else had figured something out on their own.


All times are GMT -8. The time now is 04:28 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.