Hello!
UPDATE: solved - was not to do with missing genotypes, but rather with triallelic SNPs that the tool cannot process.
I'm trying to phase my genotypes based on the data for 8 trios of a sequenced organism using PhaseByTransmission, which fails with messages as listed below.
$ java -Xmx20g -jar ~/n/GenomeAnalysisTK-1.5-31-gadad76b/GenomeAnalysisTK.jar -R mygenome_sorted.fa -T PhaseByTransmission -V all_sorted_withHeader.vcf -ped Sample_info.ped -o all_phased_by_transmission.vcf
INFO 20:54:02,919 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 20:54:02,920 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 20:54:02,920 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 20:54:02,920 HelpFormatter - Program Args: -R mygenome_sorted.fa -T PhaseByTransmission -V all_sorted_withHeader.vcf -ped Sample_info.ped -o all_phased_by_transmission.vcf
INFO 20:54:02,921 HelpFormatter - Date/Time: 2012/04/22 20:54:02
INFO 20:54:02,921 HelpFormatter - ---------------------------------------------------------------------------------
INFO 20:54:02,921 HelpFormatter - ---------------------------------------------------------------------------------
INFO 20:54:02,995 RodBindingArgumentTypeDescriptor - Dynamically determined type of all_sorted_withHeader.vcf to be VCF
INFO 20:54:03,002 GenomeAnalysisEngine - Strictness is SILENT
INFO 20:54:04,551 RMDTrackBuilder - Creating Tribble index in memory for file all_sorted_withHeader.vcf
INFO 20:56:55,532 RMDTrackBuilder - Writing Tribble index to disk for file all_sorted_withHeader.vcf.idx
INFO 20:57:00,377 PedReader - Reading PED file Sample_info.ped with missing fields: []
INFO 20:57:00,489 PedReader - Phenotype is other? false
INFO 20:57:01,371 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 20:57:01,371 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 20:57:04,604 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.NumberFormatException: For input string: "."
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:449)
at java.lang.Integer.parseInt(Integer.java:499)
at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.parsePLsIntoLikelihoods(GenotypeLikelihoods.java:153)
at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.getAsVector(GenotypeLikelihoods.java:80)
at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.getAsMap(GenotypeLikelihoods.java:105)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.getLikelihoodsAsMapSafeNull(PhaseByTransmission.java:519)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:562)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:762)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:74)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:63)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:246)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:128)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 1.5-31-gadad76b):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: For input string: "."
##### ERROR ------------------------------------------------------------------------------------------
I suspect this is because of missing genotypes in lines like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ERS074168 ERS074167 ERS074166 ERS074171 ERS074170 ERS074169 ERS074174 ERS074173 ERS074172 ERS074177 ERS074176 ERS074175 ERS074180 ERS074179 ERS074178 ERS074183 ERS074182 ERS074181 ERS074186 ERS074185 ERS074184 ERS074189 ERS074188 ERS074187
1 160248 . T C 40.00 . AC1=6;AC=6;AF1=1;AN=6;DP4=0,0,3,0;DP=6;FQ=-28.1;MQ=29;SF=2;VDB=0.0046 GT:GQ:PL . . . . 1/1:3:0,0,0 1/1:3:0,0,0 1/1:10:72,9,0 . . . . . . . . . . . . . . .
Is there a different way I should format missing genotypes?
Many thanks!
UPDATE: solved - was not to do with missing genotypes, but rather with triallelic SNPs that the tool cannot process.
I'm trying to phase my genotypes based on the data for 8 trios of a sequenced organism using PhaseByTransmission, which fails with messages as listed below.
$ java -Xmx20g -jar ~/n/GenomeAnalysisTK-1.5-31-gadad76b/GenomeAnalysisTK.jar -R mygenome_sorted.fa -T PhaseByTransmission -V all_sorted_withHeader.vcf -ped Sample_info.ped -o all_phased_by_transmission.vcf
INFO 20:54:02,919 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 20:54:02,920 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 20:54:02,920 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 20:54:02,920 HelpFormatter - Program Args: -R mygenome_sorted.fa -T PhaseByTransmission -V all_sorted_withHeader.vcf -ped Sample_info.ped -o all_phased_by_transmission.vcf
INFO 20:54:02,921 HelpFormatter - Date/Time: 2012/04/22 20:54:02
INFO 20:54:02,921 HelpFormatter - ---------------------------------------------------------------------------------
INFO 20:54:02,921 HelpFormatter - ---------------------------------------------------------------------------------
INFO 20:54:02,995 RodBindingArgumentTypeDescriptor - Dynamically determined type of all_sorted_withHeader.vcf to be VCF
INFO 20:54:03,002 GenomeAnalysisEngine - Strictness is SILENT
INFO 20:54:04,551 RMDTrackBuilder - Creating Tribble index in memory for file all_sorted_withHeader.vcf
INFO 20:56:55,532 RMDTrackBuilder - Writing Tribble index to disk for file all_sorted_withHeader.vcf.idx
INFO 20:57:00,377 PedReader - Reading PED file Sample_info.ped with missing fields: []
INFO 20:57:00,489 PedReader - Phenotype is other? false
INFO 20:57:01,371 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 20:57:01,371 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 20:57:04,604 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.NumberFormatException: For input string: "."
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:449)
at java.lang.Integer.parseInt(Integer.java:499)
at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.parsePLsIntoLikelihoods(GenotypeLikelihoods.java:153)
at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.getAsVector(GenotypeLikelihoods.java:80)
at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.getAsMap(GenotypeLikelihoods.java:105)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.getLikelihoodsAsMapSafeNull(PhaseByTransmission.java:519)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:562)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:762)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:74)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:63)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:246)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:128)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 1.5-31-gadad76b):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: For input string: "."
##### ERROR ------------------------------------------------------------------------------------------
I suspect this is because of missing genotypes in lines like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ERS074168 ERS074167 ERS074166 ERS074171 ERS074170 ERS074169 ERS074174 ERS074173 ERS074172 ERS074177 ERS074176 ERS074175 ERS074180 ERS074179 ERS074178 ERS074183 ERS074182 ERS074181 ERS074186 ERS074185 ERS074184 ERS074189 ERS074188 ERS074187
1 160248 . T C 40.00 . AC1=6;AC=6;AF1=1;AN=6;DP4=0,0,3,0;DP=6;FQ=-28.1;MQ=29;SF=2;VDB=0.0046 GT:GQ:PL . . . . 1/1:3:0,0,0 1/1:3:0,0,0 1/1:10:72,9,0 . . . . . . . . . . . . . . .
Is there a different way I should format missing genotypes?
Many thanks!