Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK error with a VCF including missing genotypes

    Hello!

    UPDATE: solved - was not to do with missing genotypes, but rather with triallelic SNPs that the tool cannot process.

    I'm trying to phase my genotypes based on the data for 8 trios of a sequenced organism using PhaseByTransmission, which fails with messages as listed below.

    $ java -Xmx20g -jar ~/n/GenomeAnalysisTK-1.5-31-gadad76b/GenomeAnalysisTK.jar -R mygenome_sorted.fa -T PhaseByTransmission -V all_sorted_withHeader.vcf -ped Sample_info.ped -o all_phased_by_transmission.vcf

    INFO 20:54:02,919 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 20:54:02,920 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 20:54:02,920 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 20:54:02,920 HelpFormatter - Program Args: -R mygenome_sorted.fa -T PhaseByTransmission -V all_sorted_withHeader.vcf -ped Sample_info.ped -o all_phased_by_transmission.vcf
    INFO 20:54:02,921 HelpFormatter - Date/Time: 2012/04/22 20:54:02
    INFO 20:54:02,921 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:54:02,921 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:54:02,995 RodBindingArgumentTypeDescriptor - Dynamically determined type of all_sorted_withHeader.vcf to be VCF
    INFO 20:54:03,002 GenomeAnalysisEngine - Strictness is SILENT
    INFO 20:54:04,551 RMDTrackBuilder - Creating Tribble index in memory for file all_sorted_withHeader.vcf
    INFO 20:56:55,532 RMDTrackBuilder - Writing Tribble index to disk for file all_sorted_withHeader.vcf.idx
    INFO 20:57:00,377 PedReader - Reading PED file Sample_info.ped with missing fields: []
    INFO 20:57:00,489 PedReader - Phenotype is other? false
    INFO 20:57:01,371 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
    INFO 20:57:01,371 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
    INFO 20:57:04,604 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace
    java.lang.NumberFormatException: For input string: "."
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Integer.parseInt(Integer.java:449)
    at java.lang.Integer.parseInt(Integer.java:499)
    at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.parsePLsIntoLikelihoods(GenotypeLikelihoods.java:153)
    at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.getAsVector(GenotypeLikelihoods.java:80)
    at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.getAsMap(GenotypeLikelihoods.java:105)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.getLikelihoodsAsMapSafeNull(PhaseByTransmission.java:519)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:562)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:762)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:74)
    at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78)
    at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:63)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:246)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:128)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 1.5-31-gadad76b):
    ##### ERROR
    ##### ERROR Please visit the wiki to see if this is a known problem
    ##### ERROR If not, please post the error, with stack trace, to the GATK forum
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: For input string: "."
    ##### ERROR ------------------------------------------------------------------------------------------

    I suspect this is because of missing genotypes in lines like this:

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ERS074168 ERS074167 ERS074166 ERS074171 ERS074170 ERS074169 ERS074174 ERS074173 ERS074172 ERS074177 ERS074176 ERS074175 ERS074180 ERS074179 ERS074178 ERS074183 ERS074182 ERS074181 ERS074186 ERS074185 ERS074184 ERS074189 ERS074188 ERS074187
    1 160248 . T C 40.00 . AC1=6;AC=6;AF1=1;AN=6;DP4=0,0,3,0;DP=6;FQ=-28.1;MQ=29;SF=2;VDB=0.0046 GT:GQ:PL . . . . 1/1:3:0,0,0 1/1:3:0,0,0 1/1:10:72,9,0 . . . . . . . . . . . . . . .

    Is there a different way I should format missing genotypes?

    Many thanks!
    Last edited by a11msp; 04-23-2012, 07:12 AM.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 11:49 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X