Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • reeso123
    Junior Member
    • Nov 2011
    • 6

    GATK variant recalibrator input files

    Dear all,

    I am new to NGS and have been trying to run through the GATK variant calling pipeline on exome sequencing data. I'm currently having an issue with the variant quality score recalibrator, I have the following error message.

    ERROR MESSAGE: Bad input: Values for HaplotypeScore annotation not detected for ANY training variant in the input callset.

    Ive tried using using variant annotator on my UnifiedGenotyper vcf file, but that does not seem to correct the problem. I am also unsure as to whether my UnifiedGenotyper vcf file, or my hapmap, dbSNP and omni1000g resource files, are missing the annotations? Any help/advice on this issue would be much appreciated.

    Thanks,
    Elliott
  • reeso123
    Junior Member
    • Nov 2011
    • 6

    #2
    Also, do people usually obtain the resource files from the broad resource bundle, and if so I guess these should be annotated appropriately?

    Comment

    • Boel
      Member
      • Oct 2009
      • 62

      #3
      Hey reeso123,

      Could you give us the command used in GATK to produce the given error?

      It might be that HaplotypeScore is not part of the default annotation, hence you need to specify in the UnifiedGenotyper to add that annotation.

      Comment

      • reeso123
        Junior Member
        • Nov 2011
        • 6

        #4
        Hi Boel,

        The commands I used for the UnifiedGenotyper function were

        java -jar GenomeAnalysisTK-1.2-64-gf62af02/GenomeAnalysisTK.jar
        -glm BOTH
        -R reference_genome/HGC/Homo_sapiens_GRCh37_53.fasta
        -T UnifiedGenotyper
        -I ./test_trio/reads.10462.recal.bam
        -D DBsnp/b37/dbsnp_132_b37_sanger.vcf
        -o ./test_trio/SNP/chr22_snps.vcf
        -metrics ./test_trio/SNP/chr22metrics.metrics
        -stand_call_conf 50.0
        -stand_emit_conf 10.0
        -L ./test_trio/Target_Intervals/chr22_target_interval.bed

        and the commands for variant recalibration were

        java -jar GenomeAnalysisTK-1.2-64-gf62af02/GenomeAnalysisTK.jar
        -T VariantRecalibrator
        -R reference_genome/HGC/Homo_sapiens_GRCh37_53.fasta
        -input ./test_trio/SNP/chr22_snps.vcf
        -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ./hapMap/hapmap_3.3.b37.sites_sanger.vcf
        -resourcemni,known=false,training=true,truth=false,prior=12.0 ./omni/1000G_omni2.5.b37.sites_sanger.vcf
        -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 ./DBsnp/b37/dbsnp_132_b37_sanger.vcf
        -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ
        -recalFile ./test_trio/SNP/output.recal
        -tranchesFile ./test_trio/SNP/output.tranches
        -rscriptFile ./test_trio/SNP/output.plots.R


        As far as I'm aware, my vcf file created by the UnifiedGenotyper contains the annotations called upon in the variant recalibrator. Iv also used GATK variant annotator to try and add them in should they not be present!

        Iv attached a subset of the vcf file used should this help to identify the problem.

        Your help is much appreciated,
        Elliott
        Attached Files

        Comment

        • RockChalkJayhawk
          Senior Member
          • Mar 2009
          • 192

          #5
          Originally posted by reeso123 View Post
          Also, do people usually obtain the resource files from the broad resource bundle, and if so I guess these should be annotated appropriately?
          I would recommend doing this. It will relieve a lot of stress. If your own annotation files are the slightest bit incorrect, GATK will likely throw errors.

          Comment

          • Boel
            Member
            • Oct 2009
            • 62

            #6
            Hi Elliott,

            ERROR MESSAGE: Bad input: Values for HaplotypeScore annotation not detected for ANY training variant in the input callset.
            I am not sure what is going on, but the error might indicate that none of the known variants (hapmap, 1000g or dbsnp) are present in your VCF file. Could that be the case?

            Comment

            • reeso123
              Junior Member
              • Nov 2011
              • 6

              #7
              Hi all,

              Thanks so much for your input. I think I may have corrected the problem, my hapmap, 1000g and dbSNP files were incorrect in that instead of a snp being located at chr22, it was chr2chr2! This is an error on my behalf from a bug in a perl script I wrote that tried to match the bam contig names with the SNP names in the resource files. It generally seems to be a bit of a nightmare obtaining the appropriate reference, hapmap, 1000g etc to match the bam, when the data that I have received has already been processed elsewhere.

              Elliott

              Comment

              • Boel
                Member
                • Oct 2009
                • 62

                #8
                Glad you solved it!
                And to answer an earlier question: I also use much of the data from the Broad resource bundle.

                Comment

                • Robby
                  Member
                  • Mar 2011
                  • 68

                  #9
                  Hi all,

                  I try to use GATK as well, but I receive the following error message, when I start the VariantRecalibrator: "Argument with name '--cluster_file' (-clusterFile) is missing."

                  My command is similar to the previous mentioned ones:
                  java -Xmx4g -jar GenomeAnalysisTK.jar \
                  -T VariantRecalibrator \
                  -R hg19.fasta \
                  -mode SNP \
                  --maxGaussians 6 \
                  -B:input,VCF snps.raw.vcf \
                  -B:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap.vcf \
                  -Bmni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.vcf \
                  -B:dbsnp,known=true,training=false,truth=false,prior=8.0 dbsnp.vcf \
                  -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ \
                  -recalFile out.recal \
                  -tranchesFile out.tranches \
                  -rscriptFile out.plots.R


                  Does someone see the mistake? Does someone else need to use the clusterFile-argument? What is that exactly? I would be really happy for any help or recommendations.

                  Comment

                  • Carlos Borroto
                    Member
                    • Mar 2011
                    • 19

                    #10
                    Hi all,

                    I'm also getting a similar error, in my case:
                    Code:
                    MESSAGE: Bad input: Values for FisherStrand annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
                    I'm using the resource files from Broad GATK bundle. My VCF file to be recalibrated does have this annotations, which I added with "Variant Annotator" tool. Do I have to add them to the bundle files also? I can see they don't have it.

                    hapmap_3.3.hg19.sites.vcf:
                    Code:
                    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
                    chr1	566875	rs2185539	C	T	.	PASS	AC=66;AF=0.02369;AN=2786;set=MKK-YRI
                    chr1	567753	rs11510103	A	G	.	PASS	AC=11;AF=0.00404;AN=2724;set=TSI-GIH-CHD-CEU-JPT
                    chr1	728951	rs11240767	C	T	.	PASS	AC=139;AF=0.05044;AN=2756;set=MKK-YRI-LWK-MEX-ASW
                    chr1	752721	rs3131972	A	G	.	PASS	AC=1660;AF=0.59456;AN=2792;set=Intersection
                    1000G_omni2.5.hg19.sites.vcf:
                    Code:
                    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
                    chr1	534247	SNP1-524110	C	T	.	PASS	CR=99.93414;GentrainScore=0.7423;HW=1.0
                    chr1	565286	SNP1-555149	C	T	.	PASS	CR=98.8266;GentrainScore=0.7029;HW=1.0
                    chr1	569624	SNP1-559487	T	C	.	PASS	CR=97.8022;GentrainScore=0.8070;HW=1.0
                    chr1	689186	rs4000335	G	A	.	NOT_POLY_IN_1000G	CR=99.86885;GentrainScore=0.7934;HW=1.0
                    dbsnp_132.hg19.vcf:
                    Code:
                    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	
                    chrM	64	rs3883917	C	T	.	PASS	ASP;RSPOS=64;SAO=0;SCS=0;SLO;SSR=0;VC=SNP;VP=050100000005000000000100;WGT=1;dbSNPBuildID=108
                    chrM	146	rs72619361	T	C	.	PASS	ASP;G5;G5A;GNO;RSPOS=146;SAO=0;SCS=0;SSR=0;VC=SNP;VP=050000000005030100000100;WGT=1;dbSNPBuildID=130
                    chrM	152	rs117135796	T	C	.	PASS	ASP;GNO;RSPOS=152;SAO=0;SCS=0;SSR=0;VC=SNP;VP=050000000005000100000100;WGT=1;dbSNPBuildID=132
                    Thanks,
                    Carlos

                    Comment

                    • neha
                      Member
                      • Oct 2011
                      • 32

                      #11
                      Problem in running VariantRecalibrator

                      Hello Everyone,

                      I am trying to run the GATK variantRecalibrator but getting an error message.
                      command I am using to run it is


                      java -jar GenomeAnalysisTK.jar -R results/test_human.fasta -T VariantRecalibrator -input results/exome_snp.vcf -resource:hapmap, known=false,training=true,truth=true,prior=15.0 results/hapmap_3.3.hg19.vcf -resourcemni, known=false,training=true,truth=false,prior=12.0 results/1000G_omni2.5.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 results/00-All.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -recalFile results/exome_variantscore.recal -tranchesFile exomeoutput.tranches -rscriptFile exomeoutput.plots.R

                      error message is ERROR MESSAGE: Invalid argument value 'results/hapmap_3.3.hg19.vcf' at position 8.
                      ##### ERROR Invalid argument value 'results/1000G_omni2.5.hg19.sites.vcf' at position 11.

                      I have downloaded both hapmap and 1000 genomes vcf file from GATK resource bundle.

                      Any help would be appreciated.

                      Thanks in advance
                      Neha

                      Comment

                      • raonyguimaraes
                        Member
                        • Jun 2010
                        • 38

                        #12
                        Can you post the first 20 lines of your VCF file ?

                        results/1000G_omni2.5.hg19.sites.vcf

                        results/hapmap_3.3.hg19.vcf

                        Comment

                        • neha
                          Member
                          • Oct 2011
                          • 32

                          #13
                          Originally posted by raonyguimaraes View Post
                          Can you post the first 20 lines of your VCF file ?

                          results/1000G_omni2.5.hg19.sites.vcf

                          results/hapmap_3.3.hg19.vcf
                          I am attaching the doc file for hapmap3.3 and 1000_genome_file. By seeing Hapmap3.3 file I am guessing there is something wrong with this file or may be I am confused this file looks like this only.

                          Neha
                          Attached Files

                          Comment

                          • neha
                            Member
                            • Oct 2011
                            • 32

                            #14
                            Originally posted by neha View Post
                            I am attaching the doc file for hapmap3.3 and 1000_genome_file. By seeing Hapmap3.3 file I am guessing there is something wrong with this file or may be I am confused this file looks like this only.

                            Neha
                            Hey did You get the chance to see the files.

                            Any help would be appreciated.

                            Neha

                            Comment

                            • neha
                              Member
                              • Oct 2011
                              • 32

                              #15
                              Hello Everyone,

                              When using VariantRecalibrator walker of GATK I am facing a small problem.

                              I am using the following command

                              java -jar ./../GenomeAnalysisTK.jar -T VariantRecalibrator -R test_human.fasta -input exome_snp.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.vcf -resourcemni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 00-All.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -recalFile exome_variantscore.recal -tranchesFile exomeoutput.tranches -rscriptFile exomeoutput.plots

                              In this I get the warning message that

                              Rscript not found in environment path. exomeoutput.plots will be generated but PDF plots will not.

                              Can anyone please guide me how to include the R script path. I am getting bit confused about it.

                              Thanks in advance.
                              Neha

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...