SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
input BAM files for GATK Jane M Bioinformatics 26 07-30-2015 09:58 PM
GATK 'Variant quality score recalibration' for shallow coverage jorge Bioinformatics 2 03-30-2012 04:08 AM
GATK excludes some samples for cohort variant calling liu_xt005 Bioinformatics 2 02-01-2012 11:58 AM
minimum depth variant calling samtools/gatk m_elena_bioinfo Bioinformatics 1 12-06-2011 08:31 AM
has anyone tried variant detection with RNA-seq using GATK? mard Bioinformatics 0 07-28-2011 12:13 AM

Reply
 
Thread Tools
Old 11-15-2011, 03:05 AM   #1
reeso123
Junior Member
 
Location: UK

Join Date: Nov 2011
Posts: 6
Default GATK variant recalibrator input files

Dear all,

I am new to NGS and have been trying to run through the GATK variant calling pipeline on exome sequencing data. I'm currently having an issue with the variant quality score recalibrator, I have the following error message.

ERROR MESSAGE: Bad input: Values for HaplotypeScore annotation not detected for ANY training variant in the input callset.

Ive tried using using variant annotator on my UnifiedGenotyper vcf file, but that does not seem to correct the problem. I am also unsure as to whether my UnifiedGenotyper vcf file, or my hapmap, dbSNP and omni1000g resource files, are missing the annotations? Any help/advice on this issue would be much appreciated.

Thanks,
Elliott
reeso123 is offline   Reply With Quote
Old 11-15-2011, 06:10 AM   #2
reeso123
Junior Member
 
Location: UK

Join Date: Nov 2011
Posts: 6
Default

Also, do people usually obtain the resource files from the broad resource bundle, and if so I guess these should be annotated appropriately?
reeso123 is offline   Reply With Quote
Old 11-15-2011, 06:47 AM   #3
Boel
Member
 
Location: Stockholm, Sweden

Join Date: Oct 2009
Posts: 62
Default

Hey reeso123,

Could you give us the command used in GATK to produce the given error?

It might be that HaplotypeScore is not part of the default annotation, hence you need to specify in the UnifiedGenotyper to add that annotation.
Boel is offline   Reply With Quote
Old 11-15-2011, 07:12 AM   #4
reeso123
Junior Member
 
Location: UK

Join Date: Nov 2011
Posts: 6
Default

Hi Boel,

The commands I used for the UnifiedGenotyper function were

java -jar GenomeAnalysisTK-1.2-64-gf62af02/GenomeAnalysisTK.jar
-glm BOTH
-R reference_genome/HGC/Homo_sapiens_GRCh37_53.fasta
-T UnifiedGenotyper
-I ./test_trio/reads.10462.recal.bam
-D DBsnp/b37/dbsnp_132_b37_sanger.vcf
-o ./test_trio/SNP/chr22_snps.vcf
-metrics ./test_trio/SNP/chr22metrics.metrics
-stand_call_conf 50.0
-stand_emit_conf 10.0
-L ./test_trio/Target_Intervals/chr22_target_interval.bed

and the commands for variant recalibration were

java -jar GenomeAnalysisTK-1.2-64-gf62af02/GenomeAnalysisTK.jar
-T VariantRecalibrator
-R reference_genome/HGC/Homo_sapiens_GRCh37_53.fasta
-input ./test_trio/SNP/chr22_snps.vcf
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 ./hapMap/hapmap_3.3.b37.sites_sanger.vcf
-resourcemni,known=false,training=true,truth=false,prior=12.0 ./omni/1000G_omni2.5.b37.sites_sanger.vcf
-resource:dbsnp,known=true,training=false,truth=false,prior=8.0 ./DBsnp/b37/dbsnp_132_b37_sanger.vcf
-an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ
-recalFile ./test_trio/SNP/output.recal
-tranchesFile ./test_trio/SNP/output.tranches
-rscriptFile ./test_trio/SNP/output.plots.R


As far as I'm aware, my vcf file created by the UnifiedGenotyper contains the annotations called upon in the variant recalibrator. Iv also used GATK variant annotator to try and add them in should they not be present!

Iv attached a subset of the vcf file used should this help to identify the problem.

Your help is much appreciated,
Elliott
Attached Files
File Type: txt chr22_snps.txt (15.2 KB, 33 views)
reeso123 is offline   Reply With Quote
Old 11-16-2011, 12:11 AM   #5
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by reeso123 View Post
Also, do people usually obtain the resource files from the broad resource bundle, and if so I guess these should be annotated appropriately?
I would recommend doing this. It will relieve a lot of stress. If your own annotation files are the slightest bit incorrect, GATK will likely throw errors.
RockChalkJayhawk is offline   Reply With Quote
Old 11-16-2011, 07:22 AM   #6
Boel
Member
 
Location: Stockholm, Sweden

Join Date: Oct 2009
Posts: 62
Default

Hi Elliott,

Quote:
ERROR MESSAGE: Bad input: Values for HaplotypeScore annotation not detected for ANY training variant in the input callset.
I am not sure what is going on, but the error might indicate that none of the known variants (hapmap, 1000g or dbsnp) are present in your VCF file. Could that be the case?
Boel is offline   Reply With Quote
Old 11-16-2011, 07:53 AM   #7
reeso123
Junior Member
 
Location: UK

Join Date: Nov 2011
Posts: 6
Default

Hi all,

Thanks so much for your input. I think I may have corrected the problem, my hapmap, 1000g and dbSNP files were incorrect in that instead of a snp being located at chr22, it was chr2chr2! This is an error on my behalf from a bug in a perl script I wrote that tried to match the bam contig names with the SNP names in the resource files. It generally seems to be a bit of a nightmare obtaining the appropriate reference, hapmap, 1000g etc to match the bam, when the data that I have received has already been processed elsewhere.

Elliott
reeso123 is offline   Reply With Quote
Old 11-16-2011, 07:58 AM   #8
Boel
Member
 
Location: Stockholm, Sweden

Join Date: Oct 2009
Posts: 62
Default

Glad you solved it!
And to answer an earlier question: I also use much of the data from the Broad resource bundle.
Boel is offline   Reply With Quote
Old 11-25-2011, 02:53 AM   #9
Robby
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 68
Default

Hi all,

I try to use GATK as well, but I receive the following error message, when I start the VariantRecalibrator: "Argument with name '--cluster_file' (-clusterFile) is missing."

My command is similar to the previous mentioned ones:
java -Xmx4g -jar GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R hg19.fasta \
-mode SNP \
--maxGaussians 6 \
-B:input,VCF snps.raw.vcf \
-B:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap.vcf \
-Bmni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.vcf \
-B:dbsnp,known=true,training=false,truth=false,prior=8.0 dbsnp.vcf \
-an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ \
-recalFile out.recal \
-tranchesFile out.tranches \
-rscriptFile out.plots.R


Does someone see the mistake? Does someone else need to use the clusterFile-argument? What is that exactly? I would be really happy for any help or recommendations.
Robby is offline   Reply With Quote
Old 11-28-2011, 07:26 AM   #10
Carlos Borroto
Member
 
Location: Baltimore, MD

Join Date: Mar 2011
Posts: 19
Default

Hi all,

I'm also getting a similar error, in my case:
Code:
MESSAGE: Bad input: Values for FisherStrand annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
I'm using the resource files from Broad GATK bundle. My VCF file to be recalibrated does have this annotations, which I added with "Variant Annotator" tool. Do I have to add them to the bundle files also? I can see they don't have it.

hapmap_3.3.hg19.sites.vcf:
Code:
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr1	566875	rs2185539	C	T	.	PASS	AC=66;AF=0.02369;AN=2786;set=MKK-YRI
chr1	567753	rs11510103	A	G	.	PASS	AC=11;AF=0.00404;AN=2724;set=TSI-GIH-CHD-CEU-JPT
chr1	728951	rs11240767	C	T	.	PASS	AC=139;AF=0.05044;AN=2756;set=MKK-YRI-LWK-MEX-ASW
chr1	752721	rs3131972	A	G	.	PASS	AC=1660;AF=0.59456;AN=2792;set=Intersection
1000G_omni2.5.hg19.sites.vcf:
Code:
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr1	534247	SNP1-524110	C	T	.	PASS	CR=99.93414;GentrainScore=0.7423;HW=1.0
chr1	565286	SNP1-555149	C	T	.	PASS	CR=98.8266;GentrainScore=0.7029;HW=1.0
chr1	569624	SNP1-559487	T	C	.	PASS	CR=97.8022;GentrainScore=0.8070;HW=1.0
chr1	689186	rs4000335	G	A	.	NOT_POLY_IN_1000G	CR=99.86885;GentrainScore=0.7934;HW=1.0
dbsnp_132.hg19.vcf:
Code:
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	
chrM	64	rs3883917	C	T	.	PASS	ASP;RSPOS=64;SAO=0;SCS=0;SLO;SSR=0;VC=SNP;VP=050100000005000000000100;WGT=1;dbSNPBuildID=108
chrM	146	rs72619361	T	C	.	PASS	ASP;G5;G5A;GNO;RSPOS=146;SAO=0;SCS=0;SSR=0;VC=SNP;VP=050000000005030100000100;WGT=1;dbSNPBuildID=130
chrM	152	rs117135796	T	C	.	PASS	ASP;GNO;RSPOS=152;SAO=0;SCS=0;SSR=0;VC=SNP;VP=050000000005000100000100;WGT=1;dbSNPBuildID=132
Thanks,
Carlos
Carlos Borroto is offline   Reply With Quote
Old 03-19-2012, 06:07 AM   #11
neha
Member
 
Location: kolkata

Join Date: Oct 2011
Posts: 32
Default Problem in running VariantRecalibrator

Hello Everyone,

I am trying to run the GATK variantRecalibrator but getting an error message.
command I am using to run it is


java -jar GenomeAnalysisTK.jar -R results/test_human.fasta -T VariantRecalibrator -input results/exome_snp.vcf -resource:hapmap, known=false,training=true,truth=true,prior=15.0 results/hapmap_3.3.hg19.vcf -resourcemni, known=false,training=true,truth=false,prior=12.0 results/1000G_omni2.5.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 results/00-All.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -recalFile results/exome_variantscore.recal -tranchesFile exomeoutput.tranches -rscriptFile exomeoutput.plots.R

error message is ERROR MESSAGE: Invalid argument value 'results/hapmap_3.3.hg19.vcf' at position 8.
##### ERROR Invalid argument value 'results/1000G_omni2.5.hg19.sites.vcf' at position 11.

I have downloaded both hapmap and 1000 genomes vcf file from GATK resource bundle.

Any help would be appreciated.

Thanks in advance
Neha
neha is offline   Reply With Quote
Old 03-19-2012, 06:50 AM   #12
raonyguimaraes
Member
 
Location: Belo Horizonte - Brazil

Join Date: Jun 2010
Posts: 38
Default

Can you post the first 20 lines of your VCF file ?

results/1000G_omni2.5.hg19.sites.vcf

results/hapmap_3.3.hg19.vcf
raonyguimaraes is offline   Reply With Quote
Old 03-19-2012, 09:57 PM   #13
neha
Member
 
Location: kolkata

Join Date: Oct 2011
Posts: 32
Default

Quote:
Originally Posted by raonyguimaraes View Post
Can you post the first 20 lines of your VCF file ?

results/1000G_omni2.5.hg19.sites.vcf

results/hapmap_3.3.hg19.vcf
I am attaching the doc file for hapmap3.3 and 1000_genome_file. By seeing Hapmap3.3 file I am guessing there is something wrong with this file or may be I am confused this file looks like this only.

Neha
Attached Files
File Type: txt 1000_genome.txt (2.0 KB, 22 views)
File Type: txt Hapmap_3.txt (2.3 KB, 8 views)
neha is offline   Reply With Quote
Old 03-21-2012, 09:17 PM   #14
neha
Member
 
Location: kolkata

Join Date: Oct 2011
Posts: 32
Default

Quote:
Originally Posted by neha View Post
I am attaching the doc file for hapmap3.3 and 1000_genome_file. By seeing Hapmap3.3 file I am guessing there is something wrong with this file or may be I am confused this file looks like this only.

Neha
Hey did You get the chance to see the files.

Any help would be appreciated.

Neha
neha is offline   Reply With Quote
Old 05-30-2012, 02:56 AM   #15
neha
Member
 
Location: kolkata

Join Date: Oct 2011
Posts: 32
Default

Hello Everyone,

When using VariantRecalibrator walker of GATK I am facing a small problem.

I am using the following command

java -jar ./../GenomeAnalysisTK.jar -T VariantRecalibrator -R test_human.fasta -input exome_snp.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.vcf -resourcemni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 00-All.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -recalFile exome_variantscore.recal -tranchesFile exomeoutput.tranches -rscriptFile exomeoutput.plots

In this I get the warning message that

Rscript not found in environment path. exomeoutput.plots will be generated but PDF plots will not.

Can anyone please guide me how to include the R script path. I am getting bit confused about it.

Thanks in advance.
Neha
neha is offline   Reply With Quote
Reply

Tags
annotation, gatk, variant recalibrator

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 06:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO