![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
input BAM files for GATK | Jane M | Bioinformatics | 26 | 07-30-2015 09:58 PM |
GATK 'Variant quality score recalibration' for shallow coverage | jorge | Bioinformatics | 2 | 03-30-2012 04:08 AM |
GATK excludes some samples for cohort variant calling | liu_xt005 | Bioinformatics | 2 | 02-01-2012 11:58 AM |
minimum depth variant calling samtools/gatk | m_elena_bioinfo | Bioinformatics | 1 | 12-06-2011 08:31 AM |
has anyone tried variant detection with RNA-seq using GATK? | mard | Bioinformatics | 0 | 07-28-2011 12:13 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: UK Join Date: Nov 2011
Posts: 6
|
![]()
Dear all,
I am new to NGS and have been trying to run through the GATK variant calling pipeline on exome sequencing data. I'm currently having an issue with the variant quality score recalibrator, I have the following error message. ERROR MESSAGE: Bad input: Values for HaplotypeScore annotation not detected for ANY training variant in the input callset. Ive tried using using variant annotator on my UnifiedGenotyper vcf file, but that does not seem to correct the problem. I am also unsure as to whether my UnifiedGenotyper vcf file, or my hapmap, dbSNP and omni1000g resource files, are missing the annotations? Any help/advice on this issue would be much appreciated. Thanks, Elliott |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: UK Join Date: Nov 2011
Posts: 6
|
![]()
Also, do people usually obtain the resource files from the broad resource bundle, and if so I guess these should be annotated appropriately?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Stockholm, Sweden Join Date: Oct 2009
Posts: 62
|
![]()
Hey reeso123,
Could you give us the command used in GATK to produce the given error? It might be that HaplotypeScore is not part of the default annotation, hence you need to specify in the UnifiedGenotyper to add that annotation. |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: UK Join Date: Nov 2011
Posts: 6
|
![]()
Hi Boel,
The commands I used for the UnifiedGenotyper function were java -jar GenomeAnalysisTK-1.2-64-gf62af02/GenomeAnalysisTK.jar -glm BOTH -R reference_genome/HGC/Homo_sapiens_GRCh37_53.fasta -T UnifiedGenotyper -I ./test_trio/reads.10462.recal.bam -D DBsnp/b37/dbsnp_132_b37_sanger.vcf -o ./test_trio/SNP/chr22_snps.vcf -metrics ./test_trio/SNP/chr22metrics.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -L ./test_trio/Target_Intervals/chr22_target_interval.bed and the commands for variant recalibration were java -jar GenomeAnalysisTK-1.2-64-gf62af02/GenomeAnalysisTK.jar -T VariantRecalibrator -R reference_genome/HGC/Homo_sapiens_GRCh37_53.fasta -input ./test_trio/SNP/chr22_snps.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ./hapMap/hapmap_3.3.b37.sites_sanger.vcf -resource ![]() -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 ./DBsnp/b37/dbsnp_132_b37_sanger.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -recalFile ./test_trio/SNP/output.recal -tranchesFile ./test_trio/SNP/output.tranches -rscriptFile ./test_trio/SNP/output.plots.R As far as I'm aware, my vcf file created by the UnifiedGenotyper contains the annotations called upon in the variant recalibrator. Iv also used GATK variant annotator to try and add them in should they not be present! Iv attached a subset of the vcf file used should this help to identify the problem. Your help is much appreciated, Elliott |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Rochester, MN Join Date: Mar 2009
Posts: 191
|
![]()
I would recommend doing this. It will relieve a lot of stress. If your own annotation files are the slightest bit incorrect, GATK will likely throw errors.
|
![]() |
![]() |
![]() |
#6 | |
Member
Location: Stockholm, Sweden Join Date: Oct 2009
Posts: 62
|
![]()
Hi Elliott,
Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: UK Join Date: Nov 2011
Posts: 6
|
![]()
Hi all,
Thanks so much for your input. I think I may have corrected the problem, my hapmap, 1000g and dbSNP files were incorrect in that instead of a snp being located at chr22, it was chr2chr2! This is an error on my behalf from a bug in a perl script I wrote that tried to match the bam contig names with the SNP names in the resource files. It generally seems to be a bit of a nightmare obtaining the appropriate reference, hapmap, 1000g etc to match the bam, when the data that I have received has already been processed elsewhere. Elliott |
![]() |
![]() |
![]() |
#8 |
Member
Location: Stockholm, Sweden Join Date: Oct 2009
Posts: 62
|
![]()
Glad you solved it!
And to answer an earlier question: I also use much of the data from the Broad resource bundle. |
![]() |
![]() |
![]() |
#9 |
Member
Location: Germany Join Date: Mar 2011
Posts: 68
|
![]()
Hi all,
I try to use GATK as well, but I receive the following error message, when I start the VariantRecalibrator: "Argument with name '--cluster_file' (-clusterFile) is missing." My command is similar to the previous mentioned ones: java -Xmx4g -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator \ -R hg19.fasta \ -mode SNP \ --maxGaussians 6 \ -B:input,VCF snps.raw.vcf \ -B:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap.vcf \ -B ![]() -B:dbsnp,known=true,training=false,truth=false,prior=8.0 dbsnp.vcf \ -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ \ -recalFile out.recal \ -tranchesFile out.tranches \ -rscriptFile out.plots.R Does someone see the mistake? Does someone else need to use the clusterFile-argument? What is that exactly? I would be really happy for any help or recommendations. |
![]() |
![]() |
![]() |
#10 |
Member
Location: Baltimore, MD Join Date: Mar 2011
Posts: 19
|
![]()
Hi all,
I'm also getting a similar error, in my case: Code:
MESSAGE: Bad input: Values for FisherStrand annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. hapmap_3.3.hg19.sites.vcf: Code:
#CHROM POS ID REF ALT QUAL FILTER INFO chr1 566875 rs2185539 C T . PASS AC=66;AF=0.02369;AN=2786;set=MKK-YRI chr1 567753 rs11510103 A G . PASS AC=11;AF=0.00404;AN=2724;set=TSI-GIH-CHD-CEU-JPT chr1 728951 rs11240767 C T . PASS AC=139;AF=0.05044;AN=2756;set=MKK-YRI-LWK-MEX-ASW chr1 752721 rs3131972 A G . PASS AC=1660;AF=0.59456;AN=2792;set=Intersection Code:
#CHROM POS ID REF ALT QUAL FILTER INFO chr1 534247 SNP1-524110 C T . PASS CR=99.93414;GentrainScore=0.7423;HW=1.0 chr1 565286 SNP1-555149 C T . PASS CR=98.8266;GentrainScore=0.7029;HW=1.0 chr1 569624 SNP1-559487 T C . PASS CR=97.8022;GentrainScore=0.8070;HW=1.0 chr1 689186 rs4000335 G A . NOT_POLY_IN_1000G CR=99.86885;GentrainScore=0.7934;HW=1.0 Code:
#CHROM POS ID REF ALT QUAL FILTER INFO chrM 64 rs3883917 C T . PASS ASP;RSPOS=64;SAO=0;SCS=0;SLO;SSR=0;VC=SNP;VP=050100000005000000000100;WGT=1;dbSNPBuildID=108 chrM 146 rs72619361 T C . PASS ASP;G5;G5A;GNO;RSPOS=146;SAO=0;SCS=0;SSR=0;VC=SNP;VP=050000000005030100000100;WGT=1;dbSNPBuildID=130 chrM 152 rs117135796 T C . PASS ASP;GNO;RSPOS=152;SAO=0;SCS=0;SSR=0;VC=SNP;VP=050000000005000100000100;WGT=1;dbSNPBuildID=132 Carlos |
![]() |
![]() |
![]() |
#11 |
Member
Location: kolkata Join Date: Oct 2011
Posts: 32
|
![]()
Hello Everyone,
I am trying to run the GATK variantRecalibrator but getting an error message. command I am using to run it is java -jar GenomeAnalysisTK.jar -R results/test_human.fasta -T VariantRecalibrator -input results/exome_snp.vcf -resource:hapmap, known=false,training=true,truth=true,prior=15.0 results/hapmap_3.3.hg19.vcf -resource ![]() error message is ERROR MESSAGE: Invalid argument value 'results/hapmap_3.3.hg19.vcf' at position 8. ##### ERROR Invalid argument value 'results/1000G_omni2.5.hg19.sites.vcf' at position 11. I have downloaded both hapmap and 1000 genomes vcf file from GATK resource bundle. Any help would be appreciated. Thanks in advance Neha |
![]() |
![]() |
![]() |
#12 |
Member
Location: Belo Horizonte - Brazil Join Date: Jun 2010
Posts: 38
|
![]()
Can you post the first 20 lines of your VCF file ?
results/1000G_omni2.5.hg19.sites.vcf results/hapmap_3.3.hg19.vcf |
![]() |
![]() |
![]() |
#13 | |
Member
Location: kolkata Join Date: Oct 2011
Posts: 32
|
![]() Quote:
Neha |
|
![]() |
![]() |
![]() |
#14 | |
Member
Location: kolkata Join Date: Oct 2011
Posts: 32
|
![]() Quote:
Any help would be appreciated. Neha |
|
![]() |
![]() |
![]() |
#15 |
Member
Location: kolkata Join Date: Oct 2011
Posts: 32
|
![]()
Hello Everyone,
When using VariantRecalibrator walker of GATK I am facing a small problem. I am using the following command java -jar ./../GenomeAnalysisTK.jar -T VariantRecalibrator -R test_human.fasta -input exome_snp.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.vcf -resource ![]() In this I get the warning message that Rscript not found in environment path. exomeoutput.plots will be generated but PDF plots will not. Can anyone please guide me how to include the R script path. I am getting bit confused about it. Thanks in advance. Neha |
![]() |
![]() |
![]() |
Tags |
annotation, gatk, variant recalibrator |
Thread Tools | |
|
|