![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem with GATK ReadBackedPhasing | mschizas | Bioinformatics | 1 | 05-26-2012 06:41 AM |
Problem GATK with CountCovariates | amathieu | Bioinformatics | 7 | 02-27-2012 01:28 AM |
problem about GATK indel VQSR | wanguan2000 | Bioinformatics | 2 | 11-07-2011 07:15 AM |
GATK - Problem | amathieu | Bioinformatics | 2 | 09-26-2011 05:23 AM |
GATK VariantFiltration undefined variable AB | wanguan2000 | Bioinformatics | 0 | 08-03-2011 06:52 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Planet Earth Join Date: Jan 2011
Posts: 26
|
![]()
I used GATK UnifiedGenotyper (latest build) to call variants.
Now I have a vcf-file which I want to filter. Although the corresponding fields are correctly defined in the header: ##fileformat=VCFv4.1 ##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities"> ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias"> I always get an "undefined variable" error when I filter for MQRankSum or ReadPosRankSum . Any suggestions are very welcome! |
![]() |
![]() |
![]() |
#2 |
Member
Location: Planet Earth Join Date: Jan 2011
Posts: 26
|
![]()
Okay, I've come a bit further. The error is due to some lines in the vcf where the annotation for MQRankSum or ReadPosRankSum is missing. When I delete those, it is running fine.
Anyone an idea why UnifiedGenotyper makes these 'mistakes' ? Is there a way of correcting this ? Letting the VariantFilter run although displaying this error gives me a filtered file. Does it mean that these specific annotations are skipped for all positions or only for those which are missing in it ? |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: UK Join Date: Jan 2010
Posts: 390
|
![]()
ReadPosRankSum isn't generated for every variant (at least, I see the same behaviour as you). About 43% of my variants get annotated with ReadPosRankSum and MQRankSum (indeed the same variants are annotated with both, they do not appear independently of each other in my data).
Consequently what I get when filtering variants is : WARN 23:03:05,494 Interpreter - org.broadinstitute.sting.utils.variantcontext.VariantContextUtils.initializeMatchExps@310![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum This seems to be expected behaviour, if you're using 'complex' JEXL expressions as discussed in http://www.broadinstitute.org/gsa/wi...XL_expressions I was a bit concerned about this as well, but it's because we have split filtering into separate expressions rather than using the old "one filter || another filter || a third filter". From that page "It is very important to note that JEXL cannot correctly handle missing annotations and throws an exception when it encounters this scenario. The default behavior of the GATK is to handle this by having the entire expression evaluate to FALSE in such cases" |
![]() |
![]() |
![]() |
#4 |
Member
Location: Planet Earth Join Date: Jan 2011
Posts: 26
|
![]()
Just to be sure I correctly understand the recommendations....
If I use the "--missingValuesInExpressionsShouldEvaluateAsFailing" option should solve the problem ?? |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: UK Join Date: Jan 2010
Posts: 390
|
![]()
Yes, that will almost certainly exclude the variants missing ReadPosRankSum and MQRankSum entries, but I think you need to be clear if this is desirable or not. What is being spat as warnings are just that - warnings, not errors.
If I look at a VCF file from our pipeline, I can pick out any number of SNPs that have failed the ReadPosRankSum or MQRankSum JEXL expressions, but *passed* one of the other expressions, that are sufficiently well covered and sufficiently different to the reference for me to be happy to include them in downstream analysis. This is all IMHO, but I'd rather report variants and deal with working out if they're pathogenic later rather than throwing away 40% of my dataset at this stage. It's your call though. I do agree that the wording of this sentence: "When evaluating the JEXL expressions, missing values should be considered failing the expression. By default, if JEXL cannot evaluate your expression for a particular record because one of the annotations is not present, the whole expression evaluates as PASSing. Use this argument to have it evaluate as failing filters instead for these cases." Makes it sound like missing values should be failed, not passed based on other expressions. I think I'm going to have a closer look at this today and see how it works in our context. Last edited by Bukowski; 01-12-2012 at 02:48 AM. |
![]() |
![]() |
![]() |
#6 |
Member
Location: Planet Earth Join Date: Jan 2011
Posts: 26
|
![]()
Hm - good point !
So to the get the filtering done correctly like suggested here and here , one would have to set one filter for every expression ? Like: --filterExpression "QD <2.0" --filterName "QDfilter" --filterExpression "MQ <40.0" --filterName "MQfilter" rather than combining them with || ? because including "ReadPosRankSum" would omit the whole entry although the other criteria are fine ?? This would mean that different combinations of filter criteria like for SNPs and INDELs would have to run separately ? |
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: UK Join Date: Jan 2010
Posts: 390
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Member
Location: Planet Earth Join Date: Jan 2011
Posts: 26
|
![]() |
![]() |
![]() |
![]() |
#9 | |
Junior Member
Location: Germany Join Date: Feb 2012
Posts: 1
|
![]()
I also have the problem that for some variants the ReadPosRankSum values are missing.
An explanation why this occurs only in some lines, can be found here: http://www.broadinstitute.org/gsa/ga...nkSumTest.html Quote:
If the JEXL expressions support lazy evaluation, something like this could be helpful: Code:
--filterExpression "isHet == 1 && ReadPosRankSum < -20.0" |
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Hungary Join Date: Aug 2009
Posts: 16
|
![]()
You can select calls MQRankSum and ReadPosRankSum by
java GenomeAnalysisTK.jar -l INFO -R ref.fasta -T SelectVariants --variant input.vcf -o with_mq_rp.vcf -select 'vc.hasAttribute("MQRankSum") && vc.hasAttribute("ReadPosRankSum")' and its complementer by -select '!vc.hasAttribute("MQRankSum") || !vc.hasAttribute("ReadPosRankSum")' to filter out calls with the needed attributes. Still suboptimal, but can help. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|