SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with GATK ReadBackedPhasing mschizas Bioinformatics 1 05-26-2012 06:41 AM
Problem GATK with CountCovariates amathieu Bioinformatics 7 02-27-2012 01:28 AM
problem about GATK indel VQSR wanguan2000 Bioinformatics 2 11-07-2011 07:15 AM
GATK - Problem amathieu Bioinformatics 2 09-26-2011 05:23 AM
GATK VariantFiltration undefined variable AB wanguan2000 Bioinformatics 0 08-03-2011 06:52 PM

Reply
 
Thread Tools
Old 01-02-2012, 06:27 AM   #1
doc.ramses
Member
 
Location: Planet Earth

Join Date: Jan 2011
Posts: 26
Default GATK undefined variable problem

I used GATK UnifiedGenotyper (latest build) to call variants.
Now I have a vcf-file which I want to filter. Although the corresponding fields are correctly defined in the header:

##fileformat=VCFv4.1
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">

I always get an "undefined variable" error when I filter for MQRankSum or ReadPosRankSum .

Any suggestions are very welcome!
doc.ramses is offline   Reply With Quote
Old 01-04-2012, 01:22 AM   #2
doc.ramses
Member
 
Location: Planet Earth

Join Date: Jan 2011
Posts: 26
Default

Okay, I've come a bit further. The error is due to some lines in the vcf where the annotation for MQRankSum or ReadPosRankSum is missing. When I delete those, it is running fine.
Anyone an idea why UnifiedGenotyper makes these 'mistakes' ?
Is there a way of correcting this ?
Letting the VariantFilter run although displaying this error gives me a filtered file. Does it mean that these specific annotations are skipped for all positions or only for those which are missing in it ?
doc.ramses is offline   Reply With Quote
Old 01-05-2012, 01:39 AM   #3
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

ReadPosRankSum isn't generated for every variant (at least, I see the same behaviour as you). About 43% of my variants get annotated with ReadPosRankSum and MQRankSum (indeed the same variants are annotated with both, they do not appear independently of each other in my data).

Consequently what I get when filtering variants is :

WARN 23:03:05,494 Interpreter - org.broadinstitute.sting.utils.variantcontext.VariantContextUtils.initializeMatchExps@310![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum

This seems to be expected behaviour, if you're using 'complex' JEXL expressions as discussed in http://www.broadinstitute.org/gsa/wi...XL_expressions

I was a bit concerned about this as well, but it's because we have split filtering into separate expressions rather than using the old "one filter || another filter || a third filter".

From that page

"It is very important to note that JEXL cannot correctly handle missing annotations and throws an exception when it encounters this scenario. The default behavior of the GATK is to handle this by having the entire expression evaluate to FALSE in such cases"
Bukowski is offline   Reply With Quote
Old 01-12-2012, 02:04 AM   #4
doc.ramses
Member
 
Location: Planet Earth

Join Date: Jan 2011
Posts: 26
Default

Just to be sure I correctly understand the recommendations....

If I use the "--missingValuesInExpressionsShouldEvaluateAsFailing" option should solve the problem ??
doc.ramses is offline   Reply With Quote
Old 01-12-2012, 02:31 AM   #5
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Yes, that will almost certainly exclude the variants missing ReadPosRankSum and MQRankSum entries, but I think you need to be clear if this is desirable or not. What is being spat as warnings are just that - warnings, not errors.

If I look at a VCF file from our pipeline, I can pick out any number of SNPs that have failed the ReadPosRankSum or MQRankSum JEXL expressions, but *passed* one of the other expressions, that are sufficiently well covered and sufficiently different to the reference for me to be happy to include them in downstream analysis.

This is all IMHO, but I'd rather report variants and deal with working out if they're pathogenic later rather than throwing away 40% of my dataset at this stage. It's your call though.

I do agree that the wording of this sentence:

"When evaluating the JEXL expressions, missing values should be considered failing the expression. By default, if JEXL cannot evaluate your expression for a particular record because one of the annotations is not present, the whole expression evaluates as PASSing. Use this argument to have it evaluate as failing filters instead for these cases."

Makes it sound like missing values should be failed, not passed based on other expressions. I think I'm going to have a closer look at this today and see how it works in our context.

Last edited by Bukowski; 01-12-2012 at 02:48 AM.
Bukowski is offline   Reply With Quote
Old 01-12-2012, 02:53 AM   #6
doc.ramses
Member
 
Location: Planet Earth

Join Date: Jan 2011
Posts: 26
Default

Hm - good point !
So to the get the filtering done correctly like suggested here and here , one would have to set one filter for every expression ? Like:

--filterExpression "QD <2.0"
--filterName "QDfilter"
--filterExpression "MQ <40.0"
--filterName "MQfilter"

rather than combining them with || ?
because including "ReadPosRankSum" would omit the whole entry although the other criteria are fine ??

This would mean that different combinations of filter criteria like for SNPs and INDELs would have to run separately ?
doc.ramses is offline   Reply With Quote
Old 01-12-2012, 03:52 AM   #7
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Quote:
Originally Posted by doc.ramses View Post
Hm - good point !
So to the get the filtering done correctly like suggested here and here , one would have to set one filter for every expression ? Like:

--filterExpression "QD <2.0"
--filterName "QDfilter"
--filterExpression "MQ <40.0"
--filterName "MQfilter"

rather than combining them with || ?
because including "ReadPosRankSum" would omit the whole entry although the other criteria are fine ??

This would mean that different combinations of filter criteria like for SNPs and INDELs would have to run separately ?
That's effectively what we do. And yes we filter SNPs and indels with different criteria, we believe we are matching the GATKv3 recommendations in this way (for non-VQSR filtering anyway).
Bukowski is offline   Reply With Quote
Old 01-12-2012, 04:44 AM   #8
doc.ramses
Member
 
Location: Planet Earth

Join Date: Jan 2011
Posts: 26
Default

Quote:
Originally Posted by Bukowski View Post
That's effectively what we do. And yes we filter SNPs and indels with different criteria, we believe we are matching the GATKv3 recommendations in this way (for non-VQSR filtering anyway).
Thanks for your contribution I will give that a try!
doc.ramses is offline   Reply With Quote
Old 05-03-2012, 01:55 AM   #9
phuseman
Junior Member
 
Location: Germany

Join Date: Feb 2012
Posts: 1
Default

I also have the problem that for some variants the ReadPosRankSum values are missing.

An explanation why this occurs only in some lines, can be found here:
http://www.broadinstitute.org/gsa/ga...nkSumTest.html
Quote:
Note that the read position rank sum test can not be calculated for homozygous sites.
The ReadPosRankSum (as well as the MQRankSum) value is only available for heterozygous sites.

If the JEXL expressions support lazy evaluation, something like this could be helpful:
Code:
--filterExpression "isHet == 1 && ReadPosRankSum < -20.0"
If the variant is not heterozygous, the expression is not evaluated further since it can not be true. Unfortunately, the above does not work and I don't know a way to check if the variant is heterozygous.
phuseman is offline   Reply With Quote
Old 07-30-2012, 04:29 AM   #10
szilva
Member
 
Location: Hungary

Join Date: Aug 2009
Posts: 16
Default

You can select calls MQRankSum and ReadPosRankSum by

java GenomeAnalysisTK.jar -l INFO -R ref.fasta -T SelectVariants --variant input.vcf -o with_mq_rp.vcf -select 'vc.hasAttribute("MQRankSum") && vc.hasAttribute("ReadPosRankSum")'

and its complementer by

-select '!vc.hasAttribute("MQRankSum") || !vc.hasAttribute("ReadPosRankSum")' to filter out calls with the needed attributes.

Still suboptimal, but can help.
szilva is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO