SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
question about GATK tranches plot caswater Bioinformatics 1 04-18-2012 09:05 PM
Does VQSR help or obscure new variation? Chfranck Bioinformatics 2 03-23-2012 02:27 PM
problem about GATK indel VQSR wanguan2000 Bioinformatics 2 11-07-2011 06:15 AM
cpb ratio chrisaw01 454 Pyrosequencing 0 04-01-2011 04:24 PM
cisgenome: invalid negative binomial trend litd Bioinformatics 2 08-10-2010 02:43 PM

Reply
 
Thread Tools
Old 05-19-2012, 04:59 AM   #1
aan
Member
 
Location: India

Join Date: Mar 2012
Posts: 14
Default Trend of Ti/Tv ratio in different tranches in VQSR tranche file?

Hi all

I am running VQSR for exome data and as a result of its first step i.e recalibration and calculation of VQSLOD, it also gives a tranche file which contains Ti/Tv ratios as one of the parameters along with others.
As per my knowledge the Ti/Tv ratio should be around 2.3 for exome data and it should decrease as the tranche size increases (from 90 to 99.-100). But in my case the trend of this ratio is reverse for novel variants (its fine for known variants) and is as follows:


targetTruthSensitivity numKnown numNovel knownTiTv novelTiTv minVQSLod filterName accessibleTruthSites callsAtTruthSites truthSensitivity
90 82669 3570 2.3897 0.9188 5.703 TruthSensitivityTranche0.00to90.00 39003 35102 0.9
99 95796 4084 2.3625 1.0044 2.3102 TruthSensitivityTranche90.00to99.00 39003 38612 0.99
99.9 106287 6236 2.3037 1.0132 -4.2613 TruthSensitivityTranche99.00to99.90 39003 38963 0.999
100 111724 9022 2.2558 1.1301 -37928.8734 TruthSensitivityTranche99.90to100.00 39003 39003 1


Is this a worry or it is supposed to be so. If yes then why?

Thanks in advance.
aan is offline   Reply With Quote
Old 05-19-2012, 07:20 AM   #2
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Why do you think it should be around 2.3 for exomic variants? http://www.broadinstitute.org/gsa/wi...php/QC_Methods indicates it should be closer to 3.3 and novel variants should be closer to 2.8-3.0.
Heisman is offline   Reply With Quote
Old 05-20-2012, 10:29 PM   #3
aan
Member
 
Location: India

Join Date: Mar 2012
Posts: 14
Default

Thanks for the updated information, I acquired this information from one of the posts on Biostar.

Then looking at my data (statistics of tranche file that I posted) I seems that there are a lot of false positives!! Is that right?

If yes, then is there any way out to further refine the data?

Thanks.
aan is offline   Reply With Quote
Old 05-20-2012, 10:40 PM   #4
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

It does imply you have a lot of false positives, if you are sure you are looking only at coding variants. Can you take a subset of SNPs with high coverage and high quality scores and see what the Ti/Tv ratio is there?
Heisman is offline   Reply With Quote
Old 05-20-2012, 11:12 PM   #5
aan
Member
 
Location: India

Join Date: Mar 2012
Posts: 14
Default

What if I only include SNPs with PASS status for further analysis and ignore rest that fall in even highest tranches?
aan is offline   Reply With Quote
Old 05-20-2012, 11:20 PM   #6
aan
Member
 
Location: India

Join Date: Mar 2012
Posts: 14
Default

Another query that I have is, why is it that Ti/Tv ratio decreases in case of known variants as we move from 90 to 100 tranche (as expected) whereas the trend is opposite in case of novel variants?

I understand that low value for this ratio indicates high FPs but what does increase in Ti/Tv ratio indicates? same thing or sth else?
aan is offline   Reply With Quote
Old 05-20-2012, 11:26 PM   #7
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

I've only ever used SAMtools to call SNPs so I'm not sure what the various values and tranches refer to, so I can't be too helpful there. I'm not sure how to interpret this. The reason I suggested taking your best quality SNPs and checking the ratio there is because that should give an indication of what values you should see for the entire data set.
Heisman is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO