SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
snpEff & snpSIFT variant filtering doc.ramses Bioinformatics 2 10-21-2014 12:29 PM
What is your approach to variant filtering? gavin.oliver Bioinformatics 0 04-24-2012 07:37 AM
recommendations needed for filtering the variant file kjaja Bioinformatics 0 12-09-2011 12:47 PM
vcfutils.pl variant filtering vidhya General 0 06-07-2011 12:54 AM

Reply
 
Thread Tools
Old 04-04-2013, 12:19 PM   #1
kjaja
Member
 
Location: NY

Join Date: Aug 2011
Posts: 55
Default variant filtering

Hi All,

I have cases and controls the were exomed sequenced and used GATK to call the variants in all the cases and controls combined. This generated a single vcf file with all the variants. I would like then to keep the variants that are in cases and not in my controls after removing the low quality variants (with scores < 30). I would like to get some ideas on the best way to handle this? The goal is to remove the variant s in the controls and see what is left.

thanks
kjaja is offline   Reply With Quote
Old 04-04-2013, 01:27 PM   #2
rjohnp
Member
 
Location: England

Join Date: Jan 2013
Posts: 16
Default

Hi,

VCF-tools has some useful tools for this sort of thing, see http://vcftools.sourceforge.net/perl...l#vcf-contrast. Looks like exactly what you're looking for.

Hope that helps.
rjohnp is offline   Reply With Quote
Old 04-06-2013, 06:59 AM   #3
kjaja
Member
 
Location: NY

Join Date: Aug 2011
Posts: 55
Default

I have tried the vcf tool vcf-contrast from the following link http://vcftools.sourceforge.net/perl...l#vcf-contrast
but i am getting the following warning

Argument "." isn't numeric in numeric gt (>) at vcftools_0.1.10/perl/vcf-contrast line 144, <STDIN> line 120414.


Any idea?

thanks
kjaja is offline   Reply With Quote
Old 04-08-2013, 01:15 AM   #4
rjohnp
Member
 
Location: England

Join Date: Jan 2013
Posts: 16
Default

Quote:
Originally Posted by kjaja View Post
I have tried the vcf tool vcf-contrast from the following link http://vcftools.sourceforge.net/perl...l#vcf-contrast
but i am getting the following warning

Argument "." isn't numeric in numeric gt (>) at vcftools_0.1.10/perl/vcf-contrast line 144, <STDIN> line 120414.


Any idea?

thanks
When I get that sort of message it's generally a trailing new-line on your input file. You should still have an output? Do a line count on your vcf file, I would suggest it's likely to be 120414 lines long. If this does pose an issue to getting an output file, remove it in vim.
rjohnp is offline   Reply With Quote
Old 04-08-2013, 03:00 PM   #5
kjaja
Member
 
Location: NY

Join Date: Aug 2011
Posts: 55
Default

Was wondering if this can be done using GATK?
kjaja is offline   Reply With Quote
Old 04-09-2013, 03:43 AM   #6
AJERYC
Member
 
Location: Spain

Join Date: Jan 2012
Posts: 26
Default

try kggseq

java -Xms256m -Xmx1300m --buildver hg19 -jar kggseq.jar --no-resource-check --buildver hg19 --vcf-file yourfile.vcf --ped-file pedigree.ped --o-vcf --genotype-filter 2,4

in file pedigree.ped put there the status of case(2) or control (1)
for genotype filter option 2 (homozygous variables present in cases and controls) and 4 (heterozygous variables present in cases and controls)
AJERYC is offline   Reply With Quote
Old 04-09-2013, 09:55 AM   #7
kjaja
Member
 
Location: NY

Join Date: Aug 2011
Posts: 55
Default

I have tried using vcf-contrast in vcf tools using the following command

vcftools_0.1.10/perl/vcf-contrast +sample1,sample2 -sample3 -n allAllsamples.vcf > insample1or2NOTsample3.vcf

where I am looking for variants that could be in sample 1 OR sample 2 but not in sample 3. But I found that some of the variants in sample 1 are in sample 3. And the same issue with sample 2.
Any suggestions will be greatly appreciated
kjaja is offline   Reply With Quote
Old 04-10-2013, 01:56 AM   #8
evakoe
Member
 
Location: Italia

Join Date: Jul 2012
Posts: 27
Default

I also noticed that vcf-contrast does not return the expected results. One still gets variants that are present in samples given with the minus flag. Does somebody know another program which has the same functionality? I think that GATK SelectVariants cannot be used for this purpose.
Thank you.
evakoe is offline   Reply With Quote
Old 04-11-2013, 11:30 PM   #9
krawitz
Member
 
Location: Bonn

Join Date: Feb 2010
Posts: 30
Default

You could also try the following:
upload your multiple vcf file to GeneTalk (www.gene-talk.de). Use the collection tool to asign all the cases the status "affected" and all the controls the status "unaffected". Then proceed with inheritance filtering option "dominant". This will yield variants that are unique to the cases.
krawitz is offline   Reply With Quote
Old 04-15-2013, 08:06 AM   #10
evakoe
Member
 
Location: Italia

Join Date: Jul 2012
Posts: 27
Default

Hi Krawitz,
thanks for your reply. I also had cosidered this already, I was just hoping for a more direct solution.
evakoe is offline   Reply With Quote
Old 04-15-2013, 08:50 AM   #11
MQ-BCBB
Member
 
Location: Maryland

Join Date: May 2009
Posts: 25
Default

Since no one has mentioned SnpSIFT, I should way that it works very well for this
http://snpeff.sourceforge.net/SnpSift.html#casecontrol
MQ-BCBB is offline   Reply With Quote
Old 04-15-2013, 09:56 AM   #12
aggp11
Member
 
Location: Wisconsin

Join Date: Jun 2011
Posts: 87
Default

Could you print like 5-10 lines from your "combined" vcf file here? Try to include an example of what output you would like to see.
aggp11 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO