SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
variant annotation Balat Bioinformatics 3 03-05-2013 01:59 AM
variant annotation for yeast (saccer) christophpale Bioinformatics 7 01-31-2012 01:59 PM
dbSNP updates and the effect on variant annotation results warrenemmett Bioinformatics 0 10-18-2011 12:31 PM
New version of GAMES (V1.3) for annotation and interpretation of variant from NGS m_elena_bioinfo Bioinformatics 0 05-25-2011 08:12 AM
Annotation tools for NGS data groody Vendor Forum 3 01-26-2011 10:24 PM

Reply
 
Thread Tools
Old 06-16-2010, 07:59 PM   #1
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default Variant Annotation Tools

Hi all,
I'm looking for suggestions of variant annotation tools for large data sets.
For example, I've called variants using Samtools pileup and now I want to go from a huge list of variants to a list of annotations and a simple method for filtering them.
Any thoughts on things I might try?
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 06-16-2010, 10:17 PM   #2
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

We've done similar things at my lab. We haven't dealt with very large datasets, just a couple of illumina and 454 together. If you want to take a look at our documentation you can. We also did a small tutorial session on the topic.
I hope that could serve you as inspiration.
Jose Blanca is offline   Reply With Quote
Old 06-16-2010, 11:01 PM   #3
Rao
Member
 
Location: India

Join Date: Oct 2008
Posts: 36
Default

You can try VarScan...takes pileup as input
Rao is offline   Reply With Quote
Old 06-18-2010, 01:09 PM   #4
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Thanks for the feedback.

I'm going to try VarScan because I've already done the variant calling and have the pileup files.

Does anyone have suggestions on annotation and filtering programs downstream of VarScan for annotation?

For example, going from the list of variants to coding consequences (marking whether and how variants affect coding sequences), and parsing by type of variant (indels vs SNVs) or coverage/quality?

I'm actually also having trouble getting VarScan to work, actually:

I used samtools 0.1.7-5 (r528) to generate pileup using the -c -a -f hg18.fa -r 0.0000007 options.
When I tried running one of the "pileup2" commands in VarScan, this is happening:
Quote:
java -jar /home/mclark/varScan/VarScan.v2.2.jar pileup2indel chr21.pileup
Min coverage: 8
Min reads2: 2
Min var freq: 0.01
Min avg qual: 15
P-value thresh: 0.99
Reading input from chr21.pileup
Chrom Position Ref Var Reads1 Reads2 VarFreq Strands1 Strands2 Qual1 Qual2 Pvalue
Parsing Exception on line:
chr21 9719766 N A 68 0 59 3 ^Z.^~,^~, `2/
For input string: "A"
Any ideas what's going on and how I can get around it?

I'm also wondering what the possible Options are when running each command in VarScan. I don't see a list on the site (and if it's in the code, I'm afraid I may not be savvy enough to figure that out myself so assistance is appreciated). For example, can I play with "min avg qual" and such? Thanks.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]

Last edited by Michael.James.Clark; 06-18-2010 at 01:45 PM.
Michael.James.Clark is offline   Reply With Quote
Old 06-21-2010, 02:34 AM   #5
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

The VarScan manual site says that it cannot process pileup created with the -c option:

"Do NOT use the -c parameter. It generates consensus format, which is different from pileup format. The next release of VarScan will recognize both formats. Note, to save disk space and file I/O, you can redirect pileup output directly to VarScan with a "pipe" command. For example:

samtools pileup -f reference.fasta myData.bam | java -jar VarScan.v2.1.jar pileup2snp"

c stands for consensus and it looks just as the parsing exception was caused by that consensus "A". So you should run pileup without -c to use it for VarScan. Or wait for the promised next release/someone to do a clever hack to the code ...
epigen is offline   Reply With Quote
Old 06-21-2010, 06:58 AM   #6
Rao
Member
 
Location: India

Join Date: Oct 2008
Posts: 36
Default

try samtools pileup -vcf
gives only varients
Rao is offline   Reply With Quote
Old 06-21-2010, 09:22 AM   #7
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Great, thanks guys. I think last week I was only seeing the "Documentation" not the "Manual" from the site. The Manual describes just what I wanted to know.

Rao, the -c option's consensus output appears to be the issue. Can still potentially use -v to only output variants, though.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]

Last edited by Michael.James.Clark; 06-21-2010 at 09:36 AM.
Michael.James.Clark is offline   Reply With Quote
Old 06-21-2010, 10:37 PM   #8
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Alright, that worked and I got output. It looks believable to me, but I've encountered some issues.

For one thing, I can't get the "filter" command to report anything. No matter what settings I use, it reports 0 variants passing filter, and for the other, when I delve into the variant file, I can find variants that should pass filter. Has anyone gotten it to work?

I also tried the somatic command, and it looks like it worked, but I've got some curiosities in it as well. Example output:

Quote:
Min coverage: 8x for Normal, 6x for Tumor
Min reads2: 2
Min strands2: 1
Min var freq: 0.2
Min freq for hom: 0.75
Min avg qual: 15
P-value thresh: 0.99
Somatic p-value: 0.05
127671560 shared positions
122884470 had sufficient coverage for comparison
121991210 were called Reference
12445 were mixed SNP-indel calls and filtered
176060 were called Germline
8887 were called LOH
685647 were called Somatic
10221 were called Unknown
0 were called Variant
I'm thrown by the "0 were called Variant". Anyone know what that means?
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 08-24-2010, 07:41 PM   #9
wuhoucdc
Member
 
Location: Nashville

Join Date: Oct 2009
Posts: 14
Default

Quote:
Originally Posted by Michael.James.Clark View Post
Hi all,
I'm looking for suggestions of variant annotation tools for large data sets.
For example, I've called variants using Samtools pileup and now I want to go from a huge list of variants to a list of annotations and a simple method for filtering them.
Any thoughts on things I might try?
Hi,

You could try SVA in DUKE (http://people.genome.duke.edu/~dg48/sva/index.php).

I think this big guy can satisfy your request if you have a big computer.

Wu
wuhoucdc is offline   Reply With Quote
Old 08-24-2010, 08:33 PM   #10
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by wuhoucdc View Post
Hi,

You could try SVA in DUKE (http://people.genome.duke.edu/~dg48/sva/index.php).

I think this big guy can satisfy your request if you have a big computer.

Wu
My only concern is that I have heard it hard-codes dbsnp 127 or something (can anyone confirm, N=1). Even still it is a great piece of software!
nilshomer is offline   Reply With Quote
Old 07-12-2012, 10:26 PM   #11
krawitz
Member
 
Location: Bonn

Join Date: Feb 2010
Posts: 30
Default

You might want to try www.gene-talk.de
krawitz is offline   Reply With Quote
Old 12-28-2012, 10:35 AM   #12
jfb
Junior Member
 
Location: SF bay area

Join Date: Nov 2011
Posts: 7
Default

Is there an update to this post recommending tools for variant annotation and analysis? I'm trying to use R's VariantAnnotation package but the learning curve is frustrating me and I'm not sure it's worth the effort...
jfb is offline   Reply With Quote
Old 12-31-2012, 01:12 PM   #13
brofallon
Member
 
Location: United States

Join Date: May 2011
Posts: 26
Default

I believe the two most commonly used tools are annovar and SNPEff. Annovar handles many types of annotations and is built for filtering. SNPEff produces some nice html files for your web-viewing enjoyment in addition to text files.
brofallon is offline   Reply With Quote
Old 01-02-2013, 03:46 AM   #14
krawitz
Member
 
Location: Bonn

Join Date: Feb 2010
Posts: 30
Default

I agree annovar and SNPeff seem to be most widely used for variant annotation. For variant analysis there are e.g. ingenuity (commercial), annotate-it and www.gene-talk.de. We are using GeneTalk at the institute for medical genetics at Berlin Charité and are collaborating with the R&D. The platform seems to be rather commonly used now. We have currently about one hundred single exomes analyzed per day by about 500 unique users. The annotation is based on annovar. The filtering and interpretation tools are codeveloped by us but it is generally a project open to any kind of collaboration. We just added a new filter for compound heterozygous filtering so if this is something you are interested in, just try it out,...
krawitz is offline   Reply With Quote
Old 01-02-2013, 07:49 PM   #15
trackavinash
Member
 
Location: St Louis, USA

Join Date: Nov 2011
Posts: 14
Default

I agree with Annotation and snpEff being widely used. I had a chance to use SeattleSeq annotation recently when I had to calculate some Grantham scores - http://snp.gs.washington.edu/SeattleSeqAnnotation137/ You could check it out if you'd prefer a web interface to submit jobs to. Galaxy does a bit of annotation as well ( I've used Galaxy for obtaining PhyloP scores).
trackavinash is offline   Reply With Quote
Old 01-03-2013, 07:42 AM   #16
jfb
Junior Member
 
Location: SF bay area

Join Date: Nov 2011
Posts: 7
Default

thanks for the suggestions.
jfb is offline   Reply With Quote
Reply

Tags
annotation, variants

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO