SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Annovar annotation with 1000g Jamica Bioinformatics 2 09-16-2016 05:04 AM
annotation with annovar tahamasoodi Bioinformatics 6 03-07-2014 10:45 PM
ANNOVAR annotation bloodyhole Bioinformatics 1 05-02-2013 11:59 PM
annovar annotation tahamasoodi Bioinformatics 0 01-13-2013 02:35 AM
Annovar -- annotation without filter? Robby Bioinformatics 3 08-09-2011 01:43 AM

Reply
 
Thread Tools
Old 03-06-2014, 05:56 AM   #1
mmwmm
Junior Member
 
Location: Germany

Join Date: Mar 2014
Posts: 4
Default Annovar annotation with refSeqSummary and refLink

Dear all,

I am currently struggling with annotation options in Annovar.
I have paired tumor normal exome sequencing data for which VarScan2 was used to call somatic SNPs. Varscan generated VCFs were succesfully converted into Annovar input files and annotated using the standard annotation command from the tutorial:

Quote:
perl table_annovar.pl myInputFile humandb/ -buildver hg19 -out myanno -remove -protocol refGene,phastConsElements46way,genomicSuperDups,esp6500si_all,1000g2012apr_all,snp137,ljb2_all -operation g,r,r,f,f,f,f -nastring NA -csvout

This works perfectly and already produces very useful data!

In order to better characterize the gene candidates, I would also like to annotate the list with the full gene names from "refLink" (e.g. PRRG1: transmembrane gamma-carboxyglutamic acid protein 1 isoform 1 precursor) and the refSeq Summaries (full description of the gene function). This would make life much easier to prioritize candidate genes instead of going back and forth between excel and webbrowser...

On the Annovar website it is stated that
Quote:
Most of the databases that ANNOVAR uses can be directly retrieved from UCSC Genome Browser Annotation Database. In general, users can use "-downdb" in ANNOVAR to download these files. As of Feb2012, there are 6418 databases for hg19, 6443 databases for hg18, 1841 databases for mm9, etc.
Since refLink database is already downloaded along with refGene db I downloaded the refSeqSummary db into my humandb folder (without errors):

Quote:
perl annotate_variation.pl -buildver hg19 -downdb -webfrom ucsc refSeqSummary humandb/
However, when I run the following command to annotate my input file with refSeqSummary entries....

Quote:
perl table_annovar.pl myInputFile humandb/ -buildver hg19 -out annovar -remove -protocol refGene,refSeqSummary,cosmic67,phastConsElements46way,genomicSuperDups,esp6500si_all,1000g2012apr_all,snp137,ljb2_all -operation g,g,f,r,r,f,f,f,f -nastring NA -otherinfo
.... I encounter this error:

Quote:
NOTICE: Reading gene annotation from humandb/hg19_refSeqSummary.txt ... Error: invalid record in humandb/hg19_refSeqSummary.txt (>=11 fields expected in refSeqSummary gene definition file): <NR_036941 FullLength >
the same results for trying annotation with refLink:

Quote:
Reading gene annotation from humandb/hg19_refLink.txt ... Error: invalid record in humandb/hg19_refLink.txt (>=11 fields expected in refLink gene definition file): < NR_036941 0 0 0 0>
After trying to fill up the residual columns with dummy values in order to have 11 fields in the file I got this:

Quote:
NOTICE: Reading gene annotation from humandb/hg19_refLink11.txt ... Error: invalid dbstrand information found in humandb/hg19_refLink11.txt (dbstrand has to be + or -): < NR_036941 0 0 0 0 NA NA NA>
Obviously, Annovar needs some kind of chromosomal positions to perform such annotations in "--geneanno" mode?
In general, even if UCSC databases were directly downloaded through Annovar's "-downdb" parameter, the databases have to be adjusted in order to be usable by Annovar?

So my questions are:
1.) Is there a general structure for database files in order to be suitable for gene-based annotation and is it correct to use --geneanno protocol?
2.) How to modify ucsc datatables like refLink and refSeqSummary for Annovar, so that they can be used to annotate vcf files?
3.) Optionally GeneRIFs would also be interesting to annotate. Is there a way to include NCBI GeneRIFs (obtainable via ftp://ftp.ncbi.nih.gov/gene/GeneRIF/) in vcf annotations?

Any help would be very much appreciated!!
Max
mmwmm is offline   Reply With Quote
Old 03-10-2014, 02:35 AM   #2
mmwmm
Junior Member
 
Location: Germany

Join Date: Mar 2014
Posts: 4
Default

Unfortunately, I'm still stuck with that problem ... anyone's input on that would be highly appreciated! Thanks!
mmwmm is offline   Reply With Quote
Old 03-24-2014, 11:45 PM   #3
bb420
Member
 
Location: MA

Join Date: Apr 2012
Posts: 14
Default

1.) Is there a general structure for database files in order to be suitable for gene-based annotation and is it correct to use --geneanno protocol?

--> just use refGene, knownGene, ensGene, and perhaps gencodegene/ccdsgene, do not use anything else.

2.) How to modify ucsc datatables like refLink and refSeqSummary for Annovar, so that they can be used to annotate vcf files?

--> I would suggest you do not modify anything, these are important files.

3.) Optionally GeneRIFs would also be interesting to annotate. Is there a way to include NCBI GeneRIFs (obtainable via ftp://ftp.ncbi.nih.gov/gene/GeneRIF/) in vcf annotations?

--> GeneRIF annotates genes, not variants. You will have to write your own script to annotate a gene with GeneRIF.

Hope this helps!
bb420 is offline   Reply With Quote
Reply

Tags
annotation, annovar, exome sequencing, gene summary

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO