Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Annovar annotation with refSeqSummary and refLink

    Dear all,

    I am currently struggling with annotation options in Annovar.
    I have paired tumor normal exome sequencing data for which VarScan2 was used to call somatic SNPs. Varscan generated VCFs were succesfully converted into Annovar input files and annotated using the standard annotation command from the tutorial:

    perl table_annovar.pl myInputFile humandb/ -buildver hg19 -out myanno -remove -protocol refGene,phastConsElements46way,genomicSuperDups,esp6500si_all,1000g2012apr_all,snp137,ljb2_all -operation g,r,r,f,f,f,f -nastring NA -csvout

    This works perfectly and already produces very useful data!

    In order to better characterize the gene candidates, I would also like to annotate the list with the full gene names from "refLink" (e.g. PRRG1: transmembrane gamma-carboxyglutamic acid protein 1 isoform 1 precursor) and the refSeq Summaries (full description of the gene function). This would make life much easier to prioritize candidate genes instead of going back and forth between excel and webbrowser...

    On the Annovar website it is stated that
    Most of the databases that ANNOVAR uses can be directly retrieved from UCSC Genome Browser Annotation Database. In general, users can use "-downdb" in ANNOVAR to download these files. As of Feb2012, there are 6418 databases for hg19, 6443 databases for hg18, 1841 databases for mm9, etc.
    Since refLink database is already downloaded along with refGene db I downloaded the refSeqSummary db into my humandb folder (without errors):

    perl annotate_variation.pl -buildver hg19 -downdb -webfrom ucsc refSeqSummary humandb/
    However, when I run the following command to annotate my input file with refSeqSummary entries....

    perl table_annovar.pl myInputFile humandb/ -buildver hg19 -out annovar -remove -protocol refGene,refSeqSummary,cosmic67,phastConsElements46way,genomicSuperDups,esp6500si_all,1000g2012apr_all,snp137,ljb2_all -operation g,g,f,r,r,f,f,f,f -nastring NA -otherinfo
    .... I encounter this error:

    NOTICE: Reading gene annotation from humandb/hg19_refSeqSummary.txt ... Error: invalid record in humandb/hg19_refSeqSummary.txt (>=11 fields expected in refSeqSummary gene definition file): <NR_036941 FullLength >
    the same results for trying annotation with refLink:

    Reading gene annotation from humandb/hg19_refLink.txt ... Error: invalid record in humandb/hg19_refLink.txt (>=11 fields expected in refLink gene definition file): < NR_036941 0 0 0 0>
    After trying to fill up the residual columns with dummy values in order to have 11 fields in the file I got this:

    NOTICE: Reading gene annotation from humandb/hg19_refLink11.txt ... Error: invalid dbstrand information found in humandb/hg19_refLink11.txt (dbstrand has to be + or -): < NR_036941 0 0 0 0 NA NA NA>
    Obviously, Annovar needs some kind of chromosomal positions to perform such annotations in "--geneanno" mode?
    In general, even if UCSC databases were directly downloaded through Annovar's "-downdb" parameter, the databases have to be adjusted in order to be usable by Annovar?

    So my questions are:
    1.) Is there a general structure for database files in order to be suitable for gene-based annotation and is it correct to use --geneanno protocol?
    2.) How to modify ucsc datatables like refLink and refSeqSummary for Annovar, so that they can be used to annotate vcf files?
    3.) Optionally GeneRIFs would also be interesting to annotate. Is there a way to include NCBI GeneRIFs (obtainable via ftp://ftp.ncbi.nih.gov/gene/GeneRIF/) in vcf annotations?

    Any help would be very much appreciated!!
    Max

  • #2
    Unfortunately, I'm still stuck with that problem ... anyone's input on that would be highly appreciated! Thanks!

    Comment


    • #3
      1.) Is there a general structure for database files in order to be suitable for gene-based annotation and is it correct to use --geneanno protocol?

      --> just use refGene, knownGene, ensGene, and perhaps gencodegene/ccdsgene, do not use anything else.

      2.) How to modify ucsc datatables like refLink and refSeqSummary for Annovar, so that they can be used to annotate vcf files?

      --> I would suggest you do not modify anything, these are important files.

      3.) Optionally GeneRIFs would also be interesting to annotate. Is there a way to include NCBI GeneRIFs (obtainable via ftp://ftp.ncbi.nih.gov/gene/GeneRIF/) in vcf annotations?

      --> GeneRIF annotates genes, not variants. You will have to write your own script to annotate a gene with GeneRIF.

      Hope this helps!

      Comment


      • #4
        NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
        NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/golde...Summary.txt.gz ... Failed
        WARNING: Some files cannot be downloaded, including http://hgdownload.cse.ucsc.edu/golde...Summary.txt.gz

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X