Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Annotating variants using Annovar question

    I have vcf files which i generated using samtools. i wish to annotate variant information in these files using annovar. will the following work?

    perl convert2annovar.pl in.vcf -format vcf4

    perl annotate_variation.pl -buildver hg19 in.vcf humandb

    in particular, will the above take care of the fact that inside the humandb folder (of annovar) i am finding hg18 related files?

    Thanks for your help.

  • #2
    If you want to keep all alternative alleles, first do this:

    ./convert2annovar.pl -format vcf4 --allallele variant_file > variant.annovarInput

    Then I recommend doing everything in hg19 with the latest version, which has very fast performance with the "summarize_annovar.pl" script.

    ./summarize_annovar.pl --buildver hg19 --remove --verdbsnp 135 --ver1000g 1000g2012feb --outfile variant.output humandb/

    This will annotate more fully than just the annotate_variation.pl script and is fast enough that it's worth using.

    Comment


    • #3
      Assuming that the vcf data are based on hg18, the positional information of the variants will not match hg19 annotations unless the reference genomes of hg18 and hg19 are identical. You'll need to convert your variant coordinates using liftover tables.

      Comment


      • #4
        Dear All,
        the first command that i gave in my original post as well as the first command given by heisman work fine on my system.

        [These are:

        perl convert2annovar.pl in.vcf -format vcf4

        and

        ./convert2annovar.pl -format vcf4 --allallele variant_file > variant.annovarInput

        ]

        But the second command given by me and also by Heisman are not working on my system.

        For instance, i started off with a file called RVK127N.vcf

        i then created a new file using this command:

        perl convert2annovar.pl in.vcf -format vcf4 >RVK127N_out.vcf

        i used the new file in the second command:

        perl annotate_variation.pl -buildver hg19 RVK127N_out.vcf humandb

        i then get the following message:
        NOTICE: The --geneanno operation is set to ON by default
        Error: The gene annotation database humandb/hg19_refGene.txt does not exist. Please use 'annotate_variation.pl --downdb refGene humandb -build hg19' to download the database.

        i then type the following in the command line:

        annotate_variation.pl --downdb refGene humandb -build hg19

        but on pressing enter i get this message:

        -bash: annotate_variation.pl: command not found

        ---
        One other question: i have the entire ucsc hg19 reference genome in a file called genome.fa (downloaded independently from UCSC). do i use this genome.fa file in any of the annovar commands?

        ----
        Thanks for your help.

        Comment


        • #5
          You do not need to use the genome.fa file anywhere.

          As for:
          Code:
          annotate_variation.pl --downdb refGene humandb -build hg19
          You need to use "perl annotate_variation.pl" or "./annotate_variation.pl"

          Also, you want to use a command line more like this:
          Code:
          ./annotate_variation.pl -downdb -webfrom annovar -buildver hg19 refGene humandb/

          Comment


          • #6
            Much obliged to Heisman.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X