Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem in opening annovar output file in Varsifter

    Hi all ,

    I ran Annovar and got the output from it. It gives output in CSV format.
    Now I need to filter them using Varsifter, which takes the input in VCF format.

    But the problem is that even after adding the header values in the file and saving it in vcf format, it does not opens in Varsifter. It says that "Data line column count is less than required, make sure that text is tab delimited".
    VCF file: https://www.dropbox.com/s/wtns0ots93...G02023.flt.vcf
    Annovar output: https://www.dropbox.com/s/hi32mv99no...me_summary.csv

    I am attaching my vcf file and annovar output in csv format.

    If anyone can help me in this matter, I will be really thankful.


    Thanks.

  • #2
    Hi,
    This error results from having too few columns in the position lines. The VCF format requires that 8 columns be present (when no info is present, a dot "." should be used). These columns need to separated by the tab character (not just a space).
    How did you add the ANNOVAR output to the VCF file?

    Comment


    • #3
      Is there a follow up to this? I also have a VCF file and an ANNOVAR output file and I'm wondering if I have to write my own script to stitch them together or if there is some existing tool for doing this? thanks.

      Comment


      • #4
        I don't know of any existing tools to insert ANNOVAR output back into a vcf file. My own scripts follow the ANNOVAR author's suggestion of using the VCF line as a comment in the ANNOVAR input, which can then be used to get the right output with the right vcf line. Although I would like to release my scripts at some point, they need some work and testing.
        The main challenge I have found is the conversion of coordinate systems from vcf to annovar and back to vcf.

        Here is my general approach:
        1. Expand VCF file so that each line has only one alternate allele.
        2. Run convert2annovar.pl with --includeinfo --allallele. I retain the original chromosome, position, ref_allele, and alt_allele to use as a key (to get the annovar output with the correct VCF line.)
        3. Run annotate_variation.pl with --separate flag, to get all possible transcripts.
        4. Add ANNOVAR output back into VCF file using chr:pos:ref:alt key. I reformat the ANNOVAR output to be a bit more structured. I also keep track of alternate alleles, so the final file has multiple alternate alleles per line, with the correctly ordered ANNOVAR annotations.
        5. I use a custom JSON file to parse the ANNOVAR outputs, so that alternate alleles are recognized and split by VarSifter, and gene names and variant type (stop, nonsyn, etc.) are pulled out from each transcript. (I can provide an example, but it completely depends on how the ANNOVAR info is formatted in the VCF file.)

        snpEFF might be an easier path to take, as it reads and writes VCF files. Be aware that a new version of snpEFF has been released that changed the output, so you'll have to modify the "snpEFF.vs.json" file as follows (until I release a new VarSifter version):

        Line 38:
        < "Gene_name": 5
        ---
        > "Gene_name": 6

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X