SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
problem for running ANNOVAR Jane M Bioinformatics 23 04-17-2014 07:44 AM
Consed ERROR opening Geneious .ace file HereBeDragons Bioinformatics 7 01-26-2013 12:26 PM
VarScan.v2.3.2 output file format problem SD2010Bioinfo Bioinformatics 7 11-20-2012 09:54 AM
problem for opening ftp site!! vinila86 Bioinformatics 3 03-02-2012 07:33 AM
MRNM problem for the .sam output file of tophat Gangcai Bioinformatics 4 08-13-2010 10:19 AM

Reply
 
Thread Tools
Old 11-24-2012, 08:38 AM   #1
genomics_search
Junior Member
 
Location: USA

Join Date: Nov 2012
Posts: 2
Default problem in opening annovar output file in Varsifter

Hi all ,

I ran Annovar and got the output from it. It gives output in CSV format.
Now I need to filter them using Varsifter, which takes the input in VCF format.

But the problem is that even after adding the header values in the file and saving it in vcf format, it does not opens in Varsifter. It says that "Data line column count is less than required, make sure that text is tab delimited".
VCF file: https://www.dropbox.com/s/wtns0ots93...G02023.flt.vcf
Annovar output: https://www.dropbox.com/s/hi32mv99no...me_summary.csv

I am attaching my vcf file and annovar output in csv format.

If anyone can help me in this matter, I will be really thankful.


Thanks.
genomics_search is offline   Reply With Quote
Old 11-29-2012, 01:17 PM   #2
JKTeer
Junior Member
 
Location: US

Join Date: Nov 2012
Posts: 4
Default

Hi,
This error results from having too few columns in the position lines. The VCF format requires that 8 columns be present (when no info is present, a dot "." should be used). These columns need to separated by the tab character (not just a space).
How did you add the ANNOVAR output to the VCF file?
JKTeer is offline   Reply With Quote
Old 12-20-2012, 09:43 PM   #3
jfb
Junior Member
 
Location: SF bay area

Join Date: Nov 2011
Posts: 7
Default

Is there a follow up to this? I also have a VCF file and an ANNOVAR output file and I'm wondering if I have to write my own script to stitch them together or if there is some existing tool for doing this? thanks.
jfb is offline   Reply With Quote
Old 01-15-2013, 05:15 AM   #4
JKTeer
Junior Member
 
Location: US

Join Date: Nov 2012
Posts: 4
Default

I don't know of any existing tools to insert ANNOVAR output back into a vcf file. My own scripts follow the ANNOVAR author's suggestion of using the VCF line as a comment in the ANNOVAR input, which can then be used to get the right output with the right vcf line. Although I would like to release my scripts at some point, they need some work and testing.
The main challenge I have found is the conversion of coordinate systems from vcf to annovar and back to vcf.

Here is my general approach:
1. Expand VCF file so that each line has only one alternate allele.
2. Run convert2annovar.pl with --includeinfo --allallele. I retain the original chromosome, position, ref_allele, and alt_allele to use as a key (to get the annovar output with the correct VCF line.)
3. Run annotate_variation.pl with --separate flag, to get all possible transcripts.
4. Add ANNOVAR output back into VCF file using chr:pos:ref:alt key. I reformat the ANNOVAR output to be a bit more structured. I also keep track of alternate alleles, so the final file has multiple alternate alleles per line, with the correctly ordered ANNOVAR annotations.
5. I use a custom JSON file to parse the ANNOVAR outputs, so that alternate alleles are recognized and split by VarSifter, and gene names and variant type (stop, nonsyn, etc.) are pulled out from each transcript. (I can provide an example, but it completely depends on how the ANNOVAR info is formatted in the VCF file.)

snpEFF might be an easier path to take, as it reads and writes VCF files. Be aware that a new version of snpEFF has been released that changed the output, so you'll have to modify the "snpEFF.vs.json" file as follows (until I release a new VarSifter version):

Line 38:
< "Gene_name": 5
---
> "Gene_name": 6
JKTeer is offline   Reply With Quote
Reply

Tags
annovar, vcf format, vcf parsing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO