Are there any ANNOVAR experts out there?
I've got some VCFs I generated from some bos Taurus exome sequence data, and I'd like to annotate them. Following the directions on their website, I did manage to get the basic functionality to work. My variant_function seems fine, but my exonic_variant_function only has unknowns for everything.
In my log, I have a ton of lines similar to:
WARNING: Cannot identify sequence for NM_001017937 (starting from chr19:52198616)
I'm assuming this is the result of some failed downloads, most suspicious to me are the chromFa.zip and ChromFa.tar.gz, which appear in the documentation on ANNOVAR's site as failed as well. The files don't exist on the website at UCSC, even though they are used in the documentation as examples of how to download the files via wget and a few other tools.
So, does anyone have any suggestions about creating these files myself or convince ANNOVAR to use one of the reference files that hasn't been split up by chromosomes?
My db build script looks like this:
#!/bin/bash
VERSION=6
export PATH=/scratch/lilab/cow/annovar/annovar/:$PATH
DB=bosTauDB$VERSION
DBV=bosTau$VERSION
#annotate_variation.pl -downdb -buildver bosTau$VERSION -webfrom annovar refGene gene knownGene bosTauDB$VERSION
annotate_variation.pl -downdb -buildver $DBV gene $DB/
annotate_variation.pl -downdb -buildver $DBV -webfrom annovar refGene $DB/
annotate_variation.pl -buildver $DBV -downdb seq $DB/${DBV}_seq
retrieve_seq_from_fasta.pl $DB/${DBV}_refGene.txt -seqfile $DB/${DBV}_seq -format refGene -outfile $DB/${DBV}_refGeneMrna.fa
Any tips are greatly welcome, even if they are from folks using other non-human references (I suspect human annotations don't have these sorts of problems)
I've got some VCFs I generated from some bos Taurus exome sequence data, and I'd like to annotate them. Following the directions on their website, I did manage to get the basic functionality to work. My variant_function seems fine, but my exonic_variant_function only has unknowns for everything.
In my log, I have a ton of lines similar to:
WARNING: Cannot identify sequence for NM_001017937 (starting from chr19:52198616)
I'm assuming this is the result of some failed downloads, most suspicious to me are the chromFa.zip and ChromFa.tar.gz, which appear in the documentation on ANNOVAR's site as failed as well. The files don't exist on the website at UCSC, even though they are used in the documentation as examples of how to download the files via wget and a few other tools.
So, does anyone have any suggestions about creating these files myself or convince ANNOVAR to use one of the reference files that hasn't been split up by chromosomes?
My db build script looks like this:
#!/bin/bash
VERSION=6
export PATH=/scratch/lilab/cow/annovar/annovar/:$PATH
DB=bosTauDB$VERSION
DBV=bosTau$VERSION
#annotate_variation.pl -downdb -buildver bosTau$VERSION -webfrom annovar refGene gene knownGene bosTauDB$VERSION
annotate_variation.pl -downdb -buildver $DBV gene $DB/
annotate_variation.pl -downdb -buildver $DBV -webfrom annovar refGene $DB/
annotate_variation.pl -buildver $DBV -downdb seq $DB/${DBV}_seq
retrieve_seq_from_fasta.pl $DB/${DBV}_refGene.txt -seqfile $DB/${DBV}_seq -format refGene -outfile $DB/${DBV}_refGeneMrna.fa
Any tips are greatly welcome, even if they are from folks using other non-human references (I suspect human annotations don't have these sorts of problems)
Comment