Seqanswers Leaderboard Ad

**Jane M** · 11-28-2011, 03:06 AM

I found some mistakes in my file: first, the positions to give in the columns 2 and 3 are the position(s) of the mutations (I gave the positions of the impacted exon). Moreover, I provided the wild and mut amino acids instead of the wild and mut nucleotides.

I fixed these problems:

1 6589165 6589165 G T

but I still have the same error message ...

**Jane M** · 11-28-2011, 03:49 AM

I finally found my (big) mistake ! I forgot to provide the "humandb"directory !

perl ./annotate_variation.pl -geneanno VICJE_nonsense211011/VICJE_nonsense humandb
NOTICE: The --buildver is set as 'hg18' by default
NOTICE: Reading gene annotation from humandb/hg18_refGene.txt ... Done with 38787 transcripts (including 6152 without coding sequence annotation) for 23270 unique genes
NOTICE: Finished gene-based annotation on 5 genetic variants in VICJE_nonsense211011/VICJE_nonsense (including 4 with invalid format written to VICJE_nonsense211011/VICJE_nonsense.invalid_input)
NOTICE: Output files were written to VICJE_nonsense211011/VICJE_nonsense.variant_function, VICJE_nonsense211011/VICJE_nonsense.exonic_variant_function

I still have one problem: by default, hg18 is used. My alignment have been done on hg19. Do you know how to change it?

**boetsie** · 11-28-2011, 04:22 AM

I just wanted to say that

To change the database variant to hg19, type
-buildver hg19
in your command. You have to download the hg19 databases for the refGene though, if you do not have this already.

Boetsie

Originally posted by Jane M View Post

I finally found my (big) mistake ! I forgot to provide the "humandb"directory !

I still have one problem: by default, hg18 is used. My alignment have been done on hg19. Do you know how to change it?

**Jane M** · 11-28-2011, 07:53 AM

Thanks ! It's working!
Now I am using my complete list and the difficulty is to create the input file automatically. But it's no more a software problem !
I will probably have more questions when getting the complete results !

**boetsie** · 11-28-2011, 07:56 AM

Good that it is working! I wrote myself a script to convert CLCBio's SNP's to Annovar's format. That's maybe the best thing to do.

Good luck!

Originally posted by Jane M View Post

Thanks ! It's working!
Now I am using my complete list and the difficulty is to create the input file automatically. But it's no more a software problem !
I will probably have more questions when getting the complete results !

**Jane M** · 11-29-2011, 03:12 AM

Hello,

I managed to run ANNOVAR on all my data set

, but I have several questions concerning the results:

1) When running:

Code:

perl ./annotate_variation.pl -geneanno $dir/$file humandb -buildver hg19

to get an annotation of the variants by functional consequences on genes, I got:

NOTICE: Reading gene annotation from humandb/hg19_refGene.txt ... Done with 38781 transcripts (including 6123 without coding sequence annotation) for 23261 unique genes
NOTICE: Reading FASTA sequences from humandb/hg19_refGeneMrna.fa ... Done with 280 sequences
WARNING: A total of 319 sequences will be ignored due to lack of correct ORF annotation

---------------------------------------------------------------------------------------
WARNING: 83 exonic SNPs have WRONG reference alleles specified in your input file!
WARNING: An example input line is <14 60950509 60950509 A G>
WARNING: ANNOVAR can still annotate exonic_variant_function for the mutation correctly!
WARNING: you may have used wrong -buildver, or specified incorrect reference allele, or used outdated mRNA FASTA file!
---------------------------------------------------------------------------------------

NOTICE: Finished gene-based annotation on 179 genetic variants in VICJE_missense211011/VICJE_missense211011_file

Is it possible that the 83 (among 179) results concerning these SNPs in the file are wrong?

2) I am interested in the mutations occurring in the well conserved regions. So I am using:

Code:

perl annotate_variation.pl -regionanno -dbtype mce46way $dir/$file humandb -build hg19

And I got file.hg19_phastConsElements46way:

Code:

mce46way	Score=243;Name=lod=13	1	6589165	6589165	G	T
mce46way	Score=504;Name=lod=149	1	36915872	36915872	C	A

I looked through the annovar website and I didn't find the description of the output file... Did I miss it? I don't know how to interpret the score and the name fields...
Can I say that each mutation presents in this file take places in a region conserved in at least 46 species?

3) I tried to identify a list of "benign" non-synonymous variants. Again, I haven't found any description of the result files, especially in the file.hg18_avsift_dropped file, which contains the "benign" non-synonymous variants.

In my case for example, I get:

Code:

avsift	0.32	1	152191890	152191890	A	G
avsift	0.82	1	158990190	158990190	T	G

Does (AV)SIFT stands for Scale-invariant feature transform?

I hope someone can help me on any of these points! Thanks!

**boetsie** · 11-29-2011, 03:21 AM

Originally posted by Jane M View Post

1) When running:

Code:

perl ./annotate_variation.pl -geneanno $dir/$file humandb -buildver hg19

to get an annotation of the variants by functional consequences on genes, I got:

Is it possible that the 83 (among 179) results concerning these SNPs in the file are wrong?

See the FAQ of Annovar:

404 Not Found

http://www.openbioinformatics.org/annovar/annovar_faq.html#wrongref

2) I am interested in the mutations occurring in the well conserved regions. So I am using:

Code:

perl annotate_variation.pl -regionanno -dbtype mce46way $dir/$file humandb -build hg19

And I got file.hg19_phastConsElements46way:

Code:

mce46way	Score=243;Name=lod=13	1	6589165	6589165	G	T
mce46way	Score=504;Name=lod=149	1	36915872	36915872	C	A

I looked through the annovar website and I didn't find the description of the output file... Did I miss it? I don't know how to interpret the score and the name fields...

See this page:
http://www.openbioinformatics.org/an...ar_region.html at the section 'Most conserved element annotation'

3) I tried to identify a list of "benign" non-synonymous variants. Again, I haven't found any description of the result files, especially in the file.hg18_avsift_dropped file, which contains the "benign" non-synonymous variants.

"SIFT is a sequence homology-based tool that sorts intolerant from tolerant amino acid substitutions and predicts whether an amino acid substitution in a protein will have a phenotypic effect. The score ranges from 0 to 1. The amino acid substitution is predicted damaging is the score is <= 0.05, and tolerated if the score is > 0.05." See the help at their website:http://sift.jcvi.org/www/SIFT_help.html

**Jane M** · 11-29-2011, 04:07 AM

Thanks for the links !
I am focusing on the point 2, the well preserved regions.
The description of the output file is

The second column is the normalized score assigned by UCSC Genome Browser, and this score range from 0 to 1000. (Note that the --score_threshold or --normscore_threshold can also be used to filter out specific variants with low conservation scores.) The second column also shows "Name=lod=x" which is used to tell the user the LOD score for the region.

In a way, the score represents the "probability" for a region to be really a preserved region... The range in my data is from 200 to 700. Is there a "common threshold" above which we can say that a region is a preserved region?

Then, the lod score is an other way to measure the "probability" for a region to be really a preserved region... Is it a tool more precise than the simple score?
I read that a "relation" is significant if the lod score is >=3. In my case, I have 115 mutations presents in the output file.
Can I conclude that 115 mutations occur in a significant way in well preserved regions?

I try to see the statistical conclusion, because there is probably a statistical test behind, to test each mutation. The null hypothesis can be "the mutation x is occurring in a conserved region" and the alternative hypothesis "the mutation x is not occurring in a conserved region"...

**Jane M** · 12-01-2011, 07:50 AM

Hello again !

I haven't managed to fix my first problem

:

perl ./annotate_variation.pl -geneanno $dir/$file humandb -build hg19
NOTICE: Reading gene annotation from humandb/hg19_refGene.txt ... Done with 38781 transcripts (including 6123 without coding sequence annotation) for 23261 unique genes
NOTICE: Reading FASTA sequences from humandb/hg19_refGeneMrna.fa ... Done with 280 sequences
WARNING: A total of 319 sequences will be ignored due to lack of correct ORF annotation
---------------------------------------------------------------------------------------
WARNING: 83 exonic SNPs have WRONG reference alleles specified in your input file!
WARNING: An example input line is <14 60950509 60950509 A G>
WARNING: ANNOVAR can still annotate exonic_variant_function for the mutation correctly!
WARNING: you may have used wrong -buildver, or specified incorrect reference allele, or used outdated mRNA FASTA file!

I checked the link pointed out by boetie and the solutions are:

Why ANNOVAR complains "exonic SNPs have WRONG reference alleles " in gene-based annotation?

This happens when ANNOVAR thinks the "reference allele" in your input does not fit the "reference allele" in the mRNA FASTA file in ANNOVAR's database. This could be due to several reason, (1) wrong -buildver, or (2) you did not specify the correct reference allele, or (3) mRNA FASTA file is outdated as the gene model gets updated pretty quickly by UCSC.

To solve this problem, first check (1) and (2) to make sure that you did have the correct input. If you cannot find an error, then update the FASTA file by retrieve_seq_from_fasta.pl command, with more details here.

(1) wrong buildver: the sequencing have not been done in the lab but we were told that hg19 was used. That's why I'm using it.
I tried to use hg18 and I got:

perl ./annotate_variation.pl -geneanno $dir/$file humandb -build hg18
NOTICE: Reading gene annotation from humandb/hg18_refGene.txt ... Done with 38787 transcripts (including 6152 without coding sequence annotation) for 23270 unique genes
NOTICE: Reading FASTA sequences from humandb/hg18_refGeneMrna.fa ... Done with 6 sequences
WARNING: A total of 373 sequences will be ignored due to lack of correct ORF annotation
---------------------------------------------------------------------------------------
WARNING: 2 exonic SNPs have WRONG reference alleles specified in your input file!
WARNING: An example input line is <5 140203159 140203159 C T>
WARNING: ANNOVAR can still annotate exonic_variant_function for the mutation correctly!
WARNING: you may have used wrong -buildver, or specified incorrect reference allele, or used outdated mRNA FASTA file!

There are only 2 SNPs with wrong alleles. Is it by chance that I have less WRONG reference alleles with hg18 than hg19, whereas hg19 is supposed to be the one used?

(2) correct input. Well, I cannot be sure about that, because we got the reference codons and the mutated codons. I've just selected the reference nucleotides and the mutated nucleotides.

(3) mRNA FASTA file is outdated: I installed ANNOVAR (the version of 11/22) few days ago so I doubt that this file is outdated...
I would like to try anyway, just in case. In the website, an example is given:

annotate_variation.pl --buildver panTro2 --downdb seq chimpdb/panTro2_seq
retrieve_seq_from_fasta.pl chimpdb/panTro2_refGene.txt -seqdir chimpdb/panTro2_seq -format refGene -outfile chimpdb/panTro2_refGeneMrna.fa

So I tried:

perl annotate_variation.pl -build hg19 --downdb seq humandb/hg19_seq
perl retrieve_seq_from_fasta.pl humandb/hg19_refGene.txt -seqdir humandb/human_seq -format refGene -outfile humandb/humandb_refGeneMrna.fa

but the first command didn't work...

perl annotate_variation.pl -build hg19 --downdb seq humandb/hg19_seq
NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database ftp://hgdownload.cse.ucsc.edu/golden...ps/chromFa.zip ... Failed
NOTICE: Downloading annotation database ftp://hgdownload.cse.ucsc.edu/golden...chromFa.tar.gz ... ^C

Does someone have a suggestion for my issue???

**Heisman** · 12-01-2011, 04:47 PM

Another update to ANNOVAR was put online 3 days ago that fixes the warning about exonic snps having the wrong reference, so try downloading the latest version and doing it again. You can always plug your reference coordinates into UCSC and see if the bases match the reference bases you think are correct.

For your last download command, I think it should end with humandb/ and nothing after the "/".

**Jane M** · 12-02-2011, 01:05 AM

Thanks Heisman. I am looking for the newest version (2011 Nov 28), but I cannot see a link on the website. I can download the 2011 Nov 20 only. Can you indicate where I can download it please?

I checked on UCSC some of the nucleotides which are given as my reference nucleotides and they do not agree with the UCSC database. But, they agree with the NCBI database (when checking the coding sequences only)...

My command line is still not working...

perl annotate_variation.pl -build hg19 --downdb seq humandb
NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database ftp://hgdownload.cse.ucsc.edu/golden...ps/chromFa.zip ... Failed
NOTICE: Downloading annotation database ftp://hgdownload.cse.ucsc.edu/golden...chromFa.tar.gz ... ^C

So I cannot run

perl retrieve_seq_from_fasta.pl humandb/hg19_refGene.txt -seqdir humandb/human_seq -format refGene -outfile humandb/humandb_refGeneMrna.fa

**Heisman** · 12-02-2011, 06:21 AM

Same download link as before. Read the first couple of paragraphs: http://www.openbioinformatics.org/an..._download.html

A lot of the annovar databases are based on UCSC, I think, so that could give you problems if there is no option to specify that correctly in annovar.

As for : [code=perl annotate_variation.pl -build hg19 --downdb seq humandb[/url]

Where on the website does it describe downloading "seq"? I cannot find that.

**Jane M** · 12-02-2011, 08:04 AM

Ok, so in fact I am using the new version of Nov 28 !
I thought it was the one of Nov 22 because of this

ANNOVAR (2011Nov20 version, ~79Mb) can be downloaded here (a mirror site here). This file was slightly updated a few days later to fix a bug in not printing out het/hom for indels for VCF files in convert2annovar.pl. Note that you can just run "annotate_variation.pl -downdb null ." to check if a new major release of ANNOVAR is available. Updated 2011Nov28: a slightly modifed version is uploaded that fixed the problem of complaining "new version is available" in the 2011Nov20 upload and that contains a slightly updated convert2annovar.pl (contributed by Dr. Hiroyuki MISHIMA) fixed issues in handling certain indels in 500SOLiD-LifeScope GFF3 files.

but I checked the modification time of the perl files.
Finally, I will try to understand why the references are different between UCSC and NCBI. Maybe I am misunderstanding something...

Thanks for your help !

**Jane M** · 12-13-2011, 02:08 AM

Hello,

I still have the problem with the warning message: "83 exonic SNPs have WRONG reference alleles specified in your input file".
I think this is due to the fact that I'm using the information obtained in reverse sense:

Code:

SNP position chr         strand    wild.codon    mut.codon
60950509	chr14         -            AAA               GAA
78177183	chr14         +           CAG               AAG

Therefore, in the database, there is no AAA, but rather TTT in this example. Is there a way to use the information obtained in reverse sense in ANNOVAR? Or do I have to turn it as "direct sense" myself ?

Thanks

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

problem for running ANNOVAR

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News