![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PhyloP in annovar | Robby | Bioinformatics | 2 | 12-21-2015 09:52 PM |
annovar question | kenietz | Bioinformatics | 5 | 02-06-2012 01:20 AM |
Annovar Format | AmitL | Bioinformatics | 0 | 09-13-2011 06:03 AM |
tutorial annovar | abakelaar | RNA Sequencing | 0 | 07-27-2011 01:28 AM |
Annovar files | Masta | General | 1 | 02-22-2011 02:57 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: China Join Date: Nov 2010
Posts: 11
|
![]()
Hello everyone,
I would like to determine whether or not my calling SNPs are in coding regions and whether they impact the protein sequence. So I use ANNOVAR for annotation. However, my research target species is maize ,which even not have the UCSC-type annotation database. So I think I shoud convert my GFF3 maize annotation file to a UCSC-type file. Could you give me any suggestion about the format of the UCSC-type file or any ideas for annotation for maize snps? The file "hg18_refGene.txt" in the example database of ANNOVAR 585 NR_028269 chr1 - 4224 7502 7502 7502 7 4224,4832,5658,6469,6719,7095,7468, 4692,4901,5810,6631,6918,7231,7502, 0 LOC100288778 unk unk -1,-1,-1,-1,-1,-1,-1, what is the meaning of the row? Thank you advance Best wishes Xujie |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: US Join Date: Jan 2009
Posts: 392
|
![]()
In order to get ANNOVAR to work with Arabidopsis I had to build my own database from scratch like you.
The fields are described in the ANNOVAR website: For refGene file, each line has 16 tab-delimited columns: $bin, $name, $chr, $dbstrand, $txstart, $txend, $cdsstart, $cdsend, $exoncount, $exonstart, $exonend, $id, $name2, $cdsstartstat, $cdsendstat, $exonframes. The only real important thing is $name (transcript name), $chr (chromosome), $dbstrand (strand of the transcript in reference genome), $txstart, $txend (transcription start and end), $cdsstart, $cdsend (translation start and end, remember that there are 5/3-UTR in each transcript so the $cdsstart is not the same as $txstart), $exoncount (number of exoms), $exonstart $exonend (comma-delimited exon start and end sites). Remember that all start sites use zero-based coordinates. http://www.openbioinformatics.org/an...ml#othergenome You can start by using the gff3ToGenePred or gtfToGenePred (found here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/) on your GFF3 or GTF file. The $bin, $id, $name2, $cdsstartstat, $cdsendstat, and $exonframes are not critical for ANNOVAR function, but you will need something in those columns just as filler for it to work. Here is a sample of what my Arabidopsis refgene file looks like: Code:
1 AT5G01010.4 Chr5 - 1222 5061 1387 4924 16 1222,1571,1744,1913,2104,2434,2747,2871,3302,3542,3761,3926,4101,4334,4551,4764, 1459,1646,1780,2007,2181,2509,2799,2934,3383,3659,3802,4005,4237,4467,4679,5061, name unk unk unk unk 1 AT5G01010.1 Chr5 - 1250 5043 1387 4924 15 1250,1571,1744,1913,2434,2747,2871,3302,3542,3761,3926,4101,4334,4551,4764, 1459,1646,1780,1961,2509,2799,2934,3383,3659,3802,4005,4258,4467,4679,5043, name unk unk unk unk 1 AT5G01010.2 Chr5 - 1278 4994 1387 4924 16 1278,1571,1744,1913,2104,2434,2747,2871,3302,3542,3761,3926,4101,4334,4551,4764, 1459,1646,1780,2007,2181,2509,2799,2934,3383,3659,3802,4005,4258,4467,4679,4994, name unk unk unk unk 1 AT5G01010.3 Chr5 - 1278 5043 1526 4924 14 1278,1744,1913,2434,2747,2871,3302,3542,3761,3926,4101,4334,4551,4764, 1646,1780,1961,2509,2799,2934,3383,3659,3802,4005,4258,4467,4679,5043, name unk unk unk unk 1 AT5G01015.1 Chr5 - 5255 5891 5334 5769 2 5255,5696, 5576,5891, name unk unk unk unk 1 AT5G01015.2 Chr5 - 5366 5801 5515 5769 2 5366,5686, 5576,5801, name unk unk unk unk Also its important to know that you need to name your database files as "hg18_refgene" and so on. Either that or go into the annotate_variation.pl and modify every instance of hg18 with the name of your database files. So in my case I replaced hg18 with TAIR10. Otherwise ANNOVAR will complain about not being able to find the right files. Last edited by chadn737; 02-23-2012 at 09:37 PM. |
![]() |
![]() |
![]() |
#3 | |
Member
Location: China Join Date: Nov 2010
Posts: 11
|
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|