Unconfigured Ad

**gaffa** · 02-10-2011, 04:13 PM

Try UniProt's online conversion service: http://www.uniprot.org -> "ID Mapping" tab

**Richard Finney** · 02-10-2011, 04:30 PM

NCBI maintains a flatfiles of gene annotations which contains the information you're after:
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz
[ There are other interesting files in that directory ]

The tax_id (taxonomy ID for C.Elgans is 6239 ) [ from Taxonomy browser http://www.ncbi.nlm.nih.gov/taxonomy ]

You can type : "wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz" from the command line, or download via a browser.

Example using this data :
bash-3.00$ cat gene2refseq | awk '{if ($1==6239) print $0}' | head
6239 171590 REVIEWED NM_058260.3 193203640 NP_490660.1 17510631 NC_003279.6 193203938 4123 10231 - -
6239 171591 REVIEWED NM_058259.3 193203639 NP_490661.1 17510629 NC_003279.6 193203938 11498 16830 + -
6239 171592 REVIEWED NM_058261.3 133902001 NP_490662.1 17510633 NC_003279.6 193203938 17496 26780 - -
6239 171592 REVIEWED NM_058262.3 86561628 NP_490663.1 17510635 NC_003279.6 193203938 17496 26780 - -
6239 171593 REVIEWED NM_058263.3 115533565 NP_490664.2 115533566 NC_003279.6 193203938 27594 32481 - -
6239 171594 REVIEWED NM_058265.3 71995026 NP_490666.2 25143331 NC_003279.6 193203938 49918 54359 + -
6239 171595 REVIEWED NM_058267.4 115533567 NP_490668.4 115533568 NC_003279.6 193203938 55315 64020 - -
6239 171597 REVIEWED NM_058269.2 71995034 NP_490670.1 17510145 NC_003279.6 193203938 85044 86283 - -
6239 171599 REVIEWED NM_058271.6 212645149 NP_490672.2 25143337 NC_003279.6 193203938 93030 94880 + -
6239 171600 REVIEWED NM_058272.4 212645150 NP_490673.1 17510147 NC_003279.6 193203938 96478 100612 - -
-bash-3.00$ cat gene_info | grep 171590 | awk '{if ($1==6239) print $0}'
6239 171590 Y74C9A.3 Y74C9A.3 - WormBase:WBGene00022277 I - hypothetical protein protein-coding - - - - 20101017

**Fuad** · 02-15-2011, 11:59 AM

DAVID has a Gene ID Conversion tool:

http://david.abcc.ncifcrf.gov/home.jsp

Fuad

**rdu** · 02-15-2011, 01:14 PM

Bioconductor package "biomaRt" also could do it.

**peachgil** · 02-16-2011, 10:34 AM

In Bioconductor, just use the following codes:

> library(org.Hs.eg.db)
> library(annotate)
> lookUp('3815', 'org.Hs.eg', 'SYMBOL')
$`3815`
[1] "KIT"

> lookUp('3815', 'org.Hs.eg', 'REFSEQ')
$`3815`
[1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"

**MDonlin** · 02-16-2011, 01:14 PM

You can also do ID conversion using Biomart at EBI.

**jmw86069** · 02-16-2011, 09:06 PM

Always a fan of the linux one-liner, here is an example for human ACTB gene using hg18:

mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e "select k2ll.value as entrezGeneId, kx.refseq as refseqMrna, kx.geneSymbol as entrezGeneSymbol, kx.description as entrezGeneDesc from kgXref kx, knownToLocusLink k2ll where k2ll.name=kx.kgID and kx.geneSymbol='ACTB';"

UCSC's C.elegans tables don't include the knownGene and kg% tables, but some poking around ( using "show tables like '%locus%';" ) led me to formulate this MySQL query that takes locusLinkId as input and prints the gene symbol, refseq mRNA, description, etc.

mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D ce6 -e "select rl.locusLinkId, rl.name as geneName, rl.product as geneDescription, rl.mrnaAcc as refseqMrna, rl.protAcc as refseqProt from refLink rl where rl.locusLinkId=174288;"

The bummer is that you have to tell it to use "ce6" -- it isn't generic enough to sniff out what organism and version to use a priori. But you'll know which one to use right? :-) And you can of course change the "=174288" to "IN (174288, 174289,174290)" for more of a bulk-input-experience, depending upon what you need. If you end up batch-scripting some geneID conversions, I'd definitely use the "IN" clause instead of querying them one-by-one. Markedly faster.

DAVID is in theory a great resource, but could be opened up to increase the API limits, or to allow direct data downloads.

**zaclown** · 03-18-2011, 10:23 AM

Thank you all guys

**[email protected]** · 10-11-2016, 06:21 AM

How to do the opposite?

Originally posted by peachgil View Post

In Bioconductor, just use the following codes:

> library(org.Hs.eg.db)
> library(annotate)
> lookUp('3815', 'org.Hs.eg', 'SYMBOL')
$`3815`
[1] "KIT"

> lookUp('3815', 'org.Hs.eg', 'REFSEQ')
$`3815`
[1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"

I have a set of HGNC gene symbols, and I want to convert them to Entrez Gene IDs.

Thanks much!

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 40 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 62 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

entrez ID conversion

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News