SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ensembl id conversion NicoBxl Bioinformatics 13 05-08-2014 10:06 AM
which is the right conversion for Solid? pepperoni Bioinformatics 2 11-12-2011 06:23 AM
alignment conversion yaximik Bioinformatics 0 05-30-2011 12:24 PM
Entrez to Refseq programmatic conversion abrook General 1 05-30-2011 10:42 AM
Fasta to Ace conversion Farhat Bioinformatics 19 05-15-2010 06:08 PM

Reply
 
Thread Tools
Old 02-10-2011, 02:38 PM   #1
zaclown
Junior Member
 
Location: USA

Join Date: Sep 2010
Posts: 3
Default entrez ID conversion

Hello,

does anyone know how to convert entrez I.D. to either Refseq ID or Gene Symbols?
I have found resources on Refseq to Gene Symbol conversion, but I can't find anything on Entrez I.D.
The genome I work with is C. elegans.
Thanks in advance for any suggestion
zaclown is offline   Reply With Quote
Old 02-10-2011, 03:13 PM   #2
gaffa
Member
 
Location: Gothenburg/Uppsala, Sweden

Join Date: Oct 2010
Posts: 82
Default

Try UniProt's online conversion service: http://www.uniprot.org -> "ID Mapping" tab
gaffa is offline   Reply With Quote
Old 02-10-2011, 03:30 PM   #3
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

NCBI maintains a flatfiles of gene annotations which contains the information you're after:
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz
[ There are other interesting files in that directory ]


The tax_id (taxonomy ID for C.Elgans is 6239 ) [ from Taxonomy browser http://www.ncbi.nlm.nih.gov/taxonomy ]

You can type : "wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz" from the command line, or download via a browser.

Example using this data :
bash-3.00$ cat gene2refseq | awk '{if ($1==6239) print $0}' | head
6239 171590 REVIEWED NM_058260.3 193203640 NP_490660.1 17510631 NC_003279.6 193203938 4123 10231 - -
6239 171591 REVIEWED NM_058259.3 193203639 NP_490661.1 17510629 NC_003279.6 193203938 11498 16830 + -
6239 171592 REVIEWED NM_058261.3 133902001 NP_490662.1 17510633 NC_003279.6 193203938 17496 26780 - -
6239 171592 REVIEWED NM_058262.3 86561628 NP_490663.1 17510635 NC_003279.6 193203938 17496 26780 - -
6239 171593 REVIEWED NM_058263.3 115533565 NP_490664.2 115533566 NC_003279.6 193203938 27594 32481 - -
6239 171594 REVIEWED NM_058265.3 71995026 NP_490666.2 25143331 NC_003279.6 193203938 49918 54359 + -
6239 171595 REVIEWED NM_058267.4 115533567 NP_490668.4 115533568 NC_003279.6 193203938 55315 64020 - -
6239 171597 REVIEWED NM_058269.2 71995034 NP_490670.1 17510145 NC_003279.6 193203938 85044 86283 - -
6239 171599 REVIEWED NM_058271.6 212645149 NP_490672.2 25143337 NC_003279.6 193203938 93030 94880 + -
6239 171600 REVIEWED NM_058272.4 212645150 NP_490673.1 17510147 NC_003279.6 193203938 96478 100612 - -
-bash-3.00$ cat gene_info | grep 171590 | awk '{if ($1==6239) print $0}'
6239 171590 Y74C9A.3 Y74C9A.3 - WormBase:WBGene00022277 I - hypothetical protein protein-coding - - - - 20101017
Richard Finney is offline   Reply With Quote
Old 02-15-2011, 10:59 AM   #4
Fuad
Junior Member
 
Location: Toronto

Join Date: Jun 2009
Posts: 2
Default

DAVID has a Gene ID Conversion tool:

http://david.abcc.ncifcrf.gov/home.jsp

Fuad
Fuad is offline   Reply With Quote
Old 02-15-2011, 12:14 PM   #5
rdu
Member
 
Location: USA

Join Date: Aug 2010
Posts: 29
Default

Bioconductor package "biomaRt" also could do it.
rdu is offline   Reply With Quote
Old 02-16-2011, 09:34 AM   #6
peachgil
Junior Member
 
Location: US

Join Date: Feb 2011
Posts: 2
Default

In Bioconductor, just use the following codes:

> library(org.Hs.eg.db)
> library(annotate)
> lookUp('3815', 'org.Hs.eg', 'SYMBOL')
$`3815`
[1] "KIT"

> lookUp('3815', 'org.Hs.eg', 'REFSEQ')
$`3815`
[1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"
peachgil is offline   Reply With Quote
Old 02-16-2011, 12:14 PM   #7
MDonlin
Member
 
Location: St. Louis, MO

Join Date: May 2010
Posts: 14
Default

You can also do ID conversion using Biomart at EBI.
MDonlin is offline   Reply With Quote
Old 02-16-2011, 08:06 PM   #8
jmw86069
Member
 
Location: RTP, NC, USA

Join Date: Jun 2009
Posts: 28
Default

Always a fan of the linux one-liner, here is an example for human ACTB gene using hg18:

Quote:
mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e "select k2ll.value as entrezGeneId, kx.refseq as refseqMrna, kx.geneSymbol as entrezGeneSymbol, kx.description as entrezGeneDesc from kgXref kx, knownToLocusLink k2ll where k2ll.name=kx.kgID and kx.geneSymbol='ACTB';"
UCSC's C.elegans tables don't include the knownGene and kg% tables, but some poking around ( using "show tables like '%locus%';" ) led me to formulate this MySQL query that takes locusLinkId as input and prints the gene symbol, refseq mRNA, description, etc.

Quote:
mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D ce6 -e "select rl.locusLinkId, rl.name as geneName, rl.product as geneDescription, rl.mrnaAcc as refseqMrna, rl.protAcc as refseqProt from refLink rl where rl.locusLinkId=174288;"
The bummer is that you have to tell it to use "ce6" -- it isn't generic enough to sniff out what organism and version to use a priori. But you'll know which one to use right? :-) And you can of course change the "=174288" to "IN (174288, 174289,174290)" for more of a bulk-input-experience, depending upon what you need. If you end up batch-scripting some geneID conversions, I'd definitely use the "IN" clause instead of querying them one-by-one. Markedly faster.

DAVID is in theory a great resource, but could be opened up to increase the API limits, or to allow direct data downloads.
jmw86069 is offline   Reply With Quote
Old 03-18-2011, 10:23 AM   #9
zaclown
Junior Member
 
Location: USA

Join Date: Sep 2010
Posts: 3
Default

Thank you all guys
zaclown is offline   Reply With Quote
Old 10-11-2016, 06:21 AM   #10
moushengxu@gmail.com
Junior Member
 
Location: USA

Join Date: Oct 2016
Posts: 1
Default How to do the opposite?

Quote:
Originally Posted by peachgil View Post
In Bioconductor, just use the following codes:

> library(org.Hs.eg.db)
> library(annotate)
> lookUp('3815', 'org.Hs.eg', 'SYMBOL')
$`3815`
[1] "KIT"

> lookUp('3815', 'org.Hs.eg', 'REFSEQ')
$`3815`
[1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"
I have a set of HGNC gene symbols, and I want to convert them to Entrez Gene IDs.

Thanks much!
moushengxu@gmail.com is offline   Reply With Quote
Reply

Tags
entrez id, gene symbol, refseq id

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO