Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zaclown
    Junior Member
    • Sep 2010
    • 3

    entrez ID conversion

    Hello,

    does anyone know how to convert entrez I.D. to either Refseq ID or Gene Symbols?
    I have found resources on Refseq to Gene Symbol conversion, but I can't find anything on Entrez I.D.
    The genome I work with is C. elegans.
    Thanks in advance for any suggestion
  • gaffa
    Member
    • Oct 2010
    • 82

    #2
    Try UniProt's online conversion service: http://www.uniprot.org -> "ID Mapping" tab

    Comment

    • Richard Finney
      Senior Member
      • Feb 2009
      • 701

      #3
      NCBI maintains a flatfiles of gene annotations which contains the information you're after:
      ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
      ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
      ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz
      [ There are other interesting files in that directory ]


      The tax_id (taxonomy ID for C.Elgans is 6239 ) [ from Taxonomy browser http://www.ncbi.nlm.nih.gov/taxonomy ]

      You can type : "wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz" from the command line, or download via a browser.

      Example using this data :
      bash-3.00$ cat gene2refseq | awk '{if ($1==6239) print $0}' | head
      6239 171590 REVIEWED NM_058260.3 193203640 NP_490660.1 17510631 NC_003279.6 193203938 4123 10231 - -
      6239 171591 REVIEWED NM_058259.3 193203639 NP_490661.1 17510629 NC_003279.6 193203938 11498 16830 + -
      6239 171592 REVIEWED NM_058261.3 133902001 NP_490662.1 17510633 NC_003279.6 193203938 17496 26780 - -
      6239 171592 REVIEWED NM_058262.3 86561628 NP_490663.1 17510635 NC_003279.6 193203938 17496 26780 - -
      6239 171593 REVIEWED NM_058263.3 115533565 NP_490664.2 115533566 NC_003279.6 193203938 27594 32481 - -
      6239 171594 REVIEWED NM_058265.3 71995026 NP_490666.2 25143331 NC_003279.6 193203938 49918 54359 + -
      6239 171595 REVIEWED NM_058267.4 115533567 NP_490668.4 115533568 NC_003279.6 193203938 55315 64020 - -
      6239 171597 REVIEWED NM_058269.2 71995034 NP_490670.1 17510145 NC_003279.6 193203938 85044 86283 - -
      6239 171599 REVIEWED NM_058271.6 212645149 NP_490672.2 25143337 NC_003279.6 193203938 93030 94880 + -
      6239 171600 REVIEWED NM_058272.4 212645150 NP_490673.1 17510147 NC_003279.6 193203938 96478 100612 - -
      -bash-3.00$ cat gene_info | grep 171590 | awk '{if ($1==6239) print $0}'
      6239 171590 Y74C9A.3 Y74C9A.3 - WormBase:WBGene00022277 I - hypothetical protein protein-coding - - - - 20101017

      Comment

      • Fuad
        Junior Member
        • Jun 2009
        • 2

        #4
        DAVID has a Gene ID Conversion tool:



        Fuad

        Comment

        • rdu
          Member
          • Aug 2010
          • 29

          #5
          Bioconductor package "biomaRt" also could do it.

          Comment

          • peachgil
            Junior Member
            • Feb 2011
            • 2

            #6
            In Bioconductor, just use the following codes:

            > library(org.Hs.eg.db)
            > library(annotate)
            > lookUp('3815', 'org.Hs.eg', 'SYMBOL')
            $`3815`
            [1] "KIT"

            > lookUp('3815', 'org.Hs.eg', 'REFSEQ')
            $`3815`
            [1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"

            Comment

            • MDonlin
              Member
              • May 2010
              • 14

              #7
              You can also do ID conversion using Biomart at EBI.

              Comment

              • jmw86069
                Member
                • Jun 2009
                • 31

                #8
                Always a fan of the linux one-liner, here is an example for human ACTB gene using hg18:

                mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e "select k2ll.value as entrezGeneId, kx.refseq as refseqMrna, kx.geneSymbol as entrezGeneSymbol, kx.description as entrezGeneDesc from kgXref kx, knownToLocusLink k2ll where k2ll.name=kx.kgID and kx.geneSymbol='ACTB';"
                UCSC's C.elegans tables don't include the knownGene and kg% tables, but some poking around ( using "show tables like '%locus%';" ) led me to formulate this MySQL query that takes locusLinkId as input and prints the gene symbol, refseq mRNA, description, etc.

                mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D ce6 -e "select rl.locusLinkId, rl.name as geneName, rl.product as geneDescription, rl.mrnaAcc as refseqMrna, rl.protAcc as refseqProt from refLink rl where rl.locusLinkId=174288;"
                The bummer is that you have to tell it to use "ce6" -- it isn't generic enough to sniff out what organism and version to use a priori. But you'll know which one to use right? :-) And you can of course change the "=174288" to "IN (174288, 174289,174290)" for more of a bulk-input-experience, depending upon what you need. If you end up batch-scripting some geneID conversions, I'd definitely use the "IN" clause instead of querying them one-by-one. Markedly faster.

                DAVID is in theory a great resource, but could be opened up to increase the API limits, or to allow direct data downloads.

                Comment

                • zaclown
                  Junior Member
                  • Sep 2010
                  • 3

                  #9
                  Thank you all guys

                  Comment

                  • moushengxu@gmail.com
                    Junior Member
                    • Oct 2016
                    • 1

                    #10
                    How to do the opposite?

                    Originally posted by peachgil View Post
                    In Bioconductor, just use the following codes:

                    > library(org.Hs.eg.db)
                    > library(annotate)
                    > lookUp('3815', 'org.Hs.eg', 'SYMBOL')
                    $`3815`
                    [1] "KIT"

                    > lookUp('3815', 'org.Hs.eg', 'REFSEQ')
                    $`3815`
                    [1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"
                    I have a set of HGNC gene symbols, and I want to convert them to Entrez Gene IDs.

                    Thanks much!

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    27 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    33 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    40 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    62 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...