Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sheen_yh
    Junior Member
    • Feb 2013
    • 2

    UCSC kgID string format

    Hello all, this is my first post in SEQanswers.

    You see, kgIDs appear like uc011mwm.1, uc011mwn.1, uc001adr.2, uc001ahe.3, etc. I'm wondering what those .1, .2, .3 mean? When I tried to map kgIDs to gene symbols, I found my kgIDs may not always found a match in the current kgXref table. I am considering if I could ignore the one digit after the period?
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Generally (at least at NCBI) the ".n" refers to version number. Larger numbers indicate a newer version.

    You can email UCSC genome support ([email protected]) to get confirmation since I couldn't find an explicit link that confirms above.

    Comment

    • sheen_yh
      Junior Member
      • Feb 2013
      • 2

      #3
      Thank you GenoMax!
      I just received a formal reply from USCS, after taking your advice of emailing them:
      #######################################
      For an ID like uc011mwn.1, the .1 represents the revision number of the transcript. When a new version of UCSC Genes is released, a transcript like uc011mwn.1 could possibly remain the same, it could become uc011mwn.2, it could receive a new transcript ID entirely or it could disappear altogether from the new version of UCSC Genes.

      On hg38, the current version of UCSC Genes is version 9. The current version of UCSC Genes is always contained in the table knownGene. When version 8 was replaced by version 9, the old version 8 tables were renamed knownGeneOld8 and kgXrefOld8. The differences in transcripts between version 8 and version 9 are tracked with the table kg8ToKg9. This same schema is repeated every time UCSC Genes is updated, so on hg19, you will find several knownGeneOld# tables, several kgXrefOld# tables and several kg#ToKg# tables.

      You could possibly ignore the revision number and still get a match, but that will only work if a transcript retained the same transcript ID with a new revision number (e.g., uc011mwn.1 to uc011mwn.2). For transcripts IDs that changed or disappeared entirely, this will not work. Note the following:

      mysql> select oldId,newId from kg8ToKg9 limit 5;
      +------------+------------+
      | oldId | newId |
      +------------+------------+
      | uc001aaa.3 | |
      | uc001aab.3 | |
      | uc010nxq.1 | |
      | uc001aae.4 | |
      | uc009vit.3 | uc031tla.1 |
      +------------+------------+
      5 rows in set (0.00 sec)

      Note that the first 4 IDs in the list disappeared entirely from version 8 to version 9 and the last changed from uc009vit.3 to uc031tla.1.

      For IDs you are having problems mapping, you can try querying kgXrefOld# or you can track the ID changes from version to version through the kg#ToKg# tables. You can download the tables in their entirety from http://hgdownload.cse.ucsc.edu/golde...hg38/database/ or http://hgdownload.cse.ucsc.edu/golde...hg19/database/. You can also query the tables on our public MySql server: https://genome.ucsc.edu/goldenPath/help/mysql.html

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Today, 10:09 AM
      0 responses
      9 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, Yesterday, 08:59 AM
      0 responses
      16 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      24 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      21 views
      0 reactions
      Last Post SEQadmin2  
      Working...