Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UCSC kgID string format

    Hello all, this is my first post in SEQanswers.

    You see, kgIDs appear like uc011mwm.1, uc011mwn.1, uc001adr.2, uc001ahe.3, etc. I'm wondering what those .1, .2, .3 mean? When I tried to map kgIDs to gene symbols, I found my kgIDs may not always found a match in the current kgXref table. I am considering if I could ignore the one digit after the period?

  • #2
    Generally (at least at NCBI) the ".n" refers to version number. Larger numbers indicate a newer version.

    You can email UCSC genome support ([email protected]) to get confirmation since I couldn't find an explicit link that confirms above.

    Comment


    • #3
      Thank you GenoMax!
      I just received a formal reply from USCS, after taking your advice of emailing them:
      #######################################
      For an ID like uc011mwn.1, the .1 represents the revision number of the transcript. When a new version of UCSC Genes is released, a transcript like uc011mwn.1 could possibly remain the same, it could become uc011mwn.2, it could receive a new transcript ID entirely or it could disappear altogether from the new version of UCSC Genes.

      On hg38, the current version of UCSC Genes is version 9. The current version of UCSC Genes is always contained in the table knownGene. When version 8 was replaced by version 9, the old version 8 tables were renamed knownGeneOld8 and kgXrefOld8. The differences in transcripts between version 8 and version 9 are tracked with the table kg8ToKg9. This same schema is repeated every time UCSC Genes is updated, so on hg19, you will find several knownGeneOld# tables, several kgXrefOld# tables and several kg#ToKg# tables.

      You could possibly ignore the revision number and still get a match, but that will only work if a transcript retained the same transcript ID with a new revision number (e.g., uc011mwn.1 to uc011mwn.2). For transcripts IDs that changed or disappeared entirely, this will not work. Note the following:

      mysql> select oldId,newId from kg8ToKg9 limit 5;
      +------------+------------+
      | oldId | newId |
      +------------+------------+
      | uc001aaa.3 | |
      | uc001aab.3 | |
      | uc010nxq.1 | |
      | uc001aae.4 | |
      | uc009vit.3 | uc031tla.1 |
      +------------+------------+
      5 rows in set (0.00 sec)

      Note that the first 4 IDs in the list disappeared entirely from version 8 to version 9 and the last changed from uc009vit.3 to uc031tla.1.

      For IDs you are having problems mapping, you can try querying kgXrefOld# or you can track the ID changes from version to version through the kg#ToKg# tables. You can download the tables in their entirety from http://hgdownload.cse.ucsc.edu/golde...hg38/database/ or http://hgdownload.cse.ucsc.edu/golde...hg19/database/. You can also query the tables on our public MySql server: https://genome.ucsc.edu/goldenPath/help/mysql.html

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X