Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to change XLOC ID to Ensembl ID from Cuffdiff

    When I did RNA-Seq analysis, the GTF file I used was from Ensembl. The output of cuffdiff replaced the Ensembl IDs with XLOC's although it also output gene names (e.g. MX2). Ensembl IDs were no longer there.
    Is there anyway to convert XLOC back to Ensemble IDs, or simply keep the ensembl IDs from my GTF file? how do you guys go about this?
    Interesting enough, if I don't run new gene discovery (i.e. without doing cuffmerge step), I got to keep Ensembl IDs.
    Last edited by super0925; 03-16-2015, 06:41 AM.

  • #2
    Hi super0925,

    I ran into the same problem, and realized the merged.gtf file produced by cuffmerge did not use the Ensembl ID as the transcript ID (shown below). Also, in the downstream analysis with cuffdiff, the "oId" ensembl ID is not carried over into the SQLite database.

    1 Cufflinks exon 11869 12227 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
    1 Cufflinks exon 12613 12721 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "2"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
    1 Cufflinks exon 13221 14409 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "3"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
    I ended up writing a python script to substitute the 'transcript_id' with the 'oId' in order to maintain the ensembl IDs (below). I used the new merged.gtf file for cuffdiff and that solved my problem.

    #!/usr/bin/python

    gtf_handle = "/PATH/TO/merged.gtf"
    fh = open(gtf_handle, "r")

    import re

    trans_ids = {}

    with open('merged2.gtf', 'w') as f:

    for line in fh:
    line = line.strip('\n') ##strip the line to remove white spaces
    ##print line
    cuffID = re.findall(r'gene_id \"([\w\.]+)"', line) ##use RE to get lists of cuffid, ensemblId etc
    cuffTx = re.findall(r'transcript_id \"([\w\.]+)"', line)
    ensemblTx = re.findall(r'oId \"([\w\.]+)"', line)
    geneName = re.findall(r'gene_name \"([\w\.]+)"', line)
    ##print cuffTx[0]
    line = str(line).replace(cuffTx[0], ensemblTx[0]) ##unlist the transcript identifiers and replace cufflinksID with ensemblIDs
    print line
    f.write("%s\n" % str(line)) ##write file out to a .gtf file
    1 Cufflinks exon 11869 12227 . + . gene_id "XLOC_000001"; transcript_id "ENST00000456328"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
    1 Cufflinks exon 12613 12721 . + . gene_id "XLOC_000001"; transcript_id "ENST00000456328"; exon_number "2"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
    1 Cufflinks exon 13221 14409 . + . gene_id "XLOC_000001"; transcript_id "ENST00000456328"; exon_number "3"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
    Thanks
    Last edited by Seq-Rue; 06-25-2015, 10:54 AM.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Recent Advances in Sequencing Analysis Tools
      by seqadmin


      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
      Yesterday, 07:48 AM
    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 06:57 AM
    0 responses
    9 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 07:17 AM
    0 responses
    14 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-02-2024, 08:06 AM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-30-2024, 12:17 PM
    0 responses
    23 views
    0 likes
    Last Post seqadmin  
    Working...
    X