SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
XLOC identifiers from cufflinks/cuffmerge/cuffdiff AdamB RNA Sequencing 4 01-24-2016 07:12 PM
Cuffdiff output:different Ensembl ID but same gene name? super0925 RNA Sequencing 3 02-10-2015 09:05 AM
Output of Cuffdiff has only XLOC Xinlitik Bioinformatics 2 08-16-2013 04:39 AM
cuffdiff generating ALL not significant results with ensembl gtf twotwo RNA Sequencing 0 09-13-2012 12:36 PM
fold change value-cuffdiff madsaan Bioinformatics 4 02-10-2011 06:51 AM

Reply
 
Thread Tools
Old 03-16-2015, 06:37 AM   #1
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default how to change XLOC ID to Ensembl ID from Cuffdiff

When I did RNA-Seq analysis, the GTF file I used was from Ensembl. The output of cuffdiff replaced the Ensembl IDs with XLOC's although it also output gene names (e.g. MX2). Ensembl IDs were no longer there.
Is there anyway to convert XLOC back to Ensemble IDs, or simply keep the ensembl IDs from my GTF file? how do you guys go about this?
Interesting enough, if I don't run new gene discovery (i.e. without doing cuffmerge step), I got to keep Ensembl IDs.

Last edited by super0925; 03-16-2015 at 06:41 AM.
super0925 is offline   Reply With Quote
Old 06-24-2015, 04:35 PM   #2
Seq-Rue
Junior Member
 
Location: USA

Join Date: Jun 2015
Posts: 1
Default

Hi super0925,

I ran into the same problem, and realized the merged.gtf file produced by cuffmerge did not use the Ensembl ID as the transcript ID (shown below). Also, in the downstream analysis with cuffdiff, the "oId" ensembl ID is not carried over into the SQLite database.

Quote:
1 Cufflinks exon 11869 12227 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
1 Cufflinks exon 12613 12721 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "2"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
1 Cufflinks exon 13221 14409 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "3"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
I ended up writing a python script to substitute the 'transcript_id' with the 'oId' in order to maintain the ensembl IDs (below). I used the new merged.gtf file for cuffdiff and that solved my problem.

Quote:
#!/usr/bin/python

gtf_handle = "/PATH/TO/merged.gtf"
fh = open(gtf_handle, "r")

import re

trans_ids = {}

with open('merged2.gtf', 'w') as f:

for line in fh:
line = line.strip('\n') ##strip the line to remove white spaces
##print line
cuffID = re.findall(r'gene_id \"([\w\.]+)"', line) ##use RE to get lists of cuffid, ensemblId etc
cuffTx = re.findall(r'transcript_id \"([\w\.]+)"', line)
ensemblTx = re.findall(r'oId \"([\w\.]+)"', line)
geneName = re.findall(r'gene_name \"([\w\.]+)"', line)
##print cuffTx[0]
line = str(line).replace(cuffTx[0], ensemblTx[0]) ##unlist the transcript identifiers and replace cufflinksID with ensemblIDs
print line
f.write("%s\n" % str(line)) ##write file out to a .gtf file
Quote:
1 Cufflinks exon 11869 12227 . + . gene_id "XLOC_000001"; transcript_id "ENST00000456328"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
1 Cufflinks exon 12613 12721 . + . gene_id "XLOC_000001"; transcript_id "ENST00000456328"; exon_number "2"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
1 Cufflinks exon 13221 14409 . + . gene_id "XLOC_000001"; transcript_id "ENST00000456328"; exon_number "3"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
Thanks

Last edited by Seq-Rue; 06-25-2015 at 10:54 AM.
Seq-Rue is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:46 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO