Seqanswers Leaderboard Ad

**Seq-Rue** · 06-24-2015, 04:35 PM

Hi super0925,

I ran into the same problem, and realized the merged.gtf file produced by cuffmerge did not use the Ensembl ID as the transcript ID (shown below). Also, in the downstream analysis with cuffdiff, the "oId" ensembl ID is not carried over into the SQLite database.

1 Cufflinks exon 11869 12227 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
1 Cufflinks exon 12613 12721 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "2"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
1 Cufflinks exon 13221 14409 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "3"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";

I ended up writing a python script to substitute the 'transcript_id' with the 'oId' in order to maintain the ensembl IDs (below). I used the new merged.gtf file for cuffdiff and that solved my problem.

#!/usr/bin/python

gtf_handle = "/PATH/TO/merged.gtf"
fh = open(gtf_handle, "r")

import re

trans_ids = {}

with open('merged2.gtf', 'w') as f:

for line in fh:
line = line.strip('\n') ##strip the line to remove white spaces
##print line
cuffID = re.findall(r'gene_id \"([\w\.]+)"', line) ##use RE to get lists of cuffid, ensemblId etc
cuffTx = re.findall(r'transcript_id \"([\w\.]+)"', line)
ensemblTx = re.findall(r'oId \"([\w\.]+)"', line)
geneName = re.findall(r'gene_name \"([\w\.]+)"', line)
##print cuffTx[0]
line = str(line).replace(cuffTx[0], ensemblTx[0]) ##unlist the transcript identifiers and replace cufflinksID with ensemblIDs
print line
f.write("%s\n" % str(line)) ##write file out to a .gtf file

1 Cufflinks exon 11869 12227 . + . gene_id "XLOC_000001"; transcript_id "ENST00000456328"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
1 Cufflinks exon 12613 12721 . + . gene_id "XLOC_000001"; transcript_id "ENST00000456328"; exon_number "2"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";
1 Cufflinks exon 13221 14409 . + . gene_id "XLOC_000001"; transcript_id "ENST00000456328"; exon_number "3"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";

Thanks

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

how to change XLOC ID to Ensembl ID from Cuffdiff

Comment

Latest Articles

ad_right_rmr

News