inconsistent gene names in genes.expr - Cufflinks

Boel

Member

Join Date: Oct 2009

Posts: 62
- Share
- Tweet
#1

inconsistent gene names in genes.expr - Cufflinks

04-13-2010, 11:09 AM

Hi,

I have discovered some inconsistencies when browsing through the Cufflinks files transcripts.expr, transcripts.tmap and genes.expr. Multiple transcripts belonging to the same gene are not named accordingly in the different files. Here are two examples to illustrate:

in transcripts.expr we have the two transcripts:

CUFF.8799.1 170650 chr1 47611249 47613205 107.266 1 0.749757 71.0992 143.433 60.5141 155
CUFF.8800.2 170650 chr1 47611334 47613500 46.5777 0.434227 0.250243 0 112.334 26.2769 77

But the gene "CUFF.8800" does not exist in the genes.expr file, only "CUFF.8799".

In the transcripts.gtf file however both "genes" are present:

chr1 Cufflinks transcript 47611250 47613205 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
chr1 Cufflinks exon 47611250 47611366 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "1"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
chr1 Cufflinks exon 47613168 47613205 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "2"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
chr1 Cufflinks transcript 47611335 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
chr1 Cufflinks exon 47611335 47611366 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "1"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
chr1 Cufflinks exon 47613456 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";

An picture from UCSC browser with my data is attached ("locus.8799.8800.gif"), and these transcripts were called in the first sample "Y1". Here you clearly see that these two transcripts come from the same loci, which the FMI and frac values in the transcript.expr file also implicate.

A somewhat more complicated example is a loci with three different transcripts deemed to be present, see attached picture "loci30602.30603.30604.gif". Analougusly, the three transcripts have different gene names:

CUFF.30602.1 198908 chr10 69761823 69768943 92.9518 0.424581 0.286601 50.5627 135.341 54.5443 491
CUFF.30603.2 198908 chr10 69761823 69769315 218.926 1 0.596452 174.379 263.473 128.466 510
CUFF.30604.3 198908 chr10 69768551 69769315 75.6472 0.345538 0.116947 60.7094 90.5849 44.3899 551

But only one of them, CUFF.30602, is present in the genes.expr file, but in the tmap file all three transcripts are annotated as belonging to the three genes CUFF.30604, CUFF.30603, CUFF.30602.

The FPKM value in the genes.expr file seems to be the total of all isoforms, but the naming and referencing is confusing.

Now you know.

Boel

Last edited by Boel; 04-14-2010, 05:06 AM.
Tags: None
thinkRNA

Member

Join Date: Jan 2010

Posts: 94
- Share
- Tweet
#2

04-13-2010, 04:28 PM

Your figures are really hard to see! Can you put up a better quality image?
Comment
Boel

Member

Join Date: Oct 2009

Posts: 62
- Share
- Tweet
#3

04-14-2010, 05:16 AM

link to the illustrations

Yup, they got tiny when I attached them apperently. They can be viewed here:

Error 404 (Not Found)!!1

http://picasaweb.google.se/108683670714644907845/ScientificIllustrations#
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Today, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

inconsistent gene names in genes.expr - Cufflinks

Comment

Comment

Latest Articles

ad_right_rmr

News