cuffdiff 2.0: figure this one out...

sdriscoll

I like code

Join Date: Sep 2009

Posts: 436
- Share
- Tweet
#1

cuffdiff 2.0: figure this one out...

05-08-2012, 12:20 PM

I've been running the new release of cuffdiff to see how it does the last day or two. It certainly produces more conservative gene lists and it seems to be better at throwing out genes that might be DE on average across replicates but have expression that seems erratic. I love the new output files that allow you to "dig in" a little more than before to see what's going on.

So that's what I'm doing and I found something strange.

I've run a 3 vs 3 wt vs mutant test with cuffdiff. Viewing the scatter plot of condition 2 vs condition 1 the overall plot looks good. there are several genes that stand out pretty far from the main body of the scatter with FPKMs > 100 in one condition or the other but they are not called significant. So I wanted to take a look at those. The first one was "Snora64" (i'm working with Mouse).

I tracked it down in several of the cuffdiff outputs:

gene_exp.diff:

Code:

XLOC_009930 XLOC_009930 Snora64 chr17:24857007-24858872 wt ko OK 52.6625 172.027 1.70779 -0.171531 0.863806 0.999999 no

isoform_exp.diff

Code:

uc008ayc.1 XLOC_009930 Snora64 chr17:24857007-24858872 wt ko OK 52.6625 172.027 1.70779 -0.171531 0.863806 0.999999 no

tss_group_exp.diff

Code:

TSS13588 XLOC_009930 Snora64 chr17:24857007-24858872 wt ko OK 52.6625 172.027 1.70779 -0.171531 0.863806 0.999999 no

Each of these files reports that this gene has FPKM of 52.6625 in the wt condition and 172.027 in the ko condition. So I figured there must be some wacky variance in this gene across replicates in each condition so I checked out the new file genes.read_groups_tracking to see how the gene is expressed and how many reads it received across conditions. This is where I get a little confused.

genes.read_groups_tracking:

Code:

XLOC_009930 ko 0 2.00145 1.45042 1.50285 0.822565 - OK Snora64 XLOC_009930 ko 1 0 0 0 0 - OK Snora64 XLOC_009930 ko 2 1.00112 0.95579 0.99034 0.542051 - OK Snora64 XLOC_009930 wt 0 0 0 0 0 - OK Snora64 XLOC_009930 wt 1 1 1.14625 1.11053 0.607837 - OK Snora64 XLOC_009930 wt 2 0 0 0 0 - OK Snora64

This file shows the FPKM of this gene across each of the replicates in both conditions in the 7th column (one left of the '-' column). Those expressions are all less than 1. So why is the expression reported to be so high in every other file? Other genes with comparable expression in gene_exp.diff or genes.fpkm_tracking, when looked up in this file, match up pretty well. I'd believe the information in this file based on what the coverage looks like across the locus this gene is in over what is reported in the other files.

There's actually several of these mis-matched expressions in my output - most of them are these same type of genes (short, single exon genes in intergenic regions of other genes). It's distracting to get odd expression values in the output like this. So why does it happen...and why is the more "correct" expression reported in genes.read_groups_tracking but a different, and much higher, expression level reported in the differential expression output files? I'm sure nobody can answer that one except Cole but I think it's good to report odd findings like this.

/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
Tags: cuffdiff, rna-seq data analysis

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

cuffdiff 2.0: figure this one out...

Latest Articles

ad_right_rmr

News