I've been running the new release of cuffdiff to see how it does the last day or two. It certainly produces more conservative gene lists and it seems to be better at throwing out genes that might be DE on average across replicates but have expression that seems erratic. I love the new output files that allow you to "dig in" a little more than before to see what's going on.
So that's what I'm doing and I found something strange.
I've run a 3 vs 3 wt vs mutant test with cuffdiff. Viewing the scatter plot of condition 2 vs condition 1 the overall plot looks good. there are several genes that stand out pretty far from the main body of the scatter with FPKMs > 100 in one condition or the other but they are not called significant. So I wanted to take a look at those. The first one was "Snora64" (i'm working with Mouse).
I tracked it down in several of the cuffdiff outputs:
gene_exp.diff:
isoform_exp.diff
tss_group_exp.diff
Each of these files reports that this gene has FPKM of 52.6625 in the wt condition and 172.027 in the ko condition. So I figured there must be some wacky variance in this gene across replicates in each condition so I checked out the new file genes.read_groups_tracking to see how the gene is expressed and how many reads it received across conditions. This is where I get a little confused.
genes.read_groups_tracking:
This file shows the FPKM of this gene across each of the replicates in both conditions in the 7th column (one left of the '-' column). Those expressions are all less than 1. So why is the expression reported to be so high in every other file? Other genes with comparable expression in gene_exp.diff or genes.fpkm_tracking, when looked up in this file, match up pretty well. I'd believe the information in this file based on what the coverage looks like across the locus this gene is in over what is reported in the other files.
There's actually several of these mis-matched expressions in my output - most of them are these same type of genes (short, single exon genes in intergenic regions of other genes). It's distracting to get odd expression values in the output like this. So why does it happen...and why is the more "correct" expression reported in genes.read_groups_tracking but a different, and much higher, expression level reported in the differential expression output files? I'm sure nobody can answer that one except Cole but I think it's good to report odd findings like this.
So that's what I'm doing and I found something strange.
I've run a 3 vs 3 wt vs mutant test with cuffdiff. Viewing the scatter plot of condition 2 vs condition 1 the overall plot looks good. there are several genes that stand out pretty far from the main body of the scatter with FPKMs > 100 in one condition or the other but they are not called significant. So I wanted to take a look at those. The first one was "Snora64" (i'm working with Mouse).
I tracked it down in several of the cuffdiff outputs:
gene_exp.diff:
Code:
XLOC_009930 XLOC_009930 Snora64 chr17:24857007-24858872 wt ko OK 52.6625 172.027 1.70779 -0.171531 0.863806 0.999999 no
Code:
uc008ayc.1 XLOC_009930 Snora64 chr17:24857007-24858872 wt ko OK 52.6625 172.027 1.70779 -0.171531 0.863806 0.999999 no
Code:
TSS13588 XLOC_009930 Snora64 chr17:24857007-24858872 wt ko OK 52.6625 172.027 1.70779 -0.171531 0.863806 0.999999 no
genes.read_groups_tracking:
Code:
XLOC_009930 ko 0 2.00145 1.45042 1.50285 0.822565 - OK Snora64 XLOC_009930 ko 1 0 0 0 0 - OK Snora64 XLOC_009930 ko 2 1.00112 0.95579 0.99034 0.542051 - OK Snora64 XLOC_009930 wt 0 0 0 0 0 - OK Snora64 XLOC_009930 wt 1 1 1.14625 1.11053 0.607837 - OK Snora64 XLOC_009930 wt 2 0 0 0 0 - OK Snora64
There's actually several of these mis-matched expressions in my output - most of them are these same type of genes (short, single exon genes in intergenic regions of other genes). It's distracting to get odd expression values in the output like this. So why does it happen...and why is the more "correct" expression reported in genes.read_groups_tracking but a different, and much higher, expression level reported in the differential expression output files? I'm sure nobody can answer that one except Cole but I think it's good to report odd findings like this.