Seqanswers Leaderboard Ad

**sdriscoll** · 12-06-2012, 12:06 AM

cuffdiff, as well as probably all other differential expression test softwares, use the false discovery rate correction described here: http://en.wikipedia.org/wiki/False_d...berg_procedure

the reasons are probably described pretty well in that Wikipedia article. if you have 100 genes between two conditions you're essentially testing those conditions 100 times (even though each test is between different genes). so the p-values from the tests have to be corrected. my opinion is it's a little weird since that means the correction is influenced by how many genes you test (so you can cheat it a little by excluding genes which you think aren't testable). whatever - statistics don't always make sense.

in your case you're not seeing an issue with the FDR correction. cuffdiff suffers from something else. just like several other tools, cuffdiff uses a parametric statistical test to test for significantly misexpressed genes. in order to do that kind of a test one usually needs a mean and a variance and possibly a number of degrees of freedom. one also needs to know something about the distribution of the metrics that are being compared. so what these tests to is model the expression distributions and then extract variance values from the models and plug them into some type of stat test (cuffdiff ends up using something similar to a t-test). last year people were thinking the poisson distribution was applicable to read-count data but now most of these tools use the negative-binomial distribution. i think in some cases a certain amount of pooled information goes into these models. obviously something is broken because emperically it's completely obvious that a change from 62 to 0.4 in expression is massive and if those were means from normal distributions and you used a t-test you'd have a p-value so small that the FDR correction would have an insignificant impact on its value.

so the problem here is likely cuffdiff's modeling and estimation of the variance for this gene.

the only thing i can recommend is for you to try a different tool to test for differential expression. unfortunately nothing is as tidy as the tophat, cuffdiff pipeline but it will better for you to try something else for this test. i guarantee that you'll be more satisfied with the results. DESeq, is a good tool. I've read that ebseq is a smart new tool as well but I haven't tried it out. both of those run in R and both of them need count data so that means you'll need read counts for your genes. HTSeq count can do a good job of counting hits at genes from your alignments assuming you us an annotation that fits. The author would recommend the ensemble annotation. all i can recommend is for you to use something that gives you a unique name per gene locus.

**jk1124** · 12-06-2012, 07:52 AM

Thanks for your helpful response. I had looked into DESeq, but it also didn't return what I expected (and also doesn't handle multi-mapping reads, which I have a lot of). I have been thinking our data may be more variable than we can work with, so I'm looking into mask files and this sort of thing, but I'm wondering - is there a standard metric(s) for reporting variability? I'm pretty new at this, and wondering, generally, how I might decide whether or not my samples are usable for this kind of analysis.

**oliviera** · 01-04-2013, 06:36 AM

Dear community, dear sdriscoll,
I have a question which could be a related with cuffdiff.
By comparing the output from isoform_exp.diff and gene_exp.diff it looks like their is less trasnscripts which are differentially expressed than genes. This sound like counter intuitive to me as I would expect at least the same number in both cases. Could you explain that?

**sdriscoll** · 01-04-2013, 09:38 AM

I'm pretty sure in one of their papers they point out the possibility that there may be differential isoform expression but not differential locus expression. It is possible that there could be isoform switching or activation in a locus with significantly disturbing the total read count level of a locus. I would expect, however, that a differentially expressed locus would also have differentially expressed isoforms in it. Since cuffdiff models the variances separately for the isoform test and the locus level test sometimes it might report a single isoform gene as significant in one test but not the other. That I think is an unfortunate side effect of the statistics used and the variance estimations. The p-values are just a guide. Every gene may actually have different tolerances to differential expression. For some genes a two fold change could be significant regardless of the statistical test. It's important to keep that in mind.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

cuffdiff q values

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News