Seqanswers Leaderboard Ad

**cw11** · 11-15-2011, 12:50 PM

Hmm, I just saw a reply from Cole on another thread that might be relevant:

Originally posted by Cole Trapnell View Post

A bit more on what FAIL means, and how it can happen. We use FAIL for genes that actually throw a numerical exception during isoform abundance calculation. In Cufflinks and Cuffdiff, there's a couple of calculations that require us to build matrices with either a row per transcript and a column per read (more or less) or a square matrix with a row and column for each transcript. Some of these matrices need to be invertible or positive definite or have other properties in order for the next steps in the algorithm to succeed. However, sometimes (due to things like round-off error) they aren't. Other times, missing data causes trouble. Oddly enough, this is actually more likely to happen the more reads you get overall, because you can see that isoforms are present, but you don't actually have enough data to calculate those abundances. This is the effect you were observing above. So since we can't be sure about the values (and in fact, were we to go ahead and do the calculation anyways, they could be *wildly* off in theory, or even negative), we set them to zero and move on.

In order to make differential expression estimates more conservative, version 1.1.0 really ramped up the checks that are done before these steps so we don't end up reporting false positives that are due to numerical exceptions. However, users (like yourselves) have been pretty frustrated by those changes, so I've spent the last few weeks going back and streamlining the overall algorithm to actually eliminate pieces that require the matrices to have some of those properties. The main offender was our "importance sampling" procedure, which tries to give us a sense (for each gene) for the accuracy for the maximum likelihood estimate of isoform abundances. This procedure was originally meant to improve the robustness of the estimate when one or more isoforms were close to zero, but in practice, we found that it actually hurts as often as it helps. Moreover, this procedure would often FAIL genes, so I removed it altogether. I've compensated on the differential expression side with some other statistical improvements and fixes, and the result is globally more accurate differential analysis (both in terms of fewer FAILs and fewer false positives than 1.1.0).

The upcoming version 1.2.0 should drastically reduce the number of FAIL genes, though there will still be some. If we can't calculate an MLE to begin with, or if for some reason the confidence interval calculation fails, a gene will be marked as FAIL.

Hope this sheds light on things.

**tleonardi** · 11-16-2011, 02:25 AM

Thank you cw11 for your reply. I've read the post that you linked, and indeed it is relevant. What I see might in fact be a consequence of removing the "importance sampling" procedure in 1.1.0
I was wondering if this is a problem only for me..

**fangquan** · 11-16-2011, 09:23 AM

I have the same problem. Please read my post here.

Question on Cuffdiff’s DE results.

Code:
Quote:
cuffdiff -p $NSLOTS -o cuffdiff_v110 Ensembl_59.gtf C_3732_hits.bam,C_6341_hits.bam,C_6356_hits.ba
m,C_6124_hits.bam,C_60PO_hits.bam HD_4430_hits.bam,HD_4412_hits.bam,HD_4242_hits.bam,HD_4189_hits
.bam,HD_3584_hits.bam
Versions
v091 v101 v102 v103 v110
OK
18112 33215 33230 33216 6765
Fail
3957 2524 2509 2523 1234
LOWDATA
NA 514 514 514 9000
NOTEST
29582 15408 15408 15408 34634
Total
51651 51661 51661 51661 51633

We wouldn’t expect such a big difference[1] between v110 and v101（v102，v103）especially in the numbers of NOTEST and OK. Cole Trapnell explained what does “Fail” mean [2], we would like to know what “NOTEST” means here.

All default options for the two versions appear to be the same; the command line options are exactly the same when we use different versions.
The cuffdiff v1.1.0 indicated.
“Cuffdiff now includes a more sophisticated check for sufficient sequencing depth prior to testingfor differences, which substantially improves the accuracy of differential expression analysis in loci with low to medium depth.”

Question on Cuffdiff’s q-value.

What is q-value here? Lots of evidence suggests q-value should be less than one [4], [5].

Reference:
(1) Inconsistency between cuffdiff 1.1.0 and cuffdiff 1.0.2

http://seqanswers.com/forums/showthread.php?t=15516

(2) Cole Trapnell on “Fail” test

http://seqanswers.com/forums/showpos...67&postcount=9

(3) Cuffdiff producing q-values > 1

Cuffdiff producing q-values > 1 - SEQanswers

http://seqanswers.com/forums/showthread.php?t=13764

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

(4) Understanding q-values as a More Intuitive Alternative to p-values

404 - File or directory not found.

http://pages.pomona.edu/~jsh04747/St...imes_Final.pdf

(5) P-values, False Discovery Rate (FDR) and q-values What are p-values?

Web Page Not Found

http://www.nonlinear.com/support/pro...pq-values.aspx

(6) Cuffdiff producing q-values > 1
http://seqanswers.com/forums/showthread.php?t=13764

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Inconsistency between cuffdiff 1.1.0 and cuffdiff 1.0.2

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News