Hi Abhijit
Thanks for reporting this issue. Another user recently sent me data that also caused the residual ECDF plots to be off in a similar way, and I investigated them a bit yesterday. My guess is that the cause is heterogeneity of replicates, i.e., some replicate samples are more similar than others. Then, the sample variances deviate from the theoretical chi-square distribution, which misleads the variance fit. (Consequenctly, this problem should only arise if one has more than two replicates.)
Could you look at the heatmap of sample distances (as described in the vignette) to see if your replicates look heterogeneous?
About this fit diagnostic plot: If the ECDF curves are above the diagonal green line (which indicates the ideal chi-square shape of the residuals distribution), variance will be underestimated, and p values will be too low. If the ECDF curves are below the green line, it is the other way round. A simple (maybe clumsy, but still valid) fix is to instruct DESeq to multiply its variance estimates by a user-specified factor in order to account for the misspecification of the residuals distribution. Just try out a couple of values and choose a correction factor that lets the ECDF curves appear just below the green line. When this is the case, correct control of type-I error is reliably restored, even though detection power will suffer, if the curve bulges away (downwards) from the diagonal green line.
I will add (hopefully today) an option to specify such a correction factor to DESeq. As Bioconductor is currently preparing its new release, I cannot submit the change to the server within the next few days, but I can send you the updated package by e-mail.
Cheers
Simon
Note: See also post #45, where I correct a mistake in this bug.
Originally posted by gen2prot
View Post
Could you look at the heatmap of sample distances (as described in the vignette) to see if your replicates look heterogeneous?
About this fit diagnostic plot: If the ECDF curves are above the diagonal green line (which indicates the ideal chi-square shape of the residuals distribution), variance will be underestimated, and p values will be too low. If the ECDF curves are below the green line, it is the other way round. A simple (maybe clumsy, but still valid) fix is to instruct DESeq to multiply its variance estimates by a user-specified factor in order to account for the misspecification of the residuals distribution. Just try out a couple of values and choose a correction factor that lets the ECDF curves appear just below the green line. When this is the case, correct control of type-I error is reliably restored, even though detection power will suffer, if the curve bulges away (downwards) from the diagonal green line.
I will add (hopefully today) an option to specify such a correction factor to DESeq. As Bioconductor is currently preparing its new release, I cannot submit the change to the server within the next few days, but I can send you the updated package by e-mail.
Cheers
Simon
Note: See also post #45, where I correct a mistake in this bug.
Comment