No Replicates Cuffdiff; How does the 'variability model good fit' apply? and other...


I want to clarify (in my head of course) how the system works so that I can make the proper choices when I use it.

I have RNAseq data for several conditions, cell line with or without a virus (and deletion mutants) and/or different treatments (no biological/technical replicates, and yes I understand what this means in terms of the results).

So, what Cuffdiff will do is ‘‘pool the conditions together to derive a dispersion model’’, which assumes that most genes won´t be differentially expressed then describes overall gene variances based on ‘expression’ levels (roughly; e.g. my gene’s fpkm is 500, I don´t know its variance as I do not have replicates, but most genes around that expression have a variance of 620, hence I will assign a variance of 620 to my gene for testing).


Next,’’ Cuffdiff assumes that gene's and transcripts with similar expression levels have similar variances in those expression levels. However, that's not always the case - some genes have unusually high variability, for biologically meaningful reasonss. Thus, Cuffdiff checks that the variability model is good fit before performing any signficance testing. This works as follows: Cuffdiff first calculates the expression level of the gene or transcript in each condition begin compared. It does so by pooling the data for the replicates of a condition together. The model gives an expression and variance estimate for each condition, along with confidence intervals around the expression estimates. Then, Cuffdiff calculates the expression level of the gene in each replicate independently. If the expression level of one or more replicates lies outside the confidence interval generated by the model, Cuffdiff flags the transcript as poorly fit by the model, and no signficance testing is performed.’’

Q2-How does this apply to my setting (no replicates)?

Q3- What exactly are the ‘NOTEST’ , ‘HIGHDATA’ and ‘FAIL’ flags?

Q4- How does the –min-outlier-p option apply to the settings (no replicates)?

Q5- Should I run Cuffdiff with all my conditions at once so that the variance model benefits from all the possible data instead of doing pairwise Cuffdiff runs?

Q6- In case I want to use the expression values with tools not included in Cuffdiff (PCAs, correlations etc.) is the FPKM value alone the one to use?

Any comments etc. would be greatly appreciated.
