Seqanswers Leaderboard Ad

**Simon Anders** · 01-02-2012, 11:43 PM

Both DESeq and edgeR offer functions to fit generalized linear models (GLMs) to your genes and then perform analysis of deviance (ANODEV; the equivalent to ANOVA for GLMs). See the vignettes for usage examples.

I don't understand why you need to use a mixed model. Generalized linear mixed models (GLMMs) of the Poisson family have been used to analyse RNA-Seq data (e.g. by Blekhman, Marioni et al.) but I think there are good reasons to prefer a negative binomial GLM without random effects, as this allows to incorporate stabilized or moderated dispersion estimates.

In short: Please don't reinvent the wheel. It's all there already.

**ETHANol** · 01-03-2012, 12:23 AM

Originally posted by Simon Anders View Post

In short: Please don't reinvent the wheel. It's all there already.

Thank God, or more appropriately thank all the guys like Simon, because I barely have a clue what you guys are talking about.

**abebe** · 01-03-2012, 08:47 AM

Can I use DESeq and edgeR for mixed ANOVA Reply to Thread ?

Dear Simon,

Thank you for pointing out to me that DESeq is capability of handling mixed effects designs through GLMs and ANODEV. I wasn’t aware of that when I posted the message earlier.

After reading many posts on SEQanswers and Bioconductor, I put the following commands for my mixed model. I am getting an error message after I put the command cds <- estimateDispersions(cds, method="pooled"). I will appreciate if you can check the command lines for any correction

My mixed effects design is a 4x2 factorial with a confounder (block) and contains:

4 tissues: A, B, C, D
2 treatments: Unstressed, Stressed
3 blocks: Block1, Block2, Block3

Tissue and treatment are fixed effects and block is a random effect. You suggested to use block as a fixed effect instead of a random effect but I am not sure if that is a good idea since we collected samples on different dates and we wanted to take that into account.

Here is how I put the R command:

>CountTable <- read.delim("count.txt", header=TRUE, row.names=1, stringsAsFactors=TRUE)
>head(CountTable)
>Design <- data.frame(
+tissue = c("A", "A","A","A","A","A","B", "B","B","B","B","B","C", "C","C","C","C","C","D", "D","D","D","D","D"),
+treatment = c("Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed"),
+ block = c("Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3"))

>countTable <- CountTable
> cds <- newCountDataSet(countTable, Design)

> cds <- estimateSizeFactors(cds)
> sizeFactors( cds )

> cds <- estimateDispersions(cds, method="pooled")

#Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]), : 'x' must be an array of at least two dimensions

# Because of the previous error I couldn’t go further to execute commands to fit GLMs but this is what I have in mind

> fit0 <- fitNbinomGLMs (cds, count ~ block)
> fit1 <- fitNbinomGLMs ( cds, count ~ date + treatment )

> pvals <- nbinomGLMTest( fit1, fit0 )
>padj <- p.adjust( pvals, method="BH" )
> rownames(counts(cds))[ padj < .1 ]

Thank you for your help.

T. Abebe

**moritzhess** · 03-20-2013, 06:52 AM

ShrinkSeq http://www.few.vu.nl/~mavdwiel/ShrinkBayes.html , seems to be very flexible and allows for random effects.

**Simon Anders** · 03-20-2013, 08:00 AM

Originally posted by abebe View Post

Tissue and treatment are fixed effects and block is a random effect. You suggested to use block as a fixed effect instead of a random effect but I am not sure if that is a good idea since we collected samples on different dates and we wanted to take that into account.

I still don't see what's wrong with using a fixed effect for 'block', and for a factor with only three levels, a GLM and a GLMM will give nearly the same result, anyway. Furthermore, a lot what you find in older text books about when to use fixed and when random effects is no longer current practice. G K Robinson's 1991 article might be what has changed many people's way of thinking about it. What it says is that you should use random effects if you want shrinkage and fixed effects if you don't, no matter what kind of effect you are modelling.

We have just released DESeq2, which offers empirical-Bayesian shrinkage on both dispersions and coefficients, so, if you want to look at it that way, it gives you mixed models. The shrinkage is using a common prior across all genes, so you actually get proper shrinkage despite having only three levels. (ShrinkSeq, sugegsted above, uses a rather different method, but for similar purposes.)

> cds <- estimateDispersions(cds, method="pooled")

[I]#Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]), : 'x' must be an array of at least two dimensions

Please use method="pooled-CR". Not sure why you didn't get a more informative error message.

> fit0 <- fitNbinomGLMs (cds, count ~ block)
> fit1 <- fitNbinomGLMs ( cds, count ~ date + treatment )

Make up you mind whether you call it "block" or "date". Also, what happened to "tissue"?

**lmolokin** · 04-09-2014, 11:56 AM

Mixed ANOVA

Hi guys,
I'm going to revive this thread as I haven't been able to find a straight forward answer to whether it is possible to run a Mixed ANOVA-like analyses in edgeR/DESeq2.

My RNA-seq experiment is simple: 2 phenotype groups, 6 patients, 4 time points per patient:

I'm trying to determine whether any changes in gene expression is a result of interaction between phenotype and time. Time being the within-subject factor and phenotype the between-subject factor.

So far based on the edgeR manual and a few discussions I read on the BioC mailing list my formula looks like this:

model.matrix(~phenotype + phenotype : patient + phenotype : time)

Is it possible to test if there's an interaction between phenotype and time?

Perhaps there's a more appropriate method to analyzing this experimental design that I'm overlooking.

Any help with this is greatly appreciated. Thanks in advance!

**moritzhess** · 04-09-2014, 12:57 PM

I would not necessary consider this design super-simple. At least the fact that you have multiple measurements per individual might be a little tricky because of a high amount of dependency (timepoint 2 might be dependent on timepoint 1, multiple measures within one patient can be more correlated).However I have to admint that I do not have much experience with time series data.

Maybe, because you have time series data, other methods, packages might be more suited.
This publication http://www.pnas.org/content/early/2011/03/22/1017542108
investigates as similar experiment, as you do.

Without knowing the data, I would assume that you are not interested in the individual's response. Thus I would treat the Patient as a random factor. As far as I know this is not possible in EdgeR but in Limma (can be used with count data using the voom() function). You would then end up with something like:
Expression ~ Phenotype + Time + Phenotype:Time + random(Patient).
Please note that it might be dangerous to introduce an interaction term in the model without introducing the main effect, because you would falsely attribute a possible main effect to the interaction.
But then you are still left with the time course which (I would assume ) is not 100% treated correct in the procedure I proposed.

This is my approach but (as I wrote above) I am not experienced with time course data and there might be better / more correct approaches

Best

Moritz

**dpryan** · 04-09-2014, 02:29 PM

Neither DESeq (1 or 2) nor edgeR handle mixed-effects models. These are generally dealt with by simply using "Patient" as a factor. As mentioned by Moritz, one could instead use limma, which has a duplicateCorrelations() function. It'd be interesting to see a side-by-side comparison to see how much one might gain from that.

BTW, you're not interested in phenotype

atient, which is good since you can't actually measure it (you can try, but you'll just get an error). Phenotype is a linear combination of patients, so ~patient + phenotype*time would make more sense (whether to include "phenotype" and/or "time" on their own in the model will depend on the biology).

**Simon Anders** · 04-15-2014, 10:41 PM

There are rather different views in the community about when one should use mixed models. Hence, Moritz, what exactly would you see as the advanage of treating 'patient' as random effect?

**lmolokin** · 04-21-2014, 11:14 AM

Thank you for your responses.

So far, what I've gathered is that the appropriate method to analyze my data is to run a PROC GLIMMIX (nb distribution) repeated measures analysis. So if I understand correctly, neither edgeR/DESeq2 nor JMP Genomics can handle that.

Ultimately, the comparisons of interest will be which genes respond differently between Control Time 1 vs Affected Time 1, Control Time 2 vs Affected Time 2. etc.

Are there possible analogous methods in packages such as edge/DESeq2 that could mimic the SAS analysis I listed above? I have a basic working knowledge of the R language as well as the stat concepts involved so please pardon my ignorance if my questions violate any fundamentals.

Thanks!

**dpryan** · 04-21-2014, 11:25 AM

lme4 can do that (glmer.nb function). Keep in mind that you'll have decreased power with PROC GLIMMIX and the analogous method in R unless you basically reimplement the DESeq2 (or edgeR or whatever) methods yourself.

Having said that, you should really reply Simon's question about what you hope to gain by treating patient as a random effect.

**teshome** · 04-23-2014, 12:33 AM

Originally posted by dpryan View Post

Having said that, you should really reply Simon's question about what you hope to gain by treating patient as a random effect.

I have also similar experiment. I think patient should be considered as a random effect because one would not really care if patient 1 performs better than patient 3 or other patients. Instead, you would care about the patients variance; i.e., how much variation does a paitent provide to the overall model.

**Simon Anders** · 04-23-2014, 01:26 AM

To repeat: Contrary to what some stats textbooks say, is is not necessary to treat an effect as random only because one does not care about individual levels or because is seems to be drawn from a population.

I prefer very much the view promoted e.g. in G. K. Robinson's classic paper [1] that the point of using random effects is to get something like empirical-Bayes shrinkage into frequentist linear-model fitting. Note that DESeq2 now give you such shrinkage in an manner that is technically achieved in a different manner but has very similar effect. [2]

[1] G. K. Robinson: That BLUP is a Good Thing: The Estimation of Random Effects. Statistical Science 6 (1991), no. 1, 15--32. doi:10.1214/ss/1177011926. http://projecteuclid.org/euclid.ss/1177011926.

[2] Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. bioRxiv (2014), doi: 10.1101/002832

**moritzhess** · 12-15-2014, 05:48 AM

I now this answer is really late

. I thought that even if it is possible to introduce a "batch factor" as fixed effect, one would loose a lot of power because more degrees of freedom are consumed compared to treating the batch effect as random. And the more levels your "batch factor" has, the more you benefit from treating it as random ?

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Can I use DESeq and edgeR for mixed ANOVA

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News