
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
comparing results by cuffdiff, edgeR, DESeq  PFS  Bioinformatics  5  03122014 04:01 AM 
Can edgeR/DESeq have more than one covariate?  arrchi  Bioinformatics  8  10282013 03:37 PM 
DESeq and edgeR up/down regulation  murphycj  Bioinformatics  7  09212011 08:09 AM 
BaySeq vs GLM in EdgeR and DESeq  Hilary April Smith  Bioinformatics  0  08032011 11:14 AM 
edgeR vs DESeq vs bayseq  Azazel  Bioinformatics  1  10072010 08:11 AM 

Thread Tools 
12302011, 12:45 PM  #1 
Junior Member
Location: Cedar Falls, IA Join Date: Dec 2011
Posts: 5

Can I use DESeq and edgeR for mixed ANOVA
Hi guys,
I am working with RNAseq data generated by Illumina GAII. The experiment design is a randomized complete block design (RCBD) with 4 different tissues, 2 treatments (control and treated), and 3 blocks (reps). My interest is to use the mixed ANOVA in ProcMixed (SAS) to determine differentially expressed genes. Initially, I thought I can normalize read counts in RPKM and use that to do the Mixed ANOVA in SAS or JMP Genomics. Reading through the literature I realized that RPKM is not the best normalization tool for RNAseq. I am leaning toward DESeq or edgeR for normalization. My questions are: 1) How do I extract normalized counts from DESeq and EdgeR? 2) Has anybody tried to analyze differential gene expression in normalized data using ProcMixed in SAS? Is there anything I should be cautious in using mixed effect model for RNAseq? I don't think I can use DESeq and edgeR for a mixed model. I appreciate your input. Thanks. T. Abebe 
01022012, 11:43 PM  #2 
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 993

Both DESeq and edgeR offer functions to fit generalized linear models (GLMs) to your genes and then perform analysis of deviance (ANODEV; the equivalent to ANOVA for GLMs). See the vignettes for usage examples.
I don't understand why you need to use a mixed model. Generalized linear mixed models (GLMMs) of the Poisson family have been used to analyse RNASeq data (e.g. by Blekhman, Marioni et al.) but I think there are good reasons to prefer a negative binomial GLM without random effects, as this allows to incorporate stabilized or moderated dispersion estimates. In short: Please don't reinvent the wheel. It's all there already. 
01032012, 12:23 AM  #3 
Senior Member
Location: Western Australia Join Date: Feb 2010
Posts: 310

Thank God, or more appropriately thank all the guys like Simon, because I barely have a clue what you guys are talking about.
__________________
 Ethan 
01032012, 08:47 AM  #4 
Junior Member
Location: Cedar Falls, IA Join Date: Dec 2011
Posts: 5

Can I use DESeq and edgeR for mixed ANOVA Reply to Thread ?
Dear Simon,
Thank you for pointing out to me that DESeq is capability of handling mixed effects designs through GLMs and ANODEV. I wasn’t aware of that when I posted the message earlier. After reading many posts on SEQanswers and Bioconductor, I put the following commands for my mixed model. I am getting an error message after I put the command cds < estimateDispersions(cds, method="pooled"). I will appreciate if you can check the command lines for any correction My mixed effects design is a 4x2 factorial with a confounder (block) and contains: 4 tissues: A, B, C, D 2 treatments: Unstressed, Stressed 3 blocks: Block1, Block2, Block3 Tissue and treatment are fixed effects and block is a random effect. You suggested to use block as a fixed effect instead of a random effect but I am not sure if that is a good idea since we collected samples on different dates and we wanted to take that into account. Here is how I put the R command: >CountTable < read.delim("count.txt", header=TRUE, row.names=1, stringsAsFactors=TRUE) >head(CountTable) >Design < data.frame( +tissue = c("A", "A","A","A","A","A","B", "B","B","B","B","B","C", "C","C","C","C","C","D", "D","D","D","D","D"), +treatment = c("Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed"), + block = c("Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3")) >countTable < CountTable > cds < newCountDataSet(countTable, Design) > cds < estimateSizeFactors(cds) > sizeFactors( cds ) > cds < estimateDispersions(cds, method="pooled") #Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]), : 'x' must be an array of at least two dimensions # Because of the previous error I couldn’t go further to execute commands to fit GLMs but this is what I have in mind > fit0 < fitNbinomGLMs (cds, count ~ block) > fit1 < fitNbinomGLMs ( cds, count ~ date + treatment ) > pvals < nbinomGLMTest( fit1, fit0 ) >padj < p.adjust( pvals, method="BH" ) > rownames(counts(cds))[ padj < .1 ] Thank you for your help. T. Abebe 
03202013, 07:52 AM  #5 
Member
Location: freiburg Join Date: Apr 2010
Posts: 25

ShrinkSeq http://www.few.vu.nl/~mavdwiel/ShrinkBayes.html , seems to be very flexible and allows for random effects.

03202013, 09:00 AM  #6  
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 993

Quote:
We have just released DESeq2, which offers empiricalBayesian shrinkage on both dispersions and coefficients, so, if you want to look at it that way, it gives you mixed models. The shrinkage is using a common prior across all genes, so you actually get proper shrinkage despite having only three levels. (ShrinkSeq, sugegsted above, uses a rather different method, but for similar purposes.) Quote:
Quote:


04092014, 12:56 PM  #7 
Member
Location: Beltsville, MD, USA Join Date: Jul 2012
Posts: 24

Mixed ANOVA
Hi guys,
I'm going to revive this thread as I haven't been able to find a straight forward answer to whether it is possible to run a Mixed ANOVAlike analyses in edgeR/DESeq2. My RNAseq experiment is simple: 2 phenotype groups, 6 patients, 4 time points per patient: I'm trying to determine whether any changes in gene expression is a result of interaction between phenotype and time. Time being the withinsubject factor and phenotype the betweensubject factor. So far based on the edgeR manual and a few discussions I read on the BioC mailing list my formula looks like this: model.matrix(~phenotype + phenotype : patient + phenotype : time) Is it possible to test if there's an interaction between phenotype and time? Perhaps there's a more appropriate method to analyzing this experimental design that I'm overlooking. Any help with this is greatly appreciated. Thanks in advance! 
04092014, 01:57 PM  #8 
Member
Location: freiburg Join Date: Apr 2010
Posts: 25

I would not necessary consider this design supersimple. At least the fact that you have multiple measurements per individual might be a little tricky because of a high amount of dependency (timepoint 2 might be dependent on timepoint 1, multiple measures within one patient can be more correlated).However I have to admint that I do not have much experience with time series data.
Maybe, because you have time series data, other methods, packages might be more suited. This publication http://www.pnas.org/content/early/2011/03/22/1017542108 investigates as similar experiment, as you do. Without knowing the data, I would assume that you are not interested in the individual's response. Thus I would treat the Patient as a random factor. As far as I know this is not possible in EdgeR but in Limma (can be used with count data using the voom() function). You would then end up with something like: Expression ~ Phenotype + Time + Phenotype:Time + random(Patient). Please note that it might be dangerous to introduce an interaction term in the model without introducing the main effect, because you would falsely attribute a possible main effect to the interaction. But then you are still left with the time course which (I would assume ) is not 100% treated correct in the procedure I proposed. This is my approach but (as I wrote above) I am not experienced with time course data and there might be better / more correct approaches Best Moritz 
04092014, 03:29 PM  #9 
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,479

Neither DESeq (1 or 2) nor edgeR handle mixedeffects models. These are generally dealt with by simply using "Patient" as a factor. As mentioned by Moritz, one could instead use limma, which has a duplicateCorrelations() function. It'd be interesting to see a sidebyside comparison to see how much one might gain from that.
BTW, you're not interested in phenotypeatient, which is good since you can't actually measure it (you can try, but you'll just get an error). Phenotype is a linear combination of patients, so ~patient + phenotype*time would make more sense (whether to include "phenotype" and/or "time" on their own in the model will depend on the biology). 
04152014, 11:41 PM  #10 
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 993

There are rather different views in the community about when one should use mixed models. Hence, Moritz, what exactly would you see as the advanage of treating 'patient' as random effect?

04212014, 12:14 PM  #11 
Member
Location: Beltsville, MD, USA Join Date: Jul 2012
Posts: 24

Thank you for your responses.
So far, what I've gathered is that the appropriate method to analyze my data is to run a PROC GLIMMIX (nb distribution) repeated measures analysis. So if I understand correctly, neither edgeR/DESeq2 nor JMP Genomics can handle that. Ultimately, the comparisons of interest will be which genes respond differently between Control Time 1 vs Affected Time 1, Control Time 2 vs Affected Time 2. etc. Are there possible analogous methods in packages such as edge/DESeq2 that could mimic the SAS analysis I listed above? I have a basic working knowledge of the R language as well as the stat concepts involved so please pardon my ignorance if my questions violate any fundamentals. Thanks! 
04212014, 12:25 PM  #12 
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,479

lme4 can do that (glmer.nb function). Keep in mind that you'll have decreased power with PROC GLIMMIX and the analogous method in R unless you basically reimplement the DESeq2 (or edgeR or whatever) methods yourself.
Having said that, you should really reply Simon's question about what you hope to gain by treating patient as a random effect. 
04232014, 01:33 AM  #13 
Junior Member
Location: Norway Join Date: Jan 2010
Posts: 4

I have also similar experiment. I think patient should be considered as a random effect because one would not really care if patient 1 performs better than patient 3 or other patients. Instead, you would care about the patients variance; i.e., how much variation does a paitent provide to the overall model.

04232014, 02:26 AM  #14 
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 993

To repeat: Contrary to what some stats textbooks say, is is not necessary to treat an effect as random only because one does not care about individual levels or because is seems to be drawn from a population.
I prefer very much the view promoted e.g. in G. K. Robinson's classic paper [1] that the point of using random effects is to get something like empiricalBayes shrinkage into frequentist linearmodel fitting. Note that DESeq2 now give you such shrinkage in an manner that is technically achieved in a different manner but has very similar effect. [2] [1] G. K. Robinson: That BLUP is a Good Thing: The Estimation of Random Effects. Statistical Science 6 (1991), no. 1, 1532. doi:10.1214/ss/1177011926. http://projecteuclid.org/euclid.ss/1177011926. [2] Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNASeq data with DESeq2. bioRxiv (2014), doi: 10.1101/002832 
12152014, 05:48 AM  #15 
Member
Location: freiburg Join Date: Apr 2010
Posts: 25

I now this answer is really late . I thought that even if it is possible to introduce a "batch factor" as fixed effect, one would loose a lot of power because more degrees of freedom are consumed compared to treating the batch effect as random. And the more levels your "batch factor" has, the more you benefit from treating it as random ?

12152014, 06:27 AM  #16 
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 993

No, because due to the use of an empiricalBayes estimation method for dispersion estimation, you do not consume DoFs as usual.
Otherwise, it would be completely impossible to perform comparisons between groups with only two to four samples per group. For details, see Gordon Smyth's original 2004 paper on limma, which introduced this whole idea of understanding empiricalBayes shrinkage estimation as a way of "saving degrees of freedom" by sharing information across genes. 
12152014, 07:14 AM  #17 
Member
Location: freiburg Join Date: Apr 2010
Posts: 25

Hi Simon,
thank's for your response. That's very interesting I might need to study the matter more in detail to fully understand. Moritz 
07112017, 09:05 AM  #18  
Junior Member
Location: Missouri, US Join Date: Jul 2017
Posts: 1

More concerns
If we model random effects as fixed effects and put bayes prior on the fixed effects to get shrunken estimates, more concerns are:
(1) When the design is complex with nested designs (such as splitplot) and blocking effects, etc. how to correctly estimate variation merely due to sampling/replication; i.e. socalled "error variation"? The size of error variation is crucial for significance test. Is there any research to assess the nearly equivalence of "fixed effects with empirical bayes" and "random effects with empirical bayes"? Because you can also put prior on random effects also to borrow information across genes. I once analyzed a gene expression data with complex designs. I first use edgeR GLM and DESeq also. edgeR and DESeq results are very similar. Then I used GLMM (no shrinkage, but random effects), I got a lot more significant differentially expressed genes than edgeR and DESeq by controlling FDR at 0.05. So, the above two things "fixed with empirical bayes" and "random effects" are not quite equal from this experience I have. (2) Correlation among observations. Does the "fixed effects with empirical bayes" correctly model correlations among observations? Bayes prior does put correlations among genes, but within a gene, how does it model correlations among observations within a gene where splitplot or blocking happen? Thanks, Ariana G. Quote:


07132017, 05:45 AM  #19 
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 993

Not sure whetehr ther's mich point in reviving this old thread, hence just to clarify:
Of course, fixed models with empirical Bayes shrinkage cannot replace any form of mixed models. Especially they cannot account for nested designs, repeatedmeasure designs, or other designs which cause the strength of correlation between samples within a treatment group to depend on substructure. My point back then (as far as I can say now) was merely that quite often, people think they need mixed models but don't. 
Tags 
deseq, edger, mixed anova 
Thread Tools  

