SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
comparing results by cuffdiff, edgeR, DESeq PFS Bioinformatics 5 03-12-2014 03:01 AM
Can edgeR/DESeq have more than one covariate? arrchi Bioinformatics 8 10-28-2013 02:37 PM
DESeq and edgeR up/down regulation murphycj Bioinformatics 7 09-21-2011 07:09 AM
BaySeq vs GLM in EdgeR and DESeq Hilary April Smith Bioinformatics 0 08-03-2011 10:14 AM
edgeR vs DESeq vs bayseq Azazel Bioinformatics 1 10-07-2010 07:11 AM

Reply
 
Thread Tools
Old 12-30-2011, 11:45 AM   #1
abebe
Junior Member
 
Location: Cedar Falls, IA

Join Date: Dec 2011
Posts: 5
Talking Can I use DESeq and edgeR for mixed ANOVA

Hi guys,

I am working with RNA-seq data generated by Illumina GAII. The experiment design is a randomized complete block design (RCBD) with 4 different tissues, 2 treatments (control and treated), and 3 blocks (reps). My interest is to use the mixed ANOVA in ProcMixed (SAS) to determine differentially expressed genes. Initially, I thought I can normalize read counts in RPKM and use that to do the Mixed ANOVA in SAS or JMP Genomics. Reading through the literature I realized that RPKM is not the best normalization tool for RNA-seq. I am leaning toward DESeq or edgeR for normalization.

My questions are:
1) How do I extract normalized counts from DESeq and EdgeR?
2) Has anybody tried to analyze differential gene expression in normalized data using ProcMixed in SAS? Is there anything I should be cautious in using mixed effect model for RNA-seq? I don't think I can use DESeq and edgeR for a mixed model.

I appreciate your input.

Thanks.

T. Abebe
abebe is offline   Reply With Quote
Old 01-02-2012, 10:43 PM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

Both DESeq and edgeR offer functions to fit generalized linear models (GLMs) to your genes and then perform analysis of deviance (ANODEV; the equivalent to ANOVA for GLMs). See the vignettes for usage examples.

I don't understand why you need to use a mixed model. Generalized linear mixed models (GLMMs) of the Poisson family have been used to analyse RNA-Seq data (e.g. by Blekhman, Marioni et al.) but I think there are good reasons to prefer a negative binomial GLM without random effects, as this allows to incorporate stabilized or moderated dispersion estimates.

In short: Please don't reinvent the wheel. It's all there already.
Simon Anders is offline   Reply With Quote
Old 01-02-2012, 11:23 PM   #3
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 310
Default

Quote:
Originally Posted by Simon Anders View Post
In short: Please don't reinvent the wheel. It's all there already.
Thank God, or more appropriately thank all the guys like Simon, because I barely have a clue what you guys are talking about.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 01-03-2012, 07:47 AM   #4
abebe
Junior Member
 
Location: Cedar Falls, IA

Join Date: Dec 2011
Posts: 5
Default Can I use DESeq and edgeR for mixed ANOVA Reply to Thread ?

Dear Simon,

Thank you for pointing out to me that DESeq is capability of handling mixed effects designs through GLMs and ANODEV. I wasn’t aware of that when I posted the message earlier.

After reading many posts on SEQanswers and Bioconductor, I put the following commands for my mixed model. I am getting an error message after I put the command cds <- estimateDispersions(cds, method="pooled"). I will appreciate if you can check the command lines for any correction

My mixed effects design is a 4x2 factorial with a confounder (block) and contains:

4 tissues: A, B, C, D
2 treatments: Unstressed, Stressed
3 blocks: Block1, Block2, Block3

Tissue and treatment are fixed effects and block is a random effect. You suggested to use block as a fixed effect instead of a random effect but I am not sure if that is a good idea since we collected samples on different dates and we wanted to take that into account.

Here is how I put the R command:

>CountTable <- read.delim("count.txt", header=TRUE, row.names=1, stringsAsFactors=TRUE)
>head(CountTable)
>Design <- data.frame(
+tissue = c("A", "A","A","A","A","A","B", "B","B","B","B","B","C", "C","C","C","C","C","D", "D","D","D","D","D"),
+treatment = c("Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed"),
+ block = c("Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3"))

>countTable <- CountTable
> cds <- newCountDataSet(countTable, Design)

> cds <- estimateSizeFactors(cds)
> sizeFactors( cds )

> cds <- estimateDispersions(cds, method="pooled")

#Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]), : 'x' must be an array of at least two dimensions

# Because of the previous error I couldn’t go further to execute commands to fit GLMs but this is what I have in mind

> fit0 <- fitNbinomGLMs (cds, count ~ block)
> fit1 <- fitNbinomGLMs ( cds, count ~ date + treatment )

> pvals <- nbinomGLMTest( fit1, fit0 )
>padj <- p.adjust( pvals, method="BH" )
> rownames(counts(cds))[ padj < .1 ]

Thank you for your help.

T. Abebe
abebe is offline   Reply With Quote
Old 03-20-2013, 06:52 AM   #5
moritzhess
Member
 
Location: freiburg

Join Date: Apr 2010
Posts: 25
Default

ShrinkSeq http://www.few.vu.nl/~mavdwiel/ShrinkBayes.html , seems to be very flexible and allows for random effects.
moritzhess is offline   Reply With Quote
Old 03-20-2013, 08:00 AM   #6
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

Quote:
Originally Posted by abebe View Post
Tissue and treatment are fixed effects and block is a random effect. You suggested to use block as a fixed effect instead of a random effect but I am not sure if that is a good idea since we collected samples on different dates and we wanted to take that into account.
I still don't see what's wrong with using a fixed effect for 'block', and for a factor with only three levels, a GLM and a GLMM will give nearly the same result, anyway. Furthermore, a lot what you find in older text books about when to use fixed and when random effects is no longer current practice. G K Robinson's 1991 article might be what has changed many people's way of thinking about it. What it says is that you should use random effects if you want shrinkage and fixed effects if you don't, no matter what kind of effect you are modelling.

We have just released DESeq2, which offers empirical-Bayesian shrinkage on both dispersions and coefficients, so, if you want to look at it that way, it gives you mixed models. The shrinkage is using a common prior across all genes, so you actually get proper shrinkage despite having only three levels. (ShrinkSeq, sugegsted above, uses a rather different method, but for similar purposes.)

Quote:
> cds <- estimateDispersions(cds, method="pooled")

[I]#Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]), : 'x' must be an array of at least two dimensions
Please use method="pooled-CR". Not sure why you didn't get a more informative error message.

Quote:
> fit0 <- fitNbinomGLMs (cds, count ~ block)
> fit1 <- fitNbinomGLMs ( cds, count ~ date + treatment )
Make up you mind whether you call it "block" or "date". Also, what happened to "tissue"?
Simon Anders is offline   Reply With Quote
Old 04-09-2014, 11:56 AM   #7
lmolokin
Member
 
Location: Beltsville, MD, USA

Join Date: Jul 2012
Posts: 23
Default Mixed ANOVA

Hi guys,
I'm going to revive this thread as I haven't been able to find a straight forward answer to whether it is possible to run a Mixed ANOVA-like analyses in edgeR/DESeq2.

My RNA-seq experiment is simple: 2 phenotype groups, 6 patients, 4 time points per patient:



I'm trying to determine whether any changes in gene expression is a result of interaction between phenotype and time. Time being the within-subject factor and phenotype the between-subject factor.

So far based on the edgeR manual and a few discussions I read on the BioC mailing list my formula looks like this:

model.matrix(~phenotype + phenotype : patient + phenotype : time)

Is it possible to test if there's an interaction between phenotype and time?

Perhaps there's a more appropriate method to analyzing this experimental design that I'm overlooking.

Any help with this is greatly appreciated. Thanks in advance!
lmolokin is offline   Reply With Quote
Old 04-09-2014, 12:57 PM   #8
moritzhess
Member
 
Location: freiburg

Join Date: Apr 2010
Posts: 25
Default

I would not necessary consider this design super-simple. At least the fact that you have multiple measurements per individual might be a little tricky because of a high amount of dependency (timepoint 2 might be dependent on timepoint 1, multiple measures within one patient can be more correlated).However I have to admint that I do not have much experience with time series data.

Maybe, because you have time series data, other methods, packages might be more suited.
This publication http://www.pnas.org/content/early/2011/03/22/1017542108
investigates as similar experiment, as you do.

Without knowing the data, I would assume that you are not interested in the individual's response. Thus I would treat the Patient as a random factor. As far as I know this is not possible in EdgeR but in Limma (can be used with count data using the voom() function). You would then end up with something like:
Expression ~ Phenotype + Time + Phenotype:Time + random(Patient).
Please note that it might be dangerous to introduce an interaction term in the model without introducing the main effect, because you would falsely attribute a possible main effect to the interaction.
But then you are still left with the time course which (I would assume ) is not 100% treated correct in the procedure I proposed.

This is my approach but (as I wrote above) I am not experienced with time course data and there might be better / more correct approaches

Best

Moritz
moritzhess is offline   Reply With Quote
Old 04-09-2014, 02:29 PM   #9
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,475
Default

Neither DESeq (1 or 2) nor edgeR handle mixed-effects models. These are generally dealt with by simply using "Patient" as a factor. As mentioned by Moritz, one could instead use limma, which has a duplicateCorrelations() function. It'd be interesting to see a side-by-side comparison to see how much one might gain from that.

BTW, you're not interested in phenotypeatient, which is good since you can't actually measure it (you can try, but you'll just get an error). Phenotype is a linear combination of patients, so ~patient + phenotype*time would make more sense (whether to include "phenotype" and/or "time" on their own in the model will depend on the biology).
dpryan is offline   Reply With Quote
Old 04-15-2014, 10:41 PM   #10
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

There are rather different views in the community about when one should use mixed models. Hence, Moritz, what exactly would you see as the advanage of treating 'patient' as random effect?
Simon Anders is offline   Reply With Quote
Old 04-21-2014, 11:14 AM   #11
lmolokin
Member
 
Location: Beltsville, MD, USA

Join Date: Jul 2012
Posts: 23
Default

Thank you for your responses.

So far, what I've gathered is that the appropriate method to analyze my data is to run a PROC GLIMMIX (nb distribution) repeated measures analysis. So if I understand correctly, neither edgeR/DESeq2 nor JMP Genomics can handle that.

Ultimately, the comparisons of interest will be which genes respond differently between Control Time 1 vs Affected Time 1, Control Time 2 vs Affected Time 2. etc.

Are there possible analogous methods in packages such as edge/DESeq2 that could mimic the SAS analysis I listed above? I have a basic working knowledge of the R language as well as the stat concepts involved so please pardon my ignorance if my questions violate any fundamentals.

Thanks!
lmolokin is offline   Reply With Quote
Old 04-21-2014, 11:25 AM   #12
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,475
Default

lme4 can do that (glmer.nb function). Keep in mind that you'll have decreased power with PROC GLIMMIX and the analogous method in R unless you basically reimplement the DESeq2 (or edgeR or whatever) methods yourself.

Having said that, you should really reply Simon's question about what you hope to gain by treating patient as a random effect.
dpryan is offline   Reply With Quote
Old 04-23-2014, 12:33 AM   #13
teshome
Junior Member
 
Location: Norway

Join Date: Jan 2010
Posts: 4
Default

Quote:
Originally Posted by dpryan View Post
Having said that, you should really reply Simon's question about what you hope to gain by treating patient as a random effect.
I have also similar experiment. I think patient should be considered as a random effect because one would not really care if patient 1 performs better than patient 3 or other patients. Instead, you would care about the patients variance; i.e., how much variation does a paitent provide to the overall model.
teshome is offline   Reply With Quote
Old 04-23-2014, 01:26 AM   #14
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

To repeat: Contrary to what some stats textbooks say, is is not necessary to treat an effect as random only because one does not care about individual levels or because is seems to be drawn from a population.

I prefer very much the view promoted e.g. in G. K. Robinson's classic paper [1] that the point of using random effects is to get something like empirical-Bayes shrinkage into frequentist linear-model fitting. Note that DESeq2 now give you such shrinkage in an manner that is technically achieved in a different manner but has very similar effect. [2]

[1] G. K. Robinson: That BLUP is a Good Thing: The Estimation of Random Effects. Statistical Science 6 (1991), no. 1, 15--32. doi:10.1214/ss/1177011926. http://projecteuclid.org/euclid.ss/1177011926.

[2] Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. bioRxiv (2014), doi: 10.1101/002832
Simon Anders is offline   Reply With Quote
Old 12-15-2014, 04:48 AM   #15
moritzhess
Member
 
Location: freiburg

Join Date: Apr 2010
Posts: 25
Default

I now this answer is really late . I thought that even if it is possible to introduce a "batch factor" as fixed effect, one would loose a lot of power because more degrees of freedom are consumed compared to treating the batch effect as random. And the more levels your "batch factor" has, the more you benefit from treating it as random ?
moritzhess is offline   Reply With Quote
Old 12-15-2014, 05:27 AM   #16
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

No, because due to the use of an empirical-Bayes estimation method for dispersion estimation, you do not consume DoFs as usual.

Otherwise, it would be completely impossible to perform comparisons between groups with only two to four samples per group. For details, see Gordon Smyth's original 2004 paper on limma, which introduced this whole idea of understanding empirical-Bayes shrinkage estimation as a way of "saving degrees of freedom" by sharing information across genes.
Simon Anders is offline   Reply With Quote
Old 12-15-2014, 06:14 AM   #17
moritzhess
Member
 
Location: freiburg

Join Date: Apr 2010
Posts: 25
Default

Hi Simon,

thank's for your response. That's very interesting I might need to study the matter more in detail to fully understand.
Moritz
moritzhess is offline   Reply With Quote
Old 07-11-2017, 08:05 AM   #18
Ariana G.
Junior Member
 
Location: Missouri, US

Join Date: Jul 2017
Posts: 1
Default More concerns

If we model random effects as fixed effects and put bayes prior on the fixed effects to get shrunken estimates, more concerns are:

(1) When the design is complex with nested designs (such as split-plot) and blocking effects, etc. how to correctly estimate variation merely due to sampling/replication; i.e. so-called "error variation"? The size of error variation is crucial for significance test. Is there any research to assess the nearly equivalence of "fixed effects with empirical bayes" and "random effects with empirical bayes"? Because you can also put prior on random effects also to borrow information across genes.

I once analyzed a gene expression data with complex designs. I first use edgeR GLM and DESeq also. edgeR and DESeq results are very similar. Then I used GLMM (no shrinkage, but random effects), I got a lot more significant differentially expressed genes than edgeR and DESeq by controlling FDR at 0.05. So, the above two things "fixed with empirical bayes" and "random effects" are not quite equal from this experience I have.

(2) Correlation among observations. Does the "fixed effects with empirical bayes" correctly model correlations among observations? Bayes prior does put correlations among genes, but within a gene, how does it model correlations among observations within a gene where split-plot or blocking happen?

Thanks,
Ariana G.


Quote:
Originally Posted by Simon Anders View Post
No, because due to the use of an empirical-Bayes estimation method for dispersion estimation, you do not consume DoFs as usual.

Otherwise, it would be completely impossible to perform comparisons between groups with only two to four samples per group. For details, see Gordon Smyth's original 2004 paper on limma, which introduced this whole idea of understanding empirical-Bayes shrinkage estimation as a way of "saving degrees of freedom" by sharing information across genes.
Ariana G. is offline   Reply With Quote
Old 07-13-2017, 04:45 AM   #19
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

Not sure whetehr ther's mich point in reviving this old thread, hence just to clarify:

Of course, fixed models with empirical Bayes shrinkage cannot replace any form of mixed models. Especially they cannot account for nested designs, repeated-measure designs, or other designs which cause the strength of correlation between samples within a treatment group to depend on sub-structure.

My point back then (as far as I can say now) was merely that quite often, people think they need mixed models but don't.
Simon Anders is offline   Reply With Quote
Reply

Tags
deseq, edger, mixed anova

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO