![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
comparing results by cuffdiff, edgeR, DESeq | PFS | Bioinformatics | 5 | 03-12-2014 03:01 AM |
Can edgeR/DESeq have more than one covariate? | arrchi | Bioinformatics | 8 | 10-28-2013 02:37 PM |
DESeq and edgeR up/down regulation | murphycj | Bioinformatics | 7 | 09-21-2011 07:09 AM |
BaySeq vs GLM in EdgeR and DESeq | Hilary April Smith | Bioinformatics | 0 | 08-03-2011 10:14 AM |
edgeR vs DESeq vs bayseq | Azazel | Bioinformatics | 1 | 10-07-2010 07:11 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Cedar Falls, IA Join Date: Dec 2011
Posts: 5
|
![]()
Hi guys,
I am working with RNA-seq data generated by Illumina GAII. The experiment design is a randomized complete block design (RCBD) with 4 different tissues, 2 treatments (control and treated), and 3 blocks (reps). My interest is to use the mixed ANOVA in ProcMixed (SAS) to determine differentially expressed genes. Initially, I thought I can normalize read counts in RPKM and use that to do the Mixed ANOVA in SAS or JMP Genomics. Reading through the literature I realized that RPKM is not the best normalization tool for RNA-seq. I am leaning toward DESeq or edgeR for normalization. My questions are: 1) How do I extract normalized counts from DESeq and EdgeR? 2) Has anybody tried to analyze differential gene expression in normalized data using ProcMixed in SAS? Is there anything I should be cautious in using mixed effect model for RNA-seq? I don't think I can use DESeq and edgeR for a mixed model. I appreciate your input. Thanks. T. Abebe |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 991
|
![]()
Both DESeq and edgeR offer functions to fit generalized linear models (GLMs) to your genes and then perform analysis of deviance (ANODEV; the equivalent to ANOVA for GLMs). See the vignettes for usage examples.
I don't understand why you need to use a mixed model. Generalized linear mixed models (GLMMs) of the Poisson family have been used to analyse RNA-Seq data (e.g. by Blekhman, Marioni et al.) but I think there are good reasons to prefer a negative binomial GLM without random effects, as this allows to incorporate stabilized or moderated dispersion estimates. In short: Please don't reinvent the wheel. It's all there already. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Western Australia Join Date: Feb 2010
Posts: 310
|
![]()
Thank God, or more appropriately thank all the guys like Simon, because I barely have a clue what you guys are talking about.
__________________
-------------- Ethan |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Cedar Falls, IA Join Date: Dec 2011
Posts: 5
|
![]()
Dear Simon,
Thank you for pointing out to me that DESeq is capability of handling mixed effects designs through GLMs and ANODEV. I wasn’t aware of that when I posted the message earlier. After reading many posts on SEQanswers and Bioconductor, I put the following commands for my mixed model. I am getting an error message after I put the command cds <- estimateDispersions(cds, method="pooled"). I will appreciate if you can check the command lines for any correction My mixed effects design is a 4x2 factorial with a confounder (block) and contains: 4 tissues: A, B, C, D 2 treatments: Unstressed, Stressed 3 blocks: Block1, Block2, Block3 Tissue and treatment are fixed effects and block is a random effect. You suggested to use block as a fixed effect instead of a random effect but I am not sure if that is a good idea since we collected samples on different dates and we wanted to take that into account. Here is how I put the R command: >CountTable <- read.delim("count.txt", header=TRUE, row.names=1, stringsAsFactors=TRUE) >head(CountTable) >Design <- data.frame( +tissue = c("A", "A","A","A","A","A","B", "B","B","B","B","B","C", "C","C","C","C","C","D", "D","D","D","D","D"), +treatment = c("Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed"), + block = c("Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3")) >countTable <- CountTable > cds <- newCountDataSet(countTable, Design) > cds <- estimateSizeFactors(cds) > sizeFactors( cds ) > cds <- estimateDispersions(cds, method="pooled") #Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]), : 'x' must be an array of at least two dimensions # Because of the previous error I couldn’t go further to execute commands to fit GLMs but this is what I have in mind > fit0 <- fitNbinomGLMs (cds, count ~ block) > fit1 <- fitNbinomGLMs ( cds, count ~ date + treatment ) > pvals <- nbinomGLMTest( fit1, fit0 ) >padj <- p.adjust( pvals, method="BH" ) > rownames(counts(cds))[ padj < .1 ] Thank you for your help. T. Abebe |
![]() |
![]() |
![]() |
#5 |
Member
Location: freiburg Join Date: Apr 2010
Posts: 25
|
![]()
ShrinkSeq http://www.few.vu.nl/~mavdwiel/ShrinkBayes.html , seems to be very flexible and allows for random effects.
|
![]() |
![]() |
![]() |
#6 | |||
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 991
|
![]() Quote:
We have just released DESeq2, which offers empirical-Bayesian shrinkage on both dispersions and coefficients, so, if you want to look at it that way, it gives you mixed models. The shrinkage is using a common prior across all genes, so you actually get proper shrinkage despite having only three levels. (ShrinkSeq, sugegsted above, uses a rather different method, but for similar purposes.) Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#7 |
Member
Location: Beltsville, MD, USA Join Date: Jul 2012
Posts: 23
|
![]()
Hi guys,
I'm going to revive this thread as I haven't been able to find a straight forward answer to whether it is possible to run a Mixed ANOVA-like analyses in edgeR/DESeq2. My RNA-seq experiment is simple: 2 phenotype groups, 6 patients, 4 time points per patient: ![]() I'm trying to determine whether any changes in gene expression is a result of interaction between phenotype and time. Time being the within-subject factor and phenotype the between-subject factor. So far based on the edgeR manual and a few discussions I read on the BioC mailing list my formula looks like this: model.matrix(~phenotype + phenotype : patient + phenotype : time) Is it possible to test if there's an interaction between phenotype and time? Perhaps there's a more appropriate method to analyzing this experimental design that I'm overlooking. Any help with this is greatly appreciated. Thanks in advance! |
![]() |
![]() |
![]() |
#8 |
Member
Location: freiburg Join Date: Apr 2010
Posts: 25
|
![]()
I would not necessary consider this design super-simple. At least the fact that you have multiple measurements per individual might be a little tricky because of a high amount of dependency (timepoint 2 might be dependent on timepoint 1, multiple measures within one patient can be more correlated).However I have to admint that I do not have much experience with time series data.
Maybe, because you have time series data, other methods, packages might be more suited. This publication http://www.pnas.org/content/early/2011/03/22/1017542108 investigates as similar experiment, as you do. Without knowing the data, I would assume that you are not interested in the individual's response. Thus I would treat the Patient as a random factor. As far as I know this is not possible in EdgeR but in Limma (can be used with count data using the voom() function). You would then end up with something like: Expression ~ Phenotype + Time + Phenotype:Time + random(Patient). Please note that it might be dangerous to introduce an interaction term in the model without introducing the main effect, because you would falsely attribute a possible main effect to the interaction. But then you are still left with the time course which (I would assume ) is not 100% treated correct in the procedure I proposed. This is my approach but (as I wrote above) I am not experienced with time course data and there might be better / more correct approaches Best Moritz |
![]() |
![]() |
![]() |
#9 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,478
|
![]()
Neither DESeq (1 or 2) nor edgeR handle mixed-effects models. These are generally dealt with by simply using "Patient" as a factor. As mentioned by Moritz, one could instead use limma, which has a duplicateCorrelations() function. It'd be interesting to see a side-by-side comparison to see how much one might gain from that.
BTW, you're not interested in phenotype ![]() |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 991
|
![]()
There are rather different views in the community about when one should use mixed models. Hence, Moritz, what exactly would you see as the advanage of treating 'patient' as random effect?
|
![]() |
![]() |
![]() |
#11 |
Member
Location: Beltsville, MD, USA Join Date: Jul 2012
Posts: 23
|
![]()
Thank you for your responses.
So far, what I've gathered is that the appropriate method to analyze my data is to run a PROC GLIMMIX (nb distribution) repeated measures analysis. So if I understand correctly, neither edgeR/DESeq2 nor JMP Genomics can handle that. Ultimately, the comparisons of interest will be which genes respond differently between Control Time 1 vs Affected Time 1, Control Time 2 vs Affected Time 2. etc. Are there possible analogous methods in packages such as edge/DESeq2 that could mimic the SAS analysis I listed above? I have a basic working knowledge of the R language as well as the stat concepts involved so please pardon my ignorance if my questions violate any fundamentals. Thanks! |
![]() |
![]() |
![]() |
#12 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,478
|
![]()
lme4 can do that (glmer.nb function). Keep in mind that you'll have decreased power with PROC GLIMMIX and the analogous method in R unless you basically reimplement the DESeq2 (or edgeR or whatever) methods yourself.
Having said that, you should really reply Simon's question about what you hope to gain by treating patient as a random effect. |
![]() |
![]() |
![]() |
#13 |
Junior Member
Location: Norway Join Date: Jan 2010
Posts: 4
|
![]()
I have also similar experiment. I think patient should be considered as a random effect because one would not really care if patient 1 performs better than patient 3 or other patients. Instead, you would care about the patients variance; i.e., how much variation does a paitent provide to the overall model.
|
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 991
|
![]()
To repeat: Contrary to what some stats textbooks say, is is not necessary to treat an effect as random only because one does not care about individual levels or because is seems to be drawn from a population.
I prefer very much the view promoted e.g. in G. K. Robinson's classic paper [1] that the point of using random effects is to get something like empirical-Bayes shrinkage into frequentist linear-model fitting. Note that DESeq2 now give you such shrinkage in an manner that is technically achieved in a different manner but has very similar effect. [2] [1] G. K. Robinson: That BLUP is a Good Thing: The Estimation of Random Effects. Statistical Science 6 (1991), no. 1, 15--32. doi:10.1214/ss/1177011926. http://projecteuclid.org/euclid.ss/1177011926. [2] Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. bioRxiv (2014), doi: 10.1101/002832 |
![]() |
![]() |
![]() |
#15 |
Member
Location: freiburg Join Date: Apr 2010
Posts: 25
|
![]()
I now this answer is really late
![]() |
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 991
|
![]()
No, because due to the use of an empirical-Bayes estimation method for dispersion estimation, you do not consume DoFs as usual.
Otherwise, it would be completely impossible to perform comparisons between groups with only two to four samples per group. For details, see Gordon Smyth's original 2004 paper on limma, which introduced this whole idea of understanding empirical-Bayes shrinkage estimation as a way of "saving degrees of freedom" by sharing information across genes. |
![]() |
![]() |
![]() |
#17 |
Member
Location: freiburg Join Date: Apr 2010
Posts: 25
|
![]()
Hi Simon,
thank's for your response. That's very interesting I might need to study the matter more in detail to fully understand. Moritz |
![]() |
![]() |
![]() |
#18 | |
Junior Member
Location: Missouri, US Join Date: Jul 2017
Posts: 1
|
![]()
If we model random effects as fixed effects and put bayes prior on the fixed effects to get shrunken estimates, more concerns are:
(1) When the design is complex with nested designs (such as split-plot) and blocking effects, etc. how to correctly estimate variation merely due to sampling/replication; i.e. so-called "error variation"? The size of error variation is crucial for significance test. Is there any research to assess the nearly equivalence of "fixed effects with empirical bayes" and "random effects with empirical bayes"? Because you can also put prior on random effects also to borrow information across genes. I once analyzed a gene expression data with complex designs. I first use edgeR GLM and DESeq also. edgeR and DESeq results are very similar. Then I used GLMM (no shrinkage, but random effects), I got a lot more significant differentially expressed genes than edgeR and DESeq by controlling FDR at 0.05. So, the above two things "fixed with empirical bayes" and "random effects" are not quite equal from this experience I have. (2) Correlation among observations. Does the "fixed effects with empirical bayes" correctly model correlations among observations? Bayes prior does put correlations among genes, but within a gene, how does it model correlations among observations within a gene where split-plot or blocking happen? Thanks, Ariana G. Quote:
|
|
![]() |
![]() |
![]() |
#19 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 991
|
![]()
Not sure whetehr ther's mich point in reviving this old thread, hence just to clarify:
Of course, fixed models with empirical Bayes shrinkage cannot replace any form of mixed models. Especially they cannot account for nested designs, repeated-measure designs, or other designs which cause the strength of correlation between samples within a treatment group to depend on sub-structure. My point back then (as far as I can say now) was merely that quite often, people think they need mixed models but don't. |
![]() |
![]() |
![]() |
Tags |
deseq, edger, mixed anova |
Thread Tools | |
|
|