Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I use DESeq and edgeR for mixed ANOVA

    Hi guys,

    I am working with RNA-seq data generated by Illumina GAII. The experiment design is a randomized complete block design (RCBD) with 4 different tissues, 2 treatments (control and treated), and 3 blocks (reps). My interest is to use the mixed ANOVA in ProcMixed (SAS) to determine differentially expressed genes. Initially, I thought I can normalize read counts in RPKM and use that to do the Mixed ANOVA in SAS or JMP Genomics. Reading through the literature I realized that RPKM is not the best normalization tool for RNA-seq. I am leaning toward DESeq or edgeR for normalization.

    My questions are:
    1) How do I extract normalized counts from DESeq and EdgeR?
    2) Has anybody tried to analyze differential gene expression in normalized data using ProcMixed in SAS? Is there anything I should be cautious in using mixed effect model for RNA-seq? I don't think I can use DESeq and edgeR for a mixed model.

    I appreciate your input.

    Thanks.

    T. Abebe

  • #2
    Both DESeq and edgeR offer functions to fit generalized linear models (GLMs) to your genes and then perform analysis of deviance (ANODEV; the equivalent to ANOVA for GLMs). See the vignettes for usage examples.

    I don't understand why you need to use a mixed model. Generalized linear mixed models (GLMMs) of the Poisson family have been used to analyse RNA-Seq data (e.g. by Blekhman, Marioni et al.) but I think there are good reasons to prefer a negative binomial GLM without random effects, as this allows to incorporate stabilized or moderated dispersion estimates.

    In short: Please don't reinvent the wheel. It's all there already.

    Comment


    • #3
      Originally posted by Simon Anders View Post
      In short: Please don't reinvent the wheel. It's all there already.
      Thank God, or more appropriately thank all the guys like Simon, because I barely have a clue what you guys are talking about.
      --------------
      Ethan

      Comment


      • #4
        Can I use DESeq and edgeR for mixed ANOVA Reply to Thread ?

        Dear Simon,

        Thank you for pointing out to me that DESeq is capability of handling mixed effects designs through GLMs and ANODEV. I wasn’t aware of that when I posted the message earlier.

        After reading many posts on SEQanswers and Bioconductor, I put the following commands for my mixed model. I am getting an error message after I put the command cds <- estimateDispersions(cds, method="pooled"). I will appreciate if you can check the command lines for any correction

        My mixed effects design is a 4x2 factorial with a confounder (block) and contains:

        4 tissues: A, B, C, D
        2 treatments: Unstressed, Stressed
        3 blocks: Block1, Block2, Block3

        Tissue and treatment are fixed effects and block is a random effect. You suggested to use block as a fixed effect instead of a random effect but I am not sure if that is a good idea since we collected samples on different dates and we wanted to take that into account.

        Here is how I put the R command:

        >CountTable <- read.delim("count.txt", header=TRUE, row.names=1, stringsAsFactors=TRUE)
        >head(CountTable)
        >Design <- data.frame(
        +tissue = c("A", "A","A","A","A","A","B", "B","B","B","B","B","C", "C","C","C","C","C","D", "D","D","D","D","D"),
        +treatment = c("Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed","Untreated","Unstressed","Unstressed","Stressed","Stressed","Stressed"),
        + block = c("Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3","Date1", "Date2","Date3"))

        >countTable <- CountTable
        > cds <- newCountDataSet(countTable, Design)

        > cds <- estimateSizeFactors(cds)
        > sizeFactors( cds )

        > cds <- estimateDispersions(cds, method="pooled")

        #Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]), : 'x' must be an array of at least two dimensions

        # Because of the previous error I couldn’t go further to execute commands to fit GLMs but this is what I have in mind

        > fit0 <- fitNbinomGLMs (cds, count ~ block)
        > fit1 <- fitNbinomGLMs ( cds, count ~ date + treatment )

        > pvals <- nbinomGLMTest( fit1, fit0 )
        >padj <- p.adjust( pvals, method="BH" )
        > rownames(counts(cds))[ padj < .1 ]

        Thank you for your help.

        T. Abebe

        Comment


        • #5
          ShrinkSeq http://www.few.vu.nl/~mavdwiel/ShrinkBayes.html , seems to be very flexible and allows for random effects.

          Comment


          • #6
            Originally posted by abebe View Post
            Tissue and treatment are fixed effects and block is a random effect. You suggested to use block as a fixed effect instead of a random effect but I am not sure if that is a good idea since we collected samples on different dates and we wanted to take that into account.
            I still don't see what's wrong with using a fixed effect for 'block', and for a factor with only three levels, a GLM and a GLMM will give nearly the same result, anyway. Furthermore, a lot what you find in older text books about when to use fixed and when random effects is no longer current practice. G K Robinson's 1991 article might be what has changed many people's way of thinking about it. What it says is that you should use random effects if you want shrinkage and fixed effects if you don't, no matter what kind of effect you are modelling.

            We have just released DESeq2, which offers empirical-Bayesian shrinkage on both dispersions and coefficients, so, if you want to look at it that way, it gives you mixed models. The shrinkage is using a common prior across all genes, so you actually get proper shrinkage despite having only three levels. (ShrinkSeq, sugegsted above, uses a rather different method, but for similar purposes.)

            > cds <- estimateDispersions(cds, method="pooled")

            [I]#Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]), : 'x' must be an array of at least two dimensions
            Please use method="pooled-CR". Not sure why you didn't get a more informative error message.

            > fit0 <- fitNbinomGLMs (cds, count ~ block)
            > fit1 <- fitNbinomGLMs ( cds, count ~ date + treatment )
            Make up you mind whether you call it "block" or "date". Also, what happened to "tissue"?

            Comment


            • #7
              Mixed ANOVA

              Hi guys,
              I'm going to revive this thread as I haven't been able to find a straight forward answer to whether it is possible to run a Mixed ANOVA-like analyses in edgeR/DESeq2.

              My RNA-seq experiment is simple: 2 phenotype groups, 6 patients, 4 time points per patient:



              I'm trying to determine whether any changes in gene expression is a result of interaction between phenotype and time. Time being the within-subject factor and phenotype the between-subject factor.

              So far based on the edgeR manual and a few discussions I read on the BioC mailing list my formula looks like this:

              model.matrix(~phenotype + phenotype : patient + phenotype : time)

              Is it possible to test if there's an interaction between phenotype and time?

              Perhaps there's a more appropriate method to analyzing this experimental design that I'm overlooking.

              Any help with this is greatly appreciated. Thanks in advance!

              Comment


              • #8
                I would not necessary consider this design super-simple. At least the fact that you have multiple measurements per individual might be a little tricky because of a high amount of dependency (timepoint 2 might be dependent on timepoint 1, multiple measures within one patient can be more correlated).However I have to admint that I do not have much experience with time series data.

                Maybe, because you have time series data, other methods, packages might be more suited.
                This publication http://www.pnas.org/content/early/2011/03/22/1017542108
                investigates as similar experiment, as you do.

                Without knowing the data, I would assume that you are not interested in the individual's response. Thus I would treat the Patient as a random factor. As far as I know this is not possible in EdgeR but in Limma (can be used with count data using the voom() function). You would then end up with something like:
                Expression ~ Phenotype + Time + Phenotype:Time + random(Patient).
                Please note that it might be dangerous to introduce an interaction term in the model without introducing the main effect, because you would falsely attribute a possible main effect to the interaction.
                But then you are still left with the time course which (I would assume ) is not 100% treated correct in the procedure I proposed.

                This is my approach but (as I wrote above) I am not experienced with time course data and there might be better / more correct approaches

                Best

                Moritz

                Comment


                • #9
                  Neither DESeq (1 or 2) nor edgeR handle mixed-effects models. These are generally dealt with by simply using "Patient" as a factor. As mentioned by Moritz, one could instead use limma, which has a duplicateCorrelations() function. It'd be interesting to see a side-by-side comparison to see how much one might gain from that.

                  BTW, you're not interested in phenotypeatient, which is good since you can't actually measure it (you can try, but you'll just get an error). Phenotype is a linear combination of patients, so ~patient + phenotype*time would make more sense (whether to include "phenotype" and/or "time" on their own in the model will depend on the biology).

                  Comment


                  • #10
                    There are rather different views in the community about when one should use mixed models. Hence, Moritz, what exactly would you see as the advanage of treating 'patient' as random effect?

                    Comment


                    • #11
                      Thank you for your responses.

                      So far, what I've gathered is that the appropriate method to analyze my data is to run a PROC GLIMMIX (nb distribution) repeated measures analysis. So if I understand correctly, neither edgeR/DESeq2 nor JMP Genomics can handle that.

                      Ultimately, the comparisons of interest will be which genes respond differently between Control Time 1 vs Affected Time 1, Control Time 2 vs Affected Time 2. etc.

                      Are there possible analogous methods in packages such as edge/DESeq2 that could mimic the SAS analysis I listed above? I have a basic working knowledge of the R language as well as the stat concepts involved so please pardon my ignorance if my questions violate any fundamentals.

                      Thanks!

                      Comment


                      • #12
                        lme4 can do that (glmer.nb function). Keep in mind that you'll have decreased power with PROC GLIMMIX and the analogous method in R unless you basically reimplement the DESeq2 (or edgeR or whatever) methods yourself.

                        Having said that, you should really reply Simon's question about what you hope to gain by treating patient as a random effect.

                        Comment


                        • #13
                          Originally posted by dpryan View Post
                          Having said that, you should really reply Simon's question about what you hope to gain by treating patient as a random effect.
                          I have also similar experiment. I think patient should be considered as a random effect because one would not really care if patient 1 performs better than patient 3 or other patients. Instead, you would care about the patients variance; i.e., how much variation does a paitent provide to the overall model.

                          Comment


                          • #14
                            To repeat: Contrary to what some stats textbooks say, is is not necessary to treat an effect as random only because one does not care about individual levels or because is seems to be drawn from a population.

                            I prefer very much the view promoted e.g. in G. K. Robinson's classic paper [1] that the point of using random effects is to get something like empirical-Bayes shrinkage into frequentist linear-model fitting. Note that DESeq2 now give you such shrinkage in an manner that is technically achieved in a different manner but has very similar effect. [2]

                            [1] G. K. Robinson: That BLUP is a Good Thing: The Estimation of Random Effects. Statistical Science 6 (1991), no. 1, 15--32. doi:10.1214/ss/1177011926. http://projecteuclid.org/euclid.ss/1177011926.

                            [2] Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. bioRxiv (2014), doi: 10.1101/002832

                            Comment


                            • #15
                              I now this answer is really late . I thought that even if it is possible to introduce a "batch factor" as fixed effect, one would loose a lot of power because more degrees of freedom are consumed compared to treating the batch effect as random. And the more levels your "batch factor" has, the more you benefit from treating it as random ?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X