Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Golsheed
    Member
    • Oct 2014
    • 49

    Variance decomposition in DESeq2

    Hi,
    I'm trying to figure out the proportion of variance in the data that is explained by each predictor. Is there a way to do this in DESeq2?
    Thanks a lot,
    Golsheed
  • Michael Love
    Senior Member
    • Jul 2013
    • 333

    #2
    Generalized linear models use analysis of deviance instead of analysis of variance.

    http://en.wikipedia.org/wiki/Deviance_(statistics)

    the -2 * the log likelihood for each gene is given in mcols(dds)$deviance

    You can't get the deviance for each predictor, but you can compare the difference in deviance between successively smaller models, where the smallest model is ~ 1. So if you remove one predictor at a time and re-run you will see this.

    This is the same that anova(fit) does when fit is a glm object.

    Comment

    • Golsheed
      Member
      • Oct 2014
      • 49

      #3
      Thanks a bunch, that's a huge help.

      Golsheed

      Originally posted by Michael Love View Post
      Generalized linear models use analysis of deviance instead of analysis of variance.

      http://en.wikipedia.org/wiki/Deviance_(statistics)

      the -2 * the log likelihood for each gene is given in mcols(dds)$deviance

      You can't get the deviance for each predictor, but you can compare the difference in deviance between successively smaller models, where the smallest model is ~ 1. So if you remove one predictor at a time and re-run you will see this.

      This is the same that anova(fit) does when fit is a glm object.

      Comment

      • Golsheed
        Member
        • Oct 2014
        • 49

        #4
        Is there any way in general to estimate the variance explained by each covariate? For example, can we, regardless of the method used (i.e., glm.nb in this case), compute the coefficient of determination between successfully smaller models?
        R^2=\hat{Cor} (\hat {y}, y)^2
        where \hat{Cor}, \hat{y}, and y are the estimated correlation, fittet values, and the observed values.

        Thanks,
        Golsheed





        Originally posted by Michael Love View Post
        Generalized linear models use analysis of deviance instead of analysis of variance.

        http://en.wikipedia.org/wiki/Deviance_(statistics)

        the -2 * the log likelihood for each gene is given in mcols(dds)$deviance

        You can't get the deviance for each predictor, but you can compare the difference in deviance between successively smaller models, where the smallest model is ~ 1. So if you remove one predictor at a time and re-run you will see this.

        This is the same that anova(fit) does when fit is a glm object.

        Comment

        • Golsheed
          Member
          • Oct 2014
          • 49

          #5
          Thanks a lot for your help, Michael. Do you mind clarifying a bit on the expression that was used to calculate the deviance for each gene? I'm trying to calculate a "generalized" coefficient of determination for this glm, and I find that (if I'm not mistaken) people use different definitions of deviance for that purpose, and come up with different measures. Just to double-check, is this the way deviance is calculated for each gene in DESeq2:
          -2*(log P(y|M)-log P(y|S)),
          where M is the model we're testing for and S is the saturated (full) model where every observation has its own coefficient, and y is the observation?

          Thanks again,
          Golsheed





          Originally posted by Michael Love View Post
          Generalized linear models use analysis of deviance instead of analysis of variance.

          http://en.wikipedia.org/wiki/Deviance_(statistics)

          the -2 * the log likelihood for each gene is given in mcols(dds)$deviance

          You can't get the deviance for each predictor, but you can compare the difference in deviance between successively smaller models, where the smallest model is ~ 1. So if you remove one predictor at a time and re-run you will see this.

          This is the same that anova(fit) does when fit is a glm object.

          Comment

          • Michael Love
            Senior Member
            • Jul 2013
            • 333

            #6
            The deviance we return is just -2 logP(y|M). You can get the saturated log likelihood using something like (warning: untested code):

            Code:
            rowSums(dnbinom(counts(dds), mu=counts(dds), size=1/dispersions(dds), log=TRUE))

            Comment

            • Golsheed
              Member
              • Oct 2014
              • 49

              #7
              Thanks a lot for the clarification on the deviance.
              Could you please explain the code? I guess you're trying to sum up the log likelihoods of the observations based on the negative binomial distribution, the parameters of which are estimated by DESeq2, is that right? I have a hard time understanding what each term of the code represents. Specifically, in the alternative parametrization of the negative binomial distribution, how are counts related to mu?

              Thanks again!


              Originally posted by Michael Love View Post
              The deviance we return is just -2 logP(y|M). You can get the saturated log likelihood using something like (warning: untested code):

              Code:
              rowSums(dnbinom(counts(dds), mu=counts(dds), size=1/dispersions(dds), log=TRUE))

              Comment

              • Golsheed
                Member
                • Oct 2014
                • 49

                #8
                Sorry for bugging you this much. If I'm using nbinomlRT to compare design(dds) the full model with the reduced model, is the deviance returning -2 logP(y|M) for the full model or the model that's considered to be significant according to the LRT?

                Thanks a bunch!


                Originally posted by Michael Love View Post
                The deviance we return is just -2 logP(y|M). You can get the saturated log likelihood using something like (warning: untested code):

                Code:
                rowSums(dnbinom(counts(dds), mu=counts(dds), size=1/dispersions(dds), log=TRUE))

                Comment

                • Michael Love
                  Senior Member
                  • Jul 2013
                  • 333

                  #9
                  There is not a simple R^2-like statistic for generalized linear models but a set of proposed ones, and the differences in interpreting these is not trivial.

                  Here are some links which describe these pseudo-R^2 for binomial GLM (logistic regression), but the point is relevant also for NB and Poisson GLM.


                  I found a formula for pseudo $R^2$ in the book Extending the Linear Model with R, Julian J. Faraway (p. 59). $$1-\frac{\text{ResidualDeviance}}{\text{NullDeviance}}$$. Is this a common formula for

                  I have SPSS output for a logistic regression model. The output reports two measures for the model fit, Cox & Snell and Nagelkerke. So as a rule of thumb, which of these $R^²$ measures would you


                  You can try these out yourself or ask someone locally to help implement the one you are interested in. Here are the corresponding terms:

                  Code:
                  y = counts(dds)
                  y.bar = rowMeans(counts(dds))
                  y.hat = assays(dds)[["mu"]]

                  Comment

                  • Golsheed
                    Member
                    • Oct 2014
                    • 49

                    #10
                    Thanks, I appreciate your detailed response.
                    As for the deviance reported by DESeq2, you mentioned that it is just the log likelihood of observations given the estimated parameters. However, when I run DESeq2 on my dataset, I'm getting weird results as follows:

                    > head(mcols(dds)$deviance)
                    [1] 247.3268 1265.1006 1012.0499 792.3526 1722.3851 1112.4335

                    These values don't seem to be representing log P(y|M), so I'm guessing either I'm doing something wrong or the deviance definition differs a bit from what I have in mind (specifically, is it the ratio of log likelihoods of the tested model against the null model?)

                    Thanks a lot for your help and patience.



                    Originally posted by Michael Love View Post
                    There is not a simple R^2-like statistic for generalized linear models but a set of proposed ones, and the differences in interpreting these is not trivial.

                    Here are some links which describe these pseudo-R^2 for binomial GLM (logistic regression), but the point is relevant also for NB and Poisson GLM.


                    I found a formula for pseudo $R^2$ in the book Extending the Linear Model with R, Julian J. Faraway (p. 59). $$1-\frac{\text{ResidualDeviance}}{\text{NullDeviance}}$$. Is this a common formula for

                    I have SPSS output for a logistic regression model. The output reports two measures for the model fit, Cox & Snell and Nagelkerke. So as a rule of thumb, which of these $R^²$ measures would you


                    You can try these out yourself or ask someone locally to help implement the one you are interested in. Here are the corresponding terms:

                    Code:
                    y = counts(dds)
                    y.bar = rowMeans(counts(dds))
                    y.hat = assays(dds)[["mu"]]

                    Comment

                    • Michael Love
                      Senior Member
                      • Jul 2013
                      • 333

                      #11
                      read over my response again (#6 above)

                      Comment

                      • Golsheed
                        Member
                        • Oct 2014
                        • 49

                        #12
                        Oh, right! sorry, I missed the minus sign!
                        If I'm using the nbinomLRT() function with a full and a reduced model, then does the deviance, for each gene, output -2 log P(y|M) where M is the full model or does it report the log likelihood for the significant model?
                        Thanks so much again.

                        Comment

                        • Michael Love
                          Senior Member
                          • Jul 2013
                          • 333

                          #13
                          The column, mcols(dds)$deviance, is -2 log likelihood full model.

                          The LRT statistic is given in results(dds)$stat, and this is -2 log (likelihood reduced model / likelihood full model)

                          See here for more details: http://en.wikipedia.org/wiki/Likelihood-ratio_test

                          Comment

                          • Golsheed
                            Member
                            • Oct 2014
                            • 49

                            #14
                            Thanks a lot for the clarification.

                            Comment

                            Latest Articles

                            Collapse

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, 06-09-2026, 11:58 AM
                            0 responses
                            17 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-05-2026, 10:09 AM
                            0 responses
                            27 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-04-2026, 08:59 AM
                            0 responses
                            38 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            61 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...