Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variance decomposition in DESeq2

    Hi,
    I'm trying to figure out the proportion of variance in the data that is explained by each predictor. Is there a way to do this in DESeq2?
    Thanks a lot,
    Golsheed

  • #2
    Generalized linear models use analysis of deviance instead of analysis of variance.

    http://en.wikipedia.org/wiki/Deviance_(statistics)

    the -2 * the log likelihood for each gene is given in mcols(dds)$deviance

    You can't get the deviance for each predictor, but you can compare the difference in deviance between successively smaller models, where the smallest model is ~ 1. So if you remove one predictor at a time and re-run you will see this.

    This is the same that anova(fit) does when fit is a glm object.

    Comment


    • #3
      Thanks a bunch, that's a huge help.

      Golsheed

      Originally posted by Michael Love View Post
      Generalized linear models use analysis of deviance instead of analysis of variance.

      http://en.wikipedia.org/wiki/Deviance_(statistics)

      the -2 * the log likelihood for each gene is given in mcols(dds)$deviance

      You can't get the deviance for each predictor, but you can compare the difference in deviance between successively smaller models, where the smallest model is ~ 1. So if you remove one predictor at a time and re-run you will see this.

      This is the same that anova(fit) does when fit is a glm object.

      Comment


      • #4
        Is there any way in general to estimate the variance explained by each covariate? For example, can we, regardless of the method used (i.e., glm.nb in this case), compute the coefficient of determination between successfully smaller models?
        R^2=\hat{Cor} (\hat {y}, y)^2
        where \hat{Cor}, \hat{y}, and y are the estimated correlation, fittet values, and the observed values.

        Thanks,
        Golsheed





        Originally posted by Michael Love View Post
        Generalized linear models use analysis of deviance instead of analysis of variance.

        http://en.wikipedia.org/wiki/Deviance_(statistics)

        the -2 * the log likelihood for each gene is given in mcols(dds)$deviance

        You can't get the deviance for each predictor, but you can compare the difference in deviance between successively smaller models, where the smallest model is ~ 1. So if you remove one predictor at a time and re-run you will see this.

        This is the same that anova(fit) does when fit is a glm object.

        Comment


        • #5
          Thanks a lot for your help, Michael. Do you mind clarifying a bit on the expression that was used to calculate the deviance for each gene? I'm trying to calculate a "generalized" coefficient of determination for this glm, and I find that (if I'm not mistaken) people use different definitions of deviance for that purpose, and come up with different measures. Just to double-check, is this the way deviance is calculated for each gene in DESeq2:
          -2*(log P(y|M)-log P(y|S)),
          where M is the model we're testing for and S is the saturated (full) model where every observation has its own coefficient, and y is the observation?

          Thanks again,
          Golsheed





          Originally posted by Michael Love View Post
          Generalized linear models use analysis of deviance instead of analysis of variance.

          http://en.wikipedia.org/wiki/Deviance_(statistics)

          the -2 * the log likelihood for each gene is given in mcols(dds)$deviance

          You can't get the deviance for each predictor, but you can compare the difference in deviance between successively smaller models, where the smallest model is ~ 1. So if you remove one predictor at a time and re-run you will see this.

          This is the same that anova(fit) does when fit is a glm object.

          Comment


          • #6
            The deviance we return is just -2 logP(y|M). You can get the saturated log likelihood using something like (warning: untested code):

            Code:
            rowSums(dnbinom(counts(dds), mu=counts(dds), size=1/dispersions(dds), log=TRUE))

            Comment


            • #7
              Thanks a lot for the clarification on the deviance.
              Could you please explain the code? I guess you're trying to sum up the log likelihoods of the observations based on the negative binomial distribution, the parameters of which are estimated by DESeq2, is that right? I have a hard time understanding what each term of the code represents. Specifically, in the alternative parametrization of the negative binomial distribution, how are counts related to mu?

              Thanks again!


              Originally posted by Michael Love View Post
              The deviance we return is just -2 logP(y|M). You can get the saturated log likelihood using something like (warning: untested code):

              Code:
              rowSums(dnbinom(counts(dds), mu=counts(dds), size=1/dispersions(dds), log=TRUE))

              Comment


              • #8
                Sorry for bugging you this much. If I'm using nbinomlRT to compare design(dds) the full model with the reduced model, is the deviance returning -2 logP(y|M) for the full model or the model that's considered to be significant according to the LRT?

                Thanks a bunch!


                Originally posted by Michael Love View Post
                The deviance we return is just -2 logP(y|M). You can get the saturated log likelihood using something like (warning: untested code):

                Code:
                rowSums(dnbinom(counts(dds), mu=counts(dds), size=1/dispersions(dds), log=TRUE))

                Comment


                • #9
                  There is not a simple R^2-like statistic for generalized linear models but a set of proposed ones, and the differences in interpreting these is not trivial.

                  Here are some links which describe these pseudo-R^2 for binomial GLM (logistic regression), but the point is relevant also for NB and Poisson GLM.


                  I found a formula for pseudo $R^2$ in the book Extending the Linear Model with R, Julian J. Faraway (p. 59). $$1-\frac{\text{ResidualDeviance}}{\text{NullDeviance}}$$. Is this a common formula for

                  I have SPSS output for a logistic regression model. The output reports two measures for the model fit, Cox & Snell and Nagelkerke. So as a rule of thumb, which of these $R^²$ measures would you


                  You can try these out yourself or ask someone locally to help implement the one you are interested in. Here are the corresponding terms:

                  Code:
                  y = counts(dds)
                  y.bar = rowMeans(counts(dds))
                  y.hat = assays(dds)[["mu"]]

                  Comment


                  • #10
                    Thanks, I appreciate your detailed response.
                    As for the deviance reported by DESeq2, you mentioned that it is just the log likelihood of observations given the estimated parameters. However, when I run DESeq2 on my dataset, I'm getting weird results as follows:

                    > head(mcols(dds)$deviance)
                    [1] 247.3268 1265.1006 1012.0499 792.3526 1722.3851 1112.4335

                    These values don't seem to be representing log P(y|M), so I'm guessing either I'm doing something wrong or the deviance definition differs a bit from what I have in mind (specifically, is it the ratio of log likelihoods of the tested model against the null model?)

                    Thanks a lot for your help and patience.



                    Originally posted by Michael Love View Post
                    There is not a simple R^2-like statistic for generalized linear models but a set of proposed ones, and the differences in interpreting these is not trivial.

                    Here are some links which describe these pseudo-R^2 for binomial GLM (logistic regression), but the point is relevant also for NB and Poisson GLM.


                    I found a formula for pseudo $R^2$ in the book Extending the Linear Model with R, Julian J. Faraway (p. 59). $$1-\frac{\text{ResidualDeviance}}{\text{NullDeviance}}$$. Is this a common formula for

                    I have SPSS output for a logistic regression model. The output reports two measures for the model fit, Cox & Snell and Nagelkerke. So as a rule of thumb, which of these $R^²$ measures would you


                    You can try these out yourself or ask someone locally to help implement the one you are interested in. Here are the corresponding terms:

                    Code:
                    y = counts(dds)
                    y.bar = rowMeans(counts(dds))
                    y.hat = assays(dds)[["mu"]]

                    Comment


                    • #11
                      read over my response again (#6 above)

                      Comment


                      • #12
                        Oh, right! sorry, I missed the minus sign!
                        If I'm using the nbinomLRT() function with a full and a reduced model, then does the deviance, for each gene, output -2 log P(y|M) where M is the full model or does it report the log likelihood for the significant model?
                        Thanks so much again.

                        Comment


                        • #13
                          The column, mcols(dds)$deviance, is -2 log likelihood full model.

                          The LRT statistic is given in results(dds)$stat, and this is -2 log (likelihood reduced model / likelihood full model)

                          See here for more details: http://en.wikipedia.org/wiki/Likelihood-ratio_test

                          Comment


                          • #14
                            Thanks a lot for the clarification.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin


                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                              Yesterday, 07:01 AM
                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            39 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            41 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            35 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            55 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X