Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can edgeR/DESeq have more than one covariate?

    Beside diagnosis, can I include age, gender and etc as covariates in the GLM model? I could not find such information in their paper or manual.

    Thanks a lot.
    Last edited by arrchi; 01-13-2012, 01:31 PM.

  • #2
    Sure, this was added to both packages mid 2010. Are you reading the current versions of the manuals?

    Comment


    • #3
      Hi,

      I read Simon's answer from a few months ago, but could not find details about covariates in the DESeq manual. The covariates I am interested in accounting for are quantitative and are known to affect gene expression (e.g. RIN can have a huge influence on mRNA levels). Could you clarify whether or not DESeq allows for this type of covariates? I previously ran into this post, which seemed to imply that the quantitative covariates cannot be easily incorporated.

      Thank you,
      Alexandra

      Comment


      • #4
        edgeR provides complete support for any number of covariates and factors, provided of course you have enough libraries available to estimate the parameters, in particular more libraries than coefficients. See



        or the edgeR User's Guide.

        There are two case studies in the edgeR User's Guide which involve two experimental factors.

        Gordon

        Comment


        • #5
          In principle, using quantitative covariates should be possible with both packages, even though I have not seen this actually being done, because it is rarely useful in practice: As we are talking about (generalized) linear models, there should be some reason to assume that your covariate influences log expression in a linear manner. For you example, RIN, I do not think that this would be a good assumption.

          Comment


          • #6
            Hi Simon,

            I do not fully understand your statement regarding RIN. Why did you say that RIN most likely does not influence expression in a linear fashion? Do you have any literature evidence that would suggest this to be the case?

            The RNA-Seq samples we are looking at have been previously analyzed in a larger Parkinson disease/control expression study that used microarrays for assessing gene expression levels. In the microarray study, we included RIN in the used linear model because of its high impact on gene expression. Actually, in many cases RIN would be a stronger predictor for gene expression than the case/control status itself. From my point of view, it makes sense that samples with low(er) RIN will have more RNA degradation, and the apparent level of expression will be influenced by RINs. To a lesser extent, post-mortem interval and age are also predictors of gene expression.

            Additionally, even after multiple RNA extractions, some of the samples obtained from diseased tissue that we included in our recent RNA-Seq study had a tendency for lower RINs than the samples obtained from control tissue. Therefore, for our analyses, we need to compare a set of diseased samples and a set of control samples that differ in terms of average RIN, covariate that we know to be predictive of gene expression. In this case, it seems mandatory to include RIN in the used analyses. What would your opinion be?

            Gordon, thank you for the article reference.

            Alexandra

            Comment


            • #7
              Interesting. If you say that you saw a linear dependence on RIN in earlier studies, it certainly makes sense to try to add it in a quantitative manner. It did not occur to me that sample integrity can be so hard to control that one needs to account for its variation, but then, I never had to work with post-mortem samples.

              I might have thought things through in my previous post, because, in principle, the GLM approaches of edgeR and DESeq should not care whether covariates are categorical or quantitative. Just proceed according the the vignettes and use a numerical vector instead of only factors in the model frame, and ask again, if this throws an error.

              Comment


              • #8
                Hello,

                I am a complete newbie in statistics, so please forgive me if this question sounds really illogical. I basically tried to to the same thing as the OP wanted, but i got stuck at the estimateDispersions function. I have a relatively large sample size (32) and i want to have 2 covariates one binary e.g. "sex" and another quantitative variable e.g. "RIN", that is partially replicated (e.g. 2, 2, 2.5, 3, 3.5, 3.5,......).
                If I do the estimateDispersions stage, then the internal function modelMatrixToConditionFactor(modelFrame) creates a condition vector with 22 levels (each having 1 to 3 replicates) and then estimates dispersions.

                My question is, can I just leave out the quantitative variable from the estimate dispersions stage, and only use it in fitNbinomGLMs() or is that just mathematically uncorrect and hence wrong?
                What I am trying to do, is to prove that the quantitative variable has a significant effect to the expression.

                Any advise would be greatly appreciated.

                Comment


                • #9
                  Mathematically incorrect.

                  Try the edgeR package. It has fast, reliable glm features and has no trouble with this sort of scenario.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-27-2024, 06:37 PM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-27-2024, 06:07 PM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  69 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X