Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coef vs contrast in edgeR glmLRT

    I thought I understood what the contrast= option in glmLRT means as opposed to coef= (which just looks for DE wrt the index=1 sample, presumed to be the control), but I'm getting crazy results from using contrast= to get what I think should be the equivalent to coef=.

    I've got a 4 times X 5 conditions design matrix, so 20 total "treatments." I want to get the DE for condition vs control for each of the four times. The first non-control condition is called AS2 and the control is called Col (for AS2-enhanced transgenic samples and "Columbia", a standard Arabidopsis wild type, if you're curious). So, for example, I'd like to get DE{Col(0)|AS2(0)}, DE{Col(30)|AS2(30)}, etc., where 0 and 30 minutes are two of the four times. But I'll focus on t=0 here, which seems to be where the problem lies. The "control" as edgeR is concerned is condition=Col and time=0 with index 1.

    The design matrix is, in order:
    (Intercept), ConditionAS2, ConditionKan, ConditionRev, ConditionSTM,
    ConditionCol:Timet030, ConditionAS2:Timet030, ConditionKan:Timet030, ConditionRev:Timet030, ConditionSTM:Timet030,
    ConditionCol:Timet060, etc.

    Here's the two calls that I think should be equivalent:

    (a) glmLRT(f, coef=2)

    This returns a very reasonable dataset for DE{Col(0)|AS2(0)} which compares favorably to other DE analyses (edgeR and DESeq2) using only the appropriate samples. I don't doubt that it's fine. However,

    (b) glmLRT(f, contrast=c(-1,1,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0))

    returns complete garbage with logFC values in the stratosphere. To make matters more confusing, it does return seemingly correct data for non-t=0 contrasts, for example,

    (c) glmLRT(f, contrast=c(0,0,0,0,0, -1,1,0,0,0, 0,0,0,0,0, 0,0,0,0,0))

    seems to return a very reasonable dataset for DE{Col(30)|AS2(30)}.

    What am I missing about the meaning of the first index in the array passed to contrast= in glmLRT??? Is there some reason why contrast= can only be used for contrasts between non-control samples? I don't get it.
    Sam Hokin
    Computational Scientist, Carnegie and NCGR

  • #2
    It's because your model matrix is using the treatment-contrasts parametrisation. That's why your second constrast doesn't make sense. What you're testing there is if the second coefficient, which is the difference of the first treatment to control is different to the first coefficient, which is the control. Subtracting a difference from an absolute measurement isn't correct.

    Code:
    glmLRT(f, contrast=c(0,1,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0))
    will give you the same answer as your first test.

    Comment


    • #3
      Thanks, Dario! That tells me that my other "contrasts" aren't what I want, either. I don't want the difference between [the difference between Col30 and Col0] and [the difference between AS230 and Col0]. That is actually not meaningful, either, but I guess that's what "contrast" means - variation of DE, so a sort of meta-DE.

      I've just rerun the analysis using only the samples of interest in the "simple" formulation of edgeR, so four times for the four times. Simple enough. It's really like four separate experiments done at four separate times.
      Sam Hokin
      Computational Scientist, Carnegie and NCGR

      Comment


      • #4
        Wait a sec. I'm looking at the edgeR User's Guide, section 3.3.1 "defining each treatment as a group", and I'm pretty much sure it says to do what I originally did. It says:

        > lrt <- glmLRT(fit, contrast=c(-1,0,1,0,0,0))

        would find the Drug.2vs0 contrast.

        That seems exactly what I was doing, and getting nonsense answers for logFC. I don't see how what I was doing differs from the example in 3.3.1.

        Does anyone else find the edgeR documentation to be confusing and inconsistent with application???
        Sam Hokin
        Computational Scientist, Carnegie and NCGR

        Comment


        • #5
          You never explained what design matrix you used for glmFit. It can't be the same one for both types of contrasts. The fact that you have an intercept immediately suggests it is wrong. The right formula would be ~ 0 + factor(paste(condition, time, sep = '.')) The 0 removes the intercept.

          Comment


          • #6
            Ahhhh, thank you. That is really not very clear from the edgeR manual, or maybe I'm just too new to this approach to understand that. Seems subtle to me, but I am now enlightened!
            Sam Hokin
            Computational Scientist, Carnegie and NCGR

            Comment


            • #7
              Hei guys,

              But the section 3.3.1 "defining each treatment as a group" doesn't refers to unpaired samples only? I'm not sure about that
              In your case samhokin (so in mine) seems like you have paired samples, I mean, samples collected from the same patient at different time points. Am I wrong?!?

              Comment


              • #8
                Mine are actually independent bio reps of a lab-grown plant, with different duration of application of a hormone ('duration' is a better word than 'time' in my case). So it's quite different from human studies - my sample groups are the same genome, with reps for added stats.
                Sam Hokin
                Computational Scientist, Carnegie and NCGR

                Comment


                • #9
                  I have data frame called subtest as following:
                  RNA.LATER.MEN2B_S5 RNA.LATER.ROSA_S4 RNA.MEN2B.1_S2 RNA.MEN2B.2_S3 RNA.ROSA_S1
                  1 13707 13866 12193 12671 10178
                  2 0 0 0 0 1
                  3 7165 5002 1256 1341 2087
                  6 8537 16679 9042 9620 19168
                  10 19438 25234 15563 16419 16582
                  16 3 3 11 3 5
                  I would like to analysis the MEN samples to ROSA samples I did script like that .
                  > group=c("LMS5","LRS4","MS2","MS3","RS1")
                  > y=DGEList(counts=data.matrix(subtest),group=group ,genes=genes)
                  > indices=which(rowSums(cpm(y)>1)<3)
                  > y=y[-indices,]
                  > y=calcNormFactors(y)
                  > design=model.matrix(~0+group,data=subtest)
                  > cont.matrix= makeContrasts(groupLMS5-groupMS2-groupMS3-(groupLRS4-groupRS1),levels=design)
                  > fit <- glmFit(y, design,dispersion=y$trended.dispersion)
                  > lrt.all <- glmLRT(fit, contrast=cont.matrix)
                  > topTags(lrt.all)
                  summary(dexp<-decideTestsDGE(lrt.all,p=0.05,adjust="BH"))
                  -1 1
                  0 36
                  1 14248
                  I got one gene as down regulated and 14248 as upregulated so I think I did something wrong.please Iam a new user in that package and I want to get a appropriate number of up and down regulated genes to downstream analysis

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X