Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edgeR GLM model - factorial design with batch effects

    Hello,

    I am analyzing some RNA-Seq data with edgeR, and I have a question about how to set up the GLM model.matrix. My experiment is a 2X2 factorial design where both WT and KO animals are subjected to either of two treatments. The 4 groups are WT-Oil, WT-E2, KO-Oil, and KO-E2. I am interested in determining which genes are DE in the WT-E2 group (our phenotype after treating with E2 is rescued in similarly treated KO animals).

    Here is the code that I used

    > data.frame(Sample=colnames(a),genotype2,trt2)
    Sample genotype2 trt2
    1 X1_WT_Oil_140710 WT OIL
    2 X5_WT_Oil_140710 WT OIL
    3 X1_WT_Oil_141109 WT OIL
    4 X2_WT_Oil_141109 WT OIL
    5 X2_WT_E2_140710 WT E2
    6 X6_WT_E2_140710 WT E2
    7 X3_WT_E2_141109 WT E2
    8 X4_WT_E2_141109 WT E2
    9 X3_KO_Oil_140710 KO OIL
    10 X8_KO_Oil_140710 KO OIL
    11 X4_KO_E2_140710 KO E2
    12 X5_KO_E2_141109 KO E2
    13 X6_KO_E2_141109 KO E2
    14 X7_KO_E2_141109 KO E2

    > design2 <- model.matrix(~0+genotype2:trt2)
    > design2
    genotype2KO:trt2E2 genotype2WT:trt2E2 genotype2KO:trt2OIL genotype2WT:trt2OIL
    1 0 0 0 1
    2 0 0 0 1
    3 0 0 0 1
    4 0 0 0 1
    5 0 1 0 0
    6 0 1 0 0
    7 0 1 0 0
    8 0 1 0 0
    9 0 0 1 0
    10 0 0 1 0
    11 1 0 0 0
    12 1 0 0 0
    13 1 0 0 0
    14 1 0 0 0
    attr(,"assign")
    [1] 1 1 1 1
    attr(,"contrasts")
    attr(,"contrasts")$genotype2
    [1] "contr.treatment"

    attr(,"contrasts")$trt2
    [1] "contr.treatment"

    > b <- estimateGLMCommonDisp(b,design2,verbose=TRUE)
    Disp = 0.20378 , BCV = 0.4514
    > fit <- glmFit(b,design2)
    > lrt <- glmLRT(fit,contrast=c(-1,3,-1,-1))

    This worked just fine, but now I would like to reanalyze this data taking batch effects into account. The last part of the sample IDs (140710 and 141109) indicate the two separate dates in which the samples were collected/sequenced. The new data frame would look like this:

    > data.frame(Sample=colnames(a),genotype2,trt2,chip2)
    Sample genotype2 trt2 chip2
    1 X1_WT_Oil_140710 WT OIL 1
    2 X5_WT_Oil_140710 WT OIL 1
    3 X1_WT_Oil_141109 WT OIL 2
    4 X2_WT_Oil_141109 WT OIL 2
    5 X2_WT_E2_140710 WT E2 1
    6 X6_WT_E2_140710 WT E2 1
    7 X3_WT_E2_141109 WT E2 2
    8 X4_WT_E2_141109 WT E2 2
    9 X3_KO_Oil_140710 KO OIL 1
    10 X8_KO_Oil_140710 KO OIL 1
    11 X4_KO_E2_140710 KO E2 1
    12 X5_KO_E2_141109 KO E2 2
    13 X6_KO_E2_141109 KO E2 2
    14 X7_KO_E2_141109 KO E2 2

    The manual doesn't describe how to set up the model.matrix when there are two factors plus batch effects, and I'm struggling setting this up.

    Could somebody guide me as to how to properly define the model.matrix as well as properly define the correct contrasts in the glmLRT function? Thanks!
    Last edited by rhcr56; 01-13-2015, 04:38 PM. Reason: highlight important changes

  • #2
    I never got the fixation with treating a simple factorial design as if it's not. You can skip the contrasts in your first design and simply use:
    Code:
    design2 <- model.matrix(~genotype2*trt2)
    You then don't need any contrasts:
    Code:
    fit <- glmFit(b,design2)
    lrt <- glmLRT(fit) #You'll get the last coefficient, which will be the interaction, by default.
    BTW, just make sure that KO and OIL are the base levels of their respective factors.

    This also lends itself nicely to adding a batch (not that it's significantly different using the "collapse the groups and make contrasts" example, where you will still contrast the same groups):
    Code:
    design = model.matrix(~batch+genotype2*trt2)

    Comment


    • #3
      Thanks for the reply. This was my first inclination, however I'm confused about the last coefficient. The code you provided was used below

      > design3 <- model.matrix(~genotype2*trt2)
      > fit <- glmFit(b,design3)
      > lrt2 <- glmLRT(fit)
      > topTags(lrt2)
      Coefficient: genotype2WT:trt2OIL
      logFC logCPM LR PValue FDR
      1600002K03Rik -8.515051 2.068219 37.67803 8.343907e-10 8.445590e-06
      Mettl21c 9.151695 2.339162 36.03935 1.933725e-09 8.445590e-06
      Il10ra 9.719101 2.009320 35.98865 1.984707e-09 8.445590e-06
      Mpo 5.802613 3.754349 34.41918 4.443238e-09 1.265384e-05
      Cd177 7.962947 2.954040 34.20659 4.956072e-09 1.265384e-05
      Hspa1a -4.576308 4.852461 33.63713 6.641285e-09 1.413044e-05
      Tpte 7.864546 2.116211 32.57592 1.146264e-08 2.090457e-05
      Bace2 -10.589050 3.180579 31.81367 1.696943e-08 2.707897e-05
      Hspa1b -4.314080 4.794562 30.50166 3.335810e-08 4.731661e-05
      Fuom -8.057760 1.793066 30.06869 4.170094e-08 5.323542e-05

      If I am interpreting this correctly, this will identify genes that are DE in the WT-Oil group compared to every other group. I am interested in those DE genes in the WT-E2 group. Do I need to use a different coefficient? When I used the contrasts, I got the appropriate test.

      > topTags(lrt,n=343)
      Coefficient: -1*genotype2KO:trt2E2 3*genotype2WT:trt2E2 -1*genotype2KO:trt2OIL -1*genotype2WT:trt2OIL

      Thanks

      Comment


      • #4
        I'll reiterate my previous "BTW":
        BTW, just make sure that KO and OIL are the base levels of their respective factors.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X