Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edgeR GLM model - factorial design with batch effects

    Hello,

    I am analyzing some RNA-Seq data with edgeR, and I have a question about how to set up the GLM model.matrix. My experiment is a 2X2 factorial design where both WT and KO animals are subjected to either of two treatments. The 4 groups are WT-Oil, WT-E2, KO-Oil, and KO-E2. I am interested in determining which genes are DE in the WT-E2 group (our phenotype after treating with E2 is rescued in similarly treated KO animals).

    Here is the code that I used

    > data.frame(Sample=colnames(a),genotype2,trt2)
    Sample genotype2 trt2
    1 X1_WT_Oil_140710 WT OIL
    2 X5_WT_Oil_140710 WT OIL
    3 X1_WT_Oil_141109 WT OIL
    4 X2_WT_Oil_141109 WT OIL
    5 X2_WT_E2_140710 WT E2
    6 X6_WT_E2_140710 WT E2
    7 X3_WT_E2_141109 WT E2
    8 X4_WT_E2_141109 WT E2
    9 X3_KO_Oil_140710 KO OIL
    10 X8_KO_Oil_140710 KO OIL
    11 X4_KO_E2_140710 KO E2
    12 X5_KO_E2_141109 KO E2
    13 X6_KO_E2_141109 KO E2
    14 X7_KO_E2_141109 KO E2

    > design2 <- model.matrix(~0+genotype2:trt2)
    > design2
    genotype2KO:trt2E2 genotype2WT:trt2E2 genotype2KO:trt2OIL genotype2WT:trt2OIL
    1 0 0 0 1
    2 0 0 0 1
    3 0 0 0 1
    4 0 0 0 1
    5 0 1 0 0
    6 0 1 0 0
    7 0 1 0 0
    8 0 1 0 0
    9 0 0 1 0
    10 0 0 1 0
    11 1 0 0 0
    12 1 0 0 0
    13 1 0 0 0
    14 1 0 0 0
    attr(,"assign")
    [1] 1 1 1 1
    attr(,"contrasts")
    attr(,"contrasts")$genotype2
    [1] "contr.treatment"

    attr(,"contrasts")$trt2
    [1] "contr.treatment"

    > b <- estimateGLMCommonDisp(b,design2,verbose=TRUE)
    Disp = 0.20378 , BCV = 0.4514
    > fit <- glmFit(b,design2)
    > lrt <- glmLRT(fit,contrast=c(-1,3,-1,-1))

    This worked just fine, but now I would like to reanalyze this data taking batch effects into account. The last part of the sample IDs (140710 and 141109) indicate the two separate dates in which the samples were collected/sequenced. The new data frame would look like this:

    > data.frame(Sample=colnames(a),genotype2,trt2,chip2)
    Sample genotype2 trt2 chip2
    1 X1_WT_Oil_140710 WT OIL 1
    2 X5_WT_Oil_140710 WT OIL 1
    3 X1_WT_Oil_141109 WT OIL 2
    4 X2_WT_Oil_141109 WT OIL 2
    5 X2_WT_E2_140710 WT E2 1
    6 X6_WT_E2_140710 WT E2 1
    7 X3_WT_E2_141109 WT E2 2
    8 X4_WT_E2_141109 WT E2 2
    9 X3_KO_Oil_140710 KO OIL 1
    10 X8_KO_Oil_140710 KO OIL 1
    11 X4_KO_E2_140710 KO E2 1
    12 X5_KO_E2_141109 KO E2 2
    13 X6_KO_E2_141109 KO E2 2
    14 X7_KO_E2_141109 KO E2 2

    The manual doesn't describe how to set up the model.matrix when there are two factors plus batch effects, and I'm struggling setting this up.

    Could somebody guide me as to how to properly define the model.matrix as well as properly define the correct contrasts in the glmLRT function? Thanks!
    Last edited by rhcr56; 01-13-2015, 04:38 PM. Reason: highlight important changes

  • #2
    I never got the fixation with treating a simple factorial design as if it's not. You can skip the contrasts in your first design and simply use:
    Code:
    design2 <- model.matrix(~genotype2*trt2)
    You then don't need any contrasts:
    Code:
    fit <- glmFit(b,design2)
    lrt <- glmLRT(fit) #You'll get the last coefficient, which will be the interaction, by default.
    BTW, just make sure that KO and OIL are the base levels of their respective factors.

    This also lends itself nicely to adding a batch (not that it's significantly different using the "collapse the groups and make contrasts" example, where you will still contrast the same groups):
    Code:
    design = model.matrix(~batch+genotype2*trt2)

    Comment


    • #3
      Thanks for the reply. This was my first inclination, however I'm confused about the last coefficient. The code you provided was used below

      > design3 <- model.matrix(~genotype2*trt2)
      > fit <- glmFit(b,design3)
      > lrt2 <- glmLRT(fit)
      > topTags(lrt2)
      Coefficient: genotype2WT:trt2OIL
      logFC logCPM LR PValue FDR
      1600002K03Rik -8.515051 2.068219 37.67803 8.343907e-10 8.445590e-06
      Mettl21c 9.151695 2.339162 36.03935 1.933725e-09 8.445590e-06
      Il10ra 9.719101 2.009320 35.98865 1.984707e-09 8.445590e-06
      Mpo 5.802613 3.754349 34.41918 4.443238e-09 1.265384e-05
      Cd177 7.962947 2.954040 34.20659 4.956072e-09 1.265384e-05
      Hspa1a -4.576308 4.852461 33.63713 6.641285e-09 1.413044e-05
      Tpte 7.864546 2.116211 32.57592 1.146264e-08 2.090457e-05
      Bace2 -10.589050 3.180579 31.81367 1.696943e-08 2.707897e-05
      Hspa1b -4.314080 4.794562 30.50166 3.335810e-08 4.731661e-05
      Fuom -8.057760 1.793066 30.06869 4.170094e-08 5.323542e-05

      If I am interpreting this correctly, this will identify genes that are DE in the WT-Oil group compared to every other group. I am interested in those DE genes in the WT-E2 group. Do I need to use a different coefficient? When I used the contrasts, I got the appropriate test.

      > topTags(lrt,n=343)
      Coefficient: -1*genotype2KO:trt2E2 3*genotype2WT:trt2E2 -1*genotype2KO:trt2OIL -1*genotype2WT:trt2OIL

      Thanks

      Comment


      • #4
        I'll reiterate my previous "BTW":
        BTW, just make sure that KO and OIL are the base levels of their respective factors.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X