Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edgeR model.matrix issue for multiple factors

    Hi, there,

    I've recently started to analyze RNAseq data myself. With the help of guidebooks from a friend and the packages and forums like this, I was able to get DESeq and DESeq2 done.
    I then tried to do edgeR on the same set of data but I got some issues on the multifactor comparisons.

    Background:

    Samples were collected from wild-type (WT) or knockout (KO) mice and the mice were fed on control diet (CD) or sugar diet (SD). So I got 4 groups of samples: WT_CD, KO_CD,WT_SD,KO_SD. There are 5 animals per group and hence I have total 20 samples.

    I was also able to do comparison of any 2 groups with edgeR (e.g. WT_CD vs KO_CD or WT_CD vs WT_SD) but I got stuck when I tried to more complicated comparisons with a different design.matrix.

    So basically

    A:
    I'd like to compare the DEGs ("W") from (WT_CD vs WT_SD) with the DEGs ("X") from (KO_CD vs KO_SD) and figure out the difference between W and X.

    B:
    Similarly, I'd like to compare the DEGs ("Y") from (WT_CD vs KO_CD) with the DEGs ("Z") from (WT_SD vs KO_SD) and figure out the difference between Y and Z.

    Q1: Is it possible to do this kind of comparison in edgeR? Or I should just directly compare these DEGs?

    The input file is one merged reads-count file that I used for DESeq.

    So here are my codes:

    > library(edgeR)
    Loading required package: limma
    > rawdata <- read.delim("/Users/NGS/all20_concat.sam.count.txt", check.names=FALSE, stringsAsFactors=FALSE)
    > y <- DGEList(counts=rawdata[,2:21], genes=rawdata[,1])
    > y <- calcNormFactors(y)

    > Genotype <- factor(c("WT","WT","WT","WT","WT","KO","KO","KO","KO","KO","WT","WT","WT","WT","WT","KO","KO","KO","KO","KO"))

    > Diet <- factor(c("CD","CD","CD","CD","CD","CD","CD","CD","CD","CD","SD","SD","SD","SD","SD","SD","SD","SD","SD","SD"))

    > data.frame(Sample=colnames(y),Genotype,Diet)
    Sample Genotype Diet
    1 01.sam.count WT CD
    2 02.sam.count WT CD
    3 03.sam.count WT CD
    4 04.sam.count WT CD
    5 05.sam.count WT CD
    6 06.sam.count KO CD
    7 07.sam.count KO CD
    8 08.sam.count KO CD
    9 09.sam.count KO CD
    10 10.sam.count KO CD
    11 11.sam.count WT SD
    12 12.sam.count WT SD
    13 13.sam.count WT SD
    14 14.sam.count WT SD
    15 15.sam.count WT SD
    16 16.sam.count KO SD
    17 17.sam.count KO SD
    18 18.sam.count KO SD
    19 19.sam.count KO SD
    20 20.sam.count KO SD

    # below is the potential issue

    > design <- model.matrix(~0+Genotype+Diet+Genotypeiet)
    > colnames(design)
    [1] "GenotypeKO" "GenotypeWT" "DietSD"
    "GenotypeWTietSD"

    # Q2: I'm not sure that was the right matrix for later comparisons like: (GenotypeKOietSD-GenotypeKOietCD)-(GenotypeWTietSD-GenotypeWTietCD) ?

    # Q3: Is this (GenotypeKOietSD-GenotypeKOietCD)-(GenotypeWTietSD-GenotypeWTietCD) comparison the right one to answer my question A mentioned above?

    Setup

    > sessionInfo()
    R version 3.1.3 (2015-03-09)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X 10.10.2 (Yosemite)

    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

    attached base packages:
    [1] stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] edgeR_3.8.6 limma_3.22.7

    loaded via a namespace (and not attached):
    [1] tools_3.1.3

    Any input will be much appreciated.
    Last edited by neokao; 03-17-2015, 08:07 AM.

  • #2
    I'm not quite sure what you are trying to do. Are you simply trying to test the interaction: (KO_CD - KO_SD) - (WT_CD - WT_SD)?

    Have you looked at Section 3.3.1 of the edgeR User's Guide? The instructions there make it very easy to make any comparison from an experiment such as yours.

    Comment


    • #3
      Thanks for the reply. I guess that I did not express it clearly.
      Alright. So after I did simple DEG analysis, I got the DEGs from the (WT_SD vs WT_CD) comparison and also DEGs from the (KO_SD vs KO_CD) comparison. Say in the 1st DEG list, I have gene A, B, C, D, E, F, G, H, I, J up-regulated and gene N, O, P, Q, R, S, T, U, V down-regulated by SD in WT mice.
      Then in the 2nd DEG list, I have gene E, F, G, H, I, J, K, L, M up-regulated and gene N, O, P, Q, W, X,Y, Z down-regulated by SD in KO mice.
      If I’d like to know which genes are specifically up-regulated by SD only in KO mice but not up-regulated by SD in WT mice, am I supposed to do
      (KO_SD-KO_CD) - (WT_SD-WT_CD) for edgeR? Or i should just directly compare the DEG lists manually to pull out “K, L, M"?


      Thanks for suggestions.
      Last edited by neokao; 03-25-2015, 03:49 PM.

      Comment


      • #4
        Originally posted by neokao View Post
        I got the DEGs from the (WT_SD vs WT_CD) comparison and also DEGs from the (KO_SD vs KO_CD) comparison. If I’d like to know which genes are specifically up-regulated by SD only in KO mice but not up-regulated by SD in WT mice, am I supposed to do (KO_SD-KO_CD) - (WT_SD-WT_CD) for edgeR? Or i should just directly compare the DEG lists manually?
        Computing the contrast is not the same thing so, yes, you just could just compare the DEG lists. The same answer would apply in any of the packages (edgeR, DESEq2 etc).

        If you have a large number of DE genes, and you want a more stringent list, then you could select genes that are DE in the KO mice and not in the WT mice and also DE for the interaction. This ensures that you only have genes for which the KO effect is substantially greater than the WT effect.
        Last edited by Gordon Smyth; 03-25-2015, 05:21 PM.

        Comment


        • #5
          Wow...Thanks for your quick reply. That was clear.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X