Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 - factor base level changes the DE genes

    Hello,

    I have an RNAseq experiment from two strains of mice, that have been exposed to two different environments (three biological replicates of each combination). So my experimental setup is

    > colData
    sample genetics environment
    black2 B6 black
    black3 B6 black
    black5 B6 black
    black1 B6 yellow
    black4 B6 yellow
    black6 B6 yellow
    agouti1 129 yellow
    agouti4 129 yellow
    agouti6 129 yellow
    agouti2 129 black
    agouti3 129 black
    agouti5 129 black

    I want to test which genes change their expression when the environment changes, controlling for the genetic background. I am using DESeq2 for this, with the formula ~ genetics + environment + genetics:environment.

    However, the choice of the base level for the factors in my experimental design changes the genes that are significant. If I leave the default reference levels (129 for genetics and black for env) I get more DE genes that if I set as reference B6 for genetics and yellow for environemnt.
    I thought the reference level is used to decide how to calculate the fold change and is necessary to correctly interpret the coefficients, but shouldn't the results be the same?

    Any thoughts on what am I missing?

    For cases where you have a control and mutant, it makes sense to use the control as the base level and compare everything to that, but in this case, I don't have any valid reason to chose one strain over the other. My results are different depending on this, so what is the correct form of doing this sort of analysis?

    Thank you very much in advance.


    My code is below.

    ## 1 ##
    > levels(colData$genetics)
    [1] "129" "B6"
    > levels(colData$environment)
    [1] "black" "yellow"

    > dds <- DESeqDataSetFromMatrix(countData = data[,8:19], colData = colData, design = ~ genetics + environment + genetics:environment)
    > dds <- DESeq(dds)

    > resG <- results(dds, alpha=0.05, name="genetics_B6_vs_129")
    > resE <- results(dds, alpha=0.05, name="environment_yellow_vs_black")
    > resGE <- results(dds, alpha=0.05, name="geneticsB6.environmentyellow")

    > dim(subset(resG, resG$padj < 0.05))
    [1] 3675 6
    > dim(subset(resE, resE$padj < 0.05))
    [1] 20 6
    > dim(subset(resGE, resGE$padj < 0.05))
    [1] 24 6

    ## 2 ##
    > colData2 <- colData
    > colData2$genetics <- relevel(colData2$genetics, ref="B6")
    > colData2$environment <- relevel(colData2$environment, ref="yellow")
    > levels(colData2$genetics)
    [1] "B6" "129"
    > levels(colData2$environment)
    [1] "yellow" "black"

    > dds2 <- DESeqDataSetFromMatrix(countData = data[,8:19], colData = colData2, design = ~ genetics + environment + genetics:environment)
    > dds2 <- DESeq(dds2)

    > res2G <- results(dds2, alpha=0.05, name="genetics_129_vs_B6")
    > res2E <- results(dds2, alpha=0.05, name="environment_black_vs_yellow")
    > res2GE <- results(dds2, alpha=0.05, name="genetics129.environmentblack")

    > dim(subset(res2G, resG$padj < 0.05))
    [1] 3127 6
    > dim(subset(res2E, resE$padj < 0.05))
    [1] 2 6
    > dim(subset(res2GE, resGE$padj < 0.05))
    [1] 0 6

    ##

    > g1 <- subset(resG, resG$padj < 0.05)
    > g2 <- subset(res2G, res2G$padj < 0.05)

    > length(intersect(rownames(g1), rownames(g2)))
    [1] 2640



    -- output of sessionInfo():

    R version 3.0.2 (2013-09-25)
    Platform: x86_64-apple-darwin10.8.0 (64-bit)

    locale:
    [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

    attached base packages:
    [1] parallel stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] DESeq2_1.2.10 RcppArmadillo_0.4.320.0 Rcpp_0.11.2
    [4] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7
    [7] BiocGenerics_0.8.0

    loaded via a namespace (and not attached):
    [1] annotate_1.40.1 AnnotationDbi_1.24.0 Biobase_2.22.0 DBI_0.2-7
    [5] genefilter_1.44.0 grid_3.0.2 lattice_0.20-23 locfit_1.5-9.1
    [9] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2
    [13] survival_2.37-7 tools_3.0.2 XML_3.95-0.2 xtable_1.7-3

  • #2
    At a guess, I'd say it has to be your interaction term. When you change the parameters, you change the interaction term, and the fact that the results are different is telling you there is some difference in the interaction of 129&black versus B6&yellow.

    DESeq2 includes some mechanisms of looking at those specific interaction affects if I remember the vingette correctly? So you might want to read through that and try to pull out the interaction affect in both your contrasts. I think it is potentially telling you something about your actual data and biological responses, so you need to determine if that interaction term is significant or not. You should also try the other, alternate combinations of interaction so you get a complete overview of which interactions are or are not significant. Significant interaction affects will really confound trying to determine the significance of individual factors if you wish to stick with a single combined data set and analysis. You may have to separate the data so you can analyze a single factor at a time without any interaction.

    Or again if the interaction is significant, you could look at some alternate normalizations to subtract one factor before significance testing. For example a TMM or KDMM normalization with the 129 mean genotype responses subtracted as baseline.
    Last edited by mbblack; 08-13-2014, 12:06 PM.
    Michael Black, Ph.D.
    ScitoVation LLC. RTP, N.C.

    Comment


    • #3
      hi xibarra,

      Note that version 1.2 is from October 2013. It's admittedly a hassle to keep R and Bioc updated, but version 1.2 was still only the second release of DESeq2. Some users reported to us last year that for certain designs, changing the base level could change the LFCs and p-values, usually only slightly, due to an asymmetry of the prior on LFCs. We then developed the methods such that changes of base level would have no effect on the results, and these were released with version 1.4 in March 2014. An additional change was to not shrink the main effect terms in designs with interaction terms.

      So you can either update R and Bioc, or for the old version (1.2) you can set betaPrior=FALSE, which should produce symmetric results tables.

      Comment


      • #4
        Thank you for your replies.
        I suspected I needed to update to the latest version, which I'll do. Still nice to know I'm not misunderstanding something or doing the wrong analysis.
        Best,
        Ximena

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        45 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X