Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 - factor base level changes the DE genes

    Hello,

    I have an RNAseq experiment from two strains of mice, that have been exposed to two different environments (three biological replicates of each combination). So my experimental setup is

    > colData
    sample genetics environment
    black2 B6 black
    black3 B6 black
    black5 B6 black
    black1 B6 yellow
    black4 B6 yellow
    black6 B6 yellow
    agouti1 129 yellow
    agouti4 129 yellow
    agouti6 129 yellow
    agouti2 129 black
    agouti3 129 black
    agouti5 129 black

    I want to test which genes change their expression when the environment changes, controlling for the genetic background. I am using DESeq2 for this, with the formula ~ genetics + environment + genetics:environment.

    However, the choice of the base level for the factors in my experimental design changes the genes that are significant. If I leave the default reference levels (129 for genetics and black for env) I get more DE genes that if I set as reference B6 for genetics and yellow for environemnt.
    I thought the reference level is used to decide how to calculate the fold change and is necessary to correctly interpret the coefficients, but shouldn't the results be the same?

    Any thoughts on what am I missing?

    For cases where you have a control and mutant, it makes sense to use the control as the base level and compare everything to that, but in this case, I don't have any valid reason to chose one strain over the other. My results are different depending on this, so what is the correct form of doing this sort of analysis?

    Thank you very much in advance.


    My code is below.

    ## 1 ##
    > levels(colData$genetics)
    [1] "129" "B6"
    > levels(colData$environment)
    [1] "black" "yellow"

    > dds <- DESeqDataSetFromMatrix(countData = data[,8:19], colData = colData, design = ~ genetics + environment + genetics:environment)
    > dds <- DESeq(dds)

    > resG <- results(dds, alpha=0.05, name="genetics_B6_vs_129")
    > resE <- results(dds, alpha=0.05, name="environment_yellow_vs_black")
    > resGE <- results(dds, alpha=0.05, name="geneticsB6.environmentyellow")

    > dim(subset(resG, resG$padj < 0.05))
    [1] 3675 6
    > dim(subset(resE, resE$padj < 0.05))
    [1] 20 6
    > dim(subset(resGE, resGE$padj < 0.05))
    [1] 24 6

    ## 2 ##
    > colData2 <- colData
    > colData2$genetics <- relevel(colData2$genetics, ref="B6")
    > colData2$environment <- relevel(colData2$environment, ref="yellow")
    > levels(colData2$genetics)
    [1] "B6" "129"
    > levels(colData2$environment)
    [1] "yellow" "black"

    > dds2 <- DESeqDataSetFromMatrix(countData = data[,8:19], colData = colData2, design = ~ genetics + environment + genetics:environment)
    > dds2 <- DESeq(dds2)

    > res2G <- results(dds2, alpha=0.05, name="genetics_129_vs_B6")
    > res2E <- results(dds2, alpha=0.05, name="environment_black_vs_yellow")
    > res2GE <- results(dds2, alpha=0.05, name="genetics129.environmentblack")

    > dim(subset(res2G, resG$padj < 0.05))
    [1] 3127 6
    > dim(subset(res2E, resE$padj < 0.05))
    [1] 2 6
    > dim(subset(res2GE, resGE$padj < 0.05))
    [1] 0 6

    ##

    > g1 <- subset(resG, resG$padj < 0.05)
    > g2 <- subset(res2G, res2G$padj < 0.05)

    > length(intersect(rownames(g1), rownames(g2)))
    [1] 2640



    -- output of sessionInfo():

    R version 3.0.2 (2013-09-25)
    Platform: x86_64-apple-darwin10.8.0 (64-bit)

    locale:
    [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

    attached base packages:
    [1] parallel stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] DESeq2_1.2.10 RcppArmadillo_0.4.320.0 Rcpp_0.11.2
    [4] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7
    [7] BiocGenerics_0.8.0

    loaded via a namespace (and not attached):
    [1] annotate_1.40.1 AnnotationDbi_1.24.0 Biobase_2.22.0 DBI_0.2-7
    [5] genefilter_1.44.0 grid_3.0.2 lattice_0.20-23 locfit_1.5-9.1
    [9] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2
    [13] survival_2.37-7 tools_3.0.2 XML_3.95-0.2 xtable_1.7-3

  • #2
    At a guess, I'd say it has to be your interaction term. When you change the parameters, you change the interaction term, and the fact that the results are different is telling you there is some difference in the interaction of 129&black versus B6&yellow.

    DESeq2 includes some mechanisms of looking at those specific interaction affects if I remember the vingette correctly? So you might want to read through that and try to pull out the interaction affect in both your contrasts. I think it is potentially telling you something about your actual data and biological responses, so you need to determine if that interaction term is significant or not. You should also try the other, alternate combinations of interaction so you get a complete overview of which interactions are or are not significant. Significant interaction affects will really confound trying to determine the significance of individual factors if you wish to stick with a single combined data set and analysis. You may have to separate the data so you can analyze a single factor at a time without any interaction.

    Or again if the interaction is significant, you could look at some alternate normalizations to subtract one factor before significance testing. For example a TMM or KDMM normalization with the 129 mean genotype responses subtracted as baseline.
    Last edited by mbblack; 08-13-2014, 12:06 PM.
    Michael Black, Ph.D.
    ScitoVation LLC. RTP, N.C.

    Comment


    • #3
      hi xibarra,

      Note that version 1.2 is from October 2013. It's admittedly a hassle to keep R and Bioc updated, but version 1.2 was still only the second release of DESeq2. Some users reported to us last year that for certain designs, changing the base level could change the LFCs and p-values, usually only slightly, due to an asymmetry of the prior on LFCs. We then developed the methods such that changes of base level would have no effect on the results, and these were released with version 1.4 in March 2014. An additional change was to not shrink the main effect terms in designs with interaction terms.

      So you can either update R and Bioc, or for the old version (1.2) you can set betaPrior=FALSE, which should produce symmetric results tables.

      Comment


      • #4
        Thank you for your replies.
        I suspected I needed to update to the latest version, which I'll do. Still nice to know I'm not misunderstanding something or doing the wrong analysis.
        Best,
        Ximena

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X