Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Skiaphrene
    Member
    • Aug 2013
    • 18

    Advanced interaction contrast in DESeq2

    Hi everyone,



    I am currently working on some RNA-seq data in DESeq2 and I would like to know if I can transform a specific biological question into a DESeq2 glm contrast for which Differentially Expressed Genes (DEGs) can be called. I am not very comfortable with contrasts (and not even sure they’re the way to go, despite having read the helpful related Seqanswers threads) so I’m looking for some help!



    We have an experiment with 3 immune cell Populations (say A, B, and C, with A as the reference) and 2 Conditions (a Control condition as reference, and a Stimulated condition), with 3 biological replicates per group (hence 18 samples in total).

    So far we’ve analysed each cell population independently, deriving the list of DEGs between Stimulated and Control in each, and then looking at how these lists intersect (e.g. with Venn diagrams).



    However, it would make much more sense to analyse all samples in one go, both statistically (more power) and biologically (as everything was done with the same protocol). In this case cell Population would be added as a new factor, and I’d also include the Population:Condition interaction. I have been experimenting with this setup:
    Code:
    design = ~ Population + Condition + Population:Condition
    The list of resultsNames in the model is thus:
    Code:
    "Intercept"                  
    "ConditionControl"                   "ConditionStimulated"               
    "PopulationA"                          "PopulationB"           "PopulationC"
    "ConditionControl.PopulationA" "ConditionStimulated.PopulationA"
    "ConditionControl.PopulationB" "ConditionStimulated.PopulationB" 
    "ConditionControl.PopulationC" "ConditionStimulated.PopulationC"
    Getting the list of overall DEGs for Stimulated versus Control is straightforward:
    Code:
    contrast=c(“Condition”, “Stimulated”, “Control”)
    Writing a contrast to compare Population C versus B, irrespective of Condition, is easy too:
    Code:
    contrast=c(“Population”, “C”, “B”)
    I’m not entirely sure of myself but it seems to me that the contrast to test the effect of the interaction of Population B and Condition Stimulated would be:
    Code:
    contrast=list(c(“PopulationB”, “ConditionStimulated”, “PopulationB.ConditionStimulated”))
    Am I right so far?



    Either way, then comes the tricky part.

    Our biological question is: what are the genes that are differentially expressed between Stimulated and Control in populations A and B BUT NOT differentially expressed between Stimulated and Control in population C? (this is because we know A and B drive a specific type of downstream response, while C is more of an “on-looker”, a sort of internal control if you wish)

    Is this the same as asking for the glm coefficients of PopulationB.Stimulated and PopulationC.Stimulated? If yes, how does one write this up as a contrast? I haven’t been able to figure it out, despite reading the various posts about writing up the contrasts... Can it be so simple as any of the following:
    Code:
    contrast=list(c(“PopulationB.Stimulated”, “PopulationC.Stimulated”))
    contrast=list(c(“PopulationB.Stimulated”, “PopulationC.Stimulated”, “ConditionStimulated”))
    contrast=list(c(“PopulationB.Stimulated”, “PopulationC.Stimulated”, “ConditionStimulated”, “PopulationB”, “PopulationC”))


    ...Or would it just be better to test for the overall Stimulated vs Control DEGs, and then the DEGs for each interaction with Population, and do set analysis on all that?



    Thanks in advance for your help!

    Best,

    -- Alex
  • Michael Love
    Senior Member
    • Jul 2013
    • 333

    #2
    hi Alex,

    In the latest DESeq2 version, 1.8, I have some new instructions for users who might have a difficult time setting up such contrasts.

    Can you look over the first code chunk in Section 3.3 Interactions:

    The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


    This makes comparisons of condition effects in each group much easier to perform.

    For your other question:

    "what are the genes that are differentially expressed between Stimulated and Control in populations A and B BUT NOT differentially expressed between Stimulated and Control in population C?"

    There is not a way to build such a set with one results() call. But you can build each table individually and then look at the union() of the first two, setdiff() the last table.

    The last table can be constructed by specifying a threshold for LFC, say log2 fold change < 2, and then using lfcThreshold=2 and altHypothesis="lessAbs". You need to come up with a threshold value which makes sense for your system, and you have to turn off betaPrior for this comparison. I'd recommend inspecting how this changes with plotMA(res). And see the DESeq2 paper for more details on the meaning of this threshold test.
    Last edited by Michael Love; 04-28-2015, 11:12 AM.

    Comment

    • Skiaphrene
      Member
      • Aug 2013
      • 18

      #3
      Hi Michael,


      Thank you very much for your quick reply!!!


      Originally posted by Michael Love View Post
      In the latest DESeq2 version, 1.8, I have some new instructions for users who might have a difficult time setting up such contrasts.

      Can you look over the first code chunk in Section 3.3 Interactions:

      The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


      This makes comparisons of condition effects in each group much easier to perform.
      => This documentation is exactly what I needed! It makes things so much clearer... I'll upgrade our version of R, Bioconductor and DESeq2 just to be sure and I'll try this out. Thank you!



      Originally posted by Michael Love View Post
      For your other question:

      "what are the genes that are differentially expressed between Stimulated and Control in populations A and B BUT NOT differentially expressed between Stimulated and Control in population C?"

      There is not a way to build such a set with one results() call. But you can build each table individually and then look at the union() of the first two, setdiff() the last table.

      The last table can be constructed by specifying a threshold for LFC, say log2 fold change < 2, and then using lfcThreshold=2 and altHypothesis="lessAbs". You need to come up with a threshold value which makes sense for your system, and you have to turn off betaPrior for this comparison. I'd recommend inspecting how this changes with plotMA(res). And see the DESeq2 paper for more details on the meaning of this threshold test.
      => This is essentially what I had in mind, but doing it within DESeq2 rather than downstream is certainly much better from a statistical point of view. I'll try this as soon as possible!


      Thank you again for your help! I may have further questions later, we'll see...

      Best regards,

      -- Alex

      Comment

      Latest Articles

      Collapse

      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


        Here are nine questions we think about, in roughly the order they matter, before...
        06-18-2026, 07:11 AM
      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        06-02-2026, 10:05 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      30 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      44 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      50 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      51 views
      0 reactions
      Last Post SEQadmin2  
      Working...