Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 : design for estimating effects of treatment at various time points

    Dear community,

    I read several explanations on the usage of DESeq2 for DGE, but I am entirely new to this as well as to R usage, and am struggling to understand all the different options enough to make sure that I design my analysis correctly so as to answer the biological questions of interest.

    Briefly, I have tested a treatment over several time points and I have 2 biological replicates for each time point (0, 1 and 14 days) and treatment (SED, CON):
    SED: T1(x2), T14(x2)
    CON:T0(x2), T1(x2), T14(x2)

    I do not have T0 samples for the treatment because I took the baseline control samples from each of the 2 replicates then initiated the treatment on "treatment" organisms.

    I would like to answer the following questions:

    1) Are there genes that are typical responses to the treatment, irrespectively of the time the treatment is applied for (i.e. that vary generally with the treatment, regardless of time), and if so, which are those genes?
    2) Which are the genes that are differentially expressed after 24h? 1week? 2weeks of treatment?
    3) Is there an effect of time (e.g. does the expression of genes affected by the treatment increase over time)?

    I have seen comments on this thread https://support.bioconductor.org/p/61801/, but as I mentioned I am a bit confused by all these various arguments.

    Following the DESeq2 manual I attempted to set my design as a time series as follows:
    Code:
    dds <- DESeqDataSetFromMatrix(countData=countTable, colData= coldata ,coldata = ~ time + treatment + time:treatment)
    but I got an error message: "error: inv(): matrix appears to be singular".

    I therefore tried (successfully) the following:
    Code:
    >dds <- DESeqDataSetFromMatrix(countData=countTable, colData= coldata ,design = ~ time + treatment)
    >dds <- DESeq(dds)
    If I understand things correctly, this design will tell me genes that are differentially expressed across treatments for all time points in general; is this correct? However, from what I have read in the documentation, the questions I am asking to these data would probably best be answered using a likelihood ratio test. So this is where I start getting really confused...

    So, how would I obtain differential expression results for the following tests:

    SED 24h vs CON 24h; SED 2wks vs CON 2wks? SED vs CON (all time points together)?

    And is it possible to compare the evolution in expression of certain genes to see if they fit a regression model (e.g. increase in expression of treatment-affected genes over time)?

    I have tried the following (I list all steps for clarity):

    Code:
    > countTable <- read.table("141029_eXpress_ndndns_eX_uniq.csv", header=TRUE, row.names=1)
    > coldata = data.frame(row.names=colnames(countTable), treatment = as.factor(c("SED","SED","CON","CON","CON","SED","SED","CON","CON","CON")),time=as.factor(c("1","14","0","1","14","1","14","0","1","14")), colony=as.factor(c("A","A","A","A","A","H","H","H","H","H")))
    > treatment= coldata$treatment
    > time=coldata$time
    > colony=coldata$colony
    > time
    [1] 1  14 0  1  14 1  14 0  1  14
    Levels: 0 1 14
    > class(time)
    [1] "factor"
    
    > treatment
     [1] SED SED CON CON CON SED SED CON CON CON
    Levels: CON SED
    > class(treatment)
    [1] "factor"
    > library("DESeq2")
    > dds <- DESeqDataSetFromMatrix(countData=countTable, colData= coldata ,design = ~ colony + time + treatment)
    > dds <- DESeqDataSetFromMatrix(countData=countTable, colData= coldata ,coldata = ~ colony + time + treatment) 
    > design(dds) = ~ colony + time + treatment + time:treatment
    > ​dds = DESeq(dds, test="LRT", reduced= ~ time + treatment)
    However I get the same error message as above:

    Code:
    estimating size factors
    estimating dispersions
    gene-wise dispersion estimates
    
    error: inv(): matrix appears to be singular
    
    Error: inv(): matrix appears to be singular
    Thank you very much in advance for your help! This is all very interesting, but the learning curve is quite steep!

    Thanks,

    Chris
    Last edited by chris_s; 10-30-2014, 01:30 AM. Reason: Added information

  • #2
    The generally recommended design for time series (~ time + treatment + time:treatment) doesn't work for your experiment because there aren't treated samples at time0. These are good to have to control for any differences in your two groups before you apply treatment. But you should still be able to analyze your data.

    Note that you can add a column:

    Code:
    dds$group <- factor(paste(dds$time, dds$treatment))
    and use a design of ~group. Then this analysis,

    Code:
    design(dds) <- ~ group
    dds <- DESeq(dds)
    results(dds, contrast=c("group","...","..."))
    would allow you to compare any two groups.

    If you assume that the treatment group had no differences than the control group at time 0 (I'm not a fan of this assumption but the experiment's already done), you can test for genes which show changes due to treatment at all times. You have to do a little R legwork, because R's model.matrix function is expecting there to be treated samples at time 0.

    In DESeq2 version >= 1.4, you can provide model matrices to some of the lower level functions, like I do below. I think I need to work on the DESeq2 interface to allow for this kind of manipulation of model matrices for experimental designs like this one, without having to use the lower level functions (estimateDispersions*, etc.). The design is not used, but I noticed a bug in v1.6 that arises if it is ~ 1, so below I use ~ time. Updated (2015.01.05): Fixed bug in 1.6 and 1.7, now v1.7 allows user-supplied model matrices to top level functions DESeq() and estimateDispersions(), so the following extra code is no longer necessary...

    Code:
    m <- model.matrix(~ time + time:treatment)
    m
    m <- m[,-4] # get rid of the column of 0's
    design(dds) <- ~ time
    dds <- estimateSizeFactors(dds)
    dds <- estimateDispersionsGeneEst(dds, modelMatrix=m)
    dds <- estimateDispersionsFit(dds)
    dds <- estimateDispersionsMAP(dds, modelMatrix=m)
    dds <- nbinomLRT(dds, full=m, reduced=m[,-c(4,5)]) # remove the treatment effects
    results(dds)
    In the second to last line above, I remove the two treatment interactions in the reduced model, so this is testing for a significant treatment effect at any time point.
    Last edited by Michael Love; 01-05-2015, 03:38 PM. Reason: added part about bug

    Comment


    • #3
      Hi Michael,

      Thanks for your quick reply. Yes, I realise the issue of not having taken "treated" samples at time 0... I've been facing similar issues with other assays on the same experiment.

      Thanks for this, it seems to have worked nicely!

      I have one question regarding the test for the effects of treatment over all time. I tried the commands you suggested but in the results it says "log2 fold change: time14.treatmentSED". Doesn't this mean that these results are only valid for the effects of the treatment after 2 weeks and not for all time points, or am I misinterpreting this?

      Code:
      log2 fold change: time14.treatmentSED 
      LRT p-value: full vs reduced
      By the way, I forgot to mention that my samples are clonal pairs across treatments. In order to take the paired design into account, I guess I would have to adapt your suggested design to be :

      Code:
      design(dds) <- ~colony + group
      Is that correct?

      Thanks!
      Last edited by chris_s; 10-30-2014, 09:41 AM.

      Comment


      • #4
        "Doesn't this mean that these results are only valid for the effects of the treatment after 2 weeks and not for all time points"

        no, the LRT is for all time points, but we can only show one fold change in the table.

        See the more detailed explanation of this in ?results

        Yes, adding colony should work

        Comment


        • #5
          Hi Michael,

          One more question if you don't mind. When I run:
          Code:
          results(dds, contrast=c("group","1.CON","1.SED"))
          If the results show that a gene is upregulated, does it means the it is upregulated in 1.SED samples in comparison to 1.CON, or the opposite? If I follow the general principles of DESeq/DESeq2, the last variable is that of interest and the 1st is the control variable, but does it apply in this case?

          Thanks!

          Comment


          • #6
            Your first stop with a question should always be the help file for the function:

            in ?results we have the description for the 'contrast' argument:

            a character vector with exactly three elements: the name of a factor in the
            design formula, the name of the numerator level for the fold change, and
            the name of the denominator level for the fold change
            so the vector you have specified gives log2(CON/SED).

            This is also printed at the top of the results table when you examine it in the R console, something to the effect of "log2 fold change for group CON vs SED"

            A log2FoldChange of 1 implies CON is 2x SED.

            Comment


            • #7
              A note: DESeq2 versions >= 1.7 allow user-supplied model matrices to top level functions DESeq() and estimateDispersions().

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Working...
              X