Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • DrWorm
    Member
    • Apr 2013
    • 17

    Differential gene expression across life cycle with DESeq2

    I'm trying to study gene expression across the life cycle of a nematode using DESeq2. We've sequenced eggs, L1, L2, L3, L4, L5, and adults. I've figured out how to do stage v. stage comparisons (e.g., egg v. L1, L1 v. L2, L2 v. L3, etc). Now, I'd like to generate a list of genes whose expression fluctuates at some point in the life cycle -- between any of the stages. I'd appreciate some advice on how to tease this out.
  • Michael Love
    Senior Member
    • Jul 2013
    • 333

    #2
    hi,

    I think what you want to use is the likelihood ratio test. You can either do this with DESeq():

    dds <- DESeq(dds, test="LRT", full=~stage, reduced=~1)

    or you can also use the nbinomLRT() function:

    dds <- nbinomLRT(dds, full=~stage, reduced=~1)

    The likelihood ratio test compares the difference in log likelihood using the stage information vs removing the stage information altogether. This then gives you a significance test for whether the gene expression changes with stage, at any point in the life cycle like you say.

    Comment

    • DrWorm
      Member
      • Apr 2013
      • 17

      #3
      Thanks so much for your reply!

      Just to be 100% sure...
      I run the DESeq function just as you demonstrated, then I grab the results:
      res <- results(dds)
      Any gene with an adjusted p-value less than 1e-5 (or whatever confidence cutoff we choose) will be differentially expressed at some stage, whereas the others show stable expression levels over the life cycle. Is this correct?

      Comment

      • Michael Love
        Senior Member
        • Jul 2013
        • 333

        #4
        yes. And with the LRT, note that the log2FoldChange and lfcSE columns in the results object give the log2 fold change and standard error of the last level over the first, which is not related to the LRT statistic, p-value or adjusted p-value in the other columns of the results object.

        Also, that is a very low adjusted p-value. We usually use a false discovery rate of 10% for examples. The optimal FDR to aim for depends on the cost of follow-up experiments and the cost of missing a true discovery, of course, but 1/10 or 1/20 is often reasonable. For exploratory experiments 1/5 might make sense.

        If you meant 1e-5 for a p-value threshold, we recommend to focus on adjusted p-values for building lists, as these are much more interpretable.

        Comment

        • DrWorm
          Member
          • Apr 2013
          • 17

          #5
          Thanks for clearing that up. I'll take another look at my results with padj < 0.1 or 0.05.

          I really appreciate your help! Thanks again for replying to my question.

          Comment

          • DrWorm
            Member
            • Apr 2013
            • 17

            #6
            Michael, I was hoping you might be able to answer one more related question...

            I've decided that I need to control for batch variation since some of the parasites I'm studying were derived from different host animals at different times of year, etc. I still want to generate a list of genes whose expression varies from stage to stage -- I just want to disregard variation due to other factors as best I can.

            Would changing this:
            dds <- DESeq(dds, test="LRT", full=~stage, reduced=~1)

            to this:
            dds <- DESeq(dds, test="LRT", full=~Stage+Date, reduced=~Date)

            accomplish my goal? I wanted to double check that I'm not making the reverse comparison and accidentally measuring variation due to collection date.

            Also, would you expect to see more DE genes controlling for batch variation or less? I seem to be finding more, and that surprised me (which is what inspired this post -- I'm nearly certain that I'm doing something wrong). I figured that some of the variation I was seeing before was likely due to differences in season or host animal and that pulling those factors out would reduce my DE gene count.

            Comment

            • Michael Love
              Senior Member
              • Jul 2013
              • 333

              #7
              hi,

              yes i would just change this to:

              dds <- DESeq(dds, test="LRT", full=~Date + Stage, reduced=~Date)

              in general always put the variable of interest at the end. while this won't make a difference for the LRT statistic, or p-values, the results() also prints out a log2 fold change based on the last variable in the design formula, and comparing the last level of this variable to the base level.

              this is admittedly a bit confusing when you have more than 2 levels for Stage, as all of the levels are taken into account when calculating the LRT statistic and p-values, but we wanted to keep the results() object standardized, and adding every possible LFC contrast would get messy.

              you can extract other LFCs, if you are interested, like so:

              lfcBvsA <- results(dds, contrast=c("Stage","B","A")$log2FoldChange
              lfcCvsA <- results(dds, contrast=c("Stage","C","A")$log2FoldChange
              lfcCvsB <- results(dds, contrast=c("Stage","C","B")$log2FoldChange
              etc


              regarding your second question, it can go either way, but often accounting for batch will end up with more DE genes. If the batches are balanced, the unaccounted-for batch effect contributes more to the denominator of the Wald statistic than the top. And similarly in the case of the LRT, unaccounted-for batch effect is likely to reduce the LRT and raise p-values. Not accounting for batch effect increases the estimates of dispersion at the gene-wise level, and additionally increases the fitted values for dispersion over all genes.

              for example, with normal linear models:

              > condition <- factor(rep(1:2,each=10))
              > batch <- factor(rep(rep(1:2,each=5),2))
              > condition
              [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
              Levels: 1 2
              > batch
              [1] 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2
              Levels: 1 2

              > y <- rnorm(20,mean=as.numeric(condition) + 3*as.numeric(batch))
              > y
              [1] 3.563869 3.065336 4.289239 1.650792 4.086886 6.564326 4.452499
              [8] 7.612532 7.720049 6.091950 6.644862 5.132749 5.328683 5.825075
              [15] 4.834425 6.843204 6.743685 9.943101 8.852742 10.021774
              > coef(summary(lm(y ~ condition)))
              Estimate Std. Error t value Pr(>|t|)
              (Intercept) 4.909748 0.6246445 7.860068 3.148028e-07
              condition2 2.107282 0.8833807 2.385475 2.825605e-02
              > coef(summary(lm(y ~ batch + condition)))
              Estimate Std. Error t value Pr(>|t|)
              (Intercept) 3.388551 0.4597157 7.370970 1.092649e-06
              batch2 3.042395 0.5308340 5.731349 2.446249e-05
              condition2 2.107282 0.5308340 3.969757 9.900114e-04

              Note the jump in t-statistic for condition from 2.38 to 3.96. This is because the unexplained variance was reduced by adding the batch variable. The estimate stayed at 2.10, while the SE of the estimate went down from 0.88 to 0.53.

              Comment

              • DrWorm
                Member
                • Apr 2013
                • 17

                #8
                Thank you!

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 08:59 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                22 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                32 views
                0 reactions
                Last Post SEQadmin2  
                Working...