Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq: more than 2 levels per condition?

    Hi,

    Is it possible in DESeq to analyze a design with more than 2 levels per condition/factor?
    I'm working with a design, that has 3 different treatments (untreated, treatment1, treatment2) at several time points (I also have replicates of all of them):

    treatment: time:

    untreated 0h
    untreated 24h
    untreated 48h
    treatment1 0h
    treatment1 24h
    treatment1 48h
    treatment2 0h
    treatment2 24h
    treatment2 48h

    Thanks in advance,
    Elena
    Last edited by edue; 11-18-2011, 05:23 AM.

  • #2
    Sure, you can analyse more complex design. See the section on GLMs in the vignette. How precisely to set up the test depends on what hypothesis you want to test.

    Comment


    • #3
      Hi Simon,
      Can you please clarify this for me? If I have more than one factor e.g. treatment and timepoint, i use the GLM full model approach. If I have only one factor, but it has more than two levels (A, B, C), should I still use the GLM approach? Or is it better to use the simpler model and do nbinomTest several times for each 2-way comparison (A vs B; A vs C; B vs C)? Is there a way to use the simpler model, but also perform the differential expression in one step (e.g. anova, especially for many-level factors)?
      Many thanks for all your work on DESeq!
      Matt

      Comment


      • #4
        For pair-wise comparisons, you have to subset your data set to only the samples involved. To be consistent with the ANOVA-style result for all levels, you should do the subsetting after the dispersion estimation.

        Comment


        • #5
          Thanks, Simon. By subsetting, I assume you mean to simply run a number of nbinomTest commands, one for each comparison, using the same countDataSet (after dispersion estimation). For example:

          Code:
          design <- data.frame(
          	sample.names = sampleTable$V1,
          	count.files = sampleTable$V2,
          	condition = c("A", "A", "A", "B", "B", "B", "C", "C", "C")
          )
          
          cds <- newCountDataSetFromHTSeqCount(design, directory="/data/dir")
          cds <- estimateSizeFactors( cds )
          cds <- estimateDispersions( cds )
          
          AvsB <- nbinomTest(cds, "A", "B")
          AvsC <- nbinomTest(cds, "A", "C")
          BvsC <- nbinomTest(cds, "B", "C")

          Comment


          • #6
            I was just going to make a thread on a similar vein, so may as well ask my question in this one.

            Also dealing with a subset of pairwise comparisons in an analysis, and hte correct way to run it with DESeq.

            Say you have a time course analysis with 3 bioligical replicates collected from 6 different time points. The comparisons we are interested it looking at is how all of the time points are different compared to time 1.

            So 5 different pairwise tests: t1 vs t2, t1 vs t3, t1 vs t4, t1 vs t5, and t1 vs t6.

            So Simon, the appropriate way to run this analysis using your DESeq would be to have them all in one count data set and then just run 5 different nbinomTests with (cds, "t1", "t2"), (cds, "t1", t3")... etc? Or taking the raw counts for "t1" and "t2", putting them in their own table, and creating / testing a count data set for each pair?

            In addition, since we are doing multiple tests on the same data set, is there a need to re-do the False Discovery Rate calculation by combining the raw p-values from all 5 pairwise tests into a single list, and re-running p.adjust on the full set of results? Or is keeping the FDR values for each individual test acceptable?

            Comment


            • #7
              I have a similar experimental set-up as above and therefore face the same decision. Essentially the question is, assuming > 2 samples (comparisons) should the variance estimation (estimateDispersions) be performed using ALL of the samples before performing the pairwise DE test, or should the variance estimation be restricted to the pair of samples that one is testing for DE?

              Cheers,

              Comment


              • #8
                Will the comparison between the full model and the reduced model (only intercept) give the overall significance of time effect?

                dfit1 <- fitNbinomGLMs(d, count ~ condition)
                dfit0 <- fitNbinomGLMs(d, count ~ 1)
                dpval <- nbinomGLMTest(dfit1, dfit0)
                dpadj <- p.adjust(dpval, method="BH")

                Comment


                • #9
                  This is indeed an important point john_nl; my impression is that estimating dispersion for only the two levels you're going to compare is a bit cheating on statistics... One would expect the dispersion to be calculated on all the condition levels, and then perform an ANOVA with contrasts... Does DESeq support this?

                  Comment


                  • #10
                    Gabriela
                    yes, DESeq supports GLMs of any type. See also Simon's earlier posts.
                    Wolfgang Huber
                    EMBL

                    Comment


                    • #11
                      Hello,

                      I have a couple questions regarding biological replicates in DESeq. I have HTSeq output files from RNA-seq data examining the effects of three chemicals on gene expression. There are two experiments for this data. One is examining the expression changes compared to a vehicle control (1%) at a high concentration of chemical (10uM) and the other experiment is examining gene expression changes at a lower concentration (1uM) of the chemicals at a lower vehicl concentration (0.1%). Currently, I have been running DESeq on the two experiments separately, i.e. two separate R codes for each experimental setup so the design variable contains 4 conditions corresponding to their respective HTSeq output files for each experiment (hopefully this all makes sense).
                      My first question is whether I should run DESeq on the experiments combined instead of keeping them separate. In this case I would have a design variable contain all 8 conditions. My reasoning (potentially naive reasoning) about combining the two experiments is that I would better estimate the overall gene dispersions for all genes examined and yet still be able to run the 'nbinomTest()' normally if I define the conditions correctly. (Maybe I’m getting confused in the vignette’s definition of condition and factor?)
                      My second question is with regards to outliers as identified by the PCA plot function of DESeq. I have generated PCA plots for both experiments (keeping them separate) in order to see whether the treatments group together and the general pattern of the data. For both the high concentration and low concentration experiments the PCA plots show that some of the replicates differ rather substantially from their respective treatment groups (see images). Now, I know that removing outliers from analyses is risky business and needs to be justified, but based on how different these replicates are from the treatments would it be ok to take these out?
                      Click image for larger version

Name:	Plot 1.png
Views:	1
Size:	7.2 KB
ID:	304295
                      Click image for larger version

Name:	Plot2.png
Views:	1
Size:	9.2 KB
ID:	304296

                      Thanks for all the help!!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      58 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      45 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X