Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • interactions with DESeq2 in a time-course analysis

    Dear all,

    we're having a time course data set of 8 TP (0, 16,24,30,48,72,90, and 100 hours).
    We would like to test for genes which are changing over time compared to the 0h (~ctrl).

    I have read the guide and a lot of posts here and on biostar.org. After understanding the way to create a design matrix for my data, I am still confused about the interactions in the design matrix.

    I have created the design matrix like that:

    Code:
    dds<-DESeqDataSetFromMatrix(countData=countTable, colData=phenotype, design= ~ replica + time )
    dds = DESeq(dds, test="LRT", reduced=~replica)
    Question 1:
    If I understand correctly, the results I get represents all genes with sig. diff. behaviour across all TP.
    But I get a lot of them. attached is [the plotMA of the counts].

    Code:
    > res
    log2 fold change (MLE): hours 100 vs 0 
    LRT p-value: '~ replica + hours' vs '~ replica' 
    DataFrame with 17558 rows and 6 columns
                   baseMean log2FoldChange     lfcSE      stat       pvalue         padj
                  <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
    FBgn0085804  0.18052802    -1.18153049  7.162547  1.518897 0.9816482138           NA
    ...
    Do all these genes change over time across all Time-points?

    Question 2:
    Does this list include all genes which are changing between only two time points?
    (that might be the reason, why the list is so long.)

    Question 3:
    what will change in term of my question (and of course the list of genes I will get), if I change my full and reduced model to this:
    Code:
    dds<-DESeqDataSetFromMatrix(countData=countTable, colData=phenotype, design= ~ replica + time +replica:time )
    dds = DESeq(dds, test="LRT", reduced=~replica +replica:time)

    Thanks in advance

    Assa
    Last edited by frymor; 11-05-2015, 03:21 AM.

  • #2
    1. Yup, you understood exactly. If you really want to be technical, what you're actually testing is whether including "hours" results in a better fit of the data...though the practical effect is asking for all genes changing over time. I should point out that you may not see all of these changes in direct pairwise comparisons (you'll probably see most of them though).

    2. Yup. This can be though of as a superset of the results from all pairwise comparisons. If it's ever DE in a pairwise comparison, it'll likely be DE in the LRT (the reverse isn't the case).

    Comment


    • #3
      Originally posted by dpryan View Post
      1. Yup, you understood exactly. If you really want to be technical, what you're actually testing is whether including "hours" results in a better fit of the data...though the practical effect is asking for all genes changing over time. I should point out that you may not see all of these changes in direct pairwise comparisons (you'll probably see most of them though).
      I didn't expect to see all the genes, when doing a pair-wise comparison, but I expect to see all of them as a subset (or subsets, if testing multiple time-point comparisons),as the LRT test supposedly tests for all DE genes over time.

      When checking for the pair-wise comparisons using the results() function, would it be better to keep using the LRT testing method, or would it be better to use the Wald test for a more robust statistical results.

      When comparing the two tests for a specific pair of time points, I can see a difference. I have read that the Wald test calculate the LFC shrinkage for the data while the takes multiple parameters into account.
      Code:
      > resTP16h_90hwald
      log2 fold change (MLE): hours 90 vs 16 
      Wald test p-value: hours 90 vs 16 
      DataFrame with 17558 rows and 6 columns
                     baseMean log2FoldChange     lfcSE       stat      pvalue        padj
                    <numeric>      <numeric> <numeric>  <numeric>   <numeric>   <numeric>
      FBgn0085804  0.18052802      -3.642448  6.899080 -0.5279614 0.597526110          NA
      FBgn0267431 19.73070118      -2.155084  1.270286 -1.6965346 0.089784690 0.125377361
      FBgn0039987  0.08559842      -2.183327  6.937402 -0.3147183 0.752975578          NA
      FBgn0058182  0.49195220      -2.710627  5.724264 -0.4735329 0.635833037          NA
      FBgn0267430 27.36264804      -4.362455  1.481378 -2.9448633 0.003230974 0.006781261
      ...                 ...            ...       ...        ...         ...         ...
      > resTP16h_90h
      log2 fold change (MLE): hours 90 vs 16 
      LRT p-value: '~ replica + hours' vs '~ replica' 
      DataFrame with 17558 rows and 6 columns
                     baseMean log2FoldChange     lfcSE      stat       pvalue         padj
                    <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
      FBgn0085804  0.18052802      -3.642448  6.899080  1.518897 0.9816482138           NA
      FBgn0267431 19.73070118      -2.155084  1.270286 17.423700 0.0148591771 0.0184819228
      FBgn0039987  0.08559842      -2.183327  6.937402  0.610748 0.9989315237           NA
      FBgn0058182  0.49195220      -2.710627  5.724264  3.607104 0.8237543782           NA
      FBgn0267430 27.36264804      -4.362455  1.481378 25.744205 0.0005595056 0.0007857541
      .
      Are the two method even comparable?

      Originally posted by dpryan View Post
      2. Yup. This can be though of as a superset of the results from all pairwise comparisons. If it's ever DE in a pairwise comparison, it'll likely be DE in the LRT (the reverse isn't the case).
      This is where I don't understand what happens. If a genes changes over time in the analysis across all time points, it must also be changed in at least one of the pair-wise comparisons. isn't that true?
      So why is the reverse not always the case?

      Comment


      • #4
        Originally posted by dpryan View Post
        2. Yup. This can be though of as a superset of the results from all pairwise comparisons. If it's ever DE in a pairwise comparison, it'll likely be DE in the LRT (the reverse isn't the case).
        When calculating the pair-wise comparisons of the timepoints, does it make more sense to add the parameter test="Wald", or can I keep the LRT results?

        I can see that there is a significant difference in the number of DE genes with an adjusted p-value <= 0.1

        Code:
        resTP16h_90h<- results(dds.filtered, contrast = c("hours", "90", "16"))
        resTP16h_90h.wald <- results(dds.filtered, test = "Wald", contrast = c("hours", "16", "90"))
        
        > addmargins(table(wald.test =(resTP16h_90h.wald$padj <.1), LRT.test=(resTP16h_90h$padj<.1)))
                 LRT.test
        wald.test FALSE  TRUE   Sum
            FALSE   624  2958  3582
            TRUE      0  9725  9725
            Sum     624 12683 13307
        Am I correct in assuming that the LRT test done in the first results command contains not only the genes differentiating between 90h and 16h, but also all the genes in the time-points between (24h,30h,48h and 72h)?
        Last edited by frymor; 11-12-2015, 02:27 AM.

        Comment


        • #5
          For pair-wise comparisons you need a Wald test. For "Is there a time effect, regardless of when?" you need an LRT. So yes, your assumption is absolutely correct

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          47 views
          0 likes
          Last Post seqadmin  
          Working...
          X