Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 model design + contrasts

    I've skimmed through previous threads about contrasts and I'm still getting hung up with my data.

    My data are 16s rDNA count data from a full factorial design with 3 tissue types (A,B,C) that were either exposed to Chemical1, Chemical2, or both. Each condition has three replicates.

    Edit: Each replicate has tissue A, B, C samples from the same animal. So, 3 animals per treatment. How should I account for this?

    For each tissue (just showing A for example)

    TissueA, Chem1+, Chem2+ x 3 replicates
    TissueA, Chem1-, Chem2+ x 3 replicates
    TissueA, Chem1+, Chem2- x 3 replicates
    TissueA, Chem1-, Chem2- x 3 replicates [aka control]

    The questions I want to test are:
    • Which OTUs are different from Tissue A vs B, B vs C, and A vs C?
    • Effects of Chem1 on each Tissue
    • Effects of Chem2 on each Tissue
    • The effects of Chem1 and Chem2 on each tissue


    Is it worth taking the chem1 and chem2 treatments and merge them as one condition? i.e. condition1= chem1+chem2+, condition2=chem1-chem2+ etc for simplicity and setup my model as ~tissue + condition?
    Last edited by gamb; 10-25-2014, 12:04 PM.

  • #2
    hi gamb,

    Firstly, as you are just starting to analyze, best to update to Bioconductor 3.0 and DESeq2 version 1.6 which was just released.

    As the analysis is a bit complicated, I'd recommend using DESeq(dds, betaPrior=FALSE) as this simplifies the building of contrasts with interactions.

    The design should be

    Code:
    ~ tissue*chem1*chem2
    ...because you're interested in both the interactions of chem1 and chem2, and in the specific effects for each tissue.

    The list of effects is given by

    Code:
    resultsNames(dds)
    You can build up the contrasts using a list like below. I'll give some examples and see if you can extend to the others. Comparing the tissues alone is simple by specifying the name of the factor and the two levels to compare, e.g.:

    Code:
    results(dds, contrast=c("tissue","C","B"))
    Then some basics about interaction terms: the effect of chem 1 in tissue B will be:

    Code:
    results(dds, contrast=list(c("chem1","chem1.tissueB")))
    ...because you need to add the main chem 1 effect (the effect for tissue A, the base level of tissue) and the extra chem 1 effect for tissue B.

    If you wanted to test if chem 1 has an effect in tissue B different than tissue A you would only test the interaction term "chem1.tissueB", because this is that difference.

    Tissue A is the base level, so the effect of chem 1 in tissue A is simply

    Code:
    results(dds, name="chem1")
    The effect of chem 1 and chem 2 in tissue B is a bit lengthy, you need to add main effects and all relevant interaction terms:

    Code:
    results(dds, contrast=list(c("chem1","chem2","chem1.tissueB","chem2.tissueB","chem1.chem2","chem1.chem2.tissueB")))
    It's simpler for tissue A because this is the base level:

    Code:
    results(dds, contrast=list(c("chem1","chem2","chem1.chem2")))
    And if you want to test only the interaction, answering if the effect of the two chems is different than each separately, you remove some of these effects. For tissue B this contrast would be:

    Code:
    results(dds, contrast=list(c("chem1.chem2","chem1.chem2.tissueB")))
    For tissue A it is again simpler:

    Code:
    results(dds, name="chem1.chem2")
    Note that another way to compare individual groups would be to create a new variable which describes all groups:

    Code:
    dds$group = factor(paste(dds$chem1, dds$chem2, dds$tissue))
    design(dds) = ~ group
    Then use the following to compare to specific groups:

    Code:
    results(dds, contrast=c("group","...","..."))
    Last edited by Michael Love; 10-26-2014, 09:20 AM.

    Comment


    • #3
      Thanks Michael-

      I updated everything and now I have problems!

      Following your code,

      Code:
      results(dds, contrast=c("tissue","C","B"))
      Works for me but this doesn't:

      Code:
      results(dds, contrast=list("chem1","chem1.tissueB"))
      I get the following:

      Code:
      results(dds, contrast=list("Chem1", "Chem1.tissueB"))
      Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues) : 
        all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'
      My code:

      Code:
      dds=phyloseq_to_deseq2(data, ~tissue*Chem1*Chem2)
      dds=DESeq(dds, test = "Wald", fitType = "local", betaPrior=FALSE)
      resultsNames(dds)
      
       [1] "Intercept"                       "tissue_tissueC_vs_tissueA"      
       [3] "tissue_tissueB_vs_tissueA"       "Chem1_Yes_vs_No"                
       [5] "Chem2_Yes_vs_No"                 "tissuetissueC.Chem1Yes"         
       [7] "tissuetissueB.Chem1Yes"          "tissuetissueC.Chem2Yes"         
       [9] "tissuetissueB.Chem2Yes"          "Chem1Yes.Chem2Yes"              
      [11] "tissuetissueC.Chem1Yes.Chem2Yes" "tissuetissueB.Chem1Yes.Chem2Yes"

      Also, the tissues are paired by the source animal (i.e. Chem1+Chem2+ has tissue samples A,B, and C, from animals 1,2 and 3). How do I account for this?

      Comment


      • #4
        Can you post the output of:

        sessionInfo()

        Comment


        • #5
          Code:
          > sessionInfo()
          R version 3.1.1 (2014-07-10)
          Platform: x86_64-apple-darwin13.1.0 (64-bit)
          
          locale:
          [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
          
          attached base packages:
           [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
          [10] base     
          
          other attached packages:
           [1] fossil_0.3.7              shapefiles_0.7            foreign_0.8-61           
           [4] maps_2.3-9                sp_1.0-15                 phyloseq_1.10.0          
           [7] gridExtra_0.9.1           plyr_1.8.1                reshape2_1.4             
          [10] ggdendro_0.1-15           gdata_2.13.3              plotrix_3.5-7            
          [13] ellipse_0.3-8             ggplot2_1.0.0             vegan_2.0-10             
          [16] lattice_0.20-29           permute_0.8-3             RColorBrewer_1.0-5       
          [19] biom_0.3.12               ape_3.1-4                 DESeq2_1.6.1             
          [22] RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3               GenomicRanges_1.18.1     
          [25] GenomeInfoDb_1.2.0        IRanges_2.0.0             S4Vectors_0.4.0          
          [28] BiocGenerics_0.12.0      
          
          loaded via a namespace (and not attached):
           [1] acepack_1.3-3.3      ade4_1.6-2           annotate_1.44.0      AnnotationDbi_1.28.0
           [5] base64enc_0.1-2      BatchJobs_1.4        BBmisc_1.7           Biobase_2.26.0      
           [9] BiocParallel_1.0.0   Biostrings_2.34.0    brew_1.0-6           checkmate_1.5.0     
          [13] chron_2.3-45         cluster_1.15.3       codetools_0.2-9      colorspace_1.2-4    
          [17] data.table_1.9.4     DBI_0.3.1            digest_0.6.4         fail_1.2            
          [21] foreach_1.4.2        Formula_1.1-2        genefilter_1.48.1    geneplotter_1.44.0  
          [25] gtable_0.1.2         gtools_3.4.1         Hmisc_3.14-5         igraph_0.7.1        
          [29] iterators_1.0.7      latticeExtra_0.6-26  locfit_1.5-9.1       MASS_7.3-35         
          [33] Matrix_1.1-4         multtest_2.22.0      munsell_0.4.2        nlme_3.1-118        
          [37] nnet_7.3-8           proto_0.3-10         RJSONIO_1.3-0        rpart_4.1-8         
          [41] RSQLite_1.0.0        scales_0.2.4         sendmailR_1.2-1      splines_3.1.1       
          [45] stringr_0.6.2        survival_2.37-7      tools_3.1.1          XML_3.98-1.1        
          [49] xtable_1.7-4         XVector_0.6.0        zlibbioc_1.12.0
          Edit: Closed out of my session and re-ran my code in a new session. Now getting this readout:

          Code:
          > dds=phyloseq_to_deseq2(data, ~tissue*Chem1*Chem2)
          converting counts to integer mode
          > dds=DESeq(dds, test = "Wald", fitType = "local", betaPrior=FALSE)
          estimating size factors
          Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  : 
            every gene contains at least one zero, cannot compute log geometric means
          Last edited by gamb; 10-26-2014, 08:52 AM.

          Comment


          • #6
            hi,

            sorry, I had a typo in my previous posts: I forgot to wrap the names in c(). I've corrected this above.

            To adapt to your level names, where I wrote "chem1", substitute "Chem1_Yes_vs_No", where I wrote "tissueC.chem1" substitute "tissuetissueC.Chem1Yes", etc.

            To add a fixed effect which controls for animal, you should add this variable to the design:

            Code:
            ~ animal + tissue*chem1*chem2
            The "every gene contains at least one zero" can be solved by using an alternate size factor estimation. I just answered this question on the Bioconductor support site this week. If you search there you should be able to find a solution.

            Comment


            • #7
              Thank you! It seems like everything is now working and makes sense!

              Comment


              • #8
                Spoke too soon. It appears that either "animal" variable is forcing this error or the tissue*chem1*chem2


                Code:
                dds=phyloseq_to_deseq2(data, ~ animal + tissue*Chem1*Chem2)
                converting counts to integer mode
                [COLOR="Red"]Error in DESeqDataSet(se, design = design, ignoreRank) : 
                  the model matrix is not full rank, so the model cannot be fit as specified.
                  one or more variables or interaction terms in the design formula
                  are linear combinations of the others and must be removed[/COLOR]
                Last edited by gamb; 10-26-2014, 11:45 AM.

                Comment


                • #9
                  Can you show as.data.frame(colData(dds))?

                  When I build a data frame as you described the samples, I get a full rank matrix with this design:

                  Code:
                  > d = data.frame(t=factor(rep(1:3,each=12)),
                    c1=factor(rep(rep(1:2,each=6),3)),
                    c2=factor(rep(rep(1:2,each=3),6)),
                    a=factor(rep(1:3,12)))
                  > m = model.matrix(~ a + t*c1*c2, data=d)
                  > qr(m)$rank
                  [1] 14
                  > ncol(m)
                  [1] 14

                  Comment


                  • #10
                    Data is structured like this:

                    Code:
                       Animal Chem1 Chem2  Tissue
                    1       F   Yes    No tissueA
                    2       F   Yes    No tissueB
                    3       F   Yes    No tissueC
                    4       G    No   Yes tissueA
                    5       G    No   Yes tissueB
                    6       G    No   Yes tissueC
                    7       H    No   Yes tissueA
                    8       H    No   Yes tissueB
                    9       H    No   Yes tissueC
                    10      I    No   Yes tissueA
                    11      I    No   Yes tissueB
                    12      I    No   Yes tissueC
                    13      J   Yes   Yes tissueA
                    14      J   Yes   Yes tissueB
                    15      J   Yes   Yes tissueC
                    16      K   Yes   Yes tissueA
                    17      K   Yes   Yes tissueB
                    18      K   Yes   Yes tissueC
                    19      L   Yes   Yes tissueA
                    20      L   Yes   Yes tissueB
                    21      L   Yes   Yes tissueC
                    22      A    No    No tissueA
                    23      A    No    No tissueB
                    24      A    No    No tissueC
                    25      B    No    No tissueA
                    26      B    No    No tissueB
                    27      B    No    No tissueC
                    28      C    No    No tissueA
                    29      C    No    No tissueB
                    30      C    No    No tissueC
                    31      D   Yes    No tissueA
                    32      D   Yes    No tissueB
                    33      D   Yes    No tissueC
                    34      E   Yes    No tissueA
                    35      E   Yes    No tissueB
                    36      E   Yes    No tissueC

                    Comment


                    • #11
                      aha, I thought you meant that it was only three animals total. You can't just add a term for animal because it forms linear combinations with chem1*chem2...
                      Last edited by Michael Love; 10-26-2014, 04:40 PM. Reason: misread the previous dataframe

                      Comment


                      • #12
                        I'll add more tomorrow, you can control for animal, but you have to reformulate the terms.

                        Comment


                        • #13
                          There is a way to control for the animal effect in this experiment, by recoding the animals within blocks of chem1*chem2 and adding an interaction term. It gets a bit trickier to form contrasts, but I will show how to do this.

                          It helps to clean up the colData a bit, I'll show on a clean version, and you can reorder your samples correspondingly. Note that you should reorder the columns of the count matrix so that they match. If you have a SummarizedExperiment, the colData and matrix are tied together, so simply reordering the columns of the SummarizedExperiment does both for you.

                          I made a sample data.frame like your data:

                          Code:
                          d <- data.frame(anim=factor(rep(1:12,each=3)),
                            chem1=factor(rep(rep(c("N","Y"),each=6),3)),
                            chem2=factor(rep(rep(c("N","Y"),each=3),6)),
                            tiss=factor(rep(1:3,12)))
                          this looks like

                          Code:
                              anim chem1 chem2 tiss
                           1     1     N     N    1
                           2     1     N     N    2
                           3     1     N     N    3
                           4     2     N     Y    1
                           5     2     N     Y    2
                           6     2     N     Y    3
                           7     3     Y     N    1
                           8     3     Y     N    2
                           9     3     Y     N    3
                           10    4     Y     Y    1
                           11    4     Y     Y    2
                           12    4     Y     Y    3
                           13    5     N     N    1
                           14    5     N     N    2
                           15    5     N     N    3
                           16    6     N     Y    1
                           17    6     N     Y    2
                           18    6     N     Y    3
                           19    7     Y     N    1
                           20    7     Y     N    2
                           21    7     Y     N    3
                           22    8     Y     Y    1
                           23    8     Y     Y    2
                           24    8     Y     Y    3
                           25    9     N     N    1
                           26    9     N     N    2
                           27    9     N     N    3
                           28   10     N     Y    1
                           29   10     N     Y    2
                           30   10     N     Y    3
                           31   11     Y     N    1
                           32   11     Y     N    2
                           33   11     Y     N    3
                           34   12     Y     Y    1
                           35   12     Y     Y    2
                           36   12     Y     Y    3
                          Now reorder by chem1, chem2 and tiss. Then add a new variable which will be used to control animal specific effects within each block of chem1*chem2.

                          Code:
                          idx <- with(d, order(chem1, chem2, tiss))
                          d <- d[idx,]
                          d$anim.nested <- factor(rep(1:3,12))
                          You can examine d to see how this works.

                          The model matrix is then the chem1 and chem2 terms and interaction, as well as the interactions of these with animal, and the interaction of these with tissue:

                          Code:
                          m <- model.matrix(~ chem1 + chem2 + chem1:chem2 + chem1:chem2:anim.nested + chem1:chem2:tiss, data=d)
                          The column names of this are:

                          Code:
                          > colnames(m)
                            [1] "(Intercept)"                "chem1Y"
                            [3] "chem2Y"                     "chem1Y:chem2Y"
                            [5] "chem1N:chem2N:anim.nested2" "chem1Y:chem2N:anim.nested2"
                            [7] "chem1N:chem2Y:anim.nested2" "chem1Y:chem2Y:anim.nested2"
                            [9] "chem1N:chem2N:anim.nested3" "chem1Y:chem2N:anim.nested3"
                           [11] "chem1N:chem2Y:anim.nested3" "chem1Y:chem2Y:anim.nested3"
                           [13] "chem1N:chem2N:tiss2"        "chem1Y:chem2N:tiss2"
                           [15] "chem1N:chem2Y:tiss2"        "chem1Y:chem2Y:tiss2"
                           [17] "chem1N:chem2N:tiss3"        "chem1Y:chem2N:tiss3"
                           [19] "chem1N:chem2Y:tiss3"        "chem1Y:chem2Y:tiss3"

                          Many of the contrasts become hard to form, though possible with numeric contrasts. A trick for getting the numeric contrasts of the other comparisons, is to take the column means of the model matrix, for the groups of samples you want to compare. For instance, if you want to find the main chem1 effect in tissue 2, the contrast would be:

                          Code:
                          ctrst = colMeans(m[d$tiss == "2" & d$chem1 == "Y" & d$chem2 == "N",]) -
                                  colMeans(m[d$tiss == "2" & d$chem1 == "N" & d$chem2 == "N",])
                          This ends up being the chem1 base effect, the chem1Y effect in tissue 2 minus the chem1N effect in tissue 2 and a few 1/3 and -1/3 to the animal interaction effects.

                          Edit: note that interaction terms can be tested using the same logic, but you need to subtract the groups with only one chem and then add the base group. So it looks like: chem1Y chem2Y - chem1N chem2Y - chem1Y chem2N + chem1N chem2N.

                          Also, I had to update that custom line of size factor code for when all rows have a zero. If you were using that line of code to avoid the issue that all rows of your count matrix have one or more zeros, you should get the edited version:



                          We're implementing a solution here which was on the list of todos for a while now, but it won't be out for 6 months. Typical RNA-Seq doesn't have this issue, but since phyloseq we are seeing more metagenomics datasets which often have no genes without a zero.
                          Last edited by Michael Love; 10-27-2014, 02:24 PM. Reason: added info on interaction

                          Comment


                          • #14
                            Ok! So, just checking in to make sure this is right... I reformatted the data as described named "d", "bbb" is a phyloseq data object, and "m" is the matrix model.

                            Code:
                            counts=as.data.frame(otu_table(bbb))
                            dds=DESeqDataSetFromMatrix(countData=counts, colData=d, design=~ chem1 + chem2 + chem1:chem2 + chem1:chem2:anim.nested + chem1:chem2:tiss)
                            
                            cts = counts(dds)
                            geoMeans = apply(cts, 1, function(row) mean(log(row[row != 0])))
                            dds = estimateSizeFactors(dds, geoMeans=geoMeans)
                            bdds=DESeq(dds, test = "Wald", fitType = "local", betaPrior=FALSE)
                            resultsNames(bdds)
                            And for contrast example.....
                            Code:
                            ctrst = colMeans(m[d$Type == “2” & d$chem1 == "Y" & d$chem2 == "N",]) -
                                    colMeans(m[d$Type == “2” & d$chem1 == "N" & d$chem2 == "N",])
                                    
                            
                            res=results(bdds, contrast=ctrst)

                            Does this seem right? Also, I've noticed I get quite a few segfaults within R when running the deseq2/phyloseq packages... is this just unique to my setup?

                            Comment


                            • #15
                              The geoMeans code you have is the old one, where I forgot to exponentiate the mean of log counts. See the link in my previous reply.

                              Otherwise seems right. Be very careful that the columns of your count matrix line up with colData.

                              I've never seen or heard of a segfault from our code. Do you have a normal release of R or did you build yourself?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X