Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bioconductor design question

    We recently commissioned some RNASeq work and I am getting my first taste of using bioconductor. This is probably a bit of a silly question for all you experts, but I just cannot seem to figure it out: Our experiment involved two cell lines that were each treated with four compounds. Each treatment was done in triplicate and there are untreated controls in triplicate as well. We are interested in comparing differential expression between the treatments vs. control in each cell line as well as comparing the same treatment between the two cell lines.

    My coldata table looks something like this:
    HTML Code:
              cell       compound    rep
    c1c1_1    c1         c1          1
    c1c1_2    c1         c1          2
    c1c1_3    c1         c1          3
    c1c2_1    c1         c2          1
    ...
    c2c3_3    c2         c3          3
    c2c4_1    c2         c4          1
    c2c4_2    c2         c4          2
    c2c4_3    c2         c4          3
    c2cc_1    c2         cc          1
    c2cc_2    c2         cc          2
    c2cc_3    c2         cc          3
    Now to my questions:
    1. Does it make sense to work with both cell lines in the same DESeqDataSet object?
    2. If yes, would '~cell + cell:compound' be a reasonable design?
    3. When comparing results between cell lines, I assume I need to compare fold-changes. Are there methods in bioconductor that can compare two DESeqResults objects?


    Any feedback, pointers, or advice greatly appreciated!
    Last edited by sunkid; 11-14-2014, 10:38 AM. Reason: typo

  • #2
    1. Yes
    2. ~cell:compound + compound or ~cell*compound would make more sense. You may also find it more convenient to just make 10 groups of cell:compound, fit ~group, and then use contrasts. The results will be the same, it's just a matter of what you find easier.
    3. You fit a model with both cell-types, so you can directly test for differences between them. Don't test by fold-change, that'd be inaccurate.

    Comment


    • #3
      Thanks for the reply. Much appreciated!

      Could you please elaborate a little on the third bullet? Is this what you mean, for example:
      HTML Code:
      > design(expmat) <- ~cell*compound
      > dds <- DESeq(expmat)
      estimating size factors
      estimating dispersions
      gene-wise dispersion estimates
      mean-dispersion relationship
      final dispersion estimates
      fitting model and testing
      > res <- results(dds, contrast=list("cellc1.compoundc3", "cellc2.compoundc3"))
      Isn't this simply comparing absolute expression levels of genes in the two cells as affected by compound c3 without taking base levels of expression into account?

      Comment


      • #4
        No, all of the possible changes have already been fit, you're just comparing the coefficients. So the baseline cell c2 vs c1 changes have already been accounted for in that comparison.

        Comment


        • #5
          Thanks again! I think I have some reading to do... any advice on a good starting point?

          Comment


          • #6
            Sorry to be a bit dense, but I am thoroughly confused, it seems

            Here is what I did:
            HTML Code:
            > design(expmat) <- ~cell*compound
            > dds <- DESeq(expmat)
            estimating size factors
            estimating dispersions
            gene-wise dispersion estimates
            mean-dispersion relationship
            final dispersion estimates
            fitting model and testing
            > res <- results(dds, contrast=list("cellc1.compoundc1", "cellc2.compoundc1"))
            > res <- res[!is.na(res$padj) & res$padj < 0.001, ]
            > o <- order(res$log2FoldChange)
            > res[o, ]
            log2 fold change (MAP): cellc1.compoundc1 vs cellc2.compoundc1 
            Wald test p-value: cellc1.compoundc1 vs cellc2.compoundc1 
            DataFrame with 254 rows and 6 columns
                    baseMean log2FoldChange      lfcSE       stat       pvalue         padj
                   <numeric>      <numeric>  <numeric>  <numeric>    <numeric>    <numeric>
            VASN    149.8294      -1.181500 0.22067214  -5.354097 8.598481e-08 1.103503e-05
            CCL20   110.8390      -1.173693 0.14111207  -8.317450 8.987607e-17 8.162821e-14
            GIMAP4 1535.1766      -1.043297 0.23876844  -4.369495 1.245343e-05 6.449020e-04
            PPAP2B  730.2799      -1.036536 0.06861464 -15.106631 1.464377e-51 1.728989e-47
            AMIGO2  169.6221      -1.016898 0.11045060  -9.206812 3.359549e-20 5.666600e-17
            ...          ...            ...        ...        ...          ...          ...
            SLIT2  130.18488       1.109605  0.1802282   6.156670 7.429022e-10 2.039871e-07
            LMO2   308.49523       1.206971  0.2256925   5.347857 8.900159e-08 1.129937e-05
            ABCB1   35.54997       1.261060  0.2257014   5.587293 2.306362e-08 3.583055e-06
            KYNU   574.17227       1.309038  0.1550106   8.444829 3.044912e-17 2.995940e-14
            RAB43   78.84993       1.346877  0.2333998   5.770686 7.894942e-09 1.412357e-06
            > plotCounts(dds, gene="VASN", intgroup = c("cell", "compound"))
            and I get


            What am I missing?

            Comment


            • #7
              Sorry to bump this, but I am still confused about the suggested design and what the comparison above actually does.

              FWIW, we are interested both in genes that are differentially expressed in each of the cell types when treated with our compounds (i.e. a comparison between the control and the treated samples) as well as those genes that are differentially expressed in BOTH cell lines treated with the same compound.

              Not to complicate things too much, but the compounds actually have overlapping enzyme inhibition profiles. C1 inhibits E1, C2 inhibits E1 and E2, and C3 inhibits just E2, for example. Hence, we are also interested in comparing the differential expression of genes (compound vs. control) between compounds.

              Comment


              • #8
                answered here: https://support.bioconductor.org/p/62973/

                Comment


                • #9
                  Originally posted by Michael Love View Post
                  Yes, and thank you very much again!

                  Comment


                  • #10
                    If in the original question, we have more than 2 cell lines, and we want to compare the "global" differences between conditions c1 and c2 (still taking into account the fact that we have multiple cell lines), would our design be something like what dpryan mentioned (cell + condition + cell:condition) and in our contrast use list(conditionc1, conditionc2)?

                    Thanks,
                    Praful

                    Comment


                    • #11
                      Yes that would work or equivalently contrast=c("condition","c1","c2") for the global c1 vs c2 comparison.

                      Comment


                      • #12
                        Thank you for the prompt reply.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 08:47 AM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        59 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        54 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X