Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I analyze double nested design? (DESeq2)

    Hi everybody,

    I am currently analysing an RNAseq eperiment I ran on recombinant inbred lines.

    So far I have used DESeq but because my design is somewhat complicated and I couldn't do the proper analysis with it, I am now using DESeq2.

    But I am still unsure what the legitimate/appropriate statistics is for my data and would be immensely grateful for some help and insight from you smart people! :-)

    So, this is what I have done so far:

    - I have a count matrix with raw read counts for my mapped genes
    - I set the data.frame as you can see below

    > ExpDesign
    genotype flowering sampling
    Bur_1_1 parent late tp1
    Bur_1_2 parent late tp1
    Bur_1_3 parent late tp1
    Bur_2_1 parent late tp2
    Bur_2_2 parent late tp2
    Bur_2_3 parent late tp2
    Col_1_1 parent early tp1
    Col_1_2 parent early tp1
    Col_1_3 parent early tp1
    Col_2_1 parent early tp2
    Col_2_2 parent early tp2
    Col_2_3 parent early tp2
    pool01 pool_lines early tp1
    pool02 pool_lines early tp1
    pool03 pool_lines early tp1
    pool04 pool_lines late tp1
    pool05 pool_lines late tp1
    pool06 pool_lines late tp1
    pool07 pool_lines early tp2
    pool08 pool_lines early tp2
    pool09 pool_lines early tp2
    pool10 pool_lines late tp2
    pool11 pool_lines late tp2
    pool12 pool_lines late tp2

    -se_input <- DESeqDataSetFromMatrix(countData = se, colData=ExpDesign, design=~genotype+flowering+sampling)
    se_input
    class: DESeqDataSet
    dim: 21646 24
    exptData(0):
    assays(1): counts
    rownames(21646): AT1G01010 AT1G01020 ... ATMG01380 ATMG01390
    rowData metadata column names(0):
    colnames(24): Bur_1_1 Bur_1_2 ... pool11 pool12
    colData names(3): genotype flowering sampling

    se_input_DESeq <- DESeq(se_input)
    se_res <- results(se_input_DESeq)
    head(se_res)
    DataFrame with 6 rows and 6 columns
    baseMean log2FoldChange lfcSE stat pvalue padj
    <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
    AT1G01010 64.594945 -0.30440446 0.23903587 -1.2734677 0.2028521166 0.339217533
    AT1G01020 140.260321 -0.05978862 0.17145139 -0.3487205 0.7272991151 0.820126969
    AT1G01030 33.542471 -0.73783880 0.21824157 -3.3808352 0.0007226586 0.004030036
    AT1G01040 514.335016 -0.49919387 0.14493772 -3.4441958 0.0005727608 0.003338910
    AT1G01046 6.191449 -0.49877559 0.26084673 -1.9121405 0.0558581775 0.129957010
    AT1G01050 689.292548 -0.13212720 0.06934961 -1.9052335 0.0567497202 0.131422678


    So, now I am wondering how to account for the interactions of my three conditions in the analysis?

    And what does the output table se_res actually tell me - what does the p-value signify, a significant difference between all three conditions?
    Last edited by Sciurus; 04-07-2014, 01:35 AM.

  • #2
    If you don't specify which coefficient you want results for then it returns the last one (sampling in your case). So, all the adjusted p-values (ignore the non-adjusted ones) would indicate changes due to that.

    You account for interactions in R with formulas like:

    Code:
    design=~genotype*flowering+sampling
    which is identical to:

    Code:
    design=~genotype+flowering+genotype:flowering+sampling
    Where "genotype:flowering" indicates an interaction. You could also have a 3-way interaction if you want. I recommend thinking about which interactions make biological sense, since often the all won't.

    Comment


    • #3
      Originally posted by dpryan View Post
      If you don't specify which coefficient you want results for then it returns the last one (sampling in your case). So, all the adjusted p-values (ignore the non-adjusted ones) would indicate changes due to that.

      You account for interactions in R with formulas like:

      Code:
      design=~genotype*flowering+sampling
      which is identical to:

      Code:
      design=~genotype+flowering+genotype:flowering+sampling
      Where "genotype:flowering" indicates an interaction. You could also have a 3-way interaction if you want. I recommend thinking about which interactions make biological sense, since often the all won't.
      Thank you!

      Yes, that is what I have been wondering: What makes sense and is statistically legitimate to do.
      What would 3-way-ineraction mean in terms of statistical power and meaning I can draw from that?

      Comment


      • #4
        At least in terms of power you'd be decreasing your degrees of freedom, which you really only want to do if a given interaction makes biological sense. I don't know enough about your experiment to say whether a 3-way genotype*flowering*sampling model really makes sense, that's unfortunately not something someone uninvolved in your project can easily guide you on.

        Comment


        • #5
          Originally posted by dpryan View Post
          At least in terms of power you'd be decreasing your degrees of freedom, which you really only want to do if a given interaction makes biological sense. I don't know enough about your experiment to say whether a 3-way genotype*flowering*sampling model really makes sense, that's unfortunately not something someone uninvolved in your project can easily guide you on.
          Okay, thank you! I ran both options and will discuss with my supervisor when I get a chance.

          Comment


          • #6
            So, I am still struggling with this nested design. If you would be so kind as to have a look here, I'd very much appreciate any help I can get! :-)

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X