Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 question: complex multifactor experiemnt

    Hello,

    I have a complex experimental setup.

    What I've done:

    countTable=read.table("HTSeq_Table.txt", header=T,row.names=1)

    And I have a design like this, where the sample names are the row names.

    > design
    geno age libType
    S1534 Ezh1 P14 single
    S1536 Ezh1 P14 single
    S8633 Ezh1 P14 single
    S1532 Ezh12 P14 single
    S8631 Ezh12 P14 single
    S1141 Ezh12del P14 single
    S1142 Ezh12del P14 single
    S1541 Wt P14 single
    S1547 Wt P14 single
    S8Wrep1 Wt W8 paired
    S8Wrep2 Wt W8 paired
    SE18rep1 Wt E18 paired
    SE18rep2 Wt E18 paired
    P0.expt1.bio1 Wt P0 single
    P0.expt1.bio2 Wt P0 single
    P0.expt2.bio1 Wt P0 single
    P0.expt2.bio2 Wt P0 single

    So, I have multiple timepoint and single/paired-end sequencing, but only one timepoint has multiple genotypes.

    To make a DESeqCountDataSet, I ran

    > cds=DESeqDataSetFromMatrix(countData=countTable,colData=designnogroup, design=~geno+age+libType)
    Error in DESeqDataSet(se, design = design, ignoreRank) :
    the model matrix is not full rank, so the model cannot be fit as specified.
    one or more variables or interaction terms in the design formula
    are linear combinations of the others and must be removed

    1: I'm not sure how to deal with having a timecourse experiment where only one timepoint has multiple genotypes.
    2. Do I need to include single/paired-end?

    Thank you so much!

    Addendum: I found the answer. In the manual of course.

    3.12.1 Linear combinations
    Last edited by jmgrindheim; 02-01-2016, 05:35 PM. Reason: Found Answer

  • #2
    Without playing around with the matrix it looks like "libType" is confounding the estimation of the "W8" and "E18" ages. You're going to have to either realign the PE data as SE (just ignore read 2) or accept that W8 and E18 estimations might be confounded by a batch effect (they likely will be regardless, though this will help minimize that). So ~geno+age instead of ~geno+age+libType.

    Comment


    • #3
      Thanks Devon, that was really helpful.

      Also, do you know what the effect will be if I put ~age+geno vs ~geno+age? It's kinda hard to decide what's more important. I want to see if the genotype causes gene expression to look like a different timepoint, so besides differential expression, I really want normalized count values to create a heatmap and I don't see an easy way to do that in DESeq2. In the first DESeq, I would take baseMeanA or B values from a differential expression test, but I haven't yet found an equivalent with DESeq2.

      Comment


      • #4
        The order only affects plotting and what's output by results() by default (it'll default to whichever you specify last), the actual statistics will be the same. For normalized counts, just use counts(dds, normalized=T). That's vastly more meaningful than creating heatmaps with the group means.

        Comment


        • #5
          So you think that plotting count values for individual biological replicates is better then plotting them for the replicates grouped?

          Comment


          • #6
            Yes, you don't lose the variance that way. If you have a crazy number of groups/samples then that might not be feasible, of course (though then the heatmap likely won't tell you much anyway).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X