Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 column/row arrangement

    Hi,

    I am trying to automate some of my analysis and am pulling in condition and data information from a master experiment design sheet and a HTSeq count table for the entire experiment. Because I am getting the sheet from the investigator and the HTSeq count is in run order using identifiers from the lab, the two sheets might be in different orders. Example:

    HTML Code:
    slim_expt[,1:2]
           SampleID     Day
    P3-D1     P3-D1 control
    P3-D2     P3-D2 control
    P3-D3     P3-D3 control
    P3-D4     P3-D4 control
    P12-D1   P12-D1 treated
    P12-D2   P12-D2 treated
    P12-D3   P12-D3 treated
    P12-D4   P12-D4 treated
    HTML Code:
    head(slim_HTS_table)
                       P12-D1 P12-D2 P12-D3 P12-D4 P3-D1 P3-D2 P3-D3 P3-D4
    ENSMUSG00000000001   1546   1322   1248   1156  2162  2211  2811  2223
    ENSMUSG00000000003      0      0      0      0     0     0     0     0
    ENSMUSG00000000028     14     23     11     17    42    30    53    55
    Note that the first sample in the experiment design sheet is not the first sample on the HTSeq count matrix.

    Now I do this:
    slim_dds<-DESeqDataSetFromMatrix(countData = slim_HTS_table, colData = slim_expt, design = ~Day)

    And wallah, the sample names are now in order, but the count matrix order was preserved, ie samples are now misassigned.

    HTML Code:
    head(counts(slim_dds))
                       P3-D1 P3-D2 P3-D3 P3-D4 P12-D1 P12-D2 P12-D3 P12-D4
    ENSMUSG00000000001  1546  1322  1248  1156   2162   2211   2811   2223
    ENSMUSG00000000003     0     0     0     0      0      0      0      0
    ENSMUSG00000000028    14    23    11    17     42     30     53     55
    Is there an easy flag to fix this or do I simply need to sort both data frames (by row or column as appropriate) prior to creating the DESeq data set?

  • #2
    Yes, you need to bring your count table into the right order beforehands.

    Do, for example:

    slim_dds <- DESeqDataSetFromMatrix(
    countData = slim_HTS_table[ , rownames(colData) ],
    colData = slim_expt,
    design = ~Day)

    However, it is unfortunate that DESeqDataSetFromMatrix silently mixed up the column names. It takes the rownames from the colData frame and used them as sample names, overwriting the columns names of the count matrix. It should issue a warning, or even an error, in such case. We'll change that,

    Comment


    • #3
      Good idea, I will implement a check for this in the devel branch. I think it should be an error.

      Comment


      • #4
        I've implemented a test for this in version 1.3.46. If the names are the same, but in the wrong order, it throws an error.

        > count <- matrix(1:20, ncol=4)
        > colnames(count) <- c("a","b","c","d")
        > cols <- data.frame(condition=factor(c(1,1,2,2)))
        > rownames(cols) <- c("1","2","3","4")
        > DESeqDataSetFromMatrix(count, cols, ~ condition)
        class: DESeqDataSet
        dim: 5 4
        exptData(0):
        assays(1): counts
        rownames: NULL
        rowData metadata column names(0):
        colnames(4): 1 2 3 4
        colData names(1): condition
        > rownames(cols) <- c("a","b","c","d")
        > DESeqDataSetFromMatrix(count, cols, ~ condition)
        class: DESeqDataSet
        dim: 5 4
        exptData(0):
        assays(1): counts
        rownames: NULL
        rowData metadata column names(0):
        colnames(4): a b c d
        colData names(1): condition
        > rownames(cols) <- c("b","c","d","a")
        !> DESeqDataSetFromMatrix(count, cols, ~ condition)
        Error in DESeqDataSetFromMatrix(count, cols, ~condition) :
        rownames of the colData:
        b,c,d,a
        are not in the same order as the colnames of the countData:
        a,b,c,d

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X