Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 error in data.frame (multiple treatments and multiple replicates)

    I have a text file, containing read counts per gene for each treatments and control, with the following column

    [Gene Symbol] [C1] [C2] [C3] [A1] [A2] [A3] [B1] [B2] [B3]
    • C is Control
    • A is Treatment 1
    • B is Treatment 2

    -> Each of C, A an B have 3 replicates

    When I do data.frame it generates an error

    Code:
    library( "DESeq2" )
    library("Biobase")
    mydata = read.table("matrix.txt", header=TRUE)
    col1 <- mydata[,1]
    
    ## Error message
    ExpDesign = data.frame(row.names=col1, condition=c("C", "C", "C", "A", "A", "A", "B", "B", "B")
    Error in data.frame(row.names = col1, condition = c("C", "C", "C", "A",  : 
      row names supplied are of the wrong length
    ## The following is what I would next if I didn't have any error message
    Code:
    countdata <- assay( mydata )
    head( countdata )
    coldata <- colData( mydata )
    rownames( coldata ) <- coldata$run
    colnames( countdata ) <- coldata$run
    head( coldata[ , c("C", "C", "C", "A", "A", "A", "B", "B", "B") ] )
    Eventually the goal is to have a heatmap with each replicates in the control, treatment A and B.

    I think the problem comes from the fact that I should subset my data, though I have no clue how to do to that. Any suggestions on where is the error message coming from and how to subset data? (If I should ever subset that..)
    Last edited by KYR; 07-15-2014, 10:57 PM. Reason: typo

  • #2
    This is your countData, which has as many rows as genes:

    Code:
    mydata = read.table("matrix.txt", header=TRUE)
    col1 <- mydata[,1]
    It looks like this will be the colData (sample information table).

    Code:
    ExpDesign = data.frame(row.names=col1, condition=c("C", "C", "C", "A", "A", "A", "B", "B", "B")
    ...which has as many rows as samples.

    So the error comes when you try to name the rows of your colData using the gene names in col1.

    You will also get an error later when you try to run
    Code:
    assay( mydata )
    because mydata is a data.frame. assay() is a function for getting a matrix from SummarizedExperiment objects. You can just use
    Code:
    as.matrix( mydata )
    in order to supply a matrix to DESeqDataSet.

    Comment


    • #3
      Hi,

      I followed your advice and tried to import as a matrix. But when I try to set up col.data I still get an error

      This is my code

      deseq2_analysis2 <- read_excel("deseq2_analysis2.xlsx")
      > View(deseq2_analysis2)
      > analysis3 <- as.matrix(deseq2_analysis2)
      > (condition <- factor(c(rep("group1", 4), rep("group2", 4), rep("group3", 4), rep("group4", 4))))
      [1] group1 group1 group1 group1 group2 group2 group2 group2 group3 group3 group3 group3 group4
      [14] group4 group4 group4
      Levels: group1 group2 group3 group4
      > (coldata <- data.frame(row.names=colnames(analysis3), condition))
      Error in data.frame(row.names = colnames(analysis3), condition) :
      row names supplied are of the wrong length
      This is my result for head command

      head(deseq2_analysis2)
      # A tibble: 6 x 17
      gene Sample1_group1 Sample2_group1 Sample3_group1 Sample4_group1 Sample1_group2 Sample2_group2
      <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
      1 YAL0~ 0 0 0 0 2 0
      2 YAL0~ 0 0 0 0 0 0
      3 YAL0~ 243 242 109 130 271 233
      4 YAL0~ 16 7 52 30 23 10
      5 YAL0~ 23 21 21 33 11 28
      6 YAL0~ 38 42 76 88 47 40
      # ... with 10 more variables: Sample3_group2 <dbl>, Sample4_group2 <dbl>, Sample1_group3 <dbl>,
      # Sample2_group3 <dbl>, Sample3_group3 <dbl>, Sample4_group3 <dbl>, Sample1_group4 <dbl>,
      # Sample2_group4 <dbl>, Sample3_group4 <dbl>, Sample4_group4 <dbl>
      What am I doing wrong here?

      Comment


      • #4
        Originally posted by Michael Love View Post
        This is your countData, which has as many rows as genes:

        Code:
        mydata = read.table("matrix.txt", header=TRUE)
        col1 <- mydata[,1]
        It looks like this will be the colData (sample information table).

        Code:
        ExpDesign = data.frame(row.names=col1, condition=c("C", "C", "C", "A", "A", "A", "B", "B", "B")
        ...which has as many rows as samples.

        So the error comes when you try to name the rows of your colData using the gene names in col1.

        You will also get an error later when you try to run
        Code:
        assay( mydata )
        because mydata is a data.frame. assay() is a function for getting a matrix from SummarizedExperiment objects. You can just use
        Code:
        as.matrix( mydata )
        in order to supply a matrix to DESeqDataSet.
        Hi,

        I followed your advice and tried to import as a matrix. But when I try to set up col.data I still get an error

        This is my code

        deseq2_analysis2 <- read_excel("deseq2_analysis2.xlsx")
        > View(deseq2_analysis2)
        > analysis3 <- as.matrix(deseq2_analysis2)
        > (condition <- factor(c(rep("group1", 4), rep("group2", 4), rep("group3", 4), rep("group4", 4))))
        [1] group1 group1 group1 group1 group2 group2 group2 group2 group3 group3 group3 group3 group4
        [14] group4 group4 group4
        Levels: group1 group2 group3 group4
        > (coldata <- data.frame(row.names=colnames(analysis3), condition))
        Error in data.frame(row.names = colnames(analysis3), condition) :
        row names supplied are of the wrong length
        This is my result for head command

        head(deseq2_analysis2)
        # A tibble: 6 x 17
        gene Sample1_group1 Sample2_group1 Sample3_group1 Sample4_group1 Sample1_group2 Sample2_group2
        <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
        1 YAL0~ 0 0 0 0 2 0
        2 YAL0~ 0 0 0 0 0 0
        3 YAL0~ 243 242 109 130 271 233
        4 YAL0~ 16 7 52 30 23 10
        5 YAL0~ 23 21 21 33 11 28
        6 YAL0~ 38 42 76 88 47 40
        # ... with 10 more variables: Sample3_group2 <dbl>, Sample4_group2 <dbl>, Sample1_group3 <dbl>,
        # Sample2_group3 <dbl>, Sample3_group3 <dbl>, Sample4_group3 <dbl>, Sample1_group4 <dbl>,
        # Sample2_group4 <dbl>, Sample3_group4 <dbl>, Sample4_group4 <dbl>
        What am I doing wrong here?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Working...
        X