Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • KYR
    Member
    • May 2012
    • 18

    Need help with DESeq2

    I have a text file, containing read counts per gene for each treatments and control, with the following column

    [Gene Symbol] [C1] [C2] [C3] [B1] [B2] [B3] [A1] [A2] [A3]
    C is Control
    A is Treatment 1
    B is Treatment 2
    -> Each of C, A an B have 3 replicates

    When I do data.frame it generates an error

    Code:
    library( "DESeq2" )
    library("Biobase")
    mydata4 = read.table("matrix4.txt", header=TRUE)
    head(mydata4)
    
    samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3))))
    Error in data.frame(row.names = c("C1", "C2", "C3", "B1", "B2", "B3",  : 
      row names supplied are of the wrong length

    How can I fix that ? I seems I need to subset my data though I don't know how to do this and deseq2 doesn't transpose my columns to rows
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    That command won't produce the error you showed.

    Comment

    • KYR
      Member
      • May 2012
      • 18

      #3
      Originally posted by dpryan View Post
      That command won't produce the error you showed.
      Indeed, I started again from a fresh R command shell, and that error doesn't appear anymore. However it generates another one:


      Code:
      library( "DESeq2" )
      library("Biobase")
      mydata = read.table("matrix4.txt", header=TRUE)
      samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3))))
      
      dds <- DESeqDataSetFromMatrix(countData = as.matrix(mydata), colData=samples, design=~condition)
      
      Error in validObject(.Object) : 
        invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol

      So I check the number of columns and rows for each and ncol has 1 more than ncol

      I'm guessing it's coming from the gene symbol column..though how can I fix this??
      Last edited by KYR; 07-16-2014, 01:11 PM. Reason: typo

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        What's the output of
        Code:
        dim(as.matrix(mydata))
        I'm guessing that the mydata object has the gene names as a column rather than as the row names.

        Comment

        • Richard Finney
          Senior Member
          • Feb 2009
          • 701

          #5
          Slight Tangent ... Using "htseq-count" outputs ... this works for me ....

          Here's the template I hack for deseq2 ....

          source("http://bioconductor.org/biocLite.R")
          library(DESeq2,lib.loc="/home/finneyr/Rlib")

          sampleFiles = c(
          "file1.txt" ,
          "file2.txt" ,
          #... (fill in the names of you htseq count files here 1 to N files.
          :"filen.txt"
          )

          #set your condtions for the files in SampleFiles
          sampleCondition = c( "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "untreated" , "untreated" , "untreated" , "untreated" , "untreated" )

          sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)

          #might need this ... I'm not sure
          libType = c ( "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" )

          options(max.print=100000)
          options(width=500)

          #set directory to your place where you keep your files llisted in "sampleFiles" which are htseqcount output files.
          directory="/data/nextgen/finneyr/novo/CNT"
          ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
          print(ddsHTSeq)

          colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("treated","untreated"))

          dds<-DESeq(ddsHTSeq)
          print(dds)
          res<-results(dds)
          print(res);
          # sort by padj (:adjusted p-value") ...
          res<-res[order(res$padj),]

          #write results to file name "rpt5" , change this to your output file name, deseq2 explains log2foldchange and other fields.
          write.csv(as.data.frame(res),file="rpt5")
          q(save="no")
          Last edited by Richard Finney; 07-16-2014, 01:21 PM.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            Why do you print the results (and not even in an abbreviated form!) to screen when you're just going to write them to a file as well?

            Comment

            • Michael Love
              Senior Member
              • Jul 2013
              • 333

              #7
              see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.

              Comment

              • KYR
                Member
                • May 2012
                • 18

                #8
                Originally posted by dpryan View Post
                What's the output of
                Code:
                dim(as.matrix(mydata))
                I'm guessing that the mydata object has the gene names as a column rather than as the row names.
                I gives me the following:

                Code:
                > dim(as.matrix(mydata4))
                [1] 25197    10
                I guess I have one more row column because of the gene symbol column which is first and should be row.names.
                Last edited by KYR; 07-16-2014, 01:40 PM. Reason: typo on column number

                Comment

                • KYR
                  Member
                  • May 2012
                  • 18

                  #9
                  Originally posted by Michael Love View Post
                  see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.

                  yes that's what's happening here, the first column gene symbol should be row.names. Though I don't know how to fix that. Any help would be greatlly appreciated..

                  Comment

                  • dpryan
                    Devon Ryan
                    • Jul 2011
                    • 3478

                    #10
                    You actually have 4 additional columns, since you only described 9 samples. You probably just need to:
                    Code:
                    mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)

                    Comment

                    • KYR
                      Member
                      • May 2012
                      • 18

                      #11
                      Originally posted by dpryan View Post
                      You actually have 4 additional columns, since you only described 9 samples. You probably just need to:
                      Code:
                      mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)

                      That's what I've done originally, but it gave me the following error:

                      Code:
                      > mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)
                      Error in read.table("matrix4.txt", header = TRUE, row.names = 1) : 
                        duplicate 'row.names' are not allowed
                      So I decided to dump the row.names, this is weird it never happened before.

                      Comment

                      • dpryan
                        Devon Ryan
                        • Jul 2011
                        • 3478

                        #12
                        You could also simply:
                        Code:
                        mydata4 <- mydata4[,-1]
                        However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.

                        Comment

                        • KYR
                          Member
                          • May 2012
                          • 18

                          #13
                          Originally posted by dpryan View Post
                          You could also simply:
                          Code:
                          mydata4 <- mydata4[,-1]
                          However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.

                          Uhh indeed we have duplicated gene names, we have to investigate before proceeding further. Thanks for your answers

                          Comment

                          • rookie_genomics
                            Junior Member
                            • Mar 2019
                            • 4

                            #14
                            Hi

                            I just started using deseq2 for DE analysis
                            I have an excel sheet input with gene names followed by 16 columns with reads.
                            I tried to generate a matrix using this file and I keep getting an error similar to that is mentioned here
                            So this is what is happening

                            analysis2 <- as.matrix(deseq2_analysis2)
                            (condition <- factor(c(rep("group1", 4), rep("group2", 4), rep("group3", 4), rep("group4", 4))))
                            group1 group1 group1 group1 group2 group2 group2 group2 group3 group3 group3 group3 group4
                            group4 group4 group4
                            Levels: group1 group2 group3 group4
                            (coldata <- data.frame(row.names=colnames(analysis2), condition))
                            Error in data.frame(row.names = colnames(analysis2), condition) :
                            row names supplied are of the wrong length
                            What am I doing wrong? I want an output that is sorted by gene names

                            This is the output of the head command

                            head(analysis2)
                            gene.name Sample1_group1 Sample2_group1 Sample3_group1 Sample4_group1 Sample1_group2
                            [1,] "YAL068C" " 0" " 0" " 0" " 0" " 2"
                            [2,] "YAL067W-A" " 0" " 0" " 0" " 0" " 0"
                            [3,] "YAL067C" " 243" " 242" " 109" " 130" " 271"
                            [4,] "YAL065C" " 16" " 7" " 52" " 30" " 23"
                            [5,] "YAL064W-B" " 23" " 21" " 21" " 33" " 11"
                            [6,] "YAL064C-A" " 38" " 42" " 76" " 88" " 47"
                            Sample2_group2 Sample3_group2 Sample4_group2 Sample1_group3 Sample2_group3 Sample3_group3
                            [1,] " 0" " 2" " 0" " 0" " 1" " 6"
                            [2,] " 0" " 2" " 0" " 0" " 0" " 0"
                            [3,] " 233" " 132" " 150" " 228" " 212" " 174"
                            [4,] " 10" " 22" " 46" " 15" " 17" " 46"
                            [5,] " 28" " 56" " 19" " 19" " 22" " 40"
                            [6,] " 40" " 44" " 65" " 42" " 35" " 74"
                            Sample4_group3 Sample1_group4 Sample2_group4 Sample3_group4 Sample4_group4
                            [1,] " 2" " 0" " 1" " 0" " 0"
                            [2,] " 0" " 0" " 0" " 0" " 0"
                            [3,] " 176" " 96" " 73" " 132" " 77"
                            [4,] " 39" " 18" " 11" " 39" " 27"
                            [5,] " 27" " 20" " 16" " 26" " 18"
                            [6,] " 83" " 49" " 23" " 55" " 52"
                            I am new to R and DE analysis and any help will be appreciated
                            Last edited by rookie_genomics; 03-11-2019, 11:58 AM.

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Today, 08:59 AM
                            0 responses
                            8 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            21 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 11:40 AM
                            0 responses
                            15 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            29 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...