Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • KYR
    Member
    • May 2012
    • 18

    Need help with DESeq2

    I have a text file, containing read counts per gene for each treatments and control, with the following column

    [Gene Symbol] [C1] [C2] [C3] [B1] [B2] [B3] [A1] [A2] [A3]
    C is Control
    A is Treatment 1
    B is Treatment 2
    -> Each of C, A an B have 3 replicates

    When I do data.frame it generates an error

    Code:
    library( "DESeq2" )
    library("Biobase")
    mydata4 = read.table("matrix4.txt", header=TRUE)
    head(mydata4)
    
    samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3))))
    Error in data.frame(row.names = c("C1", "C2", "C3", "B1", "B2", "B3",  : 
      row names supplied are of the wrong length

    How can I fix that ? I seems I need to subset my data though I don't know how to do this and deseq2 doesn't transpose my columns to rows
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    That command won't produce the error you showed.

    Comment

    • KYR
      Member
      • May 2012
      • 18

      #3
      Originally posted by dpryan View Post
      That command won't produce the error you showed.
      Indeed, I started again from a fresh R command shell, and that error doesn't appear anymore. However it generates another one:


      Code:
      library( "DESeq2" )
      library("Biobase")
      mydata = read.table("matrix4.txt", header=TRUE)
      samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3))))
      
      dds <- DESeqDataSetFromMatrix(countData = as.matrix(mydata), colData=samples, design=~condition)
      
      Error in validObject(.Object) : 
        invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol

      So I check the number of columns and rows for each and ncol has 1 more than ncol

      I'm guessing it's coming from the gene symbol column..though how can I fix this??
      Last edited by KYR; 07-16-2014, 01:11 PM. Reason: typo

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        What's the output of
        Code:
        dim(as.matrix(mydata))
        I'm guessing that the mydata object has the gene names as a column rather than as the row names.

        Comment

        • Richard Finney
          Senior Member
          • Feb 2009
          • 701

          #5
          Slight Tangent ... Using "htseq-count" outputs ... this works for me ....

          Here's the template I hack for deseq2 ....

          source("http://bioconductor.org/biocLite.R")
          library(DESeq2,lib.loc="/home/finneyr/Rlib")

          sampleFiles = c(
          "file1.txt" ,
          "file2.txt" ,
          #... (fill in the names of you htseq count files here 1 to N files.
          :"filen.txt"
          )

          #set your condtions for the files in SampleFiles
          sampleCondition = c( "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "untreated" , "untreated" , "untreated" , "untreated" , "untreated" )

          sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)

          #might need this ... I'm not sure
          libType = c ( "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" )

          options(max.print=100000)
          options(width=500)

          #set directory to your place where you keep your files llisted in "sampleFiles" which are htseqcount output files.
          directory="/data/nextgen/finneyr/novo/CNT"
          ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
          print(ddsHTSeq)

          colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("treated","untreated"))

          dds<-DESeq(ddsHTSeq)
          print(dds)
          res<-results(dds)
          print(res);
          # sort by padj (:adjusted p-value") ...
          res<-res[order(res$padj),]

          #write results to file name "rpt5" , change this to your output file name, deseq2 explains log2foldchange and other fields.
          write.csv(as.data.frame(res),file="rpt5")
          q(save="no")
          Last edited by Richard Finney; 07-16-2014, 01:21 PM.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            Why do you print the results (and not even in an abbreviated form!) to screen when you're just going to write them to a file as well?

            Comment

            • Michael Love
              Senior Member
              • Jul 2013
              • 333

              #7
              see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.

              Comment

              • KYR
                Member
                • May 2012
                • 18

                #8
                Originally posted by dpryan View Post
                What's the output of
                Code:
                dim(as.matrix(mydata))
                I'm guessing that the mydata object has the gene names as a column rather than as the row names.
                I gives me the following:

                Code:
                > dim(as.matrix(mydata4))
                [1] 25197    10
                I guess I have one more row column because of the gene symbol column which is first and should be row.names.
                Last edited by KYR; 07-16-2014, 01:40 PM. Reason: typo on column number

                Comment

                • KYR
                  Member
                  • May 2012
                  • 18

                  #9
                  Originally posted by Michael Love View Post
                  see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.

                  yes that's what's happening here, the first column gene symbol should be row.names. Though I don't know how to fix that. Any help would be greatlly appreciated..

                  Comment

                  • dpryan
                    Devon Ryan
                    • Jul 2011
                    • 3478

                    #10
                    You actually have 4 additional columns, since you only described 9 samples. You probably just need to:
                    Code:
                    mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)

                    Comment

                    • KYR
                      Member
                      • May 2012
                      • 18

                      #11
                      Originally posted by dpryan View Post
                      You actually have 4 additional columns, since you only described 9 samples. You probably just need to:
                      Code:
                      mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)

                      That's what I've done originally, but it gave me the following error:

                      Code:
                      > mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)
                      Error in read.table("matrix4.txt", header = TRUE, row.names = 1) : 
                        duplicate 'row.names' are not allowed
                      So I decided to dump the row.names, this is weird it never happened before.

                      Comment

                      • dpryan
                        Devon Ryan
                        • Jul 2011
                        • 3478

                        #12
                        You could also simply:
                        Code:
                        mydata4 <- mydata4[,-1]
                        However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.

                        Comment

                        • KYR
                          Member
                          • May 2012
                          • 18

                          #13
                          Originally posted by dpryan View Post
                          You could also simply:
                          Code:
                          mydata4 <- mydata4[,-1]
                          However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.

                          Uhh indeed we have duplicated gene names, we have to investigate before proceeding further. Thanks for your answers

                          Comment

                          • rookie_genomics
                            Junior Member
                            • Mar 2019
                            • 4

                            #14
                            Hi

                            I just started using deseq2 for DE analysis
                            I have an excel sheet input with gene names followed by 16 columns with reads.
                            I tried to generate a matrix using this file and I keep getting an error similar to that is mentioned here
                            So this is what is happening

                            analysis2 <- as.matrix(deseq2_analysis2)
                            (condition <- factor(c(rep("group1", 4), rep("group2", 4), rep("group3", 4), rep("group4", 4))))
                            group1 group1 group1 group1 group2 group2 group2 group2 group3 group3 group3 group3 group4
                            group4 group4 group4
                            Levels: group1 group2 group3 group4
                            (coldata <- data.frame(row.names=colnames(analysis2), condition))
                            Error in data.frame(row.names = colnames(analysis2), condition) :
                            row names supplied are of the wrong length
                            What am I doing wrong? I want an output that is sorted by gene names

                            This is the output of the head command

                            head(analysis2)
                            gene.name Sample1_group1 Sample2_group1 Sample3_group1 Sample4_group1 Sample1_group2
                            [1,] "YAL068C" " 0" " 0" " 0" " 0" " 2"
                            [2,] "YAL067W-A" " 0" " 0" " 0" " 0" " 0"
                            [3,] "YAL067C" " 243" " 242" " 109" " 130" " 271"
                            [4,] "YAL065C" " 16" " 7" " 52" " 30" " 23"
                            [5,] "YAL064W-B" " 23" " 21" " 21" " 33" " 11"
                            [6,] "YAL064C-A" " 38" " 42" " 76" " 88" " 47"
                            Sample2_group2 Sample3_group2 Sample4_group2 Sample1_group3 Sample2_group3 Sample3_group3
                            [1,] " 0" " 2" " 0" " 0" " 1" " 6"
                            [2,] " 0" " 2" " 0" " 0" " 0" " 0"
                            [3,] " 233" " 132" " 150" " 228" " 212" " 174"
                            [4,] " 10" " 22" " 46" " 15" " 17" " 46"
                            [5,] " 28" " 56" " 19" " 19" " 22" " 40"
                            [6,] " 40" " 44" " 65" " 42" " 35" " 74"
                            Sample4_group3 Sample1_group4 Sample2_group4 Sample3_group4 Sample4_group4
                            [1,] " 2" " 0" " 1" " 0" " 0"
                            [2,] " 0" " 0" " 0" " 0" " 0"
                            [3,] " 176" " 96" " 73" " 132" " 77"
                            [4,] " 39" " 18" " 11" " 39" " 27"
                            [5,] " 27" " 20" " 16" " 26" " 18"
                            [6,] " 83" " 49" " 23" " 55" " 52"
                            I am new to R and DE analysis and any help will be appreciated
                            Last edited by rookie_genomics; 03-11-2019, 11:58 AM.

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                              by SEQadmin2


                              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                              Here are nine questions we think about, in roughly the order they matter, before...
                              06-18-2026, 07:11 AM
                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, 06-26-2026, 11:10 AM
                            0 responses
                            16 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-17-2026, 06:09 AM
                            0 responses
                            49 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-09-2026, 11:58 AM
                            0 responses
                            107 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-05-2026, 10:09 AM
                            0 responses
                            125 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...