Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help with DESeq2

    I have a text file, containing read counts per gene for each treatments and control, with the following column

    [Gene Symbol] [C1] [C2] [C3] [B1] [B2] [B3] [A1] [A2] [A3]
    C is Control
    A is Treatment 1
    B is Treatment 2
    -> Each of C, A an B have 3 replicates

    When I do data.frame it generates an error

    Code:
    library( "DESeq2" )
    library("Biobase")
    mydata4 = read.table("matrix4.txt", header=TRUE)
    head(mydata4)
    
    samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3))))
    Error in data.frame(row.names = c("C1", "C2", "C3", "B1", "B2", "B3",  : 
      row names supplied are of the wrong length

    How can I fix that ? I seems I need to subset my data though I don't know how to do this and deseq2 doesn't transpose my columns to rows

  • #2
    That command won't produce the error you showed.

    Comment


    • #3
      Originally posted by dpryan View Post
      That command won't produce the error you showed.
      Indeed, I started again from a fresh R command shell, and that error doesn't appear anymore. However it generates another one:


      Code:
      library( "DESeq2" )
      library("Biobase")
      mydata = read.table("matrix4.txt", header=TRUE)
      samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3))))
      
      dds <- DESeqDataSetFromMatrix(countData = as.matrix(mydata), colData=samples, design=~condition)
      
      Error in validObject(.Object) : 
        invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol

      So I check the number of columns and rows for each and ncol has 1 more than ncol

      I'm guessing it's coming from the gene symbol column..though how can I fix this??
      Last edited by KYR; 07-16-2014, 01:11 PM. Reason: typo

      Comment


      • #4
        What's the output of
        Code:
        dim(as.matrix(mydata))
        I'm guessing that the mydata object has the gene names as a column rather than as the row names.

        Comment


        • #5
          Slight Tangent ... Using "htseq-count" outputs ... this works for me ....

          Here's the template I hack for deseq2 ....

          source("http://bioconductor.org/biocLite.R")
          library(DESeq2,lib.loc="/home/finneyr/Rlib")

          sampleFiles = c(
          "file1.txt" ,
          "file2.txt" ,
          #... (fill in the names of you htseq count files here 1 to N files.
          :"filen.txt"
          )

          #set your condtions for the files in SampleFiles
          sampleCondition = c( "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "untreated" , "untreated" , "untreated" , "untreated" , "untreated" )

          sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)

          #might need this ... I'm not sure
          libType = c ( "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" )

          options(max.print=100000)
          options(width=500)

          #set directory to your place where you keep your files llisted in "sampleFiles" which are htseqcount output files.
          directory="/data/nextgen/finneyr/novo/CNT"
          ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
          print(ddsHTSeq)

          colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("treated","untreated"))

          dds<-DESeq(ddsHTSeq)
          print(dds)
          res<-results(dds)
          print(res);
          # sort by padj (:adjusted p-value") ...
          res<-res[order(res$padj),]

          #write results to file name "rpt5" , change this to your output file name, deseq2 explains log2foldchange and other fields.
          write.csv(as.data.frame(res),file="rpt5")
          q(save="no")
          Last edited by Richard Finney; 07-16-2014, 01:21 PM.

          Comment


          • #6
            Why do you print the results (and not even in an abbreviated form!) to screen when you're just going to write them to a file as well?

            Comment


            • #7
              see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.

              Comment


              • #8
                Originally posted by dpryan View Post
                What's the output of
                Code:
                dim(as.matrix(mydata))
                I'm guessing that the mydata object has the gene names as a column rather than as the row names.
                I gives me the following:

                Code:
                > dim(as.matrix(mydata4))
                [1] 25197    10
                I guess I have one more row column because of the gene symbol column which is first and should be row.names.
                Last edited by KYR; 07-16-2014, 01:40 PM. Reason: typo on column number

                Comment


                • #9
                  Originally posted by Michael Love View Post
                  see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.

                  yes that's what's happening here, the first column gene symbol should be row.names. Though I don't know how to fix that. Any help would be greatlly appreciated..

                  Comment


                  • #10
                    You actually have 4 additional columns, since you only described 9 samples. You probably just need to:
                    Code:
                    mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)

                    Comment


                    • #11
                      Originally posted by dpryan View Post
                      You actually have 4 additional columns, since you only described 9 samples. You probably just need to:
                      Code:
                      mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)

                      That's what I've done originally, but it gave me the following error:

                      Code:
                      > mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)
                      Error in read.table("matrix4.txt", header = TRUE, row.names = 1) : 
                        duplicate 'row.names' are not allowed
                      So I decided to dump the row.names, this is weird it never happened before.

                      Comment


                      • #12
                        You could also simply:
                        Code:
                        mydata4 <- mydata4[,-1]
                        However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.

                        Comment


                        • #13
                          Originally posted by dpryan View Post
                          You could also simply:
                          Code:
                          mydata4 <- mydata4[,-1]
                          However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.

                          Uhh indeed we have duplicated gene names, we have to investigate before proceeding further. Thanks for your answers

                          Comment


                          • #14
                            Hi

                            I just started using deseq2 for DE analysis
                            I have an excel sheet input with gene names followed by 16 columns with reads.
                            I tried to generate a matrix using this file and I keep getting an error similar to that is mentioned here
                            So this is what is happening

                            analysis2 <- as.matrix(deseq2_analysis2)
                            (condition <- factor(c(rep("group1", 4), rep("group2", 4), rep("group3", 4), rep("group4", 4))))
                            group1 group1 group1 group1 group2 group2 group2 group2 group3 group3 group3 group3 group4
                            group4 group4 group4
                            Levels: group1 group2 group3 group4
                            (coldata <- data.frame(row.names=colnames(analysis2), condition))
                            Error in data.frame(row.names = colnames(analysis2), condition) :
                            row names supplied are of the wrong length
                            What am I doing wrong? I want an output that is sorted by gene names

                            This is the output of the head command

                            head(analysis2)
                            gene.name Sample1_group1 Sample2_group1 Sample3_group1 Sample4_group1 Sample1_group2
                            [1,] "YAL068C" " 0" " 0" " 0" " 0" " 2"
                            [2,] "YAL067W-A" " 0" " 0" " 0" " 0" " 0"
                            [3,] "YAL067C" " 243" " 242" " 109" " 130" " 271"
                            [4,] "YAL065C" " 16" " 7" " 52" " 30" " 23"
                            [5,] "YAL064W-B" " 23" " 21" " 21" " 33" " 11"
                            [6,] "YAL064C-A" " 38" " 42" " 76" " 88" " 47"
                            Sample2_group2 Sample3_group2 Sample4_group2 Sample1_group3 Sample2_group3 Sample3_group3
                            [1,] " 0" " 2" " 0" " 0" " 1" " 6"
                            [2,] " 0" " 2" " 0" " 0" " 0" " 0"
                            [3,] " 233" " 132" " 150" " 228" " 212" " 174"
                            [4,] " 10" " 22" " 46" " 15" " 17" " 46"
                            [5,] " 28" " 56" " 19" " 19" " 22" " 40"
                            [6,] " 40" " 44" " 65" " 42" " 35" " 74"
                            Sample4_group3 Sample1_group4 Sample2_group4 Sample3_group4 Sample4_group4
                            [1,] " 2" " 0" " 1" " 0" " 0"
                            [2,] " 0" " 0" " 0" " 0" " 0"
                            [3,] " 176" " 96" " 73" " 132" " 77"
                            [4,] " 39" " 18" " 11" " 39" " 27"
                            [5,] " 27" " 20" " 16" " 26" " 18"
                            [6,] " 83" " 49" " 23" " 55" " 52"
                            I am new to R and DE analysis and any help will be appreciated
                            Last edited by rookie_genomics; 03-11-2019, 11:58 AM.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            7 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            7 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            66 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X