Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq cds error; help needed

    Hello,

    I am trying to do some differential expression work on my two Illumina datasets. I only have experience with tophat => cufflinks (mostly through galaxy but some command line) so my computing skills are at a minimum, especially in R. I am trying to run the DESeq package in R but an having trouble with the creating the cds to do analysis through DESeq.

    I made a raw count file using htseqcount and used copy and paste to put them in excel with 3 columns: the genes, sample A counts, sample B counts. My initial steps in R for reading the file seem to be working but I get an error "not an integer: missing value where TRUE/FALSE needed" when making the cds.

    Thank you in advance for any help in correcting my errors. My R work is as follows after loading the DESeq library:

    > countTable <- read.csv( "~/Desktop/mergedcounts.csv", header=TRUE, row.names=1)
    > head(countTable)
    A B
    20ALPHA-HSD 0 0
    A1BG 0 0
    A2M 0 0
    A2ML1 0 21
    A4GNT 0 0
    AAAS 0 1
    > conds <- factor( c( "highfert", "lowfert" ) )
    > conds
    [1] highfert lowfert
    Levels: highfert lowfert
    > cds <- newCountDataSet( countTable, conds )
    Error in if (any(round(countData) != countData)) stop("The countData is not integer.") :
    missing value where TRUE/FALSE needed

  • #2
    Hi ccard28

    you need to make sure that your 'countTable' is a data.frame whose columns are numeric variables of storage class 'integer', and contain no NA (and no negative) values. It might be necessary to read a basic R intro to familiarize yourself with these concepts.

    To trouble-shoot, you could try (not tested):

    sapply(countTable, function(x) which(is.na(x)))

    Best wishes
    Wolfgang
    Wolfgang Huber
    EMBL

    Comment


    • #3
      I feel that my data table does follow the correct parameters. From my understanding of the commands i used thus far:

      countTable <- read.csv( "~/Desktop/mergedcounts.csv", header=TRUE, row.names=1)

      This is creating the data.frame file that is needed. This command reads my table and calls the first row in which in my excel.csv I have A and B for column headers for my 2 samples. The row.names=1 should be saying that column one is my row names which in my .csv file are gene names. All of my values are read counts that are all whole, positive numbers with many 0s as well so this should satisfy the integer requirement.

      If I am calling row 1 column names with "header=TRUE" and the first column the row names with "row.names=1" that leaves me with only positive whole numbers and 0s which should satisfy the integer requirement but why would I keep getting the error:
      "Error in if (any(round(countData) != countData)) stop("The countData is not integer.") : missing value where TRUE/FALSE needed" ?

      My data is printing fine in R so the table is importing correctly but I still don't understand why the error keeps occurring. Could my column header or row.names functions not be separating the letters/gene names correctly? Could it by my conditions i set up with "conds <- factor( c( "highfert", "lowfert" ) )" is messing things up when trying to create the cds?

      Creating the cds seems like it should be a simple step especially with my data apparently printing correctly within the R console when checking the countTable. Without the cds working I can't do any actual analysis within DESeq.

      I tried to read up on sapply and tried your sapply command and it did not change anything with the error and I am not entirely sure the basis for using sappily in this instance.

      Any other input would be very welcome.

      Thank You,
      ccard28

      Comment


      • #4
        You might just cut to the chase and:

        Code:
        which(round(countTable) != countTable)
        To see the index in (the matrix conversion of) countTable that's causing problems.

        The point of Wolfang's sapply method was to output a matrix of True/False values so you can see which cells of your table might be producing NA values. It won't actually change anything, but instead print the results to screen. You could easily find out how many of the cells are producing NA values with:

        Code:
        table(sapply(countTable, function(x) which(is.na(x))))
        You'll find a basic fluency in R to be extremely useful in bioinformatics.

        Comment


        • #5
          Thank you both very much for your input. I was able to interpret the sapply function that you both mentioned and determine the 2 rows that had missing values that were causing problems with my cds creation. Without the sapply I never would have found them amongst the thousands of rows, much appreciated.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Today, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          37 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          35 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X