Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEseq2 - some values in assay are negative

    Dear all,

    I am a first time DEseq2 user, and I am already stuck with importing my dataset.

    My RNAseq data has been going through the Hisat2 - StringTie pipeline and I have created a gene counts file using the python script provided with StringTie.

    As far as I can tell, my gene count data set looks just fine, except that there is something weird going on with negative values, and I have no idea what.

    I am trying to import the data into DEseq2 with the DESeqDataSetFromMatrix function.

    Here's a step-by-step version of what I have done so far:

    # Import data file that contains gene counts
    countdata <- as.matrix(read_excel("DEseqcounts.xlsx"),header=TRUE)
    # take row names from the first column
    rownames(countdata) <- countdata[ , 1]
    # first column is now duplicated, so remove
    countdata <- countdata[,-1]

    # Import data file that contains phenotype data in columns
    coldata=as.matrix(read_excel("coldata.xlsx"),header=TRUE)
    # take row names from the first column
    rownames(coldata) <- coldata[ , 1]
    # first column is now duplicated, so remove
    coldata <- coldata[,-1]

    (I have visually checked that the files are imported correctly, and I can't seem to find anything that looks wrong)

    I would like to run the DESeqDataSetFromMatrix as follows:

    DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ treatment, tidy = FALSE, ignoreRank = FALSE)

    which returns this error message:
    Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are negative

    Indeed, there seem to be values in my "countdata" object that are somehow classified as negative:

    countdata["" < 0] omitted 1280373 entries, which look like this:

    [1] " 0" " 0" " 0" " 0" " 5" " 0" " 26" " 104" " 10" " 24"
    [11] " 22" " 3" " 22" " 0" " 226" " 0" " 152" " 2" " 153" " 178"
    [21] " 0" " 2" " 427" " 153" " 0" " 475" " 0" " 0" " 16" " 101"
    [31] " 78" " 26" " 71" " 372" " 35" " 17" " 108" " 100" " 43" " 0"

    I have no ideas where that comes from. I couldn't find any negative, empty or NA cells in my count data file, nor are there any spaces in the cells.

    Does anyone have a solution, or an idea on what went wrong?

    Any help is highly appreciated,

    Thanks so much!

  • #2
    It looks like you have an extra space in front of all of your numbers and that's screwing everything up. Fix how the values are imported and ensure they're actually numbers and not strings.

    Comment


    • #3
      I'm not so familiar with the stringtie pipeline, but I recommend avoiding Excel for most NGS related analyses (see Zeeberg et al. 2004: Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics)

      Can you use the python script to get simple csv/tsv output?
      [Update]
      The prepDE.py script produces csv files. Import these directly R; any selection and computation you've done with Excel can be done there as well.
      Last edited by Michael.Ante; 11-29-2016, 12:15 AM.

      Comment


      • #4
        I have double checked and there is no extra space in each of my cells,
        that is actually the reason I later saved this file as excel.

        The python script gives me the gene counts in csv format, I have of course tried that too and it gives the same error.

        Using the same file in edgeR for example works without issues.

        Comment


        • #5
          Try as a first solution:
          countdata <- as.matrix(read_excel("DEseqcounts.xlsx"),header=TRUE, row.names=1)

          And check then
          summary(is.numeric(countdata[,1]))

          Maybe there are some empty lines at the end, which lead to the fact that R is reading it as factors rather than numbers. This can be checked by tail(countdata) .

          Comment


          • #6
            The class of countdata[,1] is "character"

            summary(is.numeric(countdata[,1]))
            Mode FALSE NA's
            logical 1 0

            class(countdata[,1])
            [1] "character"

            That should be the issue I guess?

            Thanks for your help!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            46 views
            0 likes
            Last Post seqadmin  
            Working...
            X