Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEseq2 - some values in assay are negative

    Dear all,

    I am a first time DEseq2 user, and I am already stuck with importing my dataset.

    My RNAseq data has been going through the Hisat2 - StringTie pipeline and I have created a gene counts file using the python script provided with StringTie.

    As far as I can tell, my gene count data set looks just fine, except that there is something weird going on with negative values, and I have no idea what.

    I am trying to import the data into DEseq2 with the DESeqDataSetFromMatrix function.

    Here's a step-by-step version of what I have done so far:

    # Import data file that contains gene counts
    countdata <- as.matrix(read_excel("DEseqcounts.xlsx"),header=TRUE)
    # take row names from the first column
    rownames(countdata) <- countdata[ , 1]
    # first column is now duplicated, so remove
    countdata <- countdata[,-1]

    # Import data file that contains phenotype data in columns
    coldata=as.matrix(read_excel("coldata.xlsx"),header=TRUE)
    # take row names from the first column
    rownames(coldata) <- coldata[ , 1]
    # first column is now duplicated, so remove
    coldata <- coldata[,-1]

    (I have visually checked that the files are imported correctly, and I can't seem to find anything that looks wrong)

    I would like to run the DESeqDataSetFromMatrix as follows:

    DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ treatment, tidy = FALSE, ignoreRank = FALSE)

    which returns this error message:
    Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are negative

    Indeed, there seem to be values in my "countdata" object that are somehow classified as negative:

    countdata["" < 0] omitted 1280373 entries, which look like this:

    [1] " 0" " 0" " 0" " 0" " 5" " 0" " 26" " 104" " 10" " 24"
    [11] " 22" " 3" " 22" " 0" " 226" " 0" " 152" " 2" " 153" " 178"
    [21] " 0" " 2" " 427" " 153" " 0" " 475" " 0" " 0" " 16" " 101"
    [31] " 78" " 26" " 71" " 372" " 35" " 17" " 108" " 100" " 43" " 0"

    I have no ideas where that comes from. I couldn't find any negative, empty or NA cells in my count data file, nor are there any spaces in the cells.

    Does anyone have a solution, or an idea on what went wrong?

    Any help is highly appreciated,

    Thanks so much!

  • #2
    It looks like you have an extra space in front of all of your numbers and that's screwing everything up. Fix how the values are imported and ensure they're actually numbers and not strings.

    Comment


    • #3
      I'm not so familiar with the stringtie pipeline, but I recommend avoiding Excel for most NGS related analyses (see Zeeberg et al. 2004: Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics)

      Can you use the python script to get simple csv/tsv output?
      [Update]
      The prepDE.py script produces csv files. Import these directly R; any selection and computation you've done with Excel can be done there as well.
      Last edited by Michael.Ante; 11-29-2016, 12:15 AM.

      Comment


      • #4
        I have double checked and there is no extra space in each of my cells,
        that is actually the reason I later saved this file as excel.

        The python script gives me the gene counts in csv format, I have of course tried that too and it gives the same error.

        Using the same file in edgeR for example works without issues.

        Comment


        • #5
          Try as a first solution:
          countdata <- as.matrix(read_excel("DEseqcounts.xlsx"),header=TRUE, row.names=1)

          And check then
          summary(is.numeric(countdata[,1]))

          Maybe there are some empty lines at the end, which lead to the fact that R is reading it as factors rather than numbers. This can be checked by tail(countdata) .

          Comment


          • #6
            The class of countdata[,1] is "character"

            summary(is.numeric(countdata[,1]))
            Mode FALSE NA's
            logical 1 0

            class(countdata[,1])
            [1] "character"

            That should be the issue I guess?

            Thanks for your help!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X