Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq: "NA" generated in the resulted differentially expressed genes

    I am using DESeq to analyze my RNA-seq data. However, I found in my generated differentially expressed genes there were a a bunch of "NA". Please see the attached table for details. The number of those "NA" genes is different for different comparisons.

    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
    NA NA NA NA NA NA NA NA NA NA NA
    NA.1 NA NA NA NA NA NA NA NA NA NA
    616 GPR128 187.5648803 0 234.4561004 Inf Inf 1.19E-15 1.16E-12 0 19.90527498


    Is there anyone else who experience this before? What could be the problem? Thanks.

  • #2
    Originally posted by idyll_ty View Post
    I am using DESeq to analyze my RNA-seq data. However, I found in my generated differentially expressed genes there were a a bunch of "NA". Please see the attached table for details. The number of those "NA" genes is different for different comparisons.

    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
    NA NA NA NA NA NA NA NA NA NA NA
    NA.1 NA NA NA NA NA NA NA NA NA NA
    616 GPR128 187.5648803 0 234.4561004 Inf Inf 1.19E-15 1.16E-12 0 19.90527498


    Is there anyone else who experience this before? What could be the problem? Thanks.
    Did you check the the same set of geneID used for read-counting is identical for every samples (gene with no read : 0)?
    Marco

    Comment


    • #3
      Yes, the genes names are consistent.

      I find the problem. Because in my input read count data, for some genes, there are no reads mapped at all, and those genes cause NA values in the results.

      Comment


      • #4
        specifically, you mean there is no reads mapped at all.

        Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

        Visually, suppose there are 2 samples per group

        [group1] sample 1: 0, sample 2: 0
        [group2] sample 1: 1, sample 2: 2

        ?
        Marco

        Comment


        • #5
          Originally posted by marcowanger View Post
          specifically, you mean there is no reads mapped at all.

          Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

          Visually, suppose there are 2 samples per group

          [group1] sample 1: 0, sample 2: 0
          [group2] sample 1: 1, sample 2: 2

          ?
          idyll_ty, have you checked your data??
          Marco

          Comment


          • #6
            Typically, such entries appear in R when subsetting with a conditional expression that may contain or result in NA. Please post your full R code (and the output of sessionInfo()), the we can have a look.

            Comment


            • #7
              Dear DESeq experts,

              Apologies for continuing this thread - I have an identical problem.

              I try to identify Differentially Expressed Genes (DEG) from a known dataset. I am trying to understand why over 80% of entries with 'NA' values are obtained extracted from counts table as obtained via DESeq_1.8.2. I have seen similar queries in the forum but I believe I am using the latest release of DESeq that is not a development version. However, if this issue is fixed in an updated version please let us know and how do we load that library in R? Thanks.
              Best,
              sarosh


              A two step approach to my workflow is as follows:


              Part_1- extract dataset (pasilla dataset)
              Part_2- use DESeq library calls to identify DEGs and show the 'NA' values.

              This dataset (countstable.txt) has
              14,470 entries with count information, of which
              ~2,500 entries with count information 0 for all case replicates


              Code:
              ################################################
              #
              #Part_1- extract a dataset
              #
              
              rm(list = ls());
              
              #require(DESeq);
              require(pasilla);
              
              data("pasillaGenes");
              
              head(counts(pasillaGenes));
              
              #save_data to view and contrast
              write.table(counts(pasillaGenes), file="countstable.txt", quote=FALSE, sep="  ", row.names=TRUE);

              ################################################

              #edit countstable.txt - remove header
              #count the number of entries with all counts 0
              # (use grep command ..)
              #start R again


              Code:
              ################################################
              ################################################
              #
              #Part_2
              
              require(DESeq);
              require(pasilla);
              
              countsTable <-read.table("countstable.txt", header=TRUE, stringsAsFactors=TRUE)
              rownames( countsTable ) <- countsTable$gene
              countsTable <- countsTable[,-1]
              conds=c("U","U","U","U","T","T","T");
              
              cds <- newCountDataSet( countsTable, conds);
              cds <-estimateSizeFactors(cds);
              
              #normcds <- counts( cds, normalized=TRUE );
              #write.table(normcds, file="normalized.countstable.txt", quote=FALSE, sep="\t", row.names=TRUE);
              
              cds <- estimateDispersions( cds, sharingMode="fit-only" );
              res <- nbinomTest(cds, "U","T");
              
              resSig <- res[ res$padj < 0.05,];
              resSig <- resSig[ order(resSig$pval), ];
              write.table(resSig, file="DEGsig_list.txt", quote=FALSE, sep="\t", row.names=FALSE);
              
              #############################################
              Final list of DEG has a large majority of NA entries.

              Comment


              • #8
                When you make the resSig it keeps the same number of lines that were in res and just writes NA for all those lines that did not meet the cutoff. The presence of NA in these lines is not a problem.

                To get rid of them, just use the na.omit function:

                Code:
                resSig<-na.omit(resSig)
                This will omit all lines that have an NA, leaving you only those lines with differentially expressed genes.

                Comment


                • #9
                  i usually trim out zero count genes (across all samples) before calling newCountDataSet like so...

                  Code:
                  mycounts <- mycounts[rowSums(mycounts) > 0,]
                  /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                  Salk Institute for Biological Studies, La Jolla, CA, USA */

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X