Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq: "NA" generated in the resulted differentially expressed genes

    I am using DESeq to analyze my RNA-seq data. However, I found in my generated differentially expressed genes there were a a bunch of "NA". Please see the attached table for details. The number of those "NA" genes is different for different comparisons.

    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
    NA NA NA NA NA NA NA NA NA NA NA
    NA.1 NA NA NA NA NA NA NA NA NA NA
    616 GPR128 187.5648803 0 234.4561004 Inf Inf 1.19E-15 1.16E-12 0 19.90527498


    Is there anyone else who experience this before? What could be the problem? Thanks.

  • #2
    Originally posted by idyll_ty View Post
    I am using DESeq to analyze my RNA-seq data. However, I found in my generated differentially expressed genes there were a a bunch of "NA". Please see the attached table for details. The number of those "NA" genes is different for different comparisons.

    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
    NA NA NA NA NA NA NA NA NA NA NA
    NA.1 NA NA NA NA NA NA NA NA NA NA
    616 GPR128 187.5648803 0 234.4561004 Inf Inf 1.19E-15 1.16E-12 0 19.90527498


    Is there anyone else who experience this before? What could be the problem? Thanks.
    Did you check the the same set of geneID used for read-counting is identical for every samples (gene with no read : 0)?
    Marco

    Comment


    • #3
      Yes, the genes names are consistent.

      I find the problem. Because in my input read count data, for some genes, there are no reads mapped at all, and those genes cause NA values in the results.

      Comment


      • #4
        specifically, you mean there is no reads mapped at all.

        Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

        Visually, suppose there are 2 samples per group

        [group1] sample 1: 0, sample 2: 0
        [group2] sample 1: 1, sample 2: 2

        ?
        Marco

        Comment


        • #5
          Originally posted by marcowanger View Post
          specifically, you mean there is no reads mapped at all.

          Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

          Visually, suppose there are 2 samples per group

          [group1] sample 1: 0, sample 2: 0
          [group2] sample 1: 1, sample 2: 2

          ?
          idyll_ty, have you checked your data??
          Marco

          Comment


          • #6
            Typically, such entries appear in R when subsetting with a conditional expression that may contain or result in NA. Please post your full R code (and the output of sessionInfo()), the we can have a look.

            Comment


            • #7
              Dear DESeq experts,

              Apologies for continuing this thread - I have an identical problem.

              I try to identify Differentially Expressed Genes (DEG) from a known dataset. I am trying to understand why over 80% of entries with 'NA' values are obtained extracted from counts table as obtained via DESeq_1.8.2. I have seen similar queries in the forum but I believe I am using the latest release of DESeq that is not a development version. However, if this issue is fixed in an updated version please let us know and how do we load that library in R? Thanks.
              Best,
              sarosh


              A two step approach to my workflow is as follows:


              Part_1- extract dataset (pasilla dataset)
              Part_2- use DESeq library calls to identify DEGs and show the 'NA' values.

              This dataset (countstable.txt) has
              14,470 entries with count information, of which
              ~2,500 entries with count information 0 for all case replicates


              Code:
              ################################################
              #
              #Part_1- extract a dataset
              #
              
              rm(list = ls());
              
              #require(DESeq);
              require(pasilla);
              
              data("pasillaGenes");
              
              head(counts(pasillaGenes));
              
              #save_data to view and contrast
              write.table(counts(pasillaGenes), file="countstable.txt", quote=FALSE, sep="  ", row.names=TRUE);

              ################################################

              #edit countstable.txt - remove header
              #count the number of entries with all counts 0
              # (use grep command ..)
              #start R again


              Code:
              ################################################
              ################################################
              #
              #Part_2
              
              require(DESeq);
              require(pasilla);
              
              countsTable <-read.table("countstable.txt", header=TRUE, stringsAsFactors=TRUE)
              rownames( countsTable ) <- countsTable$gene
              countsTable <- countsTable[,-1]
              conds=c("U","U","U","U","T","T","T");
              
              cds <- newCountDataSet( countsTable, conds);
              cds <-estimateSizeFactors(cds);
              
              #normcds <- counts( cds, normalized=TRUE );
              #write.table(normcds, file="normalized.countstable.txt", quote=FALSE, sep="\t", row.names=TRUE);
              
              cds <- estimateDispersions( cds, sharingMode="fit-only" );
              res <- nbinomTest(cds, "U","T");
              
              resSig <- res[ res$padj < 0.05,];
              resSig <- resSig[ order(resSig$pval), ];
              write.table(resSig, file="DEGsig_list.txt", quote=FALSE, sep="\t", row.names=FALSE);
              
              #############################################
              Final list of DEG has a large majority of NA entries.

              Comment


              • #8
                When you make the resSig it keeps the same number of lines that were in res and just writes NA for all those lines that did not meet the cutoff. The presence of NA in these lines is not a problem.

                To get rid of them, just use the na.omit function:

                Code:
                resSig<-na.omit(resSig)
                This will omit all lines that have an NA, leaving you only those lines with differentially expressed genes.

                Comment


                • #9
                  i usually trim out zero count genes (across all samples) before calling newCountDataSet like so...

                  Code:
                  mycounts <- mycounts[rowSums(mycounts) > 0,]
                  /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                  Salk Institute for Biological Studies, La Jolla, CA, USA */

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Advancing Precision Medicine for Rare Diseases in Children
                    by seqadmin




                    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                    12-16-2024, 07:57 AM
                  • seqadmin
                    Recent Advances in Sequencing Technologies
                    by seqadmin



                    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                    Long-Read Sequencing
                    Long-read sequencing has seen remarkable advancements,...
                    12-02-2024, 01:49 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-17-2024, 10:28 AM
                  0 responses
                  33 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-13-2024, 08:24 AM
                  0 responses
                  48 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-12-2024, 07:41 AM
                  0 responses
                  34 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-11-2024, 07:45 AM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X