Seqanswers Leaderboard Ad

**marcowanger** · 11-01-2011, 08:21 PM

Originally posted by idyll_ty View Post

I am using DESeq to analyze my RNA-seq data. However, I found in my generated differentially expressed genes there were a a bunch of "NA". Please see the attached table for details. The number of those "NA" genes is different for different comparisons.

id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
NA NA NA NA NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA NA NA NA NA
616 GPR128 187.5648803 0 234.4561004 Inf Inf 1.19E-15 1.16E-12 0 19.90527498

Is there anyone else who experience this before? What could be the problem? Thanks.

Did you check the the same set of geneID used for read-counting is identical for every samples (gene with no read : 0)?

**idyll_ty** · 11-01-2011, 08:49 PM

Yes, the genes names are consistent.

I find the problem. Because in my input read count data, for some genes, there are no reads mapped at all, and those genes cause NA values in the results.

**marcowanger** · 11-01-2011, 09:19 PM

specifically, you mean there is no reads mapped at all.

Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

Visually, suppose there are 2 samples per group

[group1] sample 1: 0, sample 2: 0
[group2] sample 1: 1, sample 2: 2

?

**marcowanger** · 11-04-2011, 09:44 PM

Originally posted by marcowanger View Post

specifically, you mean there is no reads mapped at all.

Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

Visually, suppose there are 2 samples per group

[group1] sample 1: 0, sample 2: 0
[group2] sample 1: 1, sample 2: 2

?

idyll_ty, have you checked your data??

**Simon Anders** · 11-05-2011, 11:30 AM

Typically, such entries appear in R when subsetting with a conditional expression that may contain or result in NA. Please post your full R code (and the output of sessionInfo()), the we can have a look.

**fatakias** · 05-02-2012, 02:34 PM

Dear DESeq experts,

Apologies for continuing this thread - I have an identical problem.

I try to identify Differentially Expressed Genes (DEG) from a known dataset. I am trying to understand why over 80% of entries with 'NA' values are obtained extracted from counts table as obtained via DESeq_1.8.2. I have seen similar queries in the forum but I believe I am using the latest release of DESeq that is not a development version. However, if this issue is fixed in an updated version please let us know and how do we load that library in R? Thanks.
Best,
sarosh

A two step approach to my workflow is as follows:

Part_1- extract dataset (pasilla dataset)
Part_2- use DESeq library calls to identify DEGs and show the 'NA' values.

This dataset (countstable.txt) has
14,470 entries with count information, of which
~2,500 entries with count information 0 for all case replicates

Code:

################################################
#
#Part_1- extract a dataset
#

rm(list = ls());

#require(DESeq);
require(pasilla);

data("pasillaGenes");

head(counts(pasillaGenes));

#save_data to view and contrast
write.table(counts(pasillaGenes), file="countstable.txt", quote=FALSE, sep="  ", row.names=TRUE);

################################################

#edit countstable.txt - remove header
#count the number of entries with all counts 0
# (use grep command ..)
#start R again

Code:

################################################
################################################
#
#Part_2

require(DESeq);
require(pasilla);

countsTable <-read.table("countstable.txt", header=TRUE, stringsAsFactors=TRUE)
rownames( countsTable ) <- countsTable$gene
countsTable <- countsTable[,-1]
conds=c("U","U","U","U","T","T","T");

cds <- newCountDataSet( countsTable, conds);
cds <-estimateSizeFactors(cds);

#normcds <- counts( cds, normalized=TRUE );
#write.table(normcds, file="normalized.countstable.txt", quote=FALSE, sep="\t", row.names=TRUE);

cds <- estimateDispersions( cds, sharingMode="fit-only" );
res <- nbinomTest(cds, "U","T");

resSig <- res[ res$padj < 0.05,];
resSig <- resSig[ order(resSig$pval), ];
write.table(resSig, file="DEGsig_list.txt", quote=FALSE, sep="\t", row.names=FALSE);

#############################################

Final list of DEG has a large majority of NA entries.

**chadn737** · 05-02-2012, 03:09 PM

When you make the resSig it keeps the same number of lines that were in res and just writes NA for all those lines that did not meet the cutoff. The presence of NA in these lines is not a problem.

To get rid of them, just use the na.omit function:

Code:

resSig<-na.omit(resSig)

This will omit all lines that have an NA, leaving you only those lines with differentially expressed genes.

**sdriscoll** · 05-02-2012, 03:28 PM

i usually trim out zero count genes (across all samples) before calling newCountDataSet like so...

Code:

mycounts <- mycounts[rowSums(mycounts) > 0,]

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

DESeq: "NA" generated in the resulted differentially expressed genes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News