Seqanswers Leaderboard Ad

**Wolfgang Huber** · 04-09-2012, 12:45 AM

Dear Kentnf,

in a nutshell, you are asking two different questions and so you get two different answers. In the first case, you estimate the dispersions from all samples, in the second, only from the two samples c24 and t24. The dispersions are parameters that DESeq estimates for each gene and that represent the variance that DESeq's error model expects between replicate measurements. It seems that in test0, the dispersions are estimated to be smaller than in test1 - thus slightly smaller fold changes are already deemed significant, leading to more genes.

Which method is correct? The first point to note is that you are comparing between just two single samples, without replicates, so it is expected that the results are highly volatile. With sufficient number of replicates and good-quality data the differences should be less drastic. So, don't overinterprete your result from just these two samples, and try it also out for other samples and more replicates.

Second, you are asking two different questions, and it is up to you to decide which is the one you care about. This is not a technical question, it is one that only you can decide.

Third, ideally it should not matter much. Check the scatterplot of the -log10 (p-values) from the two tests against each other. I'd expect that they broadly agree (on the high values), and that much of the difference in number of genes that you see would be an effect of arbitrary thresholding (i.e. genes that just make it below p_adj<0.05 in one case might be just above in the other).

About the filtering of genes with overall low counts (i.e. sum of counts across all samples is so small that the gene anyway could never be significantly differently expressed): this is good practice. See Section 5 of the DESeq vignette.

**kentnf** · 04-09-2012, 08:14 AM

Hi, Wolfgang Huber

I make a comparison between test0 and test1. All the DE genes in test1 are included in test0. I think that is the differnet variances on different estimations you metioned. Because our experiment has problem on design. So we can not add more replicates. Base on your suggestions and the current results, I'd like to use test1 code to perform analysis for it show more reliable than test0. And I should remove the genes without expression.

If there is any problem, please give me some advice.

Thanks again for your help.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Question about different results generated by DESeq.

Comment

Comment

Latest Articles

ad_right_rmr

News