View Single Post
02-09-2012, 12:19 PM   #8
Simon Anders
Senior Member

Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994

Quote:
 Originally Posted by ETHANol So why is it bad to threshold raw p-values?
Didn't see the question until now, but to not leave it unanswered:

Imagine your genome has 10,000 genes, You think that some of them are differentially expressed, but, in reality, none of them is. You cut your p values at 0.05.

Now remember the definition of a p value: If a test result is assigned the p value p, the probability of seeing a result this strong or stronger only due to noise (i.e., with there being no real effect) is p.

Hence, even if no genes are differentially expressed, 5% of the genes will have a p value below 5%. For 10,000 genes, these are 500.

Now. let's assume there are truly differentially expressed genes in your study. Let's say, you find 1,000 of your 10,000 genes to have a raw p value below 5%. From the argument above, you should still expect this list of 1000 genes to contain 500 false positives, i.e., your false dicovery rate is 500/1000=50%. This is clearly unacceptably large.

The Benjamini-Hochberg adjustment, which formalizes this argument, will hence adjust a raw p value of 0.05 to an adjusted p value of 0.5. In practise, you use the logic the other way round and decide on a false discovery rate that you deem acceptable, and look up which genes got an adjusted value below this.

Last edited by Simon Anders; 02-09-2012 at 12:21 PM.