Quote:
Originally Posted by ETHANol
So why is it bad to threshold raw pvalues?

Didn't see the question until now, but to not leave it unanswered:
Imagine your genome has 10,000 genes, You think that some of them are differentially expressed, but, in reality, none of them is. You cut your p values at 0.05.
Now remember the definition of a p value: If a test result is assigned the p value
p, the probability of seeing a result this strong or stronger only due to noise (i.e., with there being no real effect) is
p.
Hence, even if no genes are differentially expressed, 5% of the genes will have a p value below 5%. For 10,000 genes, these are 500.
Now. let's assume there are truly differentially expressed genes in your study. Let's say, you find 1,000 of your 10,000 genes to have a raw p value below 5%. From the argument above, you should still expect this list of 1000 genes to contain 500 false positives, i.e., your false dicovery rate is 500/1000=50%. This is clearly unacceptably large.
The BenjaminiHochberg adjustment, which formalizes this argument, will hence adjust a raw p value of 0.05 to an adjusted p value of 0.5. In practise, you use the logic the other way round and decide on a false discovery rate that you deem acceptable, and look up which genes got an adjusted value below this.