I have a couple of lists that I'd like to test for enrichment of (1) pseudogenes, (2) operons, and (3) a couple of gene families. The phyper test has gone smoothly, but I'd like to run a correction for multiple testing, and I'm not really sure what I should be using as the "number of experiments" in each case.
So, for a phyper(131,1658,48246-1658,585,lower.tail=F) pseudogene enrichment test, I'm tempted to multiply the resulting pvalue by 48246 (the number of genomic features) and 11 (the number of "gene types" from Ensembl, one of which is pseudogenes).
For operons, I'd multiply the p-value by genomic features * 2 (one option for in-an-operon, one option for not-in-an-operon).
For the specific gene families, should I find the number of all gene families in my organism?
Do these plans sound reasonable, or am I way off base?
So, for a phyper(131,1658,48246-1658,585,lower.tail=F) pseudogene enrichment test, I'm tempted to multiply the resulting pvalue by 48246 (the number of genomic features) and 11 (the number of "gene types" from Ensembl, one of which is pseudogenes).
For operons, I'd multiply the p-value by genomic features * 2 (one option for in-an-operon, one option for not-in-an-operon).
For the specific gene families, should I find the number of all gene families in my organism?
Do these plans sound reasonable, or am I way off base?
Comment