Drake 02-01-2017 09:23 AM

GAGE update and result analysis
I have recently started to use gage and I am bit confused about the pathway results data.

1) My pathway results after running gage contain 114 pathways (80 of them are NA); however, when I look online at the KEGG website, I find 120 pathways for my organism. Isn't gage able to use the latest version of online data?

2) My greater and less data contains the same list of pathways .. just in inverse order (but completely different p/q values). However, this is a bit confusing: e.g. my differential expressed gene list does not contain any down-regulated genes for one pathway but only contains genes that are up-regulated in the same pathway. Why do these pathways even appear in the less dataset when there is no single gene down-regulated? Since the p/q values are too high, I will discard the pathway anyways - was just wondering why this is listed. And what's the purpose of listing them if all q-values are similar: e.g. 0.9929488953. In this context, what's a general excepted q-value cut-off when it comes to publishing data? Is 0.2 or even 0.5 still accepted?

3) Are pathways with a set.size lower than 10 not considered in the stats?

Any feedback is appreciated! Thanks

bigmw 05-16-2017 06:48 PM

gage does use the latest data from KEGG if you generate gene set data using kegg.gsets() function. However, the reference pathway list could be out of date. The reason you see NA for 80 pathways is that those pathways are out of the set.size range. In other words, they may have too few genes mapped, hence no gene set test were done on them.
The greater and less elements in the result list correspond to up and down regulation test on all pathways (gene sets), not just the significant ones. You want to go though the gage function documentation for details, something like:

