SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Generally Applicable Gene-set Analysis (GAGE) problem wilson90 Bioinformatics 1 08-19-2013 09:28 AM
Gene set enrichment analysis of RNA-Seq data jel4h Bioinformatics 1 06-21-2012 04:25 AM

Reply
 
Thread Tools
Old 07-17-2014, 10:29 AM   #81
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

This is normal. Different analysis procedures give you different p-values, although you can still see lot of consistency between the them (the top hits or order of significant pathways). We expect native gage/pathview workflow to be more sensitive than the joint workflows due to the design of GAGE analysis procedure. Importantly, GAGE takes the sample size into account by default, but average fold change scores output from other tools donít have that info.
For details, there was an earlier thread talking on the same question:
http://seqanswers.com/forums/showthread.php?t=34655#6


Quote:
Originally Posted by tigerxu View Post
I have followed the default workflows of gage and pathview on the example RNA-seq dataset. I also used the fold changes inferred by deseq2, then followed by the gage and pathview. I found both pipelines will output different results. The pipeline based on the fold changes by deseq2 generate much fewer significant pathways. For example below

> gage.kegg.sig<-sigGeneSet(gage.kegg.p, outname="sig.kegg",pdf.size=c(7,8))
[1] "there are 22 signficantly up-regulated gene sets"
[1] "there are 17 signficantly down-regulated gene sets"

> deseq2.kegg.sig<-sigGeneSet(deseq2.kegg.p, outname="deseq2.sig.kegg",pdf.size=c(7,8))
[1] "gs.data needs to be a matrix-like object!"
[1] "No heatmap produced for down-regulated gene sets, only 1 or none signficant."
[1] "gs.data needs to be a matrix-like object!"
[1] "there are 7 signficantly up-regulated gene sets"
[1] "there are 0 signficantly down-regulated gene sets"

I'm wondering which pipeline is more reliable for biological interpretation. Why the pipeline based on deseq2 return much fewer pathways? Can anyone give me some advice?

Thanks!
bigmw is offline   Reply With Quote
Old 07-17-2014, 11:13 AM   #82
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Quote:
Originally Posted by bigmw View Post
GAGE and other methods (GSEA etc) require all genes included. This way GAGE test gene perturbations within pathways against the background of all genes. You selected a list of differentially expressed gene first, it is expected that you donít any pathways standing out in that perforeground, right? Including all genes instead of a selected list of genes give you a major advantages: you included all your data in the analysis, which is usually more powerful. In addition, you donít need some more or less arbitrary q-/p-value cutoff.

Otherwise, you code seem to work fine. You may want to check the DESeq section and the native workflow on the demo code:
http://www.bioconductor.org/packages...eqWorkflow.pdf
I did try using all the genes, but still no pathways are selected....Maybe that's just the nature of my data?
crazyhottommy is offline   Reply With Quote
Old 07-17-2014, 12:49 PM   #83
tigerxu
Member
 
Location: Wuhan, China

Join Date: Oct 2013
Posts: 20
Default

I very appreciate your information! Very helpful to further my understanding on the statistical background of GAGE.

Quote:
Originally Posted by bigmw View Post
This is normal. Different analysis procedures give you different p-values, although you can still see lot of consistency between the them (the top hits or order of significant pathways). We expect native gage/pathview workflow to be more sensitive than the joint workflows due to the design of GAGE analysis procedure. Importantly, GAGE takes the sample size into account by default, but average fold change scores output from other tools don’t have that info.
For details, there was an earlier thread talking on the same question:
http://seqanswers.com/forums/showthread.php?t=34655#6
tigerxu is offline   Reply With Quote
Old 07-18-2014, 06:50 AM   #84
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Obviously, no pathways has q-val <0.1. this is still common due to the sample size, noise level etc in your data. You may do either one or both of the following:
-increase the q-val cutoff, something like:
q.cut=0.2
sel<- fc.kegg.p$greater[,"q.val"] < q.cut & !is.na(fc.kegg.p$greater[,"q.val"])
sel.l<- fc.kegg.p$less[,"q.val"] < q.cut & !is.na(fc.kegg.p$greater[,"q.val"])

-use the native GAGE workflow instead of the joint workflow. The former has higher testing power as it take sample size into account. For details check my answer at #81 of this thread above.

I assume your data and analysis was done correctly above. But first of all, make sure you did not mix up your control and experiment samples (or their labels), and also your data quality has no major problem.



Quote:
Originally Posted by crazyhottommy View Post
I did try using all the genes, but still no pathways are selected....Maybe that's just the nature of my data?
bigmw is offline   Reply With Quote
Old 09-12-2014, 01:20 AM   #85
nmerienn
Member
 
Location: Switzerland

Join Date: Sep 2014
Posts: 12
Default Gage analysis for RNAseq

Dear all,

We are two biologists (so not bioinformaticians...) working with RNAseq data and having little "troubles" with pathways analysis. We performed RNA sequencing on 4 distinct cell populations to compare their transcriptional profile (platform Illumina HiSeq 2000). Row reads were mapped using TopHat and differential analysis was performed with edgeR+voom+limma packages. Our final output is a table (.txt file) for each contrast containing our 16058 expressed genes with respective log fold change, expression values (normalized) and adjusted p-values. We wish to perform pathway enrichment analysis to determine which pathways are enriched/depleted in our respective cell populations to examine for distinct functions and the gage pathway seems to be very complete for both GO and KEGG. However, we are not sure to use the good data for the analysis. Do we have to make the analysis separately for all the cell populations (loading into R only the log fold changes for all the genes for the contrast cell population A vs all the other cell populations) or do we have to load a table containing 4 columns (our 4 cell populations) with normalized log2 transformed expression values? It is not very clear for us... In addition, as we don't have a treated group vs another non-treated, we don't have a "biological reference" for the analysis, does it make sense to perform all the analysis with ref = NULL and samp = NULL?
We apologize for the very naive question.

Thank you for your help.
nmerienn is offline   Reply With Quote
Old 09-13-2014, 11:00 AM   #86
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

GAGE accepts log fold changes or other test statistics from differential expression analysis. Yes, you need to set ref = NULL and samp = NULL in gage function when differential expression analysis is already done. Details are described in section 5-6 of the workflow toturial:
http://www.bioconductor.org/packages...eqWorkflow.pdf

Speaking of log fold changes, you must have some reference state. In your case, it can be one cell population or some virtual state of their combination. Please make sure you understand your upstream differential expression analysis.
bigmw is offline   Reply With Quote
Old 08-27-2015, 04:24 AM   #87
BobbyT
Junior Member
 
Location: Melbourne

Join Date: Aug 2015
Posts: 2
Default Missing samples on gage() > sigGeneSet() heatmap

Hello everyone. It's seems I'm a little late to this party. So fingers crossed this thread is still active.

So my problem is that only 3 of my 6 samples are appearing in heatmaps generated with sigGeneSet. The 'samp' samples are appearing but not 'ref' samples.

This only occurs when visualizing significant gene sets (gage() > sigGeneSet()). When running gage() > essGene() > geneData() to visualize genes in specific gene sets all samples are present (Figure 2, main vignette).

I have looked over the vignettes and sigGeneSet seem to always be pictured with 'samp' samples only (Figure 1, main vignette).

Is there a way to make sigGeneSet/gage generate heatmaps with all samples such as in Figure 2/3 of the main vignette but displaying gene set names as in Figure 1 of the main vignette?

Thanks,
BobbyT
BobbyT is offline   Reply With Quote
Old 08-28-2015, 06:06 AM   #88
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Bobby,
Heatmaps from sigGeneSet function show the perturbations of whole gene sets or pathways (in experiment samples vs controls). It is only meaningful for exmperiment samples, you donít really expect controls receive such perturbation score.
The gene set level heatmap is actually a unique feature of GAGE. Because it conducts a pair-wise comparison between experiment samples and controls. Therefore, we can have a gene set level test statistics (or p-values) for each experiment sample. The global statistics and p-values are then summarized from these individual tests. These stats/p-values are all included in the gage function output. You may check the paper for details of the statistical tests.
bigmw is offline   Reply With Quote
Old 08-28-2015, 09:25 PM   #89
BobbyT
Junior Member
 
Location: Melbourne

Join Date: Aug 2015
Posts: 2
Default

Hi bigmw,

Thanks for the reply. I think I understand where you're coming from. However if I generate a sample wide heatmap with heatmap() or the like you can see that there are differences between the 'control' samples. I should probably say that I have 2 groupss with 3 replicates in each group and it goes witbout saying that biological samples, even from the same 'group', are never identical. I would have thought that this variation would be able to be visualised for all samples, even at the gene set/pathway level.
i can understand why this hasn't been included in the package and I can talk my way around it but it woumd have been nice to see what baseline is doi g at the pathway level too
BobbyT is offline   Reply With Quote
Old 09-14-2015, 07:08 AM   #90
Tom2013
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 13
Default

Hi bigmw,

I am working on non-model species. I have done differentially expressed gene analysis (gene level) using DESeq2 and used blastx to blast all expressed genes to the nr database. I am wondering whether I can use GAGE for downstream GO enrichment analysys and pathway analysis.

Thanks,
Tom
Tom2013 is offline   Reply With Quote
Old 09-24-2015, 05:21 PM   #91
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

theoretically, you can. You need to derive GO annotation for the genes of this species using tools like blast2go. Then generate GO gene set data like the ones used in gage tutorial. You need to use R function split() to get the gene set list.
bigmw is offline   Reply With Quote
Old 11-17-2015, 08:20 AM   #92
user 31888
Member
 
Location: earth

Join Date: Nov 2015
Posts: 19
Default

Any chance someone already encountered this kind of summarizeOverlaps error?
http://seqanswers.com/forums/showthr...806#post184806
user 31888 is offline   Reply With Quote
Old 11-17-2015, 09:48 AM   #93
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Please don't post the same issue in multiple threads. It's good that you created a new thread for this problem, but you need to give people some credit for being able to find existing threads without further advertising.
gringer is offline   Reply With Quote
Reply

Tags
gene set, pathway analysis, r/bioconductor, rna-seq, visualization

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:39 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO