SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Generally Applicable Gene-set Analysis (GAGE) problem wilson90 Bioinformatics 1 08-19-2013 09:28 AM
Gene set enrichment analysis of RNA-Seq data jel4h Bioinformatics 1 06-21-2012 04:25 AM

Reply
 
Thread Tools
Old 03-30-2014, 08:00 AM   #61
entrez
Junior Member
 
Location: NY

Join Date: Nov 2010
Posts: 7
Default

Thanks for the input. I agree on your feeling on these two packages. Do you see differences in the results using GAGE and GOseq?
entrez is offline   Reply With Quote
Old 03-30-2014, 12:16 PM   #62
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

I have not done a direct comparison yet, but I will in the future.
sindrle is offline   Reply With Quote
Old 04-07-2014, 12:10 PM   #63
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Thanks for the good words on GAGE. Let us know if you have more comments/questions.

Quote:
Originally Posted by sindrle View Post
I have done a quick test with GOseq, but I must admit I like GAGE better after first glance. Easy to follow, nice manual, nice plots, lots of results and possibilities. It really facilitates further analysis I think.

But Im going to give GOseq another go for sure!
bigmw is offline   Reply With Quote
Old 07-07-2014, 03:13 AM   #64
tigerxu
Member
 
Location: Wuhan, China

Join Date: Oct 2013
Posts: 20
Default

Quote:
Originally Posted by bigmw View Post
Forgot that sigGeneSet function has been updated to give users more control on the margin and font size. sigGeneSet calls a internal function heatmap2 to generate the heatmaps. So check the argument for this function
args(gage:::heatmap2)
The argument two relevant arguments here are margins and cexRow, which control the margins for column/row names and row name font size, you may do something like:
kegg.sig<-sigGeneSet(cnts.kegg.p,outname="~/RNAseq/13_Acute-Changes/14_GAGE_native_A1A2/A1A2All/A1A2All.kegg",pdf.size = c(7,12), margins = c(5,10))
I have a question about the margin argument in the sigGeneSet function when I run the following command
> rcount.kegg.sig<-sigGeneSet(rcount.kegg.p, outname="sig.kegg",pdf.size=c(7,12),margins=c(5, 10))
Error in heatmap2(gs.data, Colv = F, Rowv = F, dendrogram = "none", col = cols, :
formal argument "margins" matched by multiple actual arguments

Can anyone help me?

Thanks!
tigerxu is offline   Reply With Quote
Old 07-07-2014, 08:54 AM   #65
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

You may want to check the version of the gage package you are running, which can be seen by:
sessionInfo()
bigmw is offline   Reply With Quote
Old 07-09-2014, 03:06 AM   #66
tigerxu
Member
 
Location: Wuhan, China

Join Date: Oct 2013
Posts: 20
Default

Quote:
Originally Posted by bigmw View Post
You may want to check the version of the gage package you are running, which can be seen by:
sessionInfo()
other attached packages:
[1] gage_2.14.2 GenomicAlignments_1.0.2
[3] BSgenome_1.32.0 Rsamtools_1.16.1
[5] Biostrings_2.32.0 XVector_0.4.0
[7] DESeq2_1.4.5 RcppArmadillo_0.4.300.8.0
[9] Rcpp_0.11.2 GenomicRanges_1.16.3
[11] GenomeInfoDb_1.0.2 IRanges_1.22.9
[13] BiocGenerics_0.10.0

Is the version of gage not proper?
tigerxu is offline   Reply With Quote
Old 07-09-2014, 09:45 AM   #67
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

This is the latest version. Do you still get the problem?
bigmw is offline   Reply With Quote
Old 07-09-2014, 10:39 AM   #68
tigerxu
Member
 
Location: Wuhan, China

Join Date: Oct 2013
Posts: 20
Default

Quote:
Originally Posted by bigmw View Post
This is the latest version. Do you still get the problem?
The problem is still there. But I have modified the margins parameters in the internal function sigGeneSet within the gage package. It can work!
tigerxu is offline   Reply With Quote
Old 07-10-2014, 06:05 AM   #69
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Just checked the source code for sigGeneSet and internal functions gs.heatmap. there seems to be a potential conflict in argument margins indeed. Will have the problem fixed. you can check the updated version 2.14.3 in the next couple of days here:
http://www.bioconductor.org/packages...html/gage.html
bigmw is offline   Reply With Quote
Old 07-10-2014, 06:26 AM   #70
tigerxu
Member
 
Location: Wuhan, China

Join Date: Oct 2013
Posts: 20
Default

Quote:
Originally Posted by bigmw View Post
Just checked the source code for sigGeneSet and internal functions gs.heatmap. there seems to be a potential conflict in argument margins indeed. Will have the problem fixed. you can check the updated version 2.14.3 in the next couple of days here:
http://www.bioconductor.org/packages...html/gage.html
Okay, thank! I will try version 2.14.3 later.
tigerxu is offline   Reply With Quote
Old 07-11-2014, 12:26 PM   #71
tigerxu
Member
 
Location: Wuhan, China

Join Date: Oct 2013
Posts: 20
Default

I have followed the default workflows of gage and pathview on the example RNA-seq dataset. I also used the fold changes inferred by deseq2, then followed by the gage and pathview. I found both pipelines will output different results. The pipeline based on the fold changes by deseq2 generate much fewer significant pathways. For example below

> gage.kegg.sig<-sigGeneSet(gage.kegg.p, outname="sig.kegg",pdf.size=c(7,8))
[1] "there are 22 signficantly up-regulated gene sets"
[1] "there are 17 signficantly down-regulated gene sets"

> deseq2.kegg.sig<-sigGeneSet(deseq2.kegg.p, outname="deseq2.sig.kegg",pdf.size=c(7,8))
[1] "gs.data needs to be a matrix-like object!"
[1] "No heatmap produced for down-regulated gene sets, only 1 or none signficant."
[1] "gs.data needs to be a matrix-like object!"
[1] "there are 7 signficantly up-regulated gene sets"
[1] "there are 0 signficantly down-regulated gene sets"

I'm wondering which pipeline is more reliable for biological interpretation. Why the pipeline based on deseq2 return much fewer pathways? Can anyone give me some advice?

Thanks!

Last edited by tigerxu; 07-11-2014 at 12:29 PM.
tigerxu is offline   Reply With Quote
Old 07-14-2014, 12:56 PM   #72
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Hi there, thank you for making this awesome tool.

I am working with mouse data, I want to know how to convert the gene set into gene symbol format.

kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]
lapply(kegg.gs[1:3],head)


the eg2sym function is only for human data. I can not do things below:

data(egSymb)
kegg.gs.sym<- lapply(kegg.gs, eg2sym)

Thank you!
Tommy
crazyhottommy is offline   Reply With Quote
Old 07-15-2014, 12:49 PM   #73
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

The pathview package provides two functions: eg2id and id2eg, for ID mapping/conversion for major research species. For details:
?pathview::eg2id

BTW, I would suggest you to convert your data ID from symbol to Entrez Gene, rather than your gene set ID from Entrez to symbol. The former should be much faster as it only need to call the conversion function once.
bigmw is offline   Reply With Quote
Old 07-15-2014, 05:09 PM   #74
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

BTW, has a separate tutorial on data preparation, you can check Section 5 -- gene or transcript ID conversion:
http://www.bioconductor.org/packages...c/dataPrep.pdf
bigmw is offline   Reply With Quote
Old 07-15-2014, 06:46 PM   #75
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Quote:
Originally Posted by bigmw View Post
BTW, has a separate tutorial on data preparation, you can check Section 5 -- gene or transcript ID conversion:
http://www.bioconductor.org/packages...c/dataPrep.pdf
Thank you, I followed it, after DESeq. 1724 differentially expressed genes were used for pathway analysis.

res <- nbinomTest( cds, 'control, 'treat' )

resSig <- res[ res$padj < 0.01 & (res$log2FoldChange >1| res$log2FoldChange < -1), ]

resSig <- na.omit(resSig)

require(gage)
datakegg.gs)
deseq.fc<- resSig$log2FoldChange
names(deseq.fc)<- resSig$id
sum(is.infinite(deseq.fc)) # there are some infinite numbers, if use DESeq2, no such problem.
deseq.fc[deseq.fc>10]=10
deseq.fc[deseq.fc<-10]=-10
exp.fc<- deseq.fc

#kegg.gsets works with 3000 KEGG speicies
data(korg)
head(korg[,1:3], n=20)


#let's get the annotation files for mouse and convert the gene set to gene symbol format
kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]
lapplykegg.gs[1:3],head)



# to convert IDs among gene/transcript ID to Entrez GeneID or reverse, use eg2id and id2eg in the pathview package
library(pathview)
data(bods)
bods

gene.symbol.eg<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')
# convert the gene symbol to Entrez Gene ID
head(gene.symbol.eg, n=100)
head(gene.symbol.eg[,2], n=10)

names(exp.fc)<- gene.symbol.eg[,2]

fc.kegg.p<- gage(exp.fc, gsets= kegg.gs, ref=NULL, samp=NULL)
sel<- fc.kegg.p$greater[,"q.val"] < 0.1 & !is.na(fc.kegg.p$greater[,"q.val"])
table(sel)

sel.l<- fc.kegg.p$less[,"q.val"] < 0.1 & !is.na(fc.kegg.p$greater[,"q.val"])
table(sel.l)



> table(sel.l)
sel.l
FALSE
202

> table(sel)
sel
FALSE
202

Am I doing it right?
crazyhottommy is offline   Reply With Quote
Old 07-15-2014, 06:48 PM   #76
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Quote:
Originally Posted by bigmw View Post
The pathview package provides two functions: eg2id and id2eg, for ID mapping/conversion for major research species. For details:
?pathview::eg2id

BTW, I would suggest you to convert your data ID from symbol to Entrez Gene, rather than your gene set ID from Entrez to symbol. The former should be much faster as it only need to call the conversion function once.
Also, if I do want to convert gene set ID from Entrez to symbol.
How can I do it?

Thank you.
crazyhottommy is offline   Reply With Quote
Old 07-16-2014, 09:24 AM   #77
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

GAGE and other methods (GSEA etc) require all genes included. This way GAGE test gene perturbations within pathways against the background of all genes. You selected a list of differentially expressed gene first, it is expected that you donít any pathways standing out in that perforeground, right? Including all genes instead of a selected list of genes give you a major advantages: you included all your data in the analysis, which is usually more powerful. In addition, you donít need some more or less arbitrary q-/p-value cutoff.

Otherwise, you code seem to work fine. You may want to check the DESeq section and the native workflow on the demo code:
http://www.bioconductor.org/packages...eqWorkflow.pdf


Quote:
Originally Posted by crazyhottommy View Post
Thank you, I followed it, after DESeq. 1724 differentially expressed genes were used for pathway analysis.

res <- nbinomTest( cds, 'control, 'treat' )

resSig <- res[ res$padj < 0.01 & (res$log2FoldChange >1| res$log2FoldChange < -1), ]

resSig <- na.omit(resSig)

require(gage)
...

Am I doing it right?
bigmw is offline   Reply With Quote
Old 07-16-2014, 09:26 AM   #78
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

eg2id and id2eg are the pair of functions for ID mapping from and to Entrez Genes. For info:
?eg2id


Quote:
Originally Posted by crazyhottommy View Post
Also, if I do want to convert gene set ID from Entrez to symbol.
How can I do it?

Thank you.
bigmw is offline   Reply With Quote
Old 07-16-2014, 09:27 AM   #79
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Quote:
Originally Posted by bigmw View Post
GAGE and other methods (GSEA etc) require all genes included. This way GAGE test gene perturbations within pathways against the background of all genes. You selected a list of differentially expressed gene first, it is expected that you donít any pathways standing out in that perforeground, right? Including all genes instead of a selected list of genes give you a major advantages: you included all your data in the analysis, which is usually more powerful. In addition, you donít need some more or less arbitrary q-/p-value cutoff.

Otherwise, you code seem to work fine. You may want to check the DESeq section and the native workflow on the demo code:
http://www.bioconductor.org/packages...eqWorkflow.pdf
Thank you very much for your reply. I will feed all the genes to GAGE to see, but one question is that the fold change for each gene has different p values (and adjust p values), so GAGE only takes the fold change into account but not the p values, right?
crazyhottommy is offline   Reply With Quote
Old 07-16-2014, 01:35 PM   #80
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

You are right. In the demo examples GAGE takes fold changes, not the p-values as input.
GAGE works with fold change (default), t-stats and other types of statistics, please check the GAGE paper and the RNA-seq workflow tutorial Section 5 Per gene score choices.
bigmw is offline   Reply With Quote
Reply

Tags
gene set, pathway analysis, r/bioconductor, rna-seq, visualization

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO