View Single Post
Old 08-14-2013, 07:59 PM   #1
wilson90
Member
 
Location: Singapore

Join Date: May 2012
Posts: 48
Default Generally Applicable Gene-set Analysis (GAGE) problem

I am trying to use GAGE to do some pathway analysis based on RNA-Seq data. The vignette is nicely written, but there is an error that I do not know if it is my fault in my script.

I will show you the error first:
Code:
> head(expression.kegg.esg.up$coreGeneSets,4)
$`hsa00190 Oxidative phosphorylation`
character(0)

$`hsa03010 Ribosome`
character(0)

$`hsa04141 Protein processing in endoplasmic reticulum`
character(0)

$`hsa04120 Ubiquitin mediated proteolysis`
character(0)

> head(expression.kegg.esg.dn$coreGeneSets,4)
$`hsa04610 Complement and coagulation cascades`
character(0)

$`hsa04640 Hematopoietic cell lineage`
character(0)

$`hsa00140 Steroid hormone biosynthesis`
character(0)
Apparently, these are the list of significant pathway identified. I couldnt figure why the gene sets are missing in each pathway, unlike what is shown in the vignette. It is required for next step of the analysis.
(link: http://bioconductor.org/packages/2.1...html/gage.html)

I am thinking that because I might have probably converted the Entrez ID of Kegg.gs to Symbol instead. But, I am clueless on how it caused this error.

My script:
Code:
#read data
data = read.table("data.txt")
row = read.table("rownames3.txt")
col = read.table("colnames3.txt")
length(col$V1)
expression = matrix(scan("data3.txt"),byrow=T,ncol=length(col$V1),dimnames=list(row$V1,col$V1))
str(expression)
head(expression)

#reading Gene set data input
filename=system.file("extdata/c2.demo.gmt",package="gage")
demo.gs=readList(filename)
cn=colnames(expression)
hn=grep('A710003',cn,ignore.case=T)
dn=grep('A710004',cn,ignore.case=T)
data(kegg.gs)
data(egSymb)

#convert it back to entrez id
head(rownames(expression))
kegg.gs.sym<-lapply(kegg.gs,eg2sym)
lapply(kegg.gs.sym[1:3],head)

#preliminra
expression.kegg.p <- gage(expression,gsets=kegg.gs.sym, ref=hn, samp=dn, saaTest=gs.KSTest)
write.table(rbind(expression.kegg.p$greater,expression.kegg.p$less), file="kegg_kstest.txt",sep="\t")
expression.kegg.2d.p <- gage(expression,gsets=kegg.gs.sym, ref=hn, samp=dn, saaTest=gs.KSTest, same.dir=F)
write.table(rbind(expression.kegg.2d.p$greater,expression.kegg.2d.p$less), file="kegg_kstest_2d.txt",sep="\t")

#find sig pathway
expression.kegg.sig = sigGeneSet(expression.kegg.p, outname="expression.kegg")
expression.kegg.2d.sig = sigGeneSet(expression.kegg.2d.p, outname="expression.kegg.2d")
write.table(expression.kegg.2d.sig$greater, file="kegg_kstest_2d_sig.txt",sep="\t")

#sort and count significant gene sets based on q-value or p-va
expression.kegg.esg.up <- esset.grp(expression.kegg.p$greater,expression,gsets=kegg.gs.sym, ref=hn,samp=dn,test4up=T,output=T,outname="expression.kegg.up",make.plot=F)
expression.kegg.esg.dn <- esset.grp(expression.kegg.p$less,expression,gsets=kegg.gs.sym, ref=hn,samp=dn,test4up=T,output=T,outname="expression.kegg.dn",make.plot=F)
head(expression.kegg.esg.up$setGroups,4)
head(expression.kegg.esg.up$essentialSets,4)
head(expression.kegg.esg.up$coreGeneSets,4)
head(expression.kegg.esg.dn$coreGeneSets,4)
My problem is that I can't proceed to the next step of analysis unless I ensure that the pathways contain the genes.
Many thanks.
wilson90 is offline   Reply With Quote