I am trying to use GAGE to do some pathway analysis based on RNA-Seq data. The vignette is nicely written, but there is an error that I do not know if it is my fault in my script.
I will show you the error first:
Apparently, these are the list of significant pathway identified. I couldnt figure why the gene sets are missing in each pathway, unlike what is shown in the vignette. It is required for next step of the analysis.
(link: http://bioconductor.org/packages/2.1...html/gage.html)
I am thinking that because I might have probably converted the Entrez ID of Kegg.gs to Symbol instead. But, I am clueless on how it caused this error.
My script:
My problem is that I can't proceed to the next step of analysis unless I ensure that the pathways contain the genes.
Many thanks.
I will show you the error first:
Code:
> head(expression.kegg.esg.up$coreGeneSets,4) $`hsa00190 Oxidative phosphorylation` character(0) $`hsa03010 Ribosome` character(0) $`hsa04141 Protein processing in endoplasmic reticulum` character(0) $`hsa04120 Ubiquitin mediated proteolysis` character(0) > head(expression.kegg.esg.dn$coreGeneSets,4) $`hsa04610 Complement and coagulation cascades` character(0) $`hsa04640 Hematopoietic cell lineage` character(0) $`hsa00140 Steroid hormone biosynthesis` character(0)
(link: http://bioconductor.org/packages/2.1...html/gage.html)
I am thinking that because I might have probably converted the Entrez ID of Kegg.gs to Symbol instead. But, I am clueless on how it caused this error.
My script:
Code:
#read data data = read.table("data.txt") row = read.table("rownames3.txt") col = read.table("colnames3.txt") length(col$V1) expression = matrix(scan("data3.txt"),byrow=T,ncol=length(col$V1),dimnames=list(row$V1,col$V1)) str(expression) head(expression) #reading Gene set data input filename=system.file("extdata/c2.demo.gmt",package="gage") demo.gs=readList(filename) cn=colnames(expression) hn=grep('A710003',cn,ignore.case=T) dn=grep('A710004',cn,ignore.case=T) data(kegg.gs) data(egSymb) #convert it back to entrez id head(rownames(expression)) kegg.gs.sym<-lapply(kegg.gs,eg2sym) lapply(kegg.gs.sym[1:3],head) #preliminra expression.kegg.p <- gage(expression,gsets=kegg.gs.sym, ref=hn, samp=dn, saaTest=gs.KSTest) write.table(rbind(expression.kegg.p$greater,expression.kegg.p$less), file="kegg_kstest.txt",sep="\t") expression.kegg.2d.p <- gage(expression,gsets=kegg.gs.sym, ref=hn, samp=dn, saaTest=gs.KSTest, same.dir=F) write.table(rbind(expression.kegg.2d.p$greater,expression.kegg.2d.p$less), file="kegg_kstest_2d.txt",sep="\t") #find sig pathway expression.kegg.sig = sigGeneSet(expression.kegg.p, outname="expression.kegg") expression.kegg.2d.sig = sigGeneSet(expression.kegg.2d.p, outname="expression.kegg.2d") write.table(expression.kegg.2d.sig$greater, file="kegg_kstest_2d_sig.txt",sep="\t") #sort and count significant gene sets based on q-value or p-va expression.kegg.esg.up <- esset.grp(expression.kegg.p$greater,expression,gsets=kegg.gs.sym, ref=hn,samp=dn,test4up=T,output=T,outname="expression.kegg.up",make.plot=F) expression.kegg.esg.dn <- esset.grp(expression.kegg.p$less,expression,gsets=kegg.gs.sym, ref=hn,samp=dn,test4up=T,output=T,outname="expression.kegg.dn",make.plot=F) head(expression.kegg.esg.up$setGroups,4) head(expression.kegg.esg.up$essentialSets,4) head(expression.kegg.esg.up$coreGeneSets,4) head(expression.kegg.esg.dn$coreGeneSets,4)
Many thanks.
Comment