SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Generally Applicable Gene-set Analysis (GAGE) problem wilson90 Bioinformatics 1 08-19-2013 09:28 AM
Gene set enrichment analysis of RNA-Seq data jel4h Bioinformatics 1 06-21-2012 04:25 AM

Reply
 
Thread Tools
Old 03-19-2014, 05:07 PM   #41
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Again, that’s because in this pathway, node labelled “Gene A” includes gene(s) other than “Gene A” but with similar function. It is a summary of fold changes for all genes mapped to this node, should you be surprised that it is different from fold change of “Gene A” alone?
You may use node.sum argument to control the the node summary is calculated, mean, median, max etc.
bigmw is offline   Reply With Quote
Old 03-20-2014, 02:11 AM   #42
shriram
Member
 
Location: UK

Join Date: May 2010
Posts: 13
Default

Thanks for the quick reply.
In above example GeneA is the only gene [shown green in original kegg] on that node for that species as other genes on the node are not present in the given species.
Thanks,
Shriram
shriram is offline   Reply With Quote
Old 03-20-2014, 04:59 AM   #43
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Im having problems with Pathview. I can only get native KEGG, the kegg.native=F does not work.

Also the native KEGG only has green color, not red (up regulated) and green (down regulated).

Why am I having these two problems?

Native KEGG
# pv.out.list <- sapply(path.ids2, function(pid) pathview(gene.data = d[,
# 1:2], pathway.id = pid, species = "hsa", kegg.dir = "~/RNAseq/13_Acute-Changes/13_GAGE_native_A1A2/A1A2pT2D/Pathview"))

Graphviz view
# pv.out.list <- sapply(path.ids2, function(pid) pathview(gene.data = d[,
# 1:2], pathway.id = pid, species = "hsa", kegg.native=F,
# sign.pos="bottomleft", kegg.dir = "~/RNAseq/13_Acute-Changes/13_GAGE_native_A1A2/A1A2pT2D/Pathview"))


# > sessionInfo()
# R version 3.0.3 (2013-09-25)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# attached base packages:
# [1] parallel stats graphics grDevices utils datasets methods
# [8] base

# other attached packages:
# [1] Rsamtools_1.14.3
# [2] Biostrings_2.30.1
# [3] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
# [4] GenomicFeatures_1.14.5
# [5] AnnotationDbi_1.24.0
# [6] Biobase_2.22.0
# [7] GenomicRanges_1.14.4
# [8] XVector_0.2.0
# [9] IRanges_1.20.7
# [10] BiocGenerics_0.8.0
# [11] BiocInstaller_1.12.0

# loaded via a namespace (and not attached):
# [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0
# [4] DBI_0.2-7 RCurl_1.95-4.1 RSQLite_0.11.4
# [7] rtracklayer_1.22.5 stats4_3.0.2 tools_3.0.2
# [10] XML_3.95-0.2 zlibbioc_1.8.0
sindrle is offline   Reply With Quote
Old 03-20-2014, 05:52 PM   #44
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

May I know what node, gene, pathway and what species you are talking about?

Quote:
Originally Posted by shriram View Post
Thanks for the quick reply.
In above example GeneA is the only gene [shown green in original kegg] on that node for that species as other genes on the node are not present in the given species.
Thanks,
Shriram
bigmw is offline   Reply With Quote
Old 03-20-2014, 06:23 PM   #45
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

You donít even have pathview package loaded based on your sessionInfo().


Quote:
Originally Posted by sindrle View Post
Im having problems with Pathview. I can only get native KEGG, the kegg.native=F does not work.

Also the native KEGG only has green color, not red (up regulated) and green (down regulated).

Why am I having these two problems?

# > sessionInfo()
# R version 3.0.3 (2013-09-25)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# attached base packages:
# [1] parallel stats graphics grDevices utils datasets methods
# [8] base

# other attached packages:
# [1] Rsamtools_1.14.3
# [2] Biostrings_2.30.1
# [3] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
# [4] GenomicFeatures_1.14.5
# [5] AnnotationDbi_1.24.0
# [6] Biobase_2.22.0
# [7] GenomicRanges_1.14.4
# [8] XVector_0.2.0
# [9] IRanges_1.20.7
# [10] BiocGenerics_0.8.0
# [11] BiocInstaller_1.12.0

# loaded via a namespace (and not attached):
# [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0
# [4] DBI_0.2-7 RCurl_1.95-4.1 RSQLite_0.11.4
# [7] rtracklayer_1.22.5 stats4_3.0.2 tools_3.0.2
# [10] XML_3.95-0.2 zlibbioc_1.8.0
bigmw is offline   Reply With Quote
Old 03-21-2014, 02:50 AM   #46
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Pasted wrong sessioninfo..

# > sessionInfo()
# R version 3.0.3 (2014-03-06)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# attached base packages:
# [1] grid parallel stats graphics grDevices utils datasets
# [8] methods base

# other attached packages:
# [1] Rgraphviz_2.6.0
# [2] gageData_2.0.3
# [3] pathview_1.2.4
# [4] org.Hs.eg.db_2.10.1
# [5] RSQLite_0.11.4
# [6] DBI_0.2-7
# [7] KEGGgraph_1.20.0
# [8] graph_1.40.1
# [9] XML_3.95-0.2
# [10] gage_2.12.3
# [11] Rsamtools_1.14.3
# [12] Biostrings_2.30.1
# [13] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
# [14] GenomicFeatures_1.14.5
# [15] AnnotationDbi_1.24.0
# [16] Biobase_2.22.0
# [17] GenomicRanges_1.14.4
# [18] XVector_0.2.0
# [19] IRanges_1.20.7
# [20] BiocGenerics_0.8.0

# loaded via a namespace (and not attached):
# [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0
# [4] digest_0.6.4 httr_0.2 KEGGREST_1.2.0
# [7] png_0.1-7 RCurl_1.95-4.1 rtracklayer_1.22.6
# [10] stats4_3.0.3 stringr_0.6.2 tools_3.0.3
# [13] zlibbioc_1.8.0


Pathview works, but I dont get colors or up/down regulated genes...
sindrle is offline   Reply With Quote
Old 03-21-2014, 03:37 AM   #47
shriram
Member
 
Location: UK

Join Date: May 2010
Posts: 13
Default

pv.out.list <- sapply(enriched_pathways, function(pid) pathview(gene.data = gene_fc, pathway.id = pid, species = "sce", gene.idtype="KEGG", same.layer = F, kegg.native = T, node.sum="median"))

Data in pathview:
pv.out.list[1]
GLK1 -0.35 0.000 0.620 -1.118 -0.900

# original data supplied for pathview
gene_fc[1,]
GLK 0.14 -1.6 0.62 -1.1 -0.37

I have attached the resultant pathway image.
Original image sce00051.png shows genes specific[in green] to yeast.

I am wondering why pathview data differ for GLK when GLK is the only gene on that node.

Thanks,
Shriram
Attached Images
File Type: png sce00051.Fructose_and_mannose_metabolism_.multi.png (28.3 KB, 9 views)
File Type: png sce00051.png (33.4 KB, 2 views)
shriram is offline   Reply With Quote
Old 03-21-2014, 07:20 AM   #48
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Another question... How to alter the heatmaps text-part size (KEGG pathway names)?
The names are all capped, its unreadable.. The "pdf.size" - option only regualtes the graphic part.

###################################################
### significant.genesets
###################################################
kegg.sig<-sigGeneSet(cnts.kegg.p, outname="~/RNAseq/13_Acute-Changes/14_GAGE_native_A1A2/A1A2All/A1A2All.kegg",pdf.size = c(7,12))
sindrle is offline   Reply With Quote
Old 03-21-2014, 10:13 AM   #49
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Shriram,
Here is the original graph from official KEGG website:
http://www.genome.jp/kegg-bin/show_p...e00051+YCL040W
With the GLK1 gene node marked by red box. Hover your mouse over there, do you see 3 budding yeast genes in this nodes: YCL040W (GLK1), YFR053C (HXK1), YGL253W (HXK2)? You may click on this node, and check the details of these genes. Again, the nodes in pathview graphs are labelled with the most representative gene name rather than all mapped gene names for clarity.


Quote:
Originally Posted by shriram View Post
pv.out.list <- sapply(enriched_pathways, function(pid) pathview(gene.data = gene_fc, pathway.id = pid, species = "sce", gene.idtype="KEGG", same.layer = F, kegg.native = T, node.sum="median"))

Data in pathview:
pv.out.list[1]
GLK1 -0.35 0.000 0.620 -1.118 -0.900

# original data supplied for pathview
gene_fc[1,]
GLK 0.14 -1.6 0.62 -1.1 -0.37

I have attached the resultant pathway image.
Original image sce00051.png shows genes specific[in green] to yeast.

I am wondering why pathview data differ for GLK when GLK is the only gene on that node.

Thanks,
Shriram
bigmw is offline   Reply With Quote
Old 03-21-2014, 10:38 AM   #50
shriram
Member
 
Location: UK

Join Date: May 2010
Posts: 13
Default

Quote:
Originally Posted by bigmw View Post
Shriram,
Here is the original graph from official KEGG website:
http://www.genome.jp/kegg-bin/show_p...e00051+YCL040W
With the GLK1 gene node marked by red box. Hover your mouse over there, do you see 3 budding yeast genes in this nodes: YCL040W (GLK1), YFR053C (HXK1), YGL253W (HXK2)? You may click on this node, and check the details of these genes. Again, the nodes in pathview graphs are labelled with the most representative gene name rather than all mapped gene names for clarity.
That's great, it makes full sense now, thank you very much for your help.
Shriram
shriram is offline   Reply With Quote
Old 03-21-2014, 01:18 PM   #51
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Well, you can check the first attached image in post 47. Obviously you do get different colors on pathview graphs. I think you might have checked the pathview output in the wrong directory:
kegg.dir = "~/RNAseq/13_Acute-Changes/13_GAGE_native_A1A2/A1A2pT2D/Pathview"

This is the directory storing original KEGG pathway graphs (and xml data). you should check in your current working directory for pathview output:
getwd()


Quote:
Originally Posted by sindrle View Post
Pasted wrong sessioninfo..

# > sessionInfo()
# R version 3.0.3 (2014-03-06)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)
...

Pathview works, but I dont get colors or up/down regulated genes...
bigmw is offline   Reply With Quote
Old 03-22-2014, 05:42 AM   #52
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Someone asked a similar question here:
https://stat.ethz.ch/pipermail/bioco...ch/044427.html

Quote:
Originally Posted by sindrle View Post
Another question... How to alter the heatmaps text-part size (KEGG pathway names)?
The names are all capped, its unreadable.. The "pdf.size" - option only regualtes the graphic part.

###################################################
### significant.genesets
###################################################
kegg.sig<-sigGeneSet(cnts.kegg.p, outname="~/RNAseq/13_Acute-Changes/14_GAGE_native_A1A2/A1A2All/A1A2All.kegg",pdf.size = c(7,12))
bigmw is offline   Reply With Quote
Old 03-22-2014, 05:46 AM   #53
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Hm.
Ok, Ill try Adobe Illustrator, I tried Preview, but there they were capped as well...

I ended up editing it in Excel, cropping of the names and copy them in again from the "significant-gs.txt", then saving as new PDF.
sindrle is offline   Reply With Quote
Old 03-22-2014, 10:17 AM   #54
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Why don’t you go beyond the preview step and actually open/load the file, you will see that the full graph extends beyond the artboard area.

You can do the following things:
1 Save a png file
File -> Save for Microsoft office

2 Edit artboard size and save another copy of new pdf:
Object -> Artboard -> Fit to Artwork Bounds->..
bigmw is offline   Reply With Quote
Old 03-23-2014, 02:15 PM   #55
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Forgot that sigGeneSet function has been updated to give users more control on the margin and font size. sigGeneSet calls a internal function heatmap2 to generate the heatmaps. So check the argument for this function
args(gage:::heatmap2)
The argument two relevant arguments here are margins and cexRow, which control the margins for column/row names and row name font size, you may do something like:
kegg.sig<-sigGeneSet(cnts.kegg.p,outname="~/RNAseq/13_Acute-Changes/14_GAGE_native_A1A2/A1A2All/A1A2All.kegg",pdf.size = c(7,12), margins = c(5,10))


Quote:
Originally Posted by sindrle View Post
Another question... How to alter the heatmaps text-part size (KEGG pathway names)?
The names are all capped, its unreadable.. The "pdf.size" - option only regualtes the graphic part.

###################################################
### significant.genesets
###################################################
kegg.sig<-sigGeneSet(cnts.kegg.p, outname="~/RNAseq/13_Acute-Changes/14_GAGE_native_A1A2/A1A2All/A1A2All.kegg",pdf.size = c(7,12))
bigmw is offline   Reply With Quote
Old 03-24-2014, 01:33 PM   #56
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Thank you!


Another question, why is it that the absolute expression levels of genes is not important when doing GAGE/GOseq/DAVID etc.?

If you have a new cell type/tissue or even a species, how do you go about doing GAGE if you do NO COMPARISONS, only you want to get familiar with this new transcriptome?
You only have set of genes that expressed (including CPM/FPKM values), and you also know which genes thats not expressed.

Can you use this information? Basically, you just use one condition, and the expression values of the genes expressed.
sindrle is offline   Reply With Quote
Old 03-25-2014, 06:14 PM   #57
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

In pathway analysis or gene set analysis, you may still use absolute expression levels rather than relative expression levels. However, two things make absolute expression analyses less desirable.

First, it is very hard to quantify absolute expression level. The expression measurements are affected by multiple factors other than the actual expression level or abundance, including transcript sequences, length, GC contents etc. These factors are mostly gene/transcript specific, normalization methods donít model all these factor well.
Second, absolute expression levels are not very biologically relevant even if estimated accurately. Different genes, groups or pathways have very different working abundance ranges and dynamics. Therefore, it is less meaningful to compare the absolute expression levels between genes or pathways directly.

On the opposite, relative expression levels can be accurately estimated and are more comparable between genes because all gene specific factors are cancelled out in such measurements, i.e. a gene's expression relative to its own reference level.
bigmw is offline   Reply With Quote
Old 03-26-2014, 08:15 AM   #58
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

Having that said, you can always do GAGE (or other pathway analysis) on absolute expression levels. GAGE does work on single sample or condition (single-column matrices or vectors). But very likely, you will always see some housing keep pathways/groups, like cell growth, protein synthesis and energy metabolism etc, more expressed than others.
bigmw is offline   Reply With Quote
Old 03-28-2014, 06:13 PM   #59
entrez
Junior Member
 
Location: NY

Join Date: Nov 2010
Posts: 7
Default

Have you compared GAGE vs other gene set analysis methods like GOseq etc?

Quote:
Originally Posted by sindrle View Post
Thank you!


Another question, why is it that the absolute expression levels of genes is not important when doing GAGE/GOseq/DAVID etc.?

If you have a new cell type/tissue or even a species, how do you go about doing GAGE if you do NO COMPARISONS, only you want to get familiar with this new transcriptome?
You only have set of genes that expressed (including CPM/FPKM values), and you also know which genes thats not expressed.

Can you use this information? Basically, you just use one condition, and the expression values of the genes expressed.
entrez is offline   Reply With Quote
Old 03-29-2014, 04:38 AM   #60
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

I have done a quick test with GOseq, but I must admit I like GAGE better after first glance. Easy to follow, nice manual, nice plots, lots of results and possibilities. It really facilitates further analysis I think.

But Im going to give GOseq another go for sure!
sindrle is offline   Reply With Quote
Reply

Tags
gene set, pathway analysis, r/bioconductor, rna-seq, visualization

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:04 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO