Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    gage package has a function, kegg.gsets, to generate updated pathway gene set data in real time for ~ 2300 KEGG species and KEGG Orthology (with species="ko").
    gageData package provides kegg and GO gene sets for 4 common research species: human, mouse, rat and budding yeast.
    You may want to go through the main vignette and other documents of gage package (besides the RNA-Seq workflow tutorial):

    GAGE is a published method for gene set (enrichment or GSEA) or pathway analysis. GAGE is generally applicable independent of microarray or RNA-Seq data attributes including sample sizes, experimental designs, assay platforms, and other types of heterogeneity, and consistently achieves superior performance over other frequently used methods. In gage package, we provide functions for basic GAGE analysis, result processing and presentation. We have also built pipeline routines for of multiple GAGE analyses in a batch, comparison between parallel analyses, and combined analysis of heterogeneous data from different sources/studies. In addition, we provide demo microarray data and commonly used gene set data based on KEGG pathways and GO terms. These funtions and data are also useful for gene set analysis using other methods.


    gageData is available:
    This is a supportive data package for the software package, gage. However, the data supplied here are also useful for gene set or pathway analysis or microarray data analysis in general. In this package, we provide two demo microarray dataset: GSE16873 (a breast cancer dataset from GEO) and BMP6 (originally published as an demo dataset for GAGE, also registered as GSE13604 in GEO). This package also includes commonly used gene set data based on KEGG pathways and GO terms for major research species, including human, mouse, rat and budding yeast. Mapping data between common gene IDs for budding yeast are also included.

    Comment


    • #17
      Hi
      I am using pathview for yeast however I get following error while retrieving pathway information for 19 different pathways.
      [1] "Downloading xml files for sce04113 Meiosis - yeast, 1/19 pathways.."
      [1] "Downloading png files for sce04113 Meiosis - yeast, 1/19 pathways.."
      Download of sce04113 Meiosis - yeast xml and png files failed!
      Failed to download KEGG xml/png files, sce04113 Meiosis - yeast skipped!

      Same functionality works fine with human data.

      below is my R version
      R version 3.0.2 (2013-09-25)
      Platform: x86_64-w64-mingw32/x64 (64-bit)

      Thanks
      Shriram

      Comment


      • #18
        Originally posted by shriram View Post
        Hi
        I am using pathview for yeast however I get following error while retrieving pathway information for 19 different pathways.
        [1] "Downloading xml files for sce04113 Meiosis - yeast, 1/19 pathways.."
        [1] "Downloading png files for sce04113 Meiosis - yeast, 1/19 pathways.."
        Download of sce04113 Meiosis - yeast xml and png files failed!
        Failed to download KEGG xml/png files, sce04113 Meiosis - yeast skipped!

        Same functionality works fine with human data.

        below is my R version
        R version 3.0.2 (2013-09-25)
        Platform: x86_64-w64-mingw32/x64 (64-bit)

        Thanks
        Shriram
        ############
        Issue resolved
        by taking substring of actual pathway name in kegg and specifying gene.idtype="KEGG"
        path.ids <- substr(path.ids, 1, 8)
        ############

        Comment


        • #19
          gene.idtype="KEGG" specifies the ID type used for the gene.data. It is not related to the error message, which indicates a download problem. As shown in your solution, this download problem is due to the wrong pathway IDs.

          Originally posted by shriram View Post
          ############
          Issue resolved
          by taking substring of actual pathway name in kegg and specifying gene.idtype="KEGG"
          path.ids <- substr(path.ids, 1, 8)
          ############

          Comment


          • #20
            I have a question about using GAGE with data from cufflinks, as described in the RNA-Seq workflow tutorial. I have RNAseq data from pigs that was aligned using Tophat and analyzed for DEGs using cufflinks. I'm going through the process listed in the cufflinks section, but I'm running into an error. Below are the commands I've been entering. Everything runs fine until I get to the last command.

            > cuff.res=read.delim(file="gene_exp.diff", sep="\t")
            > cuff.fc=cuff.res$log2.fold_change
            > gnames=cuff.res$gene
            > sel=gnames!="-"
            > gnames=as.character(gnames[sel])
            > cuff.fc=cuff.fc[sel]
            > names(cuff.fc)=gnames
            > gnames.eg=pathview::id2eg(gnames, category ="symbol")
            > sel2=gnames.eg[,2]>""
            > cuff.fc=cuff.fc[sel2]
            > names(cuff.fc)=gnames.eg[sel2,2]
            > range(exp.fc)
            Error: object 'exp.fc' not found

            Do you know what the issue could be? I'm just starting out with RNAseq data and using R, and I haven't been able to find anyone else with this issue. Thanks.

            Comment


            • #21
              you have to use cuff.fc instead of exp.fc I guess.
              This is R issue as you didn't had this object

              Comment


              • #22
                changing range(exp.fc) to range(cuff.fc) results in the following result printed by R:
                [1] -Inf Inf

                So is this a typo in the RNAseq workflow, or am I already supposed to have exp.fc defined? I'm trying to go through these pdfs, but I'm confused as to what this step is doing.

                Comment


                • #23
                  This is indeed a typo, it should be cuff.fc instead of exp.fc at this step. And you will assign cuff.fc to exp.fc in a later step, and then work exclusively on exp.fc:
                  range(cuff.fc)
                  #remove the -Inf and Inf values, which block the downstream analysis
                  cuff.fc[cuff.fc>10]=10
                  cuff.fc[cuff.fc< -10]=-10
                  exp.fc=cuff.fc
                  out.suffix="cuff"


                  BTW, the demo example uses human data, and you have to specify organism to be Sus scrofa (pig) for your data when you map gene sybmols to entrez gene IDs as below:
                  gnames.eg=pathview::id2eg(gnames, category ="symbol", org=="Ss")
                  Last edited by bigmw; 03-13-2014, 12:40 PM.

                  Comment


                  • #24
                    Thanks for your reply, and for informing me about the org="Ss" command. Unfortunately I am still unable to make it through the workflow without error. Below is my code and final output. Do you see what the issue could be? Thanks.

                    > kg.ssc=kegg.gsets("ssc")
                    > kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
                    > cuff.res=read.delim(file="gene_exp.diff", sep="\t")
                    > cuff.fc=cuff.res$log2.fold_change.
                    > gnames=cuff.res$gene
                    > sel=gnames!="-"
                    > gnames=as.character(gnames[sel])
                    > cuff.fc=cuff.fc[sel]
                    > names(cuff.fc)=gnames
                    > gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
                    Loading required package: org.Ss.eg.db

                    > sel2=gnames.eg[,2]>""
                    > cuff.fc=cuff.fc[sel2]
                    > names(cuff.fc)=gnames.eg[sel2,2]
                    > range(cuff.fc)
                    [1] -Inf Inf
                    > cuff.fc[cuff.fc>10]=10
                    > cuff.fc[cuff.fc< -10]=-10
                    > exp.fc=cuff.fc
                    > out.suffix="cuff"
                    > require(gage)
                    > data(kegg.gs)
                    > fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
                    > sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
                    + !is.na(fc.kegg.p$greater[, "q.val"])
                    > path.ids <- rownames(fc.kegg.p$greater)[sel]
                    > sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
                    + !is.na(fc.kegg.p$less[, "q.val"])
                    > path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
                    > path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
                    > require(pathview)
                    > pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
                    + gene.data = exp.fc, pathway.id = pid,
                    + species = "ssc", out.suffix=out.suffix))
                    Start tag expected, '<' not found
                    Parsing ./sscNA.xml file failed, please check the file!
                    Start tag expected, '<' not found
                    Parsing ./sscNA.xml file failed, please check the file!
                    Start tag expected, '<' not found
                    Parsing ./sscNA.xml file failed, please check the file!

                    Comment


                    • #25
                      Looks like your downloading had some problem and you didn’t download any pathway data file. Can you confirm on this?

                      Comment


                      • #26
                        I think you are right, but I'm not sure how exactly to download the pathway data files. I thought the step you told me to change (below) would download the package I need, but it fails:

                        gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
                        Loading required package: org.ssc.eg.db
                        Bioconductor version 2.13 (BiocInstaller 1.12.0), ?biocLite for help
                        BioC_mirror: http://bioconductor.org
                        Using Bioconductor version 2.13 (BiocInstaller 1.12.0), R version 3.0.3.
                        Installing package(s) 'org.Ss.eg.db'
                        Loading required package: org.Ss.eg.db
                        Error in pathview::id2eg(gnames, category = "symbol", org = "Ss") :
                        Fail to install/load gene annotation package org.Ss.eg.db!
                        In addition: Warning messages:
                        1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
                        there is no package called ‘org.Ss.eg.db’
                        2: package ‘org.ssc.eg.db’ is not available (for R version 3.0.3)
                        3: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
                        there is no package called ‘org.Ss.eg.db’

                        I changed org= "Ss" to "Ssc" and "ssc", but the result is the same, there does not seem to be a database with this name. However on the KEGG website, pig is listed as ssc. Is this the proper way to download the database, or is there another step I am missing? Thanks.

                        Comment


                        • #27
                          The files you downloaded are named sscNA.xml hence your path.ids2 are all NA’s. In other words, no real pathways were selected in your gage analysis step. You gage analysis had this problem because you used kegg.gs which are human gene set data not the pig data.
                          What you did:
                          data(kegg.gs)
                          fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)

                          You should generated pig gene set data first using kegg.gsets function in gage package:
                          kg. ssc=kegg.gsets("ssc")
                          kegg.gs=kg. ssc$kg.sets[kg. ssc$sigmet.idx]
                          fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)

                          And this is the only problem in your original analysis session I quoted below. It will work once you get this right.
                          Please always pay attention to the species matching of your own data, gene set or pathway data as documented in gage and pathview packages. Actually you will find everything in the pacakge tutorials and documentations for functions you work with like gage or pathview etc:
                          GAGE is a published method for gene set (enrichment or GSEA) or pathway analysis. GAGE is generally applicable independent of microarray or RNA-Seq data attributes including sample sizes, experimental designs, assay platforms, and other types of heterogeneity, and consistently achieves superior performance over other frequently used methods. In gage package, we provide functions for basic GAGE analysis, result processing and presentation. We have also built pipeline routines for of multiple GAGE analyses in a batch, comparison between parallel analyses, and combined analysis of heterogeneous data from different sources/studies. In addition, we provide demo microarray data and commonly used gene set data based on KEGG pathways and GO terms. These funtions and data are also useful for gene set analysis using other methods.

                          Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.



                          Originally posted by shocker8786 View Post
                          Thanks for your reply, and for informing me about the org="Ss" command. Unfortunately I am still unable to make it through the workflow without error. Below is my code and final output. Do you see what the issue could be? Thanks.

                          > kg.ssc=kegg.gsets("ssc")
                          > kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
                          > cuff.res=read.delim(file="gene_exp.diff", sep="\t")
                          > cuff.fc=cuff.res$log2.fold_change.
                          > gnames=cuff.res$gene
                          > sel=gnames!="-"
                          > gnames=as.character(gnames[sel])
                          > cuff.fc=cuff.fc[sel]
                          > names(cuff.fc)=gnames
                          > gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
                          Loading required package: org.Ss.eg.db

                          > sel2=gnames.eg[,2]>""
                          > cuff.fc=cuff.fc[sel2]
                          > names(cuff.fc)=gnames.eg[sel2,2]
                          > range(cuff.fc)
                          [1] -Inf Inf
                          > cuff.fc[cuff.fc>10]=10
                          > cuff.fc[cuff.fc< -10]=-10
                          > exp.fc=cuff.fc
                          > out.suffix="cuff"
                          > require(gage)
                          > data(kegg.gs)
                          > fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
                          > sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
                          + !is.na(fc.kegg.p$greater[, "q.val"])
                          > path.ids <- rownames(fc.kegg.p$greater)[sel]
                          > sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
                          + !is.na(fc.kegg.p$less[, "q.val"])
                          > path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
                          > path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
                          > require(pathview)
                          > pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
                          + gene.data = exp.fc, pathway.id = pid,
                          + species = "ssc", out.suffix=out.suffix))
                          Start tag expected, '<' not found
                          Parsing ./sscNA.xml file failed, please check the file!
                          Start tag expected, '<' not found
                          Parsing ./sscNA.xml file failed, please check the file!
                          Start tag expected, '<' not found
                          Parsing ./sscNA.xml file failed, please check the file!

                          Comment


                          • #28
                            Your original code worked well in this step as show in your reponse #24 above.
                            > gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
                            Loading required package: org.Ss.eg.db

                            This line above had no problem. Changing org= "Ss" to "Ssc" and "ssc" caused all the errors. There is no way you will get the input of org="Ss" and message with org.ssc.eg.db like below:
                            >gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
                            Loading required package: org.ssc.eg.db


                            Originally posted by shocker8786 View Post
                            I think you are right, but I'm not sure how exactly to download the pathway data files. I thought the step you told me to change (below) would download the package I need, but it fails:

                            gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
                            Loading required package: org.ssc.eg.db
                            Bioconductor version 2.13 (BiocInstaller 1.12.0), ?biocLite for help
                            BioC_mirror: http://bioconductor.org
                            Using Bioconductor version 2.13 (BiocInstaller 1.12.0), R version 3.0.3.
                            Installing package(s) 'org.Ss.eg.db'
                            Loading required package: org.Ss.eg.db
                            Error in pathview::id2eg(gnames, category = "symbol", org = "Ss") :
                            Fail to install/load gene annotation package org.Ss.eg.db!
                            In addition: Warning messages:
                            1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
                            there is no package called ‘org.Ss.eg.db’
                            2: package ‘org.ssc.eg.db’ is not available (for R version 3.0.3)
                            3: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
                            there is no package called ‘org.Ss.eg.db’

                            I changed org= "Ss" to "Ssc" and "ssc", but the result is the same, there does not seem to be a database with this name. However on the KEGG website, pig is listed as ssc. Is this the proper way to download the database, or is there another step I am missing? Thanks.

                            Comment


                            • #29
                              Thank you for your help. I have 2 data sets that I am trying to perform this analysis on (all pig), and I was indeed able to produce the results files as expected by changing those lines of code for one of the data sets (the one I had not tried this on previously):

                              > kg.ssc=kegg.gsets("ssc")
                              > kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
                              > cuff.res=read.delim(file="gene_exp.diff", sep="\t")
                              > cuff.fc=cuff.res$log2.fold_change.
                              > gnames=cuff.res$gene
                              > sel=gnames!="-"
                              > gnames=as.character(gnames[sel])
                              > cuff.fc=cuff.fc[sel]
                              > names(cuff.fc)=gnames
                              > gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
                              Loading required package: org.Ss.eg.db

                              > sel2=gnames.eg[,2]>""
                              > cuff.fc=cuff.fc[sel2]
                              > names(cuff.fc)=gnames.eg[sel2,2]
                              > range(cuff.fc)
                              [1] -Inf Inf
                              > cuff.fc[cuff.fc>10]=10
                              > cuff.fc[cuff.fc< -10]=-10
                              > exp.fc=cuff.fc
                              > out.suffix="cuff"
                              > require(gage)
                              > kg.ssc=kegg.gsets("ssc")
                              > kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
                              > fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
                              > sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
                              + !is.na(fc.kegg.p$greater[, "q.val"])
                              > path.ids <- rownames(fc.kegg.p$greater)[sel]
                              > sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
                              + !is.na(fc.kegg.p$less[, "q.val"])
                              > path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
                              > path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
                              > require(pathview)
                              > pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
                              + gene.data = exp.fc, pathway.id = pid,
                              + species = "ssc", out.suffix=out.suffix))
                              Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
                              Writing image file ssc04151.cuff.png
                              Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
                              Writing image file ssc04010.cuff.png
                              Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
                              Writing image file ssc04080.cuff.png

                              However, with my original data set, I still receive an error after entering the code exactly the same way:

                              > cuff.res=read.delim(file="gene_exp.diff", sep="\t")
                              > cuff.fc=cuff.res$log2.fold_change.
                              > gnames=cuff.res$gene
                              > sel=gnames!="-"
                              > gnames=as.character(gnames[sel])
                              > cuff.fc=cuff.fc[sel]
                              > names(cuff.fc)=gnames
                              > gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
                              Loading required package: org.Ss.eg.db

                              > sel2=gnames.eg[,2]>""
                              > cuff.fc=cuff.fc[sel2]
                              > names(cuff.fc)=gnames.eg[sel2,2]
                              > range(cuff.fc)
                              [1] -Inf Inf
                              > cuff.fc[cuff.fc>10]=10
                              > cuff.fc[cuff.fc< -10]=-10
                              > exp.fc=cuff.fc
                              > out.suffix="cuff"
                              > require(gage)
                              > kg.ssc=kegg.gsets("ssc")
                              > kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
                              > fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
                              > sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
                              + !is.na(fc.kegg.p$greater[, "q.val"])
                              > path.ids <- rownames(fc.kegg.p$greater)[sel]
                              > sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
                              + !is.na(fc.kegg.p$less[, "q.val"])
                              > path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
                              > path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
                              > require(pathview)
                              > pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
                              + gene.data = exp.fc, pathway.id = pid,
                              + species = "ssc", out.suffix=out.suffix))
                              [1] "Downloading xml files for sscNA, 1/1 pathways.."
                              [1] "Downloading png files for sscNA, 1/1 pathways.."
                              Download of sscNA xml and png files failed!
                              Failed to download KEGG xml/png files, sscNA skipped!
                              Start tag expected, '<' not found
                              Parsing ./sscNA.xml file failed, please check the file!
                              Start tag expected, '<' not found
                              Parsing ./sscNA.xml file failed, please check the file!
                              Warning message:
                              In download.file(png.url, png.target, quiet = T, mode = "wb") :
                              cannot open: HTTP status was '404 Not Found'

                              It seems that R is remembering that I was using the human kegg database before, and now I cannot get it to use the pig one. I deleted the sscNA.xml files and unlinked the previous workspace using the unlink(".RData") command, but am still unable to produce the result.

                              Comment


                              • #30
                                R does not remember the old kegg.gs, if you assign the name to a new gene set data. If you are sure your code was right, then very likely your data had some problem. To locate the problem, you can restart a new R session and run the same analysis code on your problematic data till the pathview step. If you still get the same error message, run the following code, and post your output here so that we can check what happened to your data:

                                str(cuff.fc)
                                head(cuff.fc)
                                lapply(kegg.gs[1:3], head, 3)
                                head(fc.kegg.p$greater)

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X