Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gage - local data set

    Hi,

    I am trying to run gage and FGNet with my local list. But I can't seems to make it works.

    I have created a gmt file and read it with readList to a list structure. I converted the list elements into Entrez IDs. when I try to run the gage command I keep getting `NA`

    This is what I'm doing
    Code:
    kegg_gsea <- readList("c2.cp.kegg.v4.0.symbols.gmt") #data set from the mSigDB
    
    kegg_gsea[1]
    $KEGG_GLYCOLYSIS_GLUCONEOGENESIS
     [1] "ACSS2"   "GCK"     "PGK2"    "PGK1"    "PDHB"    "PDHA1"   "PDHA2"  
     [8] "PGM2"    "TPI1"    "ACSS1"   "FBP1"    "ADH1B"   "HK2"     "ADH1C"  
    [15] "HK1"     "HK3"     "ADH4"    "PGAM2"   "ADH5"    "PGAM1"   "ADH1A"  
    [22] "ALDOC"   "ALDH7A1" "LDHAL6B" "PKLR"    "LDHAL6A" "ENO1"    "PKM2"   
    [29] "PFKP"    "BPGM"    "PCK2"    "PCK1"    "ALDH1B1" "ALDH2"   "ALDH3A1"
    [36] "AKR1A1"  "FBP2"    "PFKM"    "PFKL"    "LDHC"    "GAPDH"   "ENO3"   
    [43] "ENO2"    "PGAM4"   "ADH7"    "ADH6"    "LDHB"    "ALDH1A3" "ALDH3B1"
    [50] "ALDH3B2" "ALDH9A1" "ALDH3A2" "GALM"    "ALDOA"   "DLD"     "DLAT"   
    [57] "ALDOB"   "G6PC2"   "LDHA"    "G6PC"    "PGM1"    "GPI"    
    
    kegg_gsea_eg <- lapply(kegg_gsea_Up, sym2eg) # convert the file from gene symbols to entrez IDs ( human data).
    kegg_gsea_eg[1]
    $KEGG_GLYCOLYSIS_GLUCONEOGENESIS
     [1] "55902"  "2645"   "5232"   "5230"   "5162"   "5160"   "5161"   "55276" 
     [9] "7167"   "84532"  "2203"   "125"    "3099"   "126"    "3098"   "3101"  
    [17] "127"    "5224"   "128"    "5223"   "124"    "230"    "501"    "92483" 
    [25] "5313"   "160287" "2023"   "5315"   "5214"   "669"    "5106"   "5105"  
    [33] "219"    "217"    "218"    "10327"  "8789"   "5213"   "5211"   "3948"  
    [41] "2597"   "2027"   "2026"   "441531" "131"    "130"    "3945"   "220"   
    [49] "221"    "222"    "223"    "224"    "130589" "226"    "1738"   "1737"  
    [57] "229"    "57818"  "3939"   "2538"   "5236"   "2821"
    This all looks fine, but when I try to run the gage function, I get only NaN as a results.

    Code:
    kegg.gs.gsea <- gage(data.norm, gsets = kegg_gsea_eg, ref = ctrl, samp = cr2w, compare ="unpaired")
    
    > head(kegg.gs.gsea$greater)
                                                  p.geomean stat.mean p.val q.val 
    KEGG_GLYCOLYSIS_GLUCONEOGENESIS                      NA       NaN    NA    NA 
    KEGG_CITRATE_CYCLE_TCA_CYCLE                         NA       NaN    NA    NA 
    KEGG_PENTOSE_PHOSPHATE_PATHWAY                       NA       NaN    NA    NA 
    KEGG_PENTOSE_AND_GLUCURONATE_INTERCONVERSIONS        NA       NaN    NA    NA 
    KEGG_FRUCTOSE_AND_MANNOSE_METABOLISM                 NA       NaN    NA    NA 
    KEGG_GALACTOSE_METABOLISM                            NA       NaN    NA    NA 
                                                  set.size CR2Wo1 CR2Wo2 CR2Wo3   
    KEGG_GLYCOLYSIS_GLUCONEOGENESIS                      0     NA     NA     NA   
    KEGG_CITRATE_CYCLE_TCA_CYCLE                         0     NA     NA     NA   
    KEGG_PENTOSE_PHOSPHATE_PATHWAY                       0     NA     NA     NA   
    KEGG_PENTOSE_AND_GLUCURONATE_INTERCONVERSIONS        0     NA     NA     NA   
    KEGG_FRUCTOSE_AND_MANNOSE_METABOLISM                 0     NA     NA     NA   
    KEGG_GALACTOSE_METABOLISM                            0     NA     NA     NA   
                                                  CR2Wo4 CR2Wo5 CR2Wo6
    KEGG_GLYCOLYSIS_GLUCONEOGENESIS                   NA     NA     NA
    KEGG_CITRATE_CYCLE_TCA_CYCLE                      NA     NA     NA
    KEGG_PENTOSE_PHOSPHATE_PATHWAY                    NA     NA     NA
    KEGG_PENTOSE_AND_GLUCURONATE_INTERCONVERSIONS     NA     NA     NA
    KEGG_FRUCTOSE_AND_MANNOSE_METABOLISM              NA     NA     NA
    KEGG_GALACTOSE_METABOLISM                         NA     NA     NA
    When I run everything with the gage function kegg.gsets()
    it works fine.

    Code:
    >kegg.gs.gage$greater[1:4,]
                                                  p.geomean stat.mean        p.val
    mmu03010 Ribosome                          0.0007681659  3.143027 1.394098e-14
    mmu00982 Drug metabolism - cytochrome P450 0.0051657235  2.555047 4.341639e-10
    mmu05204 Chemical carcinogenesis           0.0154393498  2.093465 2.108867e-07
    mmu02010 ABC transporters                  0.0164200321  2.079450 3.064449e-07
                                                      q.val set.size       CR2Wo1 
    mmu03010 Ribosome                          3.541008e-12      135 0.0007615765 
    mmu00982 Drug metabolism - cytochrome P450 5.513882e-08       58 0.0049954942 
    mmu05204 Chemical carcinogenesis           1.635328e-05       74 0.0169161660 
    mmu02010 ABC transporters                  1.635328e-05       44 0.0071786706 
                                                    CR2Wo2      CR2Wo3       CR2Wo4
    mmu03010 Ribosome                          0.004924298 0.001273332 0.0008053285
    mmu00982 Drug metabolism - cytochrome P450 0.042250736 0.001703304 0.0045428924
    mmu05204 Chemical carcinogenesis           0.093709197 0.009452226 0.0075235397
    mmu02010 ABC transporters                  0.001556945 0.021834009 0.0356210166
                                                    CR2Wo5       CR2Wo6
    mmu03010 Ribosome                          0.003067049 3.569067e-05
    mmu00982 Drug metabolism - cytochrome P450 0.006710034 3.373549e-03
    mmu05204 Chemical carcinogenesis           0.023693002 1.426880e-02
    mmu02010 ABC transporters                  0.032930730 1.248961e-01
    As far as I can tell, there is no difference in the structure between the two list. But somehow gage can't seems to find any hits in my list.

    What am doing wrong here?

    thanks
    Assa

  • #2
    mSigDB gene sets are human data, not mouse. Therefore, there won’t be any human gene sets mappable or enriched in your mouse data. Please always make sure the species and gene ID types are the same for your data and the gene set data as described in the Common Errors section of the gage tutorial.
    You can always use kegg.gsets() and go.gsets() in gage package to derive the latest KEGG pathway or GO gene sets. Actually it is recommended to do so even for human data because the gene sets in mSigDB may not be up-to-date. Please check gage package documentations for details:
    GAGE is a published method for gene set (enrichment or GSEA) or pathway analysis. GAGE is generally applicable independent of microarray or RNA-Seq data attributes including sample sizes, experimental designs, assay platforms, and other types of heterogeneity, and consistently achieves superior performance over other frequently used methods. In gage package, we provide functions for basic GAGE analysis, result processing and presentation. We have also built pipeline routines for of multiple GAGE analyses in a batch, comparison between parallel analyses, and combined analysis of heterogeneous data from different sources/studies. In addition, we provide demo microarray data and commonly used gene set data based on KEGG pathways and GO terms. These funtions and data are also useful for gene set analysis using other methods.


    BTW, to convert gene symbol or other IDs to Entrez Genes, you can use id2eg() function in pathview package for the major research species. Function sym2eg() in gage package works for human genes only. For more details, please check the pathview package:
    Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.

    Comment


    • #3
      oh

      my bad. thanks for noticing that.

      You can always use kegg.gsets() and go.gsets() in gage package to derive the latest KEGG pathway or GO gene sets.
      Do the functions always get the latest KEGG- and GO information, or is it based on a local DB?

      Thanks
      Last edited by frymor; 03-18-2015, 02:33 AM.

      Comment


      • #4
        kegg.gsets() generates the latest gene set data by connecting to KEGG database on real time.
        go.gsets() uses Bioconductor gene annotation database packages, which are updated regularly. hence it should be close to the latest version if not identical.

        Comment


        • #5
          KEGG set from annotation file

          Hello,

          Can you direct me to information on how you constructed your gmt file for GAGE analysis? I have an annotation file (.txt) that contains my geneID, KEGG, KO, KOG, and GO information.

          I am trying to use GAGE for gene enrichment.

          Thank you

          Comment


          • #6
            What species you are working on? Gage package can generates gene set data for many species (3000 for KEGG and 19 for GO) via some dedicated functions. For details, check:
            ?kegg.gsets
            ?go.gsets

            Comment


            • #7
              Thanks,

              I am working with Sorghum bicolor which there is an available kegg db for, however it is based on the v1 genome ID and we are currently at v3 with different gene_IDs. My work has been based on the v3 genome assembly. This is why I was trying to make my own file for analysis. Thanks for your reply!

              Comment


              • #8
                To use custom gene set data as in your case, the simplest way to go is read in your annotation data file into R and process it into gene set data.
                You may use read.delim function to read in data, and split function to split your gene ID column into gene sets (based on KEGG pathway or GO term assignment). For more info check within R:
                ?read.delim
                ?split

                Comment


                • #9
                  Thank you so much for the information! I will give this a try.

                  Thank you again

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  7 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  7 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  66 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X