Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Originally posted by bigmw View Post
    The pathview package provides two functions: eg2id and id2eg, for ID mapping/conversion for major research species. For details:
    ?pathview::eg2id

    BTW, I would suggest you to convert your data ID from symbol to Entrez Gene, rather than your gene set ID from Entrez to symbol. The former should be much faster as it only need to call the conversion function once.
    Also, if I do want to convert gene set ID from Entrez to symbol.
    How can I do it?

    Thank you.

    Comment


    • #77
      GAGE and other methods (GSEA etc) require all genes included. This way GAGE test gene perturbations within pathways against the background of all genes. You selected a list of differentially expressed gene first, it is expected that you don’t any pathways standing out in that perforeground, right? Including all genes instead of a selected list of genes give you a major advantages: you included all your data in the analysis, which is usually more powerful. In addition, you don’t need some more or less arbitrary q-/p-value cutoff.

      Otherwise, you code seem to work fine. You may want to check the DESeq section and the native workflow on the demo code:



      Originally posted by crazyhottommy View Post
      Thank you, I followed it, after DESeq. 1724 differentially expressed genes were used for pathway analysis.

      res <- nbinomTest( cds, 'control, 'treat' )

      resSig <- res[ res$padj < 0.01 & (res$log2FoldChange >1| res$log2FoldChange < -1), ]

      resSig <- na.omit(resSig)

      require(gage)
      ...

      Am I doing it right?

      Comment


      • #78
        eg2id and id2eg are the pair of functions for ID mapping from and to Entrez Genes. For info:
        ?eg2id


        Originally posted by crazyhottommy View Post
        Also, if I do want to convert gene set ID from Entrez to symbol.
        How can I do it?

        Thank you.

        Comment


        • #79
          Originally posted by bigmw View Post
          GAGE and other methods (GSEA etc) require all genes included. This way GAGE test gene perturbations within pathways against the background of all genes. You selected a list of differentially expressed gene first, it is expected that you don’t any pathways standing out in that perforeground, right? Including all genes instead of a selected list of genes give you a major advantages: you included all your data in the analysis, which is usually more powerful. In addition, you don’t need some more or less arbitrary q-/p-value cutoff.

          Otherwise, you code seem to work fine. You may want to check the DESeq section and the native workflow on the demo code:
          http://www.bioconductor.org/packages...eqWorkflow.pdf
          Thank you very much for your reply. I will feed all the genes to GAGE to see, but one question is that the fold change for each gene has different p values (and adjust p values), so GAGE only takes the fold change into account but not the p values, right?

          Comment


          • #80
            You are right. In the demo examples GAGE takes fold changes, not the p-values as input.
            GAGE works with fold change (default), t-stats and other types of statistics, please check the GAGE paper and the RNA-seq workflow tutorial Section 5 Per gene score choices.

            Comment


            • #81
              This is normal. Different analysis procedures give you different p-values, although you can still see lot of consistency between the them (the top hits or order of significant pathways). We expect native gage/pathview workflow to be more sensitive than the joint workflows due to the design of GAGE analysis procedure. Importantly, GAGE takes the sample size into account by default, but average fold change scores output from other tools don’t have that info.
              For details, there was an earlier thread talking on the same question:
              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc



              Originally posted by tigerxu View Post
              I have followed the default workflows of gage and pathview on the example RNA-seq dataset. I also used the fold changes inferred by deseq2, then followed by the gage and pathview. I found both pipelines will output different results. The pipeline based on the fold changes by deseq2 generate much fewer significant pathways. For example below

              > gage.kegg.sig<-sigGeneSet(gage.kegg.p, outname="sig.kegg",pdf.size=c(7,8))
              [1] "there are 22 signficantly up-regulated gene sets"
              [1] "there are 17 signficantly down-regulated gene sets"

              > deseq2.kegg.sig<-sigGeneSet(deseq2.kegg.p, outname="deseq2.sig.kegg",pdf.size=c(7,8))
              [1] "gs.data needs to be a matrix-like object!"
              [1] "No heatmap produced for down-regulated gene sets, only 1 or none signficant."
              [1] "gs.data needs to be a matrix-like object!"
              [1] "there are 7 signficantly up-regulated gene sets"
              [1] "there are 0 signficantly down-regulated gene sets"

              I'm wondering which pipeline is more reliable for biological interpretation. Why the pipeline based on deseq2 return much fewer pathways? Can anyone give me some advice?

              Thanks!

              Comment


              • #82
                Originally posted by bigmw View Post
                GAGE and other methods (GSEA etc) require all genes included. This way GAGE test gene perturbations within pathways against the background of all genes. You selected a list of differentially expressed gene first, it is expected that you don’t any pathways standing out in that perforeground, right? Including all genes instead of a selected list of genes give you a major advantages: you included all your data in the analysis, which is usually more powerful. In addition, you don’t need some more or less arbitrary q-/p-value cutoff.

                Otherwise, you code seem to work fine. You may want to check the DESeq section and the native workflow on the demo code:
                http://www.bioconductor.org/packages...eqWorkflow.pdf
                I did try using all the genes, but still no pathways are selected....Maybe that's just the nature of my data?

                Comment


                • #83
                  I very appreciate your information! Very helpful to further my understanding on the statistical background of GAGE.

                  Originally posted by bigmw View Post
                  This is normal. Different analysis procedures give you different p-values, although you can still see lot of consistency between the them (the top hits or order of significant pathways). We expect native gage/pathview workflow to be more sensitive than the joint workflows due to the design of GAGE analysis procedure. Importantly, GAGE takes the sample size into account by default, but average fold change scores output from other tools don’t have that info.
                  For details, there was an earlier thread talking on the same question:
                  http://seqanswers.com/forums/showthread.php?t=34655#6

                  Comment


                  • #84
                    Obviously, no pathways has q-val <0.1. this is still common due to the sample size, noise level etc in your data. You may do either one or both of the following:
                    -increase the q-val cutoff, something like:
                    q.cut=0.2
                    sel<- fc.kegg.p$greater[,"q.val"] < q.cut & !is.na(fc.kegg.p$greater[,"q.val"])
                    sel.l<- fc.kegg.p$less[,"q.val"] < q.cut & !is.na(fc.kegg.p$greater[,"q.val"])

                    -use the native GAGE workflow instead of the joint workflow. The former has higher testing power as it take sample size into account. For details check my answer at #81 of this thread above.

                    I assume your data and analysis was done correctly above. But first of all, make sure you did not mix up your control and experiment samples (or their labels), and also your data quality has no major problem.



                    Originally posted by crazyhottommy View Post
                    I did try using all the genes, but still no pathways are selected....Maybe that's just the nature of my data?

                    Comment


                    • #85
                      Gage analysis for RNAseq

                      Dear all,

                      We are two biologists (so not bioinformaticians...) working with RNAseq data and having little "troubles" with pathways analysis. We performed RNA sequencing on 4 distinct cell populations to compare their transcriptional profile (platform Illumina HiSeq 2000). Row reads were mapped using TopHat and differential analysis was performed with edgeR+voom+limma packages. Our final output is a table (.txt file) for each contrast containing our 16058 expressed genes with respective log fold change, expression values (normalized) and adjusted p-values. We wish to perform pathway enrichment analysis to determine which pathways are enriched/depleted in our respective cell populations to examine for distinct functions and the gage pathway seems to be very complete for both GO and KEGG. However, we are not sure to use the good data for the analysis. Do we have to make the analysis separately for all the cell populations (loading into R only the log fold changes for all the genes for the contrast cell population A vs all the other cell populations) or do we have to load a table containing 4 columns (our 4 cell populations) with normalized log2 transformed expression values? It is not very clear for us... In addition, as we don't have a treated group vs another non-treated, we don't have a "biological reference" for the analysis, does it make sense to perform all the analysis with ref = NULL and samp = NULL?
                      We apologize for the very naive question.

                      Thank you for your help.

                      Comment


                      • #86
                        GAGE accepts log fold changes or other test statistics from differential expression analysis. Yes, you need to set ref = NULL and samp = NULL in gage function when differential expression analysis is already done. Details are described in section 5-6 of the workflow toturial:


                        Speaking of log fold changes, you must have some reference state. In your case, it can be one cell population or some virtual state of their combination. Please make sure you understand your upstream differential expression analysis.

                        Comment


                        • #87
                          Missing samples on gage() &gt; sigGeneSet() heatmap

                          Hello everyone. It's seems I'm a little late to this party. So fingers crossed this thread is still active.

                          So my problem is that only 3 of my 6 samples are appearing in heatmaps generated with sigGeneSet. The 'samp' samples are appearing but not 'ref' samples.

                          This only occurs when visualizing significant gene sets (gage() > sigGeneSet()). When running gage() > essGene() > geneData() to visualize genes in specific gene sets all samples are present (Figure 2, main vignette).

                          I have looked over the vignettes and sigGeneSet seem to always be pictured with 'samp' samples only (Figure 1, main vignette).

                          Is there a way to make sigGeneSet/gage generate heatmaps with all samples such as in Figure 2/3 of the main vignette but displaying gene set names as in Figure 1 of the main vignette?

                          Thanks,
                          BobbyT

                          Comment


                          • #88
                            Bobby,
                            Heatmaps from sigGeneSet function show the perturbations of whole gene sets or pathways (in experiment samples vs controls). It is only meaningful for exmperiment samples, you don’t really expect controls receive such perturbation score.
                            The gene set level heatmap is actually a unique feature of GAGE. Because it conducts a pair-wise comparison between experiment samples and controls. Therefore, we can have a gene set level test statistics (or p-values) for each experiment sample. The global statistics and p-values are then summarized from these individual tests. These stats/p-values are all included in the gage function output. You may check the paper for details of the statistical tests.

                            Comment


                            • #89
                              Hi bigmw,

                              Thanks for the reply. I think I understand where you're coming from. However if I generate a sample wide heatmap with heatmap() or the like you can see that there are differences between the 'control' samples. I should probably say that I have 2 groupss with 3 replicates in each group and it goes witbout saying that biological samples, even from the same 'group', are never identical. I would have thought that this variation would be able to be visualised for all samples, even at the gene set/pathway level.
                              i can understand why this hasn't been included in the package and I can talk my way around it but it woumd have been nice to see what baseline is doi g at the pathway level too

                              Comment


                              • #90
                                Hi bigmw,

                                I am working on non-model species. I have done differentially expressed gene analysis (gene level) using DESeq2 and used blastx to blast all expressed genes to the nr database. I am wondering whether I can use GAGE for downstream GO enrichment analysys and pathway analysis.

                                Thanks,
                                Tom

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X