Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unsure how to use GAGE with biological replicates

    Hello,

    I have some RNAseq data (cuffdiff) that I am trying to run gage/pathview on, to determine which pathways are enriched. However, my data set uses biological replicates (two samples of each cell type), which I have not used before with gage.

    I read the manuals and found a mention of a 'weights' argument for gage here: http://www.bioconductor.org/packages...e/man/gage.pdf
    But it does not describe what the numberic vector to be the weight should contain or how to determine it based on the number of replicates. While there are many other examples of how to use gage/pathview on the Bioconductor website, none of them mention the 'weights' argument or any sort of replicates.

    The command I used without biological replicates is:
    > fc.kegg.p <- gage(exp.fc, gsets = kegg.sets.rn, ref = NULL, samp = NULL)

    Does anyone have any experience with this? How would I go about analyzing these data?

  • #2
    When you expression data are differential expression score like fold changes, no matter with or without replicate (matrix or vector), you can use gage with ref=NULL and samp=NULL. If you data are the original expression levels, you should specify the column numbers of control(s) and experiment(s) in your data matrix with ref and samp. If your column 1-2 are controls, and 3-4 are experiments, the samples are not paired, you can do something like:
    kegg.p <- gage(exp.data, gsets = kegg.sets.rn, ref = 1:2, samp = 3:4, compare = "unpaired")

    check gage function document for details:
    ?gage
    you may also want to check the quick start and basic analysis sections in gage tutorial:

    Comment


    • #3
      My data are the cuffdiff output for two samples each of four different cell types. I had already been using the (very helpful) tutorial you linked, which explains how to retrieve the needed data from the gene_exp.diff file of the cuffdiff output. This file, though, will combine the data for biolofigal replicates, making it impossible to do what you describe with multiple columns. This is why I was interested in the 'weights' argument.

      There is a file with the cuffdiff output, 'genes.read_group_tracking', which does list FPKM for each gene for each replicate. Perhaps I should use that.

      I don't see how gage will still be statistically accurate when working with fold changes with biological replicates without specifying them. If there are multiple biological replicates, then a fold change of a certain amount should be more significant than a fold change of the same amount with only one sample of each cell type. Could you elaborate on how that works?

      Comment


      • #4
        GAGE does take sample size into account. It does pair-wise comparison between experiments vs controls (disease vs normal etc), i.e. conducts gene set or pathway test between each sample pairs and then summarizes the results. The more samples (hence independent experiment-control sample pairs) you have, the more testing power you get this way. You may also read the GAGE paper for more details of the method, at http://www.biomedcentral.com/1471-2105/10/161.
        So to take the advantages of the full testing power of GAGE, it is recommended to follow the native workflow. OR you can use the normalized data from other tools (like the FPKM for each gene for each replicate by cuffdiff), then feed the data into GAGE as mentioned above. The joint workflows are there for users’ convenience. People can do the differential expression analysis (at individual gene level) with well established tools like DESeq, edgeR, Cufflinks etc and input the results into GAGE/Pathview workflow for pathway analysis and visualization. It is convenient, however less sensitive because sample size is not considered this way.
        People asked similar questions previously:
        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X