SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Question: How to merge replicates of mapped reads out of Tophat for pathview and gage Parharn Bioinformatics 4 10-27-2014 01:03 PM
RNASeq Biological Replicates Fernas Bioinformatics 5 05-20-2014 06:04 AM
How to work with biological replicates babi2305 Bioinformatics 0 04-02-2013 02:17 PM
using Cuffdiff with biological replicates Jane M RNA Sequencing 0 09-01-2011 12:42 AM
Inconsistency between biological replicates Nicholas_ Bioinformatics 1 04-06-2011 03:18 AM

Reply
 
Thread Tools
Old 08-21-2014, 01:23 AM   #1
rdsqc22
Junior Member
 
Location: Rochester

Join Date: Nov 2013
Posts: 7
Default Unsure how to use GAGE with biological replicates

Hello,

I have some RNAseq data (cuffdiff) that I am trying to run gage/pathview on, to determine which pathways are enriched. However, my data set uses biological replicates (two samples of each cell type), which I have not used before with gage.

I read the manuals and found a mention of a 'weights' argument for gage here: http://www.bioconductor.org/packages...e/man/gage.pdf
But it does not describe what the numberic vector to be the weight should contain or how to determine it based on the number of replicates. While there are many other examples of how to use gage/pathview on the Bioconductor website, none of them mention the 'weights' argument or any sort of replicates.

The command I used without biological replicates is:
> fc.kegg.p <- gage(exp.fc, gsets = kegg.sets.rn, ref = NULL, samp = NULL)

Does anyone have any experience with this? How would I go about analyzing these data?
rdsqc22 is offline   Reply With Quote
Old 08-21-2014, 05:31 PM   #2
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

When you expression data are differential expression score like fold changes, no matter with or without replicate (matrix or vector), you can use gage with ref=NULL and samp=NULL. If you data are the original expression levels, you should specify the column numbers of control(s) and experiment(s) in your data matrix with ref and samp. If your column 1-2 are controls, and 3-4 are experiments, the samples are not paired, you can do something like:
kegg.p <- gage(exp.data, gsets = kegg.sets.rn, ref = 1:2, samp = 3:4, compare = "unpaired")

check gage function document for details:
?gage
you may also want to check the quick start and basic analysis sections in gage tutorial:
http://www.bioconductor.org/packages...t/doc/gage.pdf
bigmw is offline   Reply With Quote
Old 08-26-2014, 07:23 AM   #3
rdsqc22
Junior Member
 
Location: Rochester

Join Date: Nov 2013
Posts: 7
Default

My data are the cuffdiff output for two samples each of four different cell types. I had already been using the (very helpful) tutorial you linked, which explains how to retrieve the needed data from the gene_exp.diff file of the cuffdiff output. This file, though, will combine the data for biolofigal replicates, making it impossible to do what you describe with multiple columns. This is why I was interested in the 'weights' argument.

There is a file with the cuffdiff output, 'genes.read_group_tracking', which does list FPKM for each gene for each replicate. Perhaps I should use that.

I don't see how gage will still be statistically accurate when working with fold changes with biological replicates without specifying them. If there are multiple biological replicates, then a fold change of a certain amount should be more significant than a fold change of the same amount with only one sample of each cell type. Could you elaborate on how that works?
rdsqc22 is offline   Reply With Quote
Old 08-26-2014, 05:30 PM   #4
bigmw
Senior Member
 
Location: US

Join Date: Aug 2013
Posts: 123
Default

GAGE does take sample size into account. It does pair-wise comparison between experiments vs controls (disease vs normal etc), i.e. conducts gene set or pathway test between each sample pairs and then summarizes the results. The more samples (hence independent experiment-control sample pairs) you have, the more testing power you get this way. You may also read the GAGE paper for more details of the method, at http://www.biomedcentral.com/1471-2105/10/161.
So to take the advantages of the full testing power of GAGE, it is recommended to follow the native workflow. OR you can use the normalized data from other tools (like the FPKM for each gene for each replicate by cuffdiff), then feed the data into GAGE as mentioned above. The joint workflows are there for users’ convenience. People can do the differential expression analysis (at individual gene level) with well established tools like DESeq, edgeR, Cufflinks etc and input the results into GAGE/Pathview workflow for pathway analysis and visualization. It is convenient, however less sensitive because sample size is not considered this way.
People asked similar questions previously:
http://seqanswers.com/forums/showthread.php?t=34655#6
bigmw is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO