Seqanswers Leaderboard Ad

**yzzhang** · 10-18-2013, 12:16 PM

Hi, sazz, have solve your problem? I also wonder these things. If you get good solution, could you kindly share it? Thanks in advance.

**sazz** · 10-20-2013, 05:01 AM

Hi yzzhang,

I wrote to gsea developers, that's the answer:

"The best approach would be to create a GCT file where rows are gene identifiers
(ideally, unique instances of human gene symbols), columns are the biological
replicates for each of two phenotypes and the values are FPKM values.

In any case, you should avoid filtering your data in any way because this would
significantly reduce power of GSEA."

But apart from this problem, I don't trust the ranking methods, ttest or signal2noise are not that much suitable for that kind of analysis. Even the formulas of those methods cares about the variation between replicates, it does not fit to the logic of CuffDiff significancy calculations. You can check the ranked list at the end, and you will see that the ones with very low expressions but also with a low variancy, are not in the "significant list" in CuffDiff but in high ranks at GSEA output, probably because of their low expressions; so a formula that considers the q-value of CuffDiff and log2fold change would be best; so you can do a pre-ranking; but I haven't found something like this yet; and also my statistics is not that good and I can't figure it out on my own.

Actually in the paper of "Differential analysis of gene regulation at transcript resolution with RNA-seq" of Cole Trapnell, they use GSEA after RNA-seq and in the methods part, they say:

"Enrichment for up- or downregulation sets of genes from the REACTOME pathway database was computed by running GSEA against the fold-change ranked list of genes in the experiment. Ranking was based on Cuffdiff 2–derived fold change."

So they say, they used fold change at the CuffDiff result but this can't be that simple, just disregarding the q-value. I asked to Cole Trapnell by mail but he didn't respond.

**yzzhang** · 10-20-2013, 05:47 AM

Hi, sazz,
Thanks a lot. I appreciate your help, and I will think if this method is suitable for my data. Thanks again.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

GSEA for RNA-seq data

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News