SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
SeqGSEA: Gene Set Enrichment Analysis of RNA-Seq Data Xi Wang Bioinformatics 3 10-01-2013 07:44 AM
Gene set enrichment analysis of RNA-Seq data jel4h Bioinformatics 1 06-21-2012 04:25 AM
RNA-Seq: Length Bias Correction for RNA-seq Data in Gene Set Analyses. Newsbot! Literature Watch 0 01-22-2011 02:02 AM

Reply
 
Thread Tools
Old 04-15-2013, 04:43 PM   #1
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default SeqGSEA: Gene Set Enrichment Analysis of RNA-Seq Data

I am glad to introduce to you guys a new Bioconductor package, SeqGSEA, developed by our group. The detailed description of this package is:

SeqGSEA: Gene set enrichment analysis of high-throughput RNA-Seq data by integrating differential expression and splicing. Using negative binomial distribution to model read count data, which accounts for sequencing biases and biological variation. Statistical significance of each gene set investigated is reached by subject permutation. Based on the permutation, statistical significance regarding to each gene's differential expression and splicing can also be achieved , respectively.

The package can be accessed at the URL:
http://bioconductor.org/packages/rel...l/SeqGSEA.html

Should you have any questions, comments, or suggestions, please feel free to email me at (xi.wang (at) newcastle.edu.au). Thanks.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 04-27-2013, 07:42 PM   #2
xkcao
Junior Member
 
Location: CHN

Join Date: Apr 2013
Posts: 3
Default

Could you offer an example and describe how to use this package for RNA-seq data analysis?
xkcao is offline   Reply With Quote
Old 04-28-2013, 04:18 AM   #3
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by xkcao View Post
Could you offer an example and describe how to use this package for RNA-seq data analysis?
Thanks for your message. I have included an example in Section 6 in the package's vignette available at http://bioconductor.org/packages/rel...oc/SeqGSEA.pdf

Any further questions or comments please feel free to let me know. Thanks.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 07-02-2013, 10:53 PM   #4
wilson90
Member
 
Location: Singapore

Join Date: May 2012
Posts: 48
Default

I am sorry, but it is not available for R version 2.15.1. Is it still available?
wilson90 is offline   Reply With Quote
Old 07-02-2013, 11:26 PM   #5
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

The package SeqGSEA started with R version 3.0.0. Perhaps you need to install the latest release of R. If you are not happy with the latest R, and if you are based on Linux platform, you can download the tar ball of the package at

http://www.bioconductor.org/packages...A_1.0.2.tar.gz

and type 'R CMD INSTALL SeqGSEA_1.0.2.tar.gz' to install.

Cheers
Xi
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 09-08-2013, 10:41 PM   #6
master_shake
Junior Member
 
Location: Mountain View, CA

Join Date: Jul 2012
Posts: 4
Default Using EasyRNASeq for SeqGSEA

The latest vignette claims one can use easyRNAseq for read counts. Using easyRNAseq, should one count exons, or genes? Also, how should the file be outputted, as a summarized experiment? A count table? Normalized values for DESeq?
master_shake is offline   Reply With Quote
Old 09-08-2013, 11:10 PM   #7
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by master_shake View Post
The latest vignette claims one can use easyRNAseq for read counts. Using easyRNAseq, should one count exons, or genes? Also, how should the file be outputted, as a summarized experiment? A count table? Normalized values for DESeq?
Thanks for checking the latest version of SeqGSEA :-)

You should also count reads on exons, and output the read counts per sample. As to the format, please refer to the files in the directory:

Code:
dat.dir = system.file("extdata", package="SeqGSEA", mustWork=TRUE)
Perhaps I will provide an easier way to connect easyRNAseq and SeqGSEA in the upcoming versions.

Cheers
Xi
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 03-02-2014, 06:54 PM   #8
wilson90
Member
 
Location: Singapore

Join Date: May 2012
Posts: 48
Default

Just wondering, what is a geneset file?
How do you define the geneset file?
Any example format?
Thank you.
wilson90 is offline   Reply With Quote
Old 03-03-2014, 05:28 AM   #9
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by wilson90 View Post
Just wondering, what is a geneset file?
How do you define the geneset file?
Any example format?
Thank you.
Thanks for considering SeqGSEA.

The gene sets can be downloaded from http://www.broadinstitute.org/gsea/msigdb/index.jsp

Or in GMT format if you want to specify yourself
http://www.broadinstitute.org/cancer..._.28.2A.gmt.29
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 05-12-2014, 08:23 AM   #10
Yvone
Junior Member
 
Location: Germany

Join Date: Oct 2013
Posts: 6
Default

I noticed that SeqGSEA uses the object CountDataSet from DESeq. As I am currently using DESeq2, the main object for analyses are DESeqDataSet and cannot be adopted directively in SeqGSEA. I guess these two things are similar, but is there an easy way to transform DESeqDataSet to CountDataSet?
Yvone is offline   Reply With Quote
Old 05-12-2014, 12:14 PM   #11
rpauly
Member
 
Location: Atlanta

Join Date: Apr 2011
Posts: 32
Default Sample data available?

Hi,

I came across the SeqGSEA package for GSEA of RNA-seq data.
We extensively work on cancer data sets and found this tool to be quite intriguing.

I was wondering if there is sample data set available to understand all the formats of the file. I am already have the DESEQ count files available.

~Thanks,
Rini
rpauly is offline   Reply With Quote
Old 05-13-2014, 12:23 AM   #12
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Yvone View Post
I noticed that SeqGSEA uses the object CountDataSet from DESeq. As I am currently using DESeq2, the main object for analyses are DESeqDataSet and cannot be adopted directively in SeqGSEA. I guess these two things are similar, but is there an easy way to transform DESeqDataSet to CountDataSet?
I will try to move to DESeq2 in the future but at present the most easy way to deal with this issue is just rerun DESeq analysis using the SeqGSEA pipeline. We have already provided one all-in command to go through all analyses.

Cheers
Xi
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 05-13-2014, 12:25 AM   #13
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by rpauly View Post
Hi,

I came across the SeqGSEA package for GSEA of RNA-seq data.
We extensively work on cancer data sets and found this tool to be quite intriguing.

I was wondering if there is sample data set available to understand all the formats of the file. I am already have the DESEQ count files available.

~Thanks,
Rini
A set of example data can be found by executing the following command:
system.file("extdata", package="SeqGSEA", mustWork=TRUE)
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 05-13-2014, 04:38 AM   #14
Yvone
Junior Member
 
Location: Germany

Join Date: Oct 2013
Posts: 6
Default

Thanks for your quick reply.

I have other questions.
1). Regarding independent filtering of weakly expressed genes in DE analyses, is it also needed in SeqGSEA?
2). Can I apply multi-factor design in SeqGSEA? In DESeq, I applied glm models to correct variations by other factoers than my interest, for example, calculate DE genes for the treatment while correcting for sex, batch effect. Does SeqGSEA also allow a multi-factor design?
Yvone is offline   Reply With Quote
Old 05-13-2014, 11:32 AM   #15
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Yvone View Post
Thanks for your quick reply.

I have other questions.
1). Regarding independent filtering of weakly expressed genes in DE analyses, is it also needed in SeqGSEA?
2). Can I apply multi-factor design in SeqGSEA? In DESeq, I applied glm models to correct variations by other factoers than my interest, for example, calculate DE genes for the treatment while correcting for sex, batch effect. Does SeqGSEA also allow a multi-factor design?
1) I suggestion you feed all genes to SeqGSEA, and SeqGSEA would apply itself filter to exclude unexpressed genes.
2) Sorry with the current version cannot.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 05-14-2014, 08:16 AM   #16
thaleko
Junior Member
 
Location: Norway

Join Date: Oct 2013
Posts: 4
Default

Hi,

I have recently started using SeqGSEA to analyse RNAseq data from 12 tumors. (Paired-end, 70-90 mill reads, mapped with tophat, counted with htseq-count).

I would like to compare two of these tumors (samples v6 and v11, see below) to the other ten regarding certain gene sets. However, I run into the following issue when following 6.3 in the manual (DE only):

Genes with read count 0 across all 12 samples have been removed from my counts table.

> nrow(counts)
[1] 45418
> head(counts)
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12
ENSG00000000003 43888 15516 15307 4287 17635 3342 3904 7235 22298 12464 4476 10353
ENSG00000000005 2238 35 27 23 81 0 12 46 335 39 4 56
ENSG00000000419 2781 3032 1805 2644 2027 2651 3383 2230 2416 2070 2503 1381
ENSG00000000457 1212 727 699 962 707 626 1054 944 950 647 507 493
ENSG00000000460 524 538 396 640 415 1467 852 1073 568 419 409 399
ENSG00000000938 173 1349 616 467 280 1769 987 1159 415 448 1591 679
> label <- as.factor(c(0,0,0,0,0,1,0,0,0,0,1,0))
> DEG <- newCountDataSet(counts,label)
> DEG <- estimateSizeFactors(DEG)
> DEG <- estimateDispersions(DEG, method="pooled", fitType="local")
> DEGres <- DENBStat4GSEA(DEG)
> permuteMat <- genpermuteMat(label, times=perm.times)
> DEpermNBstat <- DENBStatPermut4GSEA(DEG, permuteMat)
Error in { :
task 1 failed - "Parametric dispersion fit failed. Try a local fit and/or a pooled estimation. (See '?estimateDispersions')"


I am using estimateDispersions and estimateSizeFactors instead of runDESeq in order to use fitType=local and method=pooled, but am still experiencing trouble. Any idea what I'm doing wrong? Anyone else having this problem? Any help would be greatly appreciated!


Cheers,
Thale
thaleko is offline   Reply With Quote
Old 05-15-2014, 12:42 AM   #17
thaleko
Junior Member
 
Location: Norway

Join Date: Oct 2013
Posts: 4
Default

I finally noticed that I need at least five samples per group, so I guess the problem is that I only have two tumors of interest...?

Thanks anyway!

Thale
thaleko is offline   Reply With Quote
Old 05-15-2014, 04:58 AM   #18
Yvone
Junior Member
 
Location: Germany

Join Date: Oct 2013
Posts: 6
Default

Hello there,

I encounter this problem. As I am using the DE-only approach, I dont't have a RCS object. Any ideas why the function genpermuteMat() doesn't take a vector? Thanks in advance!

> label <- as.factor(c(rep(0,6), rep(1,6)))
> permMat <- genpermuteMat(label, times=100)
Error: is(RCS, "ReadCountSet") is not TRUE
Yvone is offline   Reply With Quote
Old 05-15-2014, 10:19 AM   #19
Yvone
Junior Member
 
Location: Germany

Join Date: Oct 2013
Posts: 6
Default

Quote:
Originally Posted by Yvone View Post
Hello there,

I encounter this problem. As I am using the DE-only approach, I dont't have a RCS object. Any ideas why the function genpermuteMat() doesn't take a vector? Thanks in advance!

> label <- as.factor(c(rep(0,6), rep(1,6)))
> permMat <- genpermuteMat(label, times=100)
Error: is(RCS, "ReadCountSet") is not TRUE
Just figured out that it was an out-dated version of SeqGSEA. With the latest version, it works with vectors.

But just out of curiosity, if I have 5 samples for each treatment group, then there are in a total of 252 combinations of randomly assigning labels. Then does it make sense to run 1000 permutation?
Yvone is offline   Reply With Quote
Old 05-26-2014, 05:27 AM   #20
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by thaleko View Post
I finally noticed that I need at least five samples per group, so I guess the problem is that I only have two tumors of interest...?

Thanks anyway!

Thale
Sorry that the current version of SeqGSEA cannot work on less than 5 samples per group.

Xi
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Reply

Tags
gsea, rna-seq, seqgsea

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO