SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   DESeq Variance Stabilizing Transformation (http://seqanswers.com/forums/showthread.php?t=29980)

john_nl 05-07-2013 01:02 AM

DESeq Variance Stabilizing Transformation
 
Hello,

I am looking for some feedback regarding the use of the variance-stabilization (VST) methods found in the DESeq2 package. Hopefully one of the authors will respond and the comments will be of help to others.

For me, the purpose for applying this transformation is to be able to generate moderated fold changes for clustering of genes (not samples as in the vignette).

My data consists of a time series, where for each time point there is a "treated" sample and a "control" sample. Each sample (timepoint) consists of 4 biological replicates.

I performed the VST on the entire set of data and plot the per-gene standard deviation against the rank of the
mean*, for the shifted logarithm log2 (n + 1) (left) and the variance stabilizing transformation (right), it does not appear to have a pronounced effect.

http://i1287.photobucket.com/albums/...ps775bca63.png

However, if i set up a count dataset that consists of the samples corresponding to one timepoint only (first timepoint in the example below), and perform the VST and plot the standard deviation against rank of the mean, the transformed values have a much better stabilized standard deviation.

http://i1287.photobucket.com/albums/...psbaf85e24.png

So my questions are: Is there anyway to obtain better variance stabilized data when considering the entire timeseries? Should I just perform the VST on a per timepoint basis; after all I will only be computing fold changes between treatment and control samples at the same timepoint.

*The procedure was performed as per the DESeq2 manual:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
vsd <- varianceStabilizingTransformation(dds)
par(mfrow=c(1,2))
plot(rank(rowMeans(counts(dds))), genefilter::rowVars(log2(counts(dds)+1)), main="log2(x+1) transform")
plot(rank(rowMeans(assay(vsd))), genefilter::rowVars(assay(vsd)), main="VST")

moritzhess 07-23-2013 05:50 AM

As far as I know, you have to tell DESEQ to treat all expression values as if they were emerging from a single condition by specifying method="blind" when extimating the Dispersions.

Him26 11-20-2013 02:57 PM

I have a slightly unrelated question. It's about the plot.
Why is the variance low for low mean ? shouldn't it start high and decrease as the mean increase?
I have a similar data set and even if I filter requiring higher cpm the trend still persists.
Any one know of why this is the case?

Shawn_Li 11-28-2013 07:18 AM

DESeq2 variance
 
1 Attachment(s)
I guess it all depends on the type of data. For my NGS bacterial 16sRNA data, SD increase as the mean increases.

Michael Love 12-09-2013 06:08 AM

hi John,

The VST helps to stabilize the variance over the mean, insofar as this can be captured by the parametric curve of dispersion over mean. You might also try the rlog transformation, which sometimes performs qualitatively better than the VST (for example, if the size factors vary a lot across samples).

ayana.rajagopal 04-28-2014 01:11 AM

Hi guys,
Is the VST package of DESeq still functional? Because most of the functions of VST including getVarianceStabilizedData() seem to be dysfunctional in R version 3.0.1. Please help.

Michael Love 04-30-2014 06:38 AM

hi Ayana,

Can you post the code which you think is not working. Please include full code, R output and sessionInfo()

The VST and rlog are both implemented in DESeq2, which we suggest you use over DESeq.

Wolfgang Huber 04-30-2014 11:11 AM

Quote:

Originally Posted by moritzhess (Post 111391)
As far as I know, you have to tell DESEQ to treat all expression values as if they were emerging from a single condition by specifying method="blind" when extimating the Dispersions.

Yes. And depending on the data, there may not always be a variance stabilising transformation. In particular, the error model on which the transformation is based assumes that for most genes the variance is dominated by technical noise and natural biological variation between replicates, and that the effects of true differential expression affect only a minority of genes. If that is not the case, then the whole concept does not really work.

As Mike Love says, the variance stabilsing transformation tends to be misled in cases when the size factors strongly vary between samples, and (at least) in these case the rlog transformation is preferable.

Wolfgang Huber 04-30-2014 11:14 AM

@Him26: Note that in John's plots the y-axis is on a log-scale.
If you do the same kind of plot with sd computed on the original scale of the counts, then you will indeed expect them to increase with the mean.


All times are GMT -8. The time now is 06:03 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.