DESeq Variance Stabilizing Transformation
Hello,
I am looking for some feedback regarding the use of the variancestabilization (VST) methods found in the DESeq2 package. Hopefully one of the authors will respond and the comments will be of help to others. For me, the purpose for applying this transformation is to be able to generate moderated fold changes for clustering of genes (not samples as in the vignette). My data consists of a time series, where for each time point there is a "treated" sample and a "control" sample. Each sample (timepoint) consists of 4 biological replicates. I performed the VST on the entire set of data and plot the pergene standard deviation against the rank of the mean*, for the shifted logarithm log2 (n + 1) (left) and the variance stabilizing transformation (right), it does not appear to have a pronounced effect. http://i1287.photobucket.com/albums/...ps775bca63.png However, if i set up a count dataset that consists of the samples corresponding to one timepoint only (first timepoint in the example below), and perform the VST and plot the standard deviation against rank of the mean, the transformed values have a much better stabilized standard deviation. http://i1287.photobucket.com/albums/...psbaf85e24.png So my questions are: Is there anyway to obtain better variance stabilized data when considering the entire timeseries? Should I just perform the VST on a per timepoint basis; after all I will only be computing fold changes between treatment and control samples at the same timepoint. *The procedure was performed as per the DESeq2 manual: dds < estimateSizeFactors(dds) dds < estimateDispersions(dds) vsd < varianceStabilizingTransformation(dds) par(mfrow=c(1,2)) plot(rank(rowMeans(counts(dds))), genefilter::rowVars(log2(counts(dds)+1)), main="log2(x+1) transform") plot(rank(rowMeans(assay(vsd))), genefilter::rowVars(assay(vsd)), main="VST") 
As far as I know, you have to tell DESEQ to treat all expression values as if they were emerging from a single condition by specifying method="blind" when extimating the Dispersions.

I have a slightly unrelated question. It's about the plot.
Why is the variance low for low mean ? shouldn't it start high and decrease as the mean increase? I have a similar data set and even if I filter requiring higher cpm the trend still persists. Any one know of why this is the case? 
DESeq2 variance
1 Attachment(s)
I guess it all depends on the type of data. For my NGS bacterial 16sRNA data, SD increase as the mean increases.

hi John,
The VST helps to stabilize the variance over the mean, insofar as this can be captured by the parametric curve of dispersion over mean. You might also try the rlog transformation, which sometimes performs qualitatively better than the VST (for example, if the size factors vary a lot across samples). 
Hi guys,
Is the VST package of DESeq still functional? Because most of the functions of VST including getVarianceStabilizedData() seem to be dysfunctional in R version 3.0.1. Please help. 
hi Ayana,
Can you post the code which you think is not working. Please include full code, R output and sessionInfo() The VST and rlog are both implemented in DESeq2, which we suggest you use over DESeq. 
Quote:
As Mike Love says, the variance stabilsing transformation tends to be misled in cases when the size factors strongly vary between samples, and (at least) in these case the rlog transformation is preferable. 
@Him26: Note that in John's plots the yaxis is on a logscale.
If you do the same kind of plot with sd computed on the original scale of the counts, then you will indeed expect them to increase with the mean. 
All times are GMT 8. The time now is 06:03 PM. 
Powered by vBulletin® Version 3.8.9
Copyright ©2000  2019, vBulletin Solutions, Inc.