SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
unencumbered haplotype estimation Richard Finney Bioinformatics 4 06-13-2017 09:55 AM
Genome size estimation moinul De novo discovery 9 04-04-2014 03:22 AM
estimation of metagenomic coverage? raw937 Bioinformatics 0 11-27-2011 12:10 AM
An estimation of coverage luoruicd Bioinformatics 0 10-08-2010 02:45 PM
Estimation of expression levels kenosaki RNA Sequencing 4 08-04-2010 01:06 AM

Reply
 
Thread Tools
Old 07-05-2011, 02:00 PM   #1
KellerMac
Member
 
Location: Baton Rouge, Louisiana

Join Date: Jun 2011
Posts: 11
Default Variance Estimation

I have been using DESeq to analyze gene expression from SAGE samples. To decide how to compare samples we have been using ECDF (empirical cumulative data function) plots to determine the quality of samples. I was wondering If I could transform this data into a quantitative number by taking the integral of the ECD function. I havn't yet discovered a way to do this in DESeq, is there a better program to analyze with?
KellerMac is offline   Reply With Quote
Old 07-06-2011, 01:03 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

A few weeks ago, we have completely rewritten the DESeq vignette (manual). One of our changes was to remove everything about this ECDF plot of the variance residuals, as people kept misunderstanding its purpose (which was maybe never that clear anyway.) It is not to check quality of samples.

The point of the variance residual ECDF plots was to check whether the assumption holds well that genes of similar expression strength have similar variance, because the old DESeq version did not deal well with "variance outliers", i.e., genes with variance much stronger than similar genes. See the new vignette to learn how we now simply take the maximum of fitted value and per-gene estimate to avoid making an error here.

To judge the reproducibility of a protocol, i.e., the similarity of replicate samples, I now
recommend the following two possibilities:

(i) use the new 'estimateDispersions' function that now, by default, no longer does a local fit but a parametric fit, fitting a curve alpha = alpha_0 + alpha_1/mu on the dispersion alpha, or equivalently, a curve v = ( 1 + alpha_1 ) * mu + alpha_0 * mu^2 on the variance v. The value alpha_0 is a good measure of the overall (intensity-independent) variation between replicates, the value alpha_1 is a measure of the additional variance for weak genes. See vignette for details.

(ii) use the variance-stabilizing transformation to make a sample-clustering heatmap, as described in the vignette, to see whether your replicates are more similar than samples from different treatment groups.

Note that the new DESeq is available in the devel branch, not yet in the release branch, of Bioconductor
Simon Anders is offline   Reply With Quote
Old 07-06-2011, 04:57 AM   #3
labunit
Member
 
Location: Giessen, Germany

Join Date: Sep 2010
Posts: 10
Default

Hello Simon,
the "Package Downloads" links on the Bioconductor homepage (http://www.bioconductor.org/packages...tml/DESeq.html) are wrong. They still link to version 1.5.18 but should link to 1.5.19. Don't know wether you have any control over that.

Best,
Mark Onyango
labunit is offline   Reply With Quote
Old 07-06-2011, 09:28 AM   #4
KellerMac
Member
 
Location: Baton Rouge, Louisiana

Join Date: Jun 2011
Posts: 11
Default

Do I need to delete the older version of DESeq? If so where do you think it would be?
KellerMac is offline   Reply With Quote
Old 07-07-2011, 12:41 AM   #5
labunit
Member
 
Location: Giessen, Germany

Join Date: Sep 2010
Posts: 10
Default

Hello Simon,
could you please elaborate on why you switched from the local fit to a parametric fit as a default setting? I always found your idea for a more data-driven fit very sound.

@KellerMac:
It depends on what operating system you are using. If you use Windows you can safely install the development version parallel to the release version as it will also create a new library folder. So the two do not interfere.
If you are using Linux (e.g. Ubuntu) you simply download the development sources of R into a folder of your choosing and compile it there. It won't be installed system-wide and can be started from that folder. All packages downloaded will be kept in that folder as well.
So all in all there is no need to delete the current version of DESeq from you PC.
labunit is offline   Reply With Quote
Old 07-12-2011, 09:19 AM   #6
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default Error: could not find function "estimateDispersion"

I'm getting:

Error: could not find function "estimateDispersion"

What have I done wrong?

I'm running R in OSX. I've had no trouble using DEseq before, just this new function.

As far as I can tell, my DEseq is up to date
chrisbala is offline   Reply With Quote
Old 07-12-2011, 10:45 AM   #7
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default

oops, I think I am just struggling with how to update DEseq. I am still at DESeq 1.4 and the "update" window is not doing anything...
chrisbala is offline   Reply With Quote
Old 07-12-2011, 11:44 AM   #8
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default

Ok, last one, it seems something is wrong with the files linked in bioconductor:
http://bioconductor.org/packages/dev...tml/DESeq.html

Am I wrong?
chrisbala is offline   Reply With Quote
Old 07-12-2011, 12:35 PM   #9
labunit
Member
 
Location: Giessen, Germany

Join Date: Sep 2010
Posts: 10
Default

You also need to use the development version of R (2.14) to be able to install the latest DESeq.
labunit is offline   Reply With Quote
Old 07-12-2011, 01:40 PM   #10
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default devel version

found the relevant thread about needing to install the development version of R as well... done.. things working for noW!
chrisbala is offline   Reply With Quote
Old 12-10-2011, 07:19 AM   #11
labunit
Member
 
Location: Giessen, Germany

Join Date: Sep 2010
Posts: 10
Question

I am sorry to awaken this thread but I seem to have a problem with the latest Relase-Version of DESeq (1.6.1):

Whenever I try to execute the estimateDispersions function I receive the following error:

Parametric dispersion fit failed. Try a local fit and/or a pooled estimation. (See '?estimateDispersions')

Now this can only happen if the coefficients during the fitting process become negative (or at least some of them). Using the local fit kind of cures this but I still see some negative dispersion coefficients. My question therefor is: How can the coefficients become negative during fitting and how do I properly handle or interpret these?
labunit is offline   Reply With Quote
Old 12-10-2011, 12:07 PM   #12
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

The problem with the fit has little to do with the negative values, because DESeq "lifts" all negative dispersion values to something slightly above zero. Rather, our new parametric fit routine still has some weaknesses that we are not yet fully sure how to straighten out. This is why the package recommends reverting to the old method if the new one fails. In practice, the difference between the two methods turned out to be not that large, anyway.

To nevertheless explain the negative values: A random variable that is distributed according to a negative binomial with mean µ and dispersion a has variance v = µ + a µ˛. DESeq estimates a from the data with a method-of-moments estimator, i.e., it estimates µ and v and then calculated a = (v - µ ) / µ˛. (I'm skipping here over a few subtleties, explained in the supplement to our paper.) Especially for low µ, it may happen that the estimate for v is larger than that for µ, and the, the estimate for the dispersion a becomes negative. On the one hand, we know that a should be positive, and hence, we need to replace all negative values with small positive ones before the test. However, I prefer to do this only after the fit, as it introduces a positive bias.
Simon Anders is offline   Reply With Quote
Old 02-13-2013, 02:05 PM   #13
kasutubh
Member
 
Location: US

Join Date: Mar 2010
Posts: 25
Default

Hi..
I'm running DESeq (2.11) on R (2.25.2) on windows platform. I'm getting same error as chrisbala
Error: could not find function "estimateDispersion"
Do I need to update anything or what am I doing wrong here?
Thanks!
kasutubh is offline   Reply With Quote
Old 02-18-2013, 01:07 AM   #14
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Please install current versions of R and Bioconductor and try again.
Simon Anders is offline   Reply With Quote
Reply

Tags
deseq, ecdf, variance

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO