
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
biological replication RNAseq  coutellec  General  0  05052011 06:28 AM 
Quantification and between run replication  Palecomic  454 Pyrosequencing  0  09062010 08:00 AM 
PubMed: Mathematical modelling of eukaryotic DNA replication.  Newsbot!  Literature Watch  0  05092010 08:00 PM 
poisson distribution  tinacai  Bioinformatics  4  04142010 10:05 AM 

Thread Tools 
02152010, 08:21 AM  #1 
Junior Member
Location: Santa Cruz, CA Join Date: Jan 2010
Posts: 4

edgeR with no replication (Common disp or poisson)
Hi,
It might be a naive question but anyways. I'm using edgeR to analyse RNAseq data. The idea is just to compare two conditions with no replications. I know that the common dispersion will be set to zero in this case, so I tried to use quantile adjusting as poisson and get exactly the same pvalues as using common dispersion with zero. So my question is: if the common dispersion is set to 0 how are the pvalues calculated? are they calculate exactly the same as if I do a quantile adjusting as poisson? Cheers, Sergio 
02172010, 12:44 PM  #2 
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 993

Hi Sergio,
two answers: 1. I invite you to try out our new tool for differential expression calling, called "DESeq", which supports testing wothout replicates. DESeq is quite similar in spirit to edgeR, i.e. it also is a Bioconductor package, takes counts as input and uses a similar test based on the negative binomial distribution. The main difference is that we do not simply estimate one fixed common dispersion constant but rather a whole curve of dispersion values to accommodate that the dispersion depends on the expression strength. This gives a more balanced hit list, avoiding biases associated with the assumption of constant dispersion. Have a look at http://wwwhuber.embl.de/users/anders/DESeq/ The vignette explains how to work without replicates, and also contains some calculations to show how much power you lose as opposed to using proper replication. If you want to know more about the details of the method, contact me for a preprint of the paper. 2. The edgeR developers have recently changed edgeR's handling of replicatefree data. In previous versions, you had to switch to Poisson, i.e., to zero dispersion. The p values resulting from this are, of course, way to low. However, in the newest version, the edgeR people came up with a solution quite similar to what our DESeq package does in case of no replicates, namely to treat all samples as if they were replicates of a single condition. This gives you an upper limit for the dispersion, as the really differentially expressed genes drive up the variance estimate. If there are not too many of them, you can get away with it. Again, see the discussion in the DESeq vignette for more. (I think, the edgeR vignette does not say yet much about it.) Best regards Simon 
02172010, 12:50 PM  #3 
Junior Member
Location: Santa Cruz, CA Join Date: Jan 2010
Posts: 4

Thanks Simon! for sure I will try DESeq, thanks for the help!

05242010, 09:53 PM  #4 
Member
Location: Melbourne Join Date: May 2010
Posts: 16

edgeR with no replication
Hi Sergio and Simon
A couple more points to add to the discussion. 1. In response to your question about how pvalues are calculated if the common dispersion is set to zero, Sergio: When the dispersion is set to zero, then the negative binomial reduces to the Poisson model. In the Poisson case, the pvalues are exact pvalues obtained from the appropriate binomial distribution, although in edgeR they are currently computed using the negative binomial with a very, very small value for the dispersion. Quantile adjustment should not make much difference. However, in our experience, RNAseq datasets in general have more variation than can be accounted for by the Poisson model, especially when there is biological replication. Therefore, using the Poisson model is likely to substantially overestimate the true amount of differential expression, as Simon notes. 2. Simon writes: "... in the newest version, the edgeR people came up with a solution quite similar to what our DESeq package does in case of no replicates, namely to treat all samples as if they were replicates of a single condition. This gives you an upper limit for the dispersion, as the really differentially expressed genes drive up the variance estimate. If there are not too many of them, you can get away with it." This is a reasonable approach, although I'm not sure it is one that we have publicly advocated. It would be a useful approach to get a feel for how much interlibrary variation there is in the data. As Simon suggests, doing this would overestimate the dispersion and therefore underestimate differential expression. All of the RNAseq datasets we have seen so far have very large amounts of DE, so it would be better to take the dispersion estimate as an upper limit rather than an accurate value. We would prefer to input a nonzero value for the dispersion based on prior experiments. See the edgeR User's Guide for some examples. We have noted that in experiments where there is no biological difference between the two experimental groups (for example when using cell lines), then the dispersion is quite low, say less than 0.05. In the LnCAP data from Li et al (2008)* we get a common dispersion of 0.02. Where there is biological replication (or simply a real biological difference) between groups, then the common dispersion is much higher, say ~0.2 or so (as found when analysing public data from 't Hoen et al (2008)**)and possibly even greater than 0.2 for other datasets. As such, we expect that choosing an appropriate nonzero value for the dispersion based on the particularities of your experiment will give good results. As we see more datasets it should become clearer what value would be most appropriate to plug in. However, the DE analysis is not going to be upset by small changes in the dispersion. Kind regards Davis *[http://www.ncbi.nlm.nih.gov/entrez/q...&hl=en&num=50] **[http://nar.oxfordjournals.org/cgi/co...hort/gkn705v1] 
Tags 
common dispersion, edger 
Thread Tools  

