SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
biological replication RNAseq coutellec General 0 05-05-2011 06:28 AM
Quantification and between run replication Palecomic 454 Pyrosequencing 0 09-06-2010 08:00 AM
PubMed: Mathematical modelling of eukaryotic DNA replication. Newsbot! Literature Watch 0 05-09-2010 08:00 PM
poisson distribution tinacai Bioinformatics 4 04-14-2010 10:05 AM

Reply
 
Thread Tools
Old 02-15-2010, 08:21 AM   #1
sergio
Junior Member
 
Location: Santa Cruz, CA

Join Date: Jan 2010
Posts: 4
Default edgeR with no replication (Common disp or poisson)

Hi,
It might be a naive question but anyways. I'm using edgeR to analyse RNA-seq data. The idea is just to compare two conditions with no replications. I know that the common dispersion will be set to zero in this case, so I tried to use quantile adjusting as poisson and get exactly the same p-values as using common dispersion with zero.

So my question is: if the common dispersion is set to 0 how are the p-values calculated? are they calculate exactly the same as if I do a quantile adjusting as poisson?

Cheers,
Sergio
sergio is offline   Reply With Quote
Old 02-17-2010, 12:44 PM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 993
Default

Hi Sergio,

two answers:

1. I invite you to try out our new tool for differential expression calling, called "DESeq", which supports testing wothout replicates.

DESeq is quite similar in spirit to edgeR, i.e. it also is a Bioconductor package, takes counts as input and uses a similar test based on the negative binomial distribution. The main difference is that we do not simply estimate one fixed common dispersion constant but rather a whole curve of dispersion values to accommodate that the dispersion depends on the expression strength. This gives a more balanced hit list, avoiding biases associated with the assumption of constant dispersion.

Have a look at http://www-huber.embl.de/users/anders/DESeq/

The vignette explains how to work without replicates, and also contains some calculations to show how much power you lose as opposed to using proper replication. If you want to know more about the details of the method, contact me for a preprint of the paper.

2. The edgeR developers have recently changed edgeR's handling of replicate-free data. In previous versions, you had to switch to Poisson, i.e., to zero dispersion. The p values resulting from this are, of course, way to low.

However, in the newest version, the edgeR people came up with a solution quite similar to what our DESeq package does in case of no replicates, namely to treat all samples as if they were replicates of a single condition. This gives you an upper limit for the dispersion, as the really differentially expressed genes drive up the variance estimate. If there are not too many of them, you can get away with it. Again, see the discussion in the DESeq vignette for more. (I think, the edgeR vignette does not say yet much about it.)

Best regards
Simon
Simon Anders is offline   Reply With Quote
Old 02-17-2010, 12:50 PM   #3
sergio
Junior Member
 
Location: Santa Cruz, CA

Join Date: Jan 2010
Posts: 4
Default

Thanks Simon! for sure I will try DESeq, thanks for the help!
sergio is offline   Reply With Quote
Old 05-24-2010, 09:53 PM   #4
Davis McC
Member
 
Location: Melbourne

Join Date: May 2010
Posts: 16
Default edgeR with no replication

Hi Sergio and Simon

A couple more points to add to the discussion.

1. In response to your question about how p-values are calculated if the common dispersion is set to zero, Sergio:

When the dispersion is set to zero, then the negative binomial reduces to the Poisson model. In the Poisson case, the p-values are exact p-values obtained from the appropriate binomial distribution, although in edgeR they are currently computed using the negative binomial with a very, very small value for the dispersion. Quantile adjustment should not make much difference. However, in our experience, RNA-seq datasets in general have more variation than can be accounted for by the Poisson model, especially when there is biological replication. Therefore, using the Poisson model is likely to substantially overestimate the true amount of differential expression, as Simon notes.

2. Simon writes: "... in the newest version, the edgeR people came up with a solution quite similar to what our DESeq package does in case of no replicates, namely to treat all samples as if they were replicates of a single condition. This gives you an upper limit for the dispersion, as the really differentially expressed genes drive up the variance estimate. If there are not too many of them, you can get away with it."

This is a reasonable approach, although I'm not sure it is one that we have publicly advocated. It would be a useful approach to get a feel for how much inter-library variation there is in the data. As Simon suggests, doing this would overestimate the dispersion and therefore underestimate differential expression. All of the RNA-seq datasets we have seen so far have very large amounts of DE, so it would be better to take the dispersion estimate as an upper limit rather than an accurate value.

We would prefer to input a non-zero value for the dispersion based on prior experiments. See the edgeR User's Guide for some examples. We have noted that in experiments where there is no biological difference between the two experimental groups (for example when using cell lines), then the dispersion is quite low, say less than 0.05. In the LnCAP data from Li et al (2008)* we get a common dispersion of 0.02. Where there is biological replication (or simply a real biological difference) between groups, then the common dispersion is much higher, say ~0.2 or so (as found when analysing public data from 't Hoen et al (2008)**)---and possibly even greater than 0.2 for other datasets.

As such, we expect that choosing an appropriate non-zero value for the dispersion based on the particularities of your experiment will give good results. As we see more datasets it should become clearer what value would be most appropriate to plug in. However, the DE analysis is not going to be upset by small changes in the dispersion.

Kind regards
Davis

*[http://www.ncbi.nlm.nih.gov/entrez/q...&hl=en&num=50]
**[http://nar.oxfordjournals.org/cgi/co...hort/gkn705v1]
Davis McC is offline   Reply With Quote
Reply

Tags
common dispersion, edger

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO