Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edgeR with no replication (Common disp or poisson)

    Hi,
    It might be a naive question but anyways. I'm using edgeR to analyse RNA-seq data. The idea is just to compare two conditions with no replications. I know that the common dispersion will be set to zero in this case, so I tried to use quantile adjusting as poisson and get exactly the same p-values as using common dispersion with zero.

    So my question is: if the common dispersion is set to 0 how are the p-values calculated? are they calculate exactly the same as if I do a quantile adjusting as poisson?

    Cheers,
    Sergio

  • #2
    Hi Sergio,

    two answers:

    1. I invite you to try out our new tool for differential expression calling, called "DESeq", which supports testing wothout replicates.

    DESeq is quite similar in spirit to edgeR, i.e. it also is a Bioconductor package, takes counts as input and uses a similar test based on the negative binomial distribution. The main difference is that we do not simply estimate one fixed common dispersion constant but rather a whole curve of dispersion values to accommodate that the dispersion depends on the expression strength. This gives a more balanced hit list, avoiding biases associated with the assumption of constant dispersion.

    Have a look at http://www-huber.embl.de/users/anders/DESeq/

    The vignette explains how to work without replicates, and also contains some calculations to show how much power you lose as opposed to using proper replication. If you want to know more about the details of the method, contact me for a preprint of the paper.

    2. The edgeR developers have recently changed edgeR's handling of replicate-free data. In previous versions, you had to switch to Poisson, i.e., to zero dispersion. The p values resulting from this are, of course, way to low.

    However, in the newest version, the edgeR people came up with a solution quite similar to what our DESeq package does in case of no replicates, namely to treat all samples as if they were replicates of a single condition. This gives you an upper limit for the dispersion, as the really differentially expressed genes drive up the variance estimate. If there are not too many of them, you can get away with it. Again, see the discussion in the DESeq vignette for more. (I think, the edgeR vignette does not say yet much about it.)

    Best regards
    Simon

    Comment


    • #3
      Thanks Simon! for sure I will try DESeq, thanks for the help!

      Comment


      • #4
        edgeR with no replication

        Hi Sergio and Simon

        A couple more points to add to the discussion.

        1. In response to your question about how p-values are calculated if the common dispersion is set to zero, Sergio:

        When the dispersion is set to zero, then the negative binomial reduces to the Poisson model. In the Poisson case, the p-values are exact p-values obtained from the appropriate binomial distribution, although in edgeR they are currently computed using the negative binomial with a very, very small value for the dispersion. Quantile adjustment should not make much difference. However, in our experience, RNA-seq datasets in general have more variation than can be accounted for by the Poisson model, especially when there is biological replication. Therefore, using the Poisson model is likely to substantially overestimate the true amount of differential expression, as Simon notes.

        2. Simon writes: "... in the newest version, the edgeR people came up with a solution quite similar to what our DESeq package does in case of no replicates, namely to treat all samples as if they were replicates of a single condition. This gives you an upper limit for the dispersion, as the really differentially expressed genes drive up the variance estimate. If there are not too many of them, you can get away with it."

        This is a reasonable approach, although I'm not sure it is one that we have publicly advocated. It would be a useful approach to get a feel for how much inter-library variation there is in the data. As Simon suggests, doing this would overestimate the dispersion and therefore underestimate differential expression. All of the RNA-seq datasets we have seen so far have very large amounts of DE, so it would be better to take the dispersion estimate as an upper limit rather than an accurate value.

        We would prefer to input a non-zero value for the dispersion based on prior experiments. See the edgeR User's Guide for some examples. We have noted that in experiments where there is no biological difference between the two experimental groups (for example when using cell lines), then the dispersion is quite low, say less than 0.05. In the LnCAP data from Li et al (2008)* we get a common dispersion of 0.02. Where there is biological replication (or simply a real biological difference) between groups, then the common dispersion is much higher, say ~0.2 or so (as found when analysing public data from 't Hoen et al (2008)**)---and possibly even greater than 0.2 for other datasets.

        As such, we expect that choosing an appropriate non-zero value for the dispersion based on the particularities of your experiment will give good results. As we see more datasets it should become clearer what value would be most appropriate to plug in. However, the DE analysis is not going to be upset by small changes in the dispersion.

        Kind regards
        Davis

        *[http://www.ncbi.nlm.nih.gov/entrez/q...&hl=en&num=50]
        **[http://nar.oxfordjournals.org/cgi/co...hort/gkn705v1]

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X