Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • differential gene expression without replicates: edgeR, DESeq?

    Hi all,

    I am a cufflinks user and I am trying to test other popular gene expression analysis tools such as edgeR and DESeq. In most of my projects we only have one Normal and one Tumor sample. Though there has been a lot of discussions, it is still unclear to me if edgeR or DESeq is "better" than cuffdiff when there are no biological replicates.

    Any advice will be appreciated.

  • #2
    In the complete absence of replicates, I don't think any statistical tool is going to be worth a dang for differential gene expression. All you can do is look at simple differences in counts, with no means at all of assessing the significance of those differences. The statistics cannot compensate for a complete lack of adequate data for the analysis in question, and without some minimal number of replicates (3 is really the minimum, 4 or more would be far better), there is no way to assign statistical significance.

    I know the vignettes for tools like edgeR talk about good performance "...even for experiments with minimal levels of biological replication" (quoting from the edgeR manual), but note the use of the word "minimum". A complete absence of replication is not minimum, and in the complete absence of replication, you cannot perform statistical tests of significance for differences.

    And since you have no statistical power at all, comparing different analytical tools seems pointless to me.
    Michael Black, Ph.D.
    ScitoVation LLC. RTP, N.C.

    Comment


    • #3
      I have to agree with mbblack. you should try to gain more statistical power by getting at least 3 replicates per treatment. otherwise your comparision is not really meaningful.

      Comment


      • #4
        Many thanks, mbblack and lexa.

        Lacking of replicates is indeed an issue for some of my projects. Unfortunately, these collaborators will not proceed to sequence replicates until they find something interesting in the current data.
        They even wish to have a short, "reliable" list of DE genes or differentially spliced that makes sense, while we are not able to achieve this without replicates. It is really a dilema.

        It is important for biologists to discuss with bioinformaticians before they submit the samples for sequencing.

        Comment


        • #5
          What you could do is run both and show them the resulting gene lists for both and the intersection (venn diagram?)

          Comment


          • #6
            that's hard. anyway, you could try to get a 'reliable' gene set using different methods and just take the overlap from different methods. maybe, you should take genes verified by at least 2 different methods. then, do a literature search for the genes you found. maybe, some of the genes you find are already described.

            Comment


            • #7
              edgeR does mention a method for dealing with lack of replication by assigning a variance value
              simply pick a reasonable dispersion value, based on your experience with similar data, and use that. Although subjective, this is still more defensible than assuming Poisson variation. Typical values are dispersion=0.4 for human data, dispersion=0.1 for data
              on genetically identical model organisms or dispersion=0.01 for technical replicates.
              More detail in the User Guide, an option anyway, replication is always better.

              Comment


              • #8
                In my mind I tried that a long time ago. I found that the result is sensitive to the selected dispersion coefficient.

                Comment


                • #9
                  Originally posted by mrfox View Post
                  Many thanks, mbblack and lexa.

                  Lacking of replicates is indeed an issue for some of my projects. Unfortunately, these collaborators will not proceed to sequence replicates until they find something interesting in the current data.
                  They even wish to have a short, "reliable" list of DE genes or differentially spliced that makes sense, while we are not able to achieve this without replicates. It is really a dilema.

                  It is important for biologists to discuss with bioinformaticians before they submit the samples for sequencing.
                  You need to discuss this with them. Without replicates, there is no way to actually give them the answers they seek. "Reliable" list of DE genes? That cannot possible be derived without some statistical significance assigned to the results, and you cannot have any statistically significant results without replicates. At best, all you could give them would be a ranked list of simple differences in gene counts or RPKM for mapped genes, and with no hint of what the variance about those differences there may be.

                  They really need to do a proper pilot study, with 3-5 replicates to see just what they have to work with. Otherwise, all you can tell them is what is different, but with no statistical ranking of significance nor any idea of how variable those differences may be.

                  It is not that you have minimal statistical power without replicates, you have none. All you have is simple numeric differences of some count or normalized values, and nothing more. And you have no idea at all if those differences are real biological differences, or random experimental noise.

                  And there is nothing unique to RNAseq data about that - you cannot compute statistics on a simple difference between two single numbers.
                  Michael Black, Ph.D.
                  ScitoVation LLC. RTP, N.C.

                  Comment


                  • #10
                    I could not agree more. Inferring a short list of DE genes from an expensive(compared to array data) RNA-Seq sequencing for even one single pair of samples is some collaborators' dream. Some even prefer to spend money on sequencing more cell line types rather than replicates. I find it is hard to persuade them.

                    Without replicates, what we can provide is only the list of DE genes based on statistical models such as poisson but this will never reflect the truth without sufficient replicates.


                    Originally posted by mbblack View Post
                    You need to discuss this with them. Without replicates, there is no way to actually give them the answers they seek. "Reliable" list of DE genes? That cannot possible be derived without some statistical significance assigned to the results, and you cannot have any statistically significant results without replicates. At best, all you could give them would be a ranked list of simple differences in gene counts or RPKM for mapped genes, and with no hint of what the variance about those differences there may be.

                    They really need to do a proper pilot study, with 3-5 replicates to see just what they have to work with. Otherwise, all you can tell them is what is different, but with no statistical ranking of significance nor any idea of how variable those differences may be.

                    It is not that you have minimal statistical power without replicates, you have none. All you have is simple numeric differences of some count or normalized values, and nothing more. And you have no idea at all if those differences are real biological differences, or random experimental noise.

                    And there is nothing unique to RNAseq data about that - you cannot compute statistics on a simple difference between two single numbers.

                    Comment


                    • #11
                      edgeR without replicates

                      Knowing that it is unwise to do experiments without replication, I find myself in exactly that situation. (pooled samples).

                      I've analysed these data with older versions of DE-Seq, but now would also like to try edgeR. I can't seem to decipher exactly how one does this analysis without replicates based on the vignette. Anyone able to help me out/share a script?

                      It's pretty clear that both DEseq and edgeR camps are now strongly discouraging such efforts (does DEseq2 even stil incorporate such analyses?), but still need to give it a go in this case.

                      Thanks!

                      Comment


                      • #12
                        To be honest, my opinion is that the first option mentioned in the edgeR vignette is really the only valid approach to follow in that situation. To quote from page 18:

                        "1. Be satised with a descriptive analysis, that might include an MDS plot and an analysis
                        of fold changes. Do not attempt a signicance analysis. This may be the best advice."

                        In other words, make your argument for significantly differentially expressed genes based solely on the magnitude of measured differences between samples and accept that you cannot perform any reliable or valid statistical significance testing. I just think it is pointless to spend a lot of time running algorithms or code on a data set that fundamentally cannot be analyzed statistically.

                        Basically, what is the point of the effort if the stats are meaningless or open to vigorous negative criticism?
                        Michael Black, Ph.D.
                        ScitoVation LLC. RTP, N.C.

                        Comment


                        • #13
                          thanks, option 1 is basically what we are doing. but also trying to scrutinize the data in as many ways as possible. we pooled 10 individuals per library, and our results seem not hopeless in that we can see some of the things we know we should see, and these do hold up to DESeq stats ("working without replicates"). but its the novel stuff that is more problematic. We'll be finding out via qPCR and in situs, I suppose, how well these stats hold up. But yes, not so optimistic. should also say that we are have 3 groups, not 2 so we at least have a bit more information on variability.
                          Last edited by chrisbala; 05-23-2013, 07:23 AM.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          46 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X