Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mrfox
    Senior Member
    • Aug 2010
    • 103

    differential gene expression without replicates: edgeR, DESeq?

    Hi all,

    I am a cufflinks user and I am trying to test other popular gene expression analysis tools such as edgeR and DESeq. In most of my projects we only have one Normal and one Tumor sample. Though there has been a lot of discussions, it is still unclear to me if edgeR or DESeq is "better" than cuffdiff when there are no biological replicates.

    Any advice will be appreciated.
  • mbblack
    Senior Member
    • Aug 2009
    • 245

    #2
    In the complete absence of replicates, I don't think any statistical tool is going to be worth a dang for differential gene expression. All you can do is look at simple differences in counts, with no means at all of assessing the significance of those differences. The statistics cannot compensate for a complete lack of adequate data for the analysis in question, and without some minimal number of replicates (3 is really the minimum, 4 or more would be far better), there is no way to assign statistical significance.

    I know the vignettes for tools like edgeR talk about good performance "...even for experiments with minimal levels of biological replication" (quoting from the edgeR manual), but note the use of the word "minimum". A complete absence of replication is not minimum, and in the complete absence of replication, you cannot perform statistical tests of significance for differences.

    And since you have no statistical power at all, comparing different analytical tools seems pointless to me.
    Michael Black, Ph.D.
    ScitoVation LLC. RTP, N.C.

    Comment

    • lexa
      Member
      • Jun 2010
      • 17

      #3
      I have to agree with mbblack. you should try to gain more statistical power by getting at least 3 replicates per treatment. otherwise your comparision is not really meaningful.

      Comment

      • mrfox
        Senior Member
        • Aug 2010
        • 103

        #4
        Many thanks, mbblack and lexa.

        Lacking of replicates is indeed an issue for some of my projects. Unfortunately, these collaborators will not proceed to sequence replicates until they find something interesting in the current data.
        They even wish to have a short, "reliable" list of DE genes or differentially spliced that makes sense, while we are not able to achieve this without replicates. It is really a dilema.

        It is important for biologists to discuss with bioinformaticians before they submit the samples for sequencing.

        Comment

        • mgogol
          Senior Member
          • Mar 2008
          • 197

          #5
          What you could do is run both and show them the resulting gene lists for both and the intersection (venn diagram?)

          Comment

          • lexa
            Member
            • Jun 2010
            • 17

            #6
            that's hard. anyway, you could try to get a 'reliable' gene set using different methods and just take the overlap from different methods. maybe, you should take genes verified by at least 2 different methods. then, do a literature search for the genes you found. maybe, some of the genes you find are already described.

            Comment

            • Tom Bair
              Member
              • Oct 2008
              • 28

              #7
              edgeR does mention a method for dealing with lack of replication by assigning a variance value
              simply pick a reasonable dispersion value, based on your experience with similar data, and use that. Although subjective, this is still more defensible than assuming Poisson variation. Typical values are dispersion=0.4 for human data, dispersion=0.1 for data
              on genetically identical model organisms or dispersion=0.01 for technical replicates.
              More detail in the User Guide, an option anyway, replication is always better.

              Comment

              • mrfox
                Senior Member
                • Aug 2010
                • 103

                #8
                In my mind I tried that a long time ago. I found that the result is sensitive to the selected dispersion coefficient.

                Comment

                • mbblack
                  Senior Member
                  • Aug 2009
                  • 245

                  #9
                  Originally posted by mrfox View Post
                  Many thanks, mbblack and lexa.

                  Lacking of replicates is indeed an issue for some of my projects. Unfortunately, these collaborators will not proceed to sequence replicates until they find something interesting in the current data.
                  They even wish to have a short, "reliable" list of DE genes or differentially spliced that makes sense, while we are not able to achieve this without replicates. It is really a dilema.

                  It is important for biologists to discuss with bioinformaticians before they submit the samples for sequencing.
                  You need to discuss this with them. Without replicates, there is no way to actually give them the answers they seek. "Reliable" list of DE genes? That cannot possible be derived without some statistical significance assigned to the results, and you cannot have any statistically significant results without replicates. At best, all you could give them would be a ranked list of simple differences in gene counts or RPKM for mapped genes, and with no hint of what the variance about those differences there may be.

                  They really need to do a proper pilot study, with 3-5 replicates to see just what they have to work with. Otherwise, all you can tell them is what is different, but with no statistical ranking of significance nor any idea of how variable those differences may be.

                  It is not that you have minimal statistical power without replicates, you have none. All you have is simple numeric differences of some count or normalized values, and nothing more. And you have no idea at all if those differences are real biological differences, or random experimental noise.

                  And there is nothing unique to RNAseq data about that - you cannot compute statistics on a simple difference between two single numbers.
                  Michael Black, Ph.D.
                  ScitoVation LLC. RTP, N.C.

                  Comment

                  • mrfox
                    Senior Member
                    • Aug 2010
                    • 103

                    #10
                    I could not agree more. Inferring a short list of DE genes from an expensive(compared to array data) RNA-Seq sequencing for even one single pair of samples is some collaborators' dream. Some even prefer to spend money on sequencing more cell line types rather than replicates. I find it is hard to persuade them.

                    Without replicates, what we can provide is only the list of DE genes based on statistical models such as poisson but this will never reflect the truth without sufficient replicates.


                    Originally posted by mbblack View Post
                    You need to discuss this with them. Without replicates, there is no way to actually give them the answers they seek. "Reliable" list of DE genes? That cannot possible be derived without some statistical significance assigned to the results, and you cannot have any statistically significant results without replicates. At best, all you could give them would be a ranked list of simple differences in gene counts or RPKM for mapped genes, and with no hint of what the variance about those differences there may be.

                    They really need to do a proper pilot study, with 3-5 replicates to see just what they have to work with. Otherwise, all you can tell them is what is different, but with no statistical ranking of significance nor any idea of how variable those differences may be.

                    It is not that you have minimal statistical power without replicates, you have none. All you have is simple numeric differences of some count or normalized values, and nothing more. And you have no idea at all if those differences are real biological differences, or random experimental noise.

                    And there is nothing unique to RNAseq data about that - you cannot compute statistics on a simple difference between two single numbers.

                    Comment

                    • chrisbala
                      Member
                      • Jan 2010
                      • 82

                      #11
                      edgeR without replicates

                      Knowing that it is unwise to do experiments without replication, I find myself in exactly that situation. (pooled samples).

                      I've analysed these data with older versions of DE-Seq, but now would also like to try edgeR. I can't seem to decipher exactly how one does this analysis without replicates based on the vignette. Anyone able to help me out/share a script?

                      It's pretty clear that both DEseq and edgeR camps are now strongly discouraging such efforts (does DEseq2 even stil incorporate such analyses?), but still need to give it a go in this case.

                      Thanks!

                      Comment

                      • mbblack
                        Senior Member
                        • Aug 2009
                        • 245

                        #12
                        To be honest, my opinion is that the first option mentioned in the edgeR vignette is really the only valid approach to follow in that situation. To quote from page 18:

                        "1. Be satised with a descriptive analysis, that might include an MDS plot and an analysis
                        of fold changes. Do not attempt a signicance analysis. This may be the best advice."

                        In other words, make your argument for significantly differentially expressed genes based solely on the magnitude of measured differences between samples and accept that you cannot perform any reliable or valid statistical significance testing. I just think it is pointless to spend a lot of time running algorithms or code on a data set that fundamentally cannot be analyzed statistically.

                        Basically, what is the point of the effort if the stats are meaningless or open to vigorous negative criticism?
                        Michael Black, Ph.D.
                        ScitoVation LLC. RTP, N.C.

                        Comment

                        • chrisbala
                          Member
                          • Jan 2010
                          • 82

                          #13
                          thanks, option 1 is basically what we are doing. but also trying to scrutinize the data in as many ways as possible. we pooled 10 individuals per library, and our results seem not hopeless in that we can see some of the things we know we should see, and these do hold up to DESeq stats ("working without replicates"). but its the novel stuff that is more problematic. We'll be finding out via qPCR and in situs, I suppose, how well these stats hold up. But yes, not so optimistic. should also say that we are have 3 groups, not 2 so we at least have a bit more information on variability.
                          Last edited by chrisbala; 05-23-2013, 07:23 AM.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Pathogen Surveillance with Advanced Genomic Tools
                            by seqadmin




                            The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                            03-24-2025, 11:48 AM
                          • seqadmin
                            New Genomics Tools and Methods Shared at AGBT 2025
                            by seqadmin


                            This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                            The Headliner
                            The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                            03-03-2025, 01:39 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-20-2025, 05:03 AM
                          0 responses
                          49 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-19-2025, 07:27 AM
                          0 responses
                          57 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-18-2025, 12:50 PM
                          0 responses
                          50 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-03-2025, 01:15 PM
                          0 responses
                          200 views
                          0 reactions
                          Last Post seqadmin  
                          Working...