Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq statistical analysis without replicate

    Dear all,

    I am applying RNA-seq to analyze small non coding RNAs
    I perform differential analysis and I am currently doing it with edgeR (and will use baySeq and maybe DEseq to reinforce my analyses)
    The fact is I don't have and will not have any replicate for my samples and so I'm relying on your expertise to tell me if any statistical analysis is relevant even without replicate ...

    Thanks for your answers

    Claudia

  • #2
    No it isn't relevant

    Comment


    • #3
      Hi, I'm in the same boat (attempting to compare two libraries without having replicates - they are similar to RNA-seq libraries) and am wondering whether this is going to be possible or not.

      Originally posted by NicoBxl View Post
      No it isn't relevant
      Could you be a bit more specific about your answer here? W.r.t. edgeR, I just read in the abstract of the Applications Note describing the program [Robinson MD, McCarthy DJ and Smythe GK, Bioinformatics, vol. 26(1), 2010] that:

      "An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated."

      So to me, it sounds like you *must* have replicates to use edgeR.

      Might anyone know of a method that does not require any replicates?

      Comment


      • #4
        You've to check the p-value for each of the small RNA after the DE analysis. If you've only on sample for each condition, the p-value will be big ( and therefore the DE analysis irelevant ).

        But you can, even with on sample per condition, applying DESeq ( edgeR I don't know )

        So it's very dangerous to jump to conclusions with no replicates.

        Comment


        • #5
          Analyzing RNA-seq libraries with no replicates

          Perusing the vignette for DESeq (mentioned above) by Simon Anders entitled "Analysing RNA-Seq data with the 'DESeq' package", I found this bit, which i think is quite informative and to the point:

          ###hope it's ok to post this here, the pdf is posted on the web for free...###

          "Proper replicates are essential to interpret a biological experiment.
          After all, if one compares two conditions and find a difference, how else
          would one know that this difference is due to the different conditions and
          would not have arisen between replicates, as well, just due to noise?
          Hence, any attempt to work without any replicates will lead to conclusions
          of very limited reliability. Nevertheless, such experiments are often
          undertaken, especially in HTS, and the DESeq package can deal with them,
          even though the soundness of the results may depend very much on the
          circumstances.

          Our primary assumption is still that the mean is a good predictor for the
          variance. Hence, if a number of genes with similar expression level are
          compared between replicates, we expect that their variation is of
          comparable magnitude. Once we accept this assumption, we may argue as
          follows: Given two samples from different conditions and a number of genes
          with comparable expression levels, of which we expect only a minority to
          be influenced by the condition, we may take the variance estimated from
          comparing their count rates across conditions as ersatz for a proper
          estimate of the variance across replicates. After all, we assume most
          genes to behave the same within replicates as across conditions, and
          hence, the estimated variance should not change too much due to the
          influence of the hopefully few differentially expressed genes.
          Furthermore, the differentially expressed genes will only cause the
          variance estimate to be too high, so that the test will err to the side of
          being too conservative, i.e., we only lose power."

          I'm not sure if there is a way to change the parameters of edgeR analysis to account for not having replicates, but I also haven't looked very hard yet. I am going to try out DESeq and see if it is sufficient for the level of analysis I want out of my libraries for the time being. No reason to blow another Illumina lane if it's not really necessary....

          Comment


          • #6
            Originally posted by NicoBxl View Post
            You've to check the p-value for each of the small RNA after the DE analysis. If you've only on sample for each condition, the p-value will be big ( and therefore the DE analysis irelevant ).

            But you can, even with on sample per condition, applying DESeq ( edgeR I don't know )

            So it's very dangerous to jump to conclusions with no replicates.
            Thanks! That makes a lot of sense now.

            Comment


            • #7
              Kerhard's post make sense! I also read "Analysing RNA-Seq data with the 'DESeq' package". And I have another question: All the Codes using DESeq
              in R is for the Example with Replicates. like
              Then, the minimal set of commands to run a full analysis is:
              > cds <- newCountDataSet( countsTable, conds )
              > cds <- estimateSizeFactors( cds )
              > cds <- estimateVarianceFunctions( cds )
              > res <- nbinomTest( cds, "T", "N")


              So how to change the commonds to fit two conditions without replicates?

              Thanks for the response and further discussions.



              Originally posted by kerhard View Post
              Perusing the vignette for DESeq (mentioned above) by Simon Anders entitled "Analysing RNA-Seq data with the 'DESeq' package", I found this bit, which i think is quite informative and to the point:

              ###hope it's ok to post this here, the pdf is posted on the web for free...###

              "Proper replicates are essential to interpret a biological experiment.
              After all, if one compares two conditions and find a difference, how else
              would one know that this difference is due to the different conditions and
              would not have arisen between replicates, as well, just due to noise?
              Hence, any attempt to work without any replicates will lead to conclusions
              of very limited reliability. Nevertheless, such experiments are often
              undertaken, especially in HTS, and the DESeq package can deal with them,
              even though the soundness of the results may depend very much on the
              circumstances.

              Our primary assumption is still that the mean is a good predictor for the
              variance. Hence, if a number of genes with similar expression level are
              compared between replicates, we expect that their variation is of
              comparable magnitude. Once we accept this assumption, we may argue as
              follows: Given two samples from different conditions and a number of genes
              with comparable expression levels, of which we expect only a minority to
              be influenced by the condition, we may take the variance estimated from
              comparing their count rates across conditions as ersatz for a proper
              estimate of the variance across replicates. After all, we assume most
              genes to behave the same within replicates as across conditions, and
              hence, the estimated variance should not change too much due to the
              influence of the hopefully few differentially expressed genes.
              Furthermore, the differentially expressed genes will only cause the
              variance estimate to be too high, so that the test will err to the side of
              being too conservative, i.e., we only lose power."

              I'm not sure if there is a way to change the parameters of edgeR analysis to account for not having replicates, but I also haven't looked very hard yet. I am going to try out DESeq and see if it is sufficient for the level of analysis I want out of my libraries for the time being. No reason to blow another Illumina lane if it's not really necessary....

              Comment


              • #8
                Hi byou678,

                a few pages later in the DESeq documentation you'll find the answer, the only thing you have to change is the estimateVarianceFunction like this:
                cds <- estimateVarianceFunctions(cds, method="blind")

                Regards,
                Patrick

                Originally posted by byou678 View Post
                Kerhard's post make sense! I also read "Analysing RNA-Seq data with the 'DESeq' package". And I have another question: All the Codes using DESeq
                in R is for the Example with Replicates. like
                Then, the minimal set of commands to run a full analysis is:
                > cds <- newCountDataSet( countsTable, conds )
                > cds <- estimateSizeFactors( cds )
                > cds <- estimateVarianceFunctions( cds )
                > res <- nbinomTest( cds, "T", "N")


                So how to change the commonds to fit two conditions without replicates?

                Thanks for the response and further discussions.

                Comment


                • #9
                  I have the same situation.
                  When I run the minimal set of commands:
                  > cds <- newCountDataSet( countsTable, conds )
                  > cds <- estimateSizeFactors( cds )
                  > cds <- estimateVarianceFunctions( cds, method="blind")
                  > res <- nbinomTest( cds, "A", "B")
                  Error: condA %in% levels(conditions(cds)) is not TRUE
                  what's the problem?

                  Comment


                  • #10
                    Could some expert in the field comment on my finding, that only DESeq and the recently published NOISeq allow for DE testing with no replicates? All other tools I searched (edgeR, DEGSeq, BaySeq, Cufflinks) need at least one of the conditions to be in duplicate. Is this correct?

                    Thank's in advance!

                    Comment


                    • #11
                      I've used cufflinks and edgeR without replicates. They work without problems in such situations.

                      Comment


                      • #12
                        Originally posted by pschwien View Post
                        Could some expert in the field comment on my finding, that only DESeq and the recently published NOISeq allow for DE testing with no replicates? All other tools I searched (edgeR, DEGSeq, BaySeq, Cufflinks) need at least one of the conditions to be in duplicate. Is this correct?

                        Thank's in advance!
                        The issue is not whether you can find some tool that will compute a mathematical solution and spit out a p-value. The real question is does your biological interpretation of those numbers have any real significance or meaning in the absence of replicates. And, the growing consensus seems to be that no, such results are largely valueless as they have no statistical rigor.

                        In other words, you can have little to no confidence that your apparently significant results are, in fact, significant at all and not just the product of random events.

                        I can fully understand those situations where it is truly impossible to replicate. I've handled data from some human studies where tissue or samples were extraordinarily hard to come by, and we were lucky to get enough tissue for a single experiment per individual (we did not use NGS though, we went with Affy arrays so I could at least use probe level data to gain some statistical rigor in DE analysis - used SScore in R).

                        But other then the situation where it truly is not possible, I think replication should be considered an absolute essential condition for an experiment. Especially once you factor in the issue of multiplicity in the huge number of tests being compared in a DE analysis, the absence of replicates makes it impossible to impose any statistically rigorous significance threshold.
                        Last edited by mbblack; 10-24-2011, 09:29 AM.
                        Michael Black, Ph.D.
                        ScitoVation LLC. RTP, N.C.

                        Comment


                        • #13
                          That's a good point. That design would give more reproducible conclusions. Here is a good article I read sometime ago that discusses exactly that.

                          Comment


                          • #14
                            Have you tried GFOLD? This a tool specifically designed for no replicate case.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            66 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X