Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq bio-replication with low correlation

    Hi!
    I sequeced two biological replicates for one condition with Hiseq 2000 platfom.
    but the typical R(Pearson) correlation of gene expression(Raw Count) between two biological replicates is only 0.93, ie the R2 only about 0.87.
    Can I use these 2 samples to do the differencial analysis?
    Any suggestion for how to use this to call DE genes?
    Or some recommend readings are also very helpful.

    Thanks all.

  • #2
    Raw counts don't follow a linear distribution. Use Spearman, not Pearson. And discard any genes with 0 counts. Actually, I'd probably discard genes with < 10.

    Secondly, is this human data, or in other words are your biological replicates sampled from different individuals with a heterogeneous genetic background?

    Comment


    • #3
      Originally posted by mikep View Post
      Raw counts don't follow a linear distribution. Use Spearman, not Pearson. And discard any genes with 0 counts. Actually, I'd probably discard genes with < 10.

      Secondly, is this human data, or in other words are your biological replicates sampled from different individuals with a heterogeneous genetic background?
      thank you for your advice,
      I removed all genes < 10 and calculated the spearman correlation, but it still only about 0.93.
      and I calculate spearson crrelation with genes < 1RPKM ,it didn't change.
      now I wonder if I can use this to call DE genes. and how much it affect the result.
      or if I use the data ,how can I make the least differrence.

      Comment


      • #4
        The Pearson and Spearman correlation coefficients are not well suited to RNA-seq count data. Indeed, we want to know if expression values are the same between two samples (linearity => Pearson coefficient), not just whether they have an increasing or decreasing trend (Spearman coefficient). But, Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count).

        I think it is difficult, from these coefficients, to determine if the samples are good replicates or not.

        I advise you to read this publication and to use the SERE coefficient, which is well suited to the comparison of RNA-seq samples:

        SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.


        A score of 1 indicating faithful replication. And more the score is high, more the samples are different. I use this coefficient to explore my data.
        Last edited by velt; 08-06-2014, 12:08 AM.

        Comment


        • #5
          You didn't mention your sample source. If it is different people then 0.93 might be as good as it gets. I get around 0.95 on my data.

          Another option (for future use) is to use a spikein like ERCC, you can then correlate counts independent of biological variabilty.

          As for DE, my advice is suck it and see.

          Finally Velt, nice call. Assimilating SERE into our pipeline in 3...2...1...

          Comment


          • #6
            Originally posted by velt View Post
            The Pearson and Spearman correlation coefficients are not well suited to RNA-seq count data. Indeed, we want to know if expression values are the same between two samples (linearity => Pearson coefficient), not just whether they have an increasing or decreasing trend (Spearman coefficient). But, Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count).

            I think it is difficult, from these coefficients, to determine if the samples are good replicates or not.

            I advise you to read this publication and to use the SERE coefficient, which is well suited to the comparison of RNA-seq samples:

            SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.


            A score of 1 indicating faithful replication. And more the score is high, more the samples are different. I use this coefficient to explore my data.
            hi velt,
            thank you very much.
            I have tried it with my data. the SERE score is 5.8.
            and another pair replication is about 3.3
            is this too high? or any sugguestions ?
            by the way, I think this standard is really strict.

            Comment


            • #7
              Well, your single greatest source of variation when it comes to differential expression is biological variation amongst individuals in your population. So if these were two different individuals, then your observed correlations might not be far off, at least when looking only at raw read counts.

              Also, did you have equal or near equal read depth for each sample? If you had large differences in read depth across the two samples, then raw counts will also vary a great deal because of that.

              Honestly, I would not worry about such differences in raw counts between biological replicates. That sort of variability is the very reason you use biological replication, so you can compute a robust mean population response. Individuals will inherently vary, often a great deal, in raw expression estimates.

              How do your normalized read counts compare for these two samples? That is by far a more meaningful comparison than raw counts. Also, basing a comparison on an N of just 2 can be very misleading, as you have no idea how those two biological samples fall out in terms of the range of variation in expression for your population.
              Last edited by mbblack; 08-06-2014, 04:09 AM.
              Michael Black, Ph.D.
              ScitoVation LLC. RTP, N.C.

              Comment


              • #8
                SERE over log transform

                My understanding is that log(poisson) [log(counts) in this case] will approximate a normal distribution thereby achieving linearity. Is there a benefit to using SERE over using the pearson correlation of log transformed counts?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                69 views
                0 likes
                Last Post seqadmin  
                Working...
                X