Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • danwiththeplan
    Member
    • Sep 2011
    • 72

    Statistical treatment of biological reps from pooled samples

    Hi, I have a theoretical question. I know that it's important to do biological reps when doing differential expression RNA-seq experiments (3 seems to be a minimum). I very much see the point when it's from 3 different frogs/trees/chickens/humans. However I'm encountering situations where people are doing 3 biological reps which are derived from pooled samples (e.g. a pool of lots of individuals, where for practical reasons you can't get enough RNA from one individual).

    In this case, what would be the advantage of doing biological reps (for example, doing 3 biological reps, with 3 different library preps/multiplex tags) and what is the difference between this situation and the situation where you just do 3x as much sequencing on one sample, and later randomly split the reads (bioinformatically) into 3 pools?

    To me it's a similar situation:

    Situation 1: you have a pool of samples, split it into 3, extract RNA and do some sequencing, as opposed to..

    Situation 2: you have a pool of samples, extract RNA and do some sequencing, and split it into 3.

    What's the difference biologically/statistically? Does it really make sense to have 3x the library-prep/multiplexing costs for situation 1?

    The advantage for the latter situation would be a (considerable) saving in library prep / multiplexing costs (I'm talking about Illumina PE sequencing).

    Thoughts anyone?
  • chadn737
    Senior Member
    • Jan 2009
    • 392

    #2
    If you pool all samples, and split the reads randomly after sequencing then your variation is not due to biological differences, but do to your random sampling. Its going to behave differently.

    Pooling samples into biological reps so that you can get enough RNA has the effect of reducing the biological variance by the size of the pool, however, since you still have independent biological reps, you can still determine the biological variance.

    Comment

    • danwiththeplan
      Member
      • Sep 2011
      • 72

      #3
      Originally posted by chadn737 View Post
      If you pool all samples, and split the reads randomly after sequencing then your variation is not due to biological differences, but do to your random sampling. Its going to behave differently.

      Pooling samples into biological reps so that you can get enough RNA has the effect of reducing the biological variance by the size of the pool, however, since you still have independent biological reps, you can still determine the biological variance.
      Thanks for your answer. I understand the difference between variation caused by random sampling (bioinformatically) and biological variation, but I'm still not sure I see that it would make much difference in the situation where your biological reps are basically random samples from a bigger pool (for example, bacteria from a fermenter), assuming that your sampling method is random (i.e. you've shaken the fermenter appropriately and you're not sampling from a different physical area). I am guessing that the more individual biological units (i.e. bacteria, zebrafish, whatever) in your pool, the less difference it makes.

      Of course you could simply (in the example above) grow 3 different pools, but in this case, are you looking at biological variation or variation caused by lab equipment?
      Last edited by danwiththeplan; 09-30-2013, 08:01 PM.

      Comment

      • Jeremy
        Senior Member
        • Nov 2009
        • 190

        #4
        The situations you are describing are technical replicates not biological replicates. Essentially you have no biological replicates in those examples. To be biological replicates they would need to be pools of different individuals
        e.g. pool 1 = sample A, B and C
        pool 2 = sample D, E and F etc.

        A biological replicate allows you to determine the variance in gene expression between different samples. The pooled approach is better than just a single control vs a single treatment, but you still have no idea what the variance is.

        Comment

        • danwiththeplan
          Member
          • Sep 2011
          • 72

          #5
          Originally posted by Jeremy View Post
          The situations you are describing are technical replicates not biological replicates. Essentially you have no biological replicates in those examples. To be biological replicates they would need to be pools of different individuals
          e.g. pool 1 = sample A, B and C
          pool 2 = sample D, E and F etc.

          A biological replicate allows you to determine the variance in gene expression between different samples. The pooled approach is better than just a single control vs a single treatment, but you still have no idea what the variance is.
          Yes I've had some internal discussions about this and I agree, it's a case of me using a not-very-correct definition of what a biological replicate is. So, I agree that both the situations I describe above are really technical reps since they are derived from the same pool.

          However, in the case where 2 different pooled samples are derived from different individuals, as per your example above, then for me, if the pools are derived from a large number of individuals (even if they are different individuals between pools), any biological variation will be washed out by the number of individual biological units- and the more units, the more washed out.

          So, theoretically speaking, if you have two pools of 10,000 bacteria each, then according to the strict definition, they are biological reps because each pool is derived from different individuals, but practically speaking, any variance you see is much more likely to be "technical" , i.e. derived from your bacteria being grown closer to the window/next to the coffee machine/at a different time/in a different tube.

          Contrast this with the situation where you have two pools of 3 zebrafish embryos. Again both pools are "biological reps" but any variance is much, much more likely to be derived from true biological variation because of the small number of individuals in the pool. I guess that's what I was trying to say.

          Comment

          • Jeremy
            Senior Member
            • Nov 2009
            • 190

            #6
            Single cell organisms are a tricky case, each cell is capable of determining its own expression levels but then they often live in communities that 'talk' to each other and so have some degree of co-regulation. So are a few separate flasks of the same bacterial strain to be considered technical or biological replicates or a pool of samples? I would say it incoporates a part of each of those depending on the species in question. How does that affect the statistics? I don't know.

            Zebra fish for example, or something small, let's say lice (where you may need to pool to get enough RNA), if you hade 3 pools of control each consisting of 5 samples and 3 pools of tratment each consisting of 5 samples then you could get some idea of biological variation in gene expression, even though the variation would be variation of different averages.

            Comment

            • danwiththeplan
              Member
              • Sep 2011
              • 72

              #7
              Originally posted by Jeremy View Post
              Single cell organisms are a tricky case, each cell is capable of determining its own expression levels but then they often live in communities that 'talk' to each other and so have some degree of co-regulation. So are a few separate flasks of the same bacterial strain to be considered technical or biological replicates or a pool of samples? I would say it incoporates a part of each of those depending on the species in question. How does that affect the statistics? I don't know.

              Zebra fish for example, or something small, let's say lice (where you may need to pool to get enough RNA), if you hade 3 pools of control each consisting of 5 samples and 3 pools of tratment each consisting of 5 samples then you could get some idea of biological variation in gene expression, even though the variation would be variation of different averages.
              Agree, I think the distinction between technical and biological reps is quite fuzzy really, and depends on the biology quite a lot. Good point about bacterial communities.

              Comment

              • Skiaphrene
                Member
                • Aug 2013
                • 18

                #8
                Hi danwiththeplan,

                I'm not very experienced with RNA-seq stats, but I have a few comments/questions that I hope will advance the discussion...

                My understanding is that when you're NOT doing single-cell RNA-seq, then you're already acquiring reads from a "pool" of cells that may or may not be very similar between each other; the smallest "statistical individual" would be a single cell (and that is assuming transcription is homogenous in the cell...). So for me, there is already some pooling in the process! Not to mention the absence of fine time resolution, hence the tag "steady state" RNA-seq...

                Regardless, any measurement (such as a gene expression level) made is going to be the result of a compendium of variability factors, and the difficulty is identifying them, deciding which ones you can estimate, which ones you care for, which ones you want to ignore... Which is, I think, why the definition of a "biological replicate" is so fuzzy: it corresponds to whatever replicate the experimenter considers as capable of capturing "biologically-relevant" variability.

                For example, if you're RNA-sequencing from a load of cells from 5 human patients from 2 groups, 100 cells per patient and 1 patient per sample to sequence, then cell variabilty is disregarded, but "biological" variability - variability per patient - is taken into account, and thus the experiment has 5 biological replicates.
                However, if you're RNA-sequencing from pools of immune cells from several mice from 2 groups, pooling 5 mice to create one sample to sequence, and creating 3 samples per group, then mouse variability is disregarded as well, but you DO get the variability of a "cohort-of-5-mice" which, as chadn737 pointed out, may be related back to the variability in the population of mice. And the experimenter might be perfectly happy with that (even if it makes the biostatistician cringe).

                How this can then be taken into account in DGE analysis, at this stage, I have no idea...

                I think I'm going to try and do some very simple simulations to work out how to best setup pooling: more individuals per pool? or more pools?

                Am I making sense??? Any thoughts?

                -- Alex

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  Yesterday, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 12:03 PM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, Yesterday, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...