Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical treatment of biological reps from pooled samples

    Hi, I have a theoretical question. I know that it's important to do biological reps when doing differential expression RNA-seq experiments (3 seems to be a minimum). I very much see the point when it's from 3 different frogs/trees/chickens/humans. However I'm encountering situations where people are doing 3 biological reps which are derived from pooled samples (e.g. a pool of lots of individuals, where for practical reasons you can't get enough RNA from one individual).

    In this case, what would be the advantage of doing biological reps (for example, doing 3 biological reps, with 3 different library preps/multiplex tags) and what is the difference between this situation and the situation where you just do 3x as much sequencing on one sample, and later randomly split the reads (bioinformatically) into 3 pools?

    To me it's a similar situation:

    Situation 1: you have a pool of samples, split it into 3, extract RNA and do some sequencing, as opposed to..

    Situation 2: you have a pool of samples, extract RNA and do some sequencing, and split it into 3.

    What's the difference biologically/statistically? Does it really make sense to have 3x the library-prep/multiplexing costs for situation 1?

    The advantage for the latter situation would be a (considerable) saving in library prep / multiplexing costs (I'm talking about Illumina PE sequencing).

    Thoughts anyone?

  • #2
    If you pool all samples, and split the reads randomly after sequencing then your variation is not due to biological differences, but do to your random sampling. Its going to behave differently.

    Pooling samples into biological reps so that you can get enough RNA has the effect of reducing the biological variance by the size of the pool, however, since you still have independent biological reps, you can still determine the biological variance.

    Comment


    • #3
      Originally posted by chadn737 View Post
      If you pool all samples, and split the reads randomly after sequencing then your variation is not due to biological differences, but do to your random sampling. Its going to behave differently.

      Pooling samples into biological reps so that you can get enough RNA has the effect of reducing the biological variance by the size of the pool, however, since you still have independent biological reps, you can still determine the biological variance.
      Thanks for your answer. I understand the difference between variation caused by random sampling (bioinformatically) and biological variation, but I'm still not sure I see that it would make much difference in the situation where your biological reps are basically random samples from a bigger pool (for example, bacteria from a fermenter), assuming that your sampling method is random (i.e. you've shaken the fermenter appropriately and you're not sampling from a different physical area). I am guessing that the more individual biological units (i.e. bacteria, zebrafish, whatever) in your pool, the less difference it makes.

      Of course you could simply (in the example above) grow 3 different pools, but in this case, are you looking at biological variation or variation caused by lab equipment?
      Last edited by danwiththeplan; 09-30-2013, 08:01 PM.

      Comment


      • #4
        The situations you are describing are technical replicates not biological replicates. Essentially you have no biological replicates in those examples. To be biological replicates they would need to be pools of different individuals
        e.g. pool 1 = sample A, B and C
        pool 2 = sample D, E and F etc.

        A biological replicate allows you to determine the variance in gene expression between different samples. The pooled approach is better than just a single control vs a single treatment, but you still have no idea what the variance is.

        Comment


        • #5
          Originally posted by Jeremy View Post
          The situations you are describing are technical replicates not biological replicates. Essentially you have no biological replicates in those examples. To be biological replicates they would need to be pools of different individuals
          e.g. pool 1 = sample A, B and C
          pool 2 = sample D, E and F etc.

          A biological replicate allows you to determine the variance in gene expression between different samples. The pooled approach is better than just a single control vs a single treatment, but you still have no idea what the variance is.
          Yes I've had some internal discussions about this and I agree, it's a case of me using a not-very-correct definition of what a biological replicate is. So, I agree that both the situations I describe above are really technical reps since they are derived from the same pool.

          However, in the case where 2 different pooled samples are derived from different individuals, as per your example above, then for me, if the pools are derived from a large number of individuals (even if they are different individuals between pools), any biological variation will be washed out by the number of individual biological units- and the more units, the more washed out.

          So, theoretically speaking, if you have two pools of 10,000 bacteria each, then according to the strict definition, they are biological reps because each pool is derived from different individuals, but practically speaking, any variance you see is much more likely to be "technical" , i.e. derived from your bacteria being grown closer to the window/next to the coffee machine/at a different time/in a different tube.

          Contrast this with the situation where you have two pools of 3 zebrafish embryos. Again both pools are "biological reps" but any variance is much, much more likely to be derived from true biological variation because of the small number of individuals in the pool. I guess that's what I was trying to say.

          Comment


          • #6
            Single cell organisms are a tricky case, each cell is capable of determining its own expression levels but then they often live in communities that 'talk' to each other and so have some degree of co-regulation. So are a few separate flasks of the same bacterial strain to be considered technical or biological replicates or a pool of samples? I would say it incoporates a part of each of those depending on the species in question. How does that affect the statistics? I don't know.

            Zebra fish for example, or something small, let's say lice (where you may need to pool to get enough RNA), if you hade 3 pools of control each consisting of 5 samples and 3 pools of tratment each consisting of 5 samples then you could get some idea of biological variation in gene expression, even though the variation would be variation of different averages.

            Comment


            • #7
              Originally posted by Jeremy View Post
              Single cell organisms are a tricky case, each cell is capable of determining its own expression levels but then they often live in communities that 'talk' to each other and so have some degree of co-regulation. So are a few separate flasks of the same bacterial strain to be considered technical or biological replicates or a pool of samples? I would say it incoporates a part of each of those depending on the species in question. How does that affect the statistics? I don't know.

              Zebra fish for example, or something small, let's say lice (where you may need to pool to get enough RNA), if you hade 3 pools of control each consisting of 5 samples and 3 pools of tratment each consisting of 5 samples then you could get some idea of biological variation in gene expression, even though the variation would be variation of different averages.
              Agree, I think the distinction between technical and biological reps is quite fuzzy really, and depends on the biology quite a lot. Good point about bacterial communities.

              Comment


              • #8
                Hi danwiththeplan,

                I'm not very experienced with RNA-seq stats, but I have a few comments/questions that I hope will advance the discussion...

                My understanding is that when you're NOT doing single-cell RNA-seq, then you're already acquiring reads from a "pool" of cells that may or may not be very similar between each other; the smallest "statistical individual" would be a single cell (and that is assuming transcription is homogenous in the cell...). So for me, there is already some pooling in the process! Not to mention the absence of fine time resolution, hence the tag "steady state" RNA-seq...

                Regardless, any measurement (such as a gene expression level) made is going to be the result of a compendium of variability factors, and the difficulty is identifying them, deciding which ones you can estimate, which ones you care for, which ones you want to ignore... Which is, I think, why the definition of a "biological replicate" is so fuzzy: it corresponds to whatever replicate the experimenter considers as capable of capturing "biologically-relevant" variability.

                For example, if you're RNA-sequencing from a load of cells from 5 human patients from 2 groups, 100 cells per patient and 1 patient per sample to sequence, then cell variabilty is disregarded, but "biological" variability - variability per patient - is taken into account, and thus the experiment has 5 biological replicates.
                However, if you're RNA-sequencing from pools of immune cells from several mice from 2 groups, pooling 5 mice to create one sample to sequence, and creating 3 samples per group, then mouse variability is disregarded as well, but you DO get the variability of a "cohort-of-5-mice" which, as chadn737 pointed out, may be related back to the variability in the population of mice. And the experimenter might be perfectly happy with that (even if it makes the biostatistician cringe).

                How this can then be taken into account in DGE analysis, at this stage, I have no idea...

                I think I'm going to try and do some very simple simulations to work out how to best setup pooling: more individuals per pool? or more pools?

                Am I making sense??? Any thoughts?

                -- Alex

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X