Unconfigured Ad

**chadn737** · 09-30-2013, 07:51 PM

If you pool all samples, and split the reads randomly after sequencing then your variation is not due to biological differences, but do to your random sampling. Its going to behave differently.

Pooling samples into biological reps so that you can get enough RNA has the effect of reducing the biological variance by the size of the pool, however, since you still have independent biological reps, you can still determine the biological variance.

**danwiththeplan** · 09-30-2013, 07:57 PM

Originally posted by chadn737 View Post

If you pool all samples, and split the reads randomly after sequencing then your variation is not due to biological differences, but do to your random sampling. Its going to behave differently.

Pooling samples into biological reps so that you can get enough RNA has the effect of reducing the biological variance by the size of the pool, however, since you still have independent biological reps, you can still determine the biological variance.

Thanks for your answer. I understand the difference between variation caused by random sampling (bioinformatically) and biological variation, but I'm still not sure I see that it would make much difference in the situation where your biological reps are basically random samples from a bigger pool (for example, bacteria from a fermenter), assuming that your sampling method is random (i.e. you've shaken the fermenter appropriately and you're not sampling from a different physical area). I am guessing that the more individual biological units (i.e. bacteria, zebrafish, whatever) in your pool, the less difference it makes.

Of course you could simply (in the example above) grow 3 different pools, but in this case, are you looking at biological variation or variation caused by lab equipment?

**Jeremy** · 10-02-2013, 05:45 PM

The situations you are describing are technical replicates not biological replicates. Essentially you have no biological replicates in those examples. To be biological replicates they would need to be pools of different individuals
e.g. pool 1 = sample A, B and C
pool 2 = sample D, E and F etc.

A biological replicate allows you to determine the variance in gene expression between different samples. The pooled approach is better than just a single control vs a single treatment, but you still have no idea what the variance is.

**danwiththeplan** · 10-02-2013, 06:08 PM

Originally posted by Jeremy View Post

The situations you are describing are technical replicates not biological replicates. Essentially you have no biological replicates in those examples. To be biological replicates they would need to be pools of different individuals
e.g. pool 1 = sample A, B and C
pool 2 = sample D, E and F etc.

A biological replicate allows you to determine the variance in gene expression between different samples. The pooled approach is better than just a single control vs a single treatment, but you still have no idea what the variance is.

Yes I've had some internal discussions about this and I agree, it's a case of me using a not-very-correct definition of what a biological replicate is. So, I agree that both the situations I describe above are really technical reps since they are derived from the same pool.

However, in the case where 2 different pooled samples are derived from different individuals, as per your example above, then for me, if the pools are derived from a large number of individuals (even if they are different individuals between pools), any biological variation will be washed out by the number of individual biological units- and the more units, the more washed out.

So, theoretically speaking, if you have two pools of 10,000 bacteria each, then according to the strict definition, they are biological reps because each pool is derived from different individuals, but practically speaking, any variance you see is much more likely to be "technical" , i.e. derived from your bacteria being grown closer to the window/next to the coffee machine/at a different time/in a different tube.

Contrast this with the situation where you have two pools of 3 zebrafish embryos. Again both pools are "biological reps" but any variance is much, much more likely to be derived from true biological variation because of the small number of individuals in the pool. I guess that's what I was trying to say.

**Jeremy** · 10-02-2013, 07:00 PM

Single cell organisms are a tricky case, each cell is capable of determining its own expression levels but then they often live in communities that 'talk' to each other and so have some degree of co-regulation. So are a few separate flasks of the same bacterial strain to be considered technical or biological replicates or a pool of samples? I would say it incoporates a part of each of those depending on the species in question. How does that affect the statistics? I don't know.

Zebra fish for example, or something small, let's say lice (where you may need to pool to get enough RNA), if you hade 3 pools of control each consisting of 5 samples and 3 pools of tratment each consisting of 5 samples then you could get some idea of biological variation in gene expression, even though the variation would be variation of different averages.

**danwiththeplan** · 10-02-2013, 07:02 PM

Originally posted by Jeremy View Post

Single cell organisms are a tricky case, each cell is capable of determining its own expression levels but then they often live in communities that 'talk' to each other and so have some degree of co-regulation. So are a few separate flasks of the same bacterial strain to be considered technical or biological replicates or a pool of samples? I would say it incoporates a part of each of those depending on the species in question. How does that affect the statistics? I don't know.

Zebra fish for example, or something small, let's say lice (where you may need to pool to get enough RNA), if you hade 3 pools of control each consisting of 5 samples and 3 pools of tratment each consisting of 5 samples then you could get some idea of biological variation in gene expression, even though the variation would be variation of different averages.

Agree, I think the distinction between technical and biological reps is quite fuzzy really, and depends on the biology quite a lot. Good point about bacterial communities.

**Skiaphrene** · 10-23-2013, 08:28 PM

Hi danwiththeplan,

I'm not very experienced with RNA-seq stats, but I have a few comments/questions that I hope will advance the discussion...

My understanding is that when you're NOT doing single-cell RNA-seq, then you're already acquiring reads from a "pool" of cells that may or may not be very similar between each other; the smallest "statistical individual" would be a single cell (and that is assuming transcription is homogenous in the cell...). So for me, there is already some pooling in the process! Not to mention the absence of fine time resolution, hence the tag "steady state" RNA-seq...

Regardless, any measurement (such as a gene expression level) made is going to be the result of a compendium of variability factors, and the difficulty is identifying them, deciding which ones you can estimate, which ones you care for, which ones you want to ignore... Which is, I think, why the definition of a "biological replicate" is so fuzzy: it corresponds to whatever replicate the experimenter considers as capable of capturing "biologically-relevant" variability.

For example, if you're RNA-sequencing from a load of cells from 5 human patients from 2 groups, 100 cells per patient and 1 patient per sample to sequence, then cell variabilty is disregarded, but "biological" variability - variability per patient - is taken into account, and thus the experiment has 5 biological replicates.
However, if you're RNA-sequencing from pools of immune cells from several mice from 2 groups, pooling 5 mice to create one sample to sequence, and creating 3 samples per group, then mouse variability is disregarded as well, but you DO get the variability of a "cohort-of-5-mice" which, as chadn737 pointed out, may be related back to the variability in the population of mice. And the experimenter might be perfectly happy with that (even if it makes the biostatistician cringe).

How this can then be taken into account in DGE analysis, at this stage, I have no idea...

I think I'm going to try and do some very simple simulations to work out how to best setup pooling: more individuals per pool? or more pools?

Am I making sense??? Any thoughts?

-- Alex

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, Yesterday, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 Yesterday, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, Yesterday, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Statistical treatment of biological reps from pooled samples

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News