SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Is methylation data gathered from bisulfite sequencing noisy between biological reps? rghan Epigenetics 1 03-18-2013 07:09 AM
dindel stage 4 for pooled samples bgu Bioinformatics 2 12-31-2012 01:39 PM
RLMID pooled samples yaximik 454 Pyrosequencing 0 06-27-2012 07:07 AM
PubMed: SNVer: a statistical tool for variant calling in analysis of pooled or indivi Newsbot! Literature Watch 0 01-04-2012 03:10 AM
Extracting reads from barcoded pooled samples megh Bioinformatics 1 06-28-2011 12:42 AM

Reply
 
Thread Tools
Old 09-30-2013, 08:41 PM   #1
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default Statistical treatment of biological reps from pooled samples

Hi, I have a theoretical question. I know that it's important to do biological reps when doing differential expression RNA-seq experiments (3 seems to be a minimum). I very much see the point when it's from 3 different frogs/trees/chickens/humans. However I'm encountering situations where people are doing 3 biological reps which are derived from pooled samples (e.g. a pool of lots of individuals, where for practical reasons you can't get enough RNA from one individual).

In this case, what would be the advantage of doing biological reps (for example, doing 3 biological reps, with 3 different library preps/multiplex tags) and what is the difference between this situation and the situation where you just do 3x as much sequencing on one sample, and later randomly split the reads (bioinformatically) into 3 pools?

To me it's a similar situation:

Situation 1: you have a pool of samples, split it into 3, extract RNA and do some sequencing, as opposed to..

Situation 2: you have a pool of samples, extract RNA and do some sequencing, and split it into 3.

What's the difference biologically/statistically? Does it really make sense to have 3x the library-prep/multiplexing costs for situation 1?

The advantage for the latter situation would be a (considerable) saving in library prep / multiplexing costs (I'm talking about Illumina PE sequencing).

Thoughts anyone?
danwiththeplan is offline   Reply With Quote
Old 09-30-2013, 08:51 PM   #2
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

If you pool all samples, and split the reads randomly after sequencing then your variation is not due to biological differences, but do to your random sampling. Its going to behave differently.

Pooling samples into biological reps so that you can get enough RNA has the effect of reducing the biological variance by the size of the pool, however, since you still have independent biological reps, you can still determine the biological variance.
chadn737 is offline   Reply With Quote
Old 09-30-2013, 08:57 PM   #3
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Quote:
Originally Posted by chadn737 View Post
If you pool all samples, and split the reads randomly after sequencing then your variation is not due to biological differences, but do to your random sampling. Its going to behave differently.

Pooling samples into biological reps so that you can get enough RNA has the effect of reducing the biological variance by the size of the pool, however, since you still have independent biological reps, you can still determine the biological variance.
Thanks for your answer. I understand the difference between variation caused by random sampling (bioinformatically) and biological variation, but I'm still not sure I see that it would make much difference in the situation where your biological reps are basically random samples from a bigger pool (for example, bacteria from a fermenter), assuming that your sampling method is random (i.e. you've shaken the fermenter appropriately and you're not sampling from a different physical area). I am guessing that the more individual biological units (i.e. bacteria, zebrafish, whatever) in your pool, the less difference it makes.

Of course you could simply (in the example above) grow 3 different pools, but in this case, are you looking at biological variation or variation caused by lab equipment?

Last edited by danwiththeplan; 09-30-2013 at 09:01 PM.
danwiththeplan is offline   Reply With Quote
Old 10-02-2013, 06:45 PM   #4
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

The situations you are describing are technical replicates not biological replicates. Essentially you have no biological replicates in those examples. To be biological replicates they would need to be pools of different individuals
e.g. pool 1 = sample A, B and C
pool 2 = sample D, E and F etc.

A biological replicate allows you to determine the variance in gene expression between different samples. The pooled approach is better than just a single control vs a single treatment, but you still have no idea what the variance is.
Jeremy is offline   Reply With Quote
Old 10-02-2013, 07:08 PM   #5
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Quote:
Originally Posted by Jeremy View Post
The situations you are describing are technical replicates not biological replicates. Essentially you have no biological replicates in those examples. To be biological replicates they would need to be pools of different individuals
e.g. pool 1 = sample A, B and C
pool 2 = sample D, E and F etc.

A biological replicate allows you to determine the variance in gene expression between different samples. The pooled approach is better than just a single control vs a single treatment, but you still have no idea what the variance is.
Yes I've had some internal discussions about this and I agree, it's a case of me using a not-very-correct definition of what a biological replicate is. So, I agree that both the situations I describe above are really technical reps since they are derived from the same pool.

However, in the case where 2 different pooled samples are derived from different individuals, as per your example above, then for me, if the pools are derived from a large number of individuals (even if they are different individuals between pools), any biological variation will be washed out by the number of individual biological units- and the more units, the more washed out.

So, theoretically speaking, if you have two pools of 10,000 bacteria each, then according to the strict definition, they are biological reps because each pool is derived from different individuals, but practically speaking, any variance you see is much more likely to be "technical" , i.e. derived from your bacteria being grown closer to the window/next to the coffee machine/at a different time/in a different tube.

Contrast this with the situation where you have two pools of 3 zebrafish embryos. Again both pools are "biological reps" but any variance is much, much more likely to be derived from true biological variation because of the small number of individuals in the pool. I guess that's what I was trying to say.
danwiththeplan is offline   Reply With Quote
Old 10-02-2013, 08:00 PM   #6
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

Single cell organisms are a tricky case, each cell is capable of determining its own expression levels but then they often live in communities that 'talk' to each other and so have some degree of co-regulation. So are a few separate flasks of the same bacterial strain to be considered technical or biological replicates or a pool of samples? I would say it incoporates a part of each of those depending on the species in question. How does that affect the statistics? I don't know.

Zebra fish for example, or something small, let's say lice (where you may need to pool to get enough RNA), if you hade 3 pools of control each consisting of 5 samples and 3 pools of tratment each consisting of 5 samples then you could get some idea of biological variation in gene expression, even though the variation would be variation of different averages.
Jeremy is offline   Reply With Quote
Old 10-02-2013, 08:02 PM   #7
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Quote:
Originally Posted by Jeremy View Post
Single cell organisms are a tricky case, each cell is capable of determining its own expression levels but then they often live in communities that 'talk' to each other and so have some degree of co-regulation. So are a few separate flasks of the same bacterial strain to be considered technical or biological replicates or a pool of samples? I would say it incoporates a part of each of those depending on the species in question. How does that affect the statistics? I don't know.

Zebra fish for example, or something small, let's say lice (where you may need to pool to get enough RNA), if you hade 3 pools of control each consisting of 5 samples and 3 pools of tratment each consisting of 5 samples then you could get some idea of biological variation in gene expression, even though the variation would be variation of different averages.
Agree, I think the distinction between technical and biological reps is quite fuzzy really, and depends on the biology quite a lot. Good point about bacterial communities.
danwiththeplan is offline   Reply With Quote
Old 10-23-2013, 09:28 PM   #8
Skiaphrene
Member
 
Location: Lausanne CH

Join Date: Aug 2013
Posts: 18
Default

Hi danwiththeplan,

I'm not very experienced with RNA-seq stats, but I have a few comments/questions that I hope will advance the discussion...

My understanding is that when you're NOT doing single-cell RNA-seq, then you're already acquiring reads from a "pool" of cells that may or may not be very similar between each other; the smallest "statistical individual" would be a single cell (and that is assuming transcription is homogenous in the cell...). So for me, there is already some pooling in the process! Not to mention the absence of fine time resolution, hence the tag "steady state" RNA-seq...

Regardless, any measurement (such as a gene expression level) made is going to be the result of a compendium of variability factors, and the difficulty is identifying them, deciding which ones you can estimate, which ones you care for, which ones you want to ignore... Which is, I think, why the definition of a "biological replicate" is so fuzzy: it corresponds to whatever replicate the experimenter considers as capable of capturing "biologically-relevant" variability.

For example, if you're RNA-sequencing from a load of cells from 5 human patients from 2 groups, 100 cells per patient and 1 patient per sample to sequence, then cell variabilty is disregarded, but "biological" variability - variability per patient - is taken into account, and thus the experiment has 5 biological replicates.
However, if you're RNA-sequencing from pools of immune cells from several mice from 2 groups, pooling 5 mice to create one sample to sequence, and creating 3 samples per group, then mouse variability is disregarded as well, but you DO get the variability of a "cohort-of-5-mice" which, as chadn737 pointed out, may be related back to the variability in the population of mice. And the experimenter might be perfectly happy with that (even if it makes the biostatistician cringe).

How this can then be taken into account in DGE analysis, at this stage, I have no idea...

I think I'm going to try and do some very simple simulations to work out how to best setup pooling: more individuals per pool? or more pools?

Am I making sense??? Any thoughts?

-- Alex
Skiaphrene is offline   Reply With Quote
Reply

Tags
design, pooled, repeats, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO