SEQanswers

Go Back   SEQanswers > Applications Forums > Sample Prep / Library Generation



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pooling samples for library prep using True seq vl80 Illumina/Solexa 6 02-14-2013 02:25 PM
Exome sequencing sample pooling Effri Sample Prep / Library Generation 0 12-19-2012 04:24 AM
Pooling samples in Sanger sequencing? Alex Coventry Sanger/Dye Terminator 6 10-25-2011 03:49 AM
PubMed: Genomic DNA pooling strategy for next-generation sequencing-based rare varian Newsbot! Literature Watch 0 09-18-2011 08:30 AM
Pooling the samples lindenb Sample Prep / Library Generation 3 07-29-2010 11:59 AM

Reply
 
Thread Tools
Old 02-27-2013, 11:45 AM   #1
shocker8786
Member
 
Location: Urbana Illinois

Join Date: Jan 2013
Posts: 28
Default Pooling Samples for Sequencing

I am working on an experiment in which we are going to be doing reduced representation bisulfite sequencing and RNA-seq. Our plan is to pool the samples for each treatment group and sequence each treatment group as a single sample. So for example if we have 4 samples for treatment A we will combine them and sequence the group as a single treatment A sample.

My question is when using this method are there any issues with comparing pools of different sample size? I have two treatments, one has 4 samples and the other has 5. Can I use all the samples from each treatment, or do I have to remove one from the second group, so I have pools of 4 samples for each group?

In other words, is there any issues associated with comparing pools with unequal sample size?
shocker8786 is offline   Reply With Quote
Old 02-27-2013, 11:09 PM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Sigh. Well, at least you ask before doing the experiment and ruining your project. No, the unequal sample sizes are not your problem.

But how would you ever know whether an observed difference is statistically significant, i.e., large compared to what you observe between samples treated the same way, if you don't know how strong the differences between samples in the same treatment group are?

Maybe I'm in a bad mood because it's early in the morning, but as you are the n-th person to ask this question here: I still don't get it. Why would anyone even think about pooling samples without multiplexing? I met people who claimed that they knew that the differences between equally treated samples are so small that they don't need to check, but curiously, these are only those people who have never done such an experiment.
Simon Anders is offline   Reply With Quote
Old 02-28-2013, 09:35 AM   #3
shocker8786
Member
 
Location: Urbana Illinois

Join Date: Jan 2013
Posts: 28
Default

Thank you for your reply. I'm new to NGS analysis, so I may have this wrong, but my understanding was that when comparing differentially methylated sites between groups your statistics are based on comparing the number of methylated/unmethylated reads for each group.

For example, you have a region where 50 reads are aligned in both pools. You would then determine statistical significance by comparing the methylated and unmethylated read counts of the two pools at that region.

I was under this assumption based on the paper below.
http://www.pnas.org/content/110/6/2354.short

The sentence below was taken from the supplemental methods, where they explain how statistical significance was determined between the two cell lines.

"For each methylation region, statistical significance of differential methylation was calculated using a Fisher’s exact test on a 2 × 2 contingency table of methylated and nonmethylated counts in the two cell lines. "

The way I interpret that is the reads are what give you statistical significance. If I'm mistaken would you be able to explain what I am missing? Thank you very much for your help, I really appreciate it!
shocker8786 is offline   Reply With Quote
Old 02-28-2013, 09:51 AM   #4
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Short answer: Using Fisher's exact test for this purpose is wrong. I don't have much time at the moment to look at it in detail, but the paper's analysis is most likely seriously flawed.

Imagine you have 2 treated and 2 control samples:

Control 1: 10 of 50 reads methylated
Control 2: 30 of 50 reads methylated
Treatment 1: 20 of 50 reads methylated
Treatment 2: 40 of 50 reads methylated

So, the methylation goes up by 10 reads, but between two samples within the same treatment group, the difference is 20 reads. Would you believe that this increase in methylation by 10 reads is due to the treatment? I'd rather say it is due to the same random variation that you see within group. Next time you do the same experiment, you might get the opposite result if things vary so much.

Now, imagine you pooled the samples, so you see only the averages:

Control: 20 of 50 reads methylated
Treatment: 30 of 50 reads methylated

Now you don't know any more that there was a change of 20 between replicates, and might think that an increase by 10 is a lot. FIsher's exact test cannot know this either, which is why it is wrong to use this test.

The advantage of pooling is, of course, precisely that you do not see that your results are unlikely to be reproducible, and hence are not discouraged from writing a paper anyway. The fact that referees still fail to spot this elementary mistake seems to help.
Simon Anders is offline   Reply With Quote
Old 02-28-2013, 09:51 AM   #5
microgirl123
Senior Member
 
Location: New England

Join Date: Jun 2012
Posts: 199
Default

I think what Simon is trying to say doesn't relate to NGS sequencing specifically. It relates to any set of samples you are trying to perform statistics on and get meaningful results. Basically, you cannot statistically compare two things unless you have replicates (n must be greater than 1 in your statistics formulas!). If you pool all your samples together into two groups, then you can't perform statistics because you only have one of each of two things (n=1).

You should index each of your 4 samples for Treatment A and each of your 5 samples for Treatment B before pooling. Then you can perform your NGS analysis on the pooled sample and see how the differences between samples in Treatment A compare to the differences between samples in Treatment B.
microgirl123 is offline   Reply With Quote
Old 02-28-2013, 10:28 AM   #6
shocker8786
Member
 
Location: Urbana Illinois

Join Date: Jan 2013
Posts: 28
Default

Thank you very much for taking the time to explain, I understand what you are saying now. I cannot remember why the decision to pool was originally made, but your argument against it makes perfect sense. I'm definitely going to talk with my group about reconsidering our experimental design.

Thanks again!
shocker8786 is offline   Reply With Quote
Old 02-28-2013, 10:29 AM   #7
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by microgirl123 View Post
I think what Simon is trying to say doesn't relate to NGS sequencing specifically.
Of course. But NGS is one of the few fields where people don't know this and nevertheless routinely get papers in high-ranking journals, which than causes new-comers to think that this is how it should be done.
Simon Anders is offline   Reply With Quote
Old 09-20-2013, 11:46 AM   #8
Rick_R
Junior Member
 
Location: USA

Join Date: Sep 2013
Posts: 2
Default

I know this is many months after the original post, but I would like to pose a similar question.

I work with cell lines, and can therefore produce many biological replicates. However, the cost of sequencing them all separately would be too high. One could sequence, say, 6 samples:
1. Control A
2. Control B
3. Control C
4. Treatment A
5. Treatment B
6. Treatment C

Might it be better to sequence this instead:
1. Control A + Control B
2. Control C + Control D
3. Control E + Control F
4. Treatment A + Treatment B
5. Treatment C + Treatment D
6. Treatment E + Treatment F

Is this a reasonable way to reduce the "noise" from biological variability/random variation while maintaining the number of samples sequenced?
Rick_R is offline   Reply With Quote
Old 09-20-2013, 11:54 AM   #9
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Yes, it is.

It's still worth double-checking whether multiplexing really is that expensive: Even if you want to use only one lane for two samples, you can still gain information by marking the fragments from each sample with a barcode. You don't pay more for the sequencing, but you do pay extra for the steps up to the barcode ligation because they cannot be performed in a pooled fashion.
Simon Anders is offline   Reply With Quote
Old 01-08-2014, 12:39 PM   #10
aliceb
Member
 
Location: Switzerland

Join Date: Jan 2010
Posts: 18
Default

Hi all,

To dredge up an old question again, I was wondering if I could get an opinion on a pooling / not pooling design.

First, I understand that I want biological replicates! But is it better to work with replicates of pools or replicates of individuals? I'm leaning towards individuals because we can better call alleles, I think. But my main goal is to identify differentially expressed genes.

An example. We have 3 treatments to compare:

Option A: 5 individuals per treatment, giving me 15 libraries.
Option B: 5 pools (of 10 individuals?), again giving me 15 libraries, but summarizing 150 individuals.

Any thoughts on this option would be appreciated.

Thanks!
aliceb is offline   Reply With Quote
Old 01-08-2014, 02:12 PM   #11
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Of course, B is the better option if you have so many samples anyway. (What are we talking about? Flies?) Unless you want to look at allele-specific expression, as you already noted. The trade-off here depends on how much signal you gain with B vs A and how much potentially interesting biology you lose by not being able to look at alleles.

The option I argued against is

Option C: Pool all the samples from each treatment, giving you 3 libraries in total.

It seems to be non-obvious to distressingly many practitioners why that one is not acceptable.


If it does not cost anything extra, you should consider

Option D: Label the cDNA from each individual with a barcode, the pool them all in one big library, spread over 15 sequencing lanes.

This offers you most information, but requires you to do all the sample-prep steps up to the barcoding 150 times in parallel, which is practicable only if these are only few steps before the pooling and/or you have suitable robotics or lots of patience.
Simon Anders is offline   Reply With Quote
Old 01-08-2014, 11:49 PM   #12
aliceb
Member
 
Location: Switzerland

Join Date: Jan 2010
Posts: 18
Default

Thanks for the reply! We're working with wasps that can be grown up, but high numbers will be a bit of a struggle. And as they're variable, sexual populations there will certainly be information that is lost by pooling.

Option D sounds fantastic. But as I actually have 12 experimental lines to sequence (well, 3 blocks of 4 parallel lines), with at least 5 biological replicates each, I think it's outside of my budget and pipetting capacity

Also, when it comes to pooling, do you have an opinion on how many individuals to use? It seems like pools of only 5 individuals might have problems with one weirdo dominating the response. But how high would one have to go to avoid that? This where my number limitations come in. I would like 10 per pool, but might be limited to fewer.

Last edited by aliceb; 01-09-2014 at 03:29 AM.
aliceb is offline   Reply With Quote
Old 01-22-2014, 12:16 PM   #13
revAMI
Junior Member
 
Location: West Virginia

Join Date: Jan 2014
Posts: 1
Default

Library prep can be more expensive than the sequencing, so option D would have a significant added cost.

I have money of 18 preps, and one run. I have three treatment groups, and hundreds of samples. Is it better to pick six from each group at random, or do six pools (of how many?) for each group?

Pooling would reduce chance bias from biological variability, and give a stronger signal for the most changed genes. It would also be more emotionally satisfying to use more of my samples. On the other hand, it would make allele-specific expression and alternative splicing much harder to do.

This is in humans, so I'm not concerned about creating a denovo trnscriptome.

Which would look better to apply for a follow grant to do more samples?
revAMI is offline   Reply With Quote
Reply

Tags
pooling samples

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO