Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pooling Samples for Sequencing

    I am working on an experiment in which we are going to be doing reduced representation bisulfite sequencing and RNA-seq. Our plan is to pool the samples for each treatment group and sequence each treatment group as a single sample. So for example if we have 4 samples for treatment A we will combine them and sequence the group as a single treatment A sample.

    My question is when using this method are there any issues with comparing pools of different sample size? I have two treatments, one has 4 samples and the other has 5. Can I use all the samples from each treatment, or do I have to remove one from the second group, so I have pools of 4 samples for each group?

    In other words, is there any issues associated with comparing pools with unequal sample size?

  • #2
    Sigh. Well, at least you ask before doing the experiment and ruining your project. No, the unequal sample sizes are not your problem.

    But how would you ever know whether an observed difference is statistically significant, i.e., large compared to what you observe between samples treated the same way, if you don't know how strong the differences between samples in the same treatment group are?

    Maybe I'm in a bad mood because it's early in the morning, but as you are the n-th person to ask this question here: I still don't get it. Why would anyone even think about pooling samples without multiplexing? I met people who claimed that they knew that the differences between equally treated samples are so small that they don't need to check, but curiously, these are only those people who have never done such an experiment.

    Comment


    • #3
      Thank you for your reply. I'm new to NGS analysis, so I may have this wrong, but my understanding was that when comparing differentially methylated sites between groups your statistics are based on comparing the number of methylated/unmethylated reads for each group.

      For example, you have a region where 50 reads are aligned in both pools. You would then determine statistical significance by comparing the methylated and unmethylated read counts of the two pools at that region.

      I was under this assumption based on the paper below.


      The sentence below was taken from the supplemental methods, where they explain how statistical significance was determined between the two cell lines.

      "For each methylation region, statistical significance of differential methylation was calculated using a Fisher’s exact test on a 2 × 2 contingency table of methylated and nonmethylated counts in the two cell lines. "

      The way I interpret that is the reads are what give you statistical significance. If I'm mistaken would you be able to explain what I am missing? Thank you very much for your help, I really appreciate it!

      Comment


      • #4
        Short answer: Using Fisher's exact test for this purpose is wrong. I don't have much time at the moment to look at it in detail, but the paper's analysis is most likely seriously flawed.

        Imagine you have 2 treated and 2 control samples:

        Control 1: 10 of 50 reads methylated
        Control 2: 30 of 50 reads methylated
        Treatment 1: 20 of 50 reads methylated
        Treatment 2: 40 of 50 reads methylated

        So, the methylation goes up by 10 reads, but between two samples within the same treatment group, the difference is 20 reads. Would you believe that this increase in methylation by 10 reads is due to the treatment? I'd rather say it is due to the same random variation that you see within group. Next time you do the same experiment, you might get the opposite result if things vary so much.

        Now, imagine you pooled the samples, so you see only the averages:

        Control: 20 of 50 reads methylated
        Treatment: 30 of 50 reads methylated

        Now you don't know any more that there was a change of 20 between replicates, and might think that an increase by 10 is a lot. FIsher's exact test cannot know this either, which is why it is wrong to use this test.

        The advantage of pooling is, of course, precisely that you do not see that your results are unlikely to be reproducible, and hence are not discouraged from writing a paper anyway. The fact that referees still fail to spot this elementary mistake seems to help.

        Comment


        • #5
          I think what Simon is trying to say doesn't relate to NGS sequencing specifically. It relates to any set of samples you are trying to perform statistics on and get meaningful results. Basically, you cannot statistically compare two things unless you have replicates (n must be greater than 1 in your statistics formulas!). If you pool all your samples together into two groups, then you can't perform statistics because you only have one of each of two things (n=1).

          You should index each of your 4 samples for Treatment A and each of your 5 samples for Treatment B before pooling. Then you can perform your NGS analysis on the pooled sample and see how the differences between samples in Treatment A compare to the differences between samples in Treatment B.

          Comment


          • #6
            Thank you very much for taking the time to explain, I understand what you are saying now. I cannot remember why the decision to pool was originally made, but your argument against it makes perfect sense. I'm definitely going to talk with my group about reconsidering our experimental design.

            Thanks again!

            Comment


            • #7
              Originally posted by microgirl123 View Post
              I think what Simon is trying to say doesn't relate to NGS sequencing specifically.
              Of course. But NGS is one of the few fields where people don't know this and nevertheless routinely get papers in high-ranking journals, which than causes new-comers to think that this is how it should be done.

              Comment


              • #8
                I know this is many months after the original post, but I would like to pose a similar question.

                I work with cell lines, and can therefore produce many biological replicates. However, the cost of sequencing them all separately would be too high. One could sequence, say, 6 samples:
                1. Control A
                2. Control B
                3. Control C
                4. Treatment A
                5. Treatment B
                6. Treatment C

                Might it be better to sequence this instead:
                1. Control A + Control B
                2. Control C + Control D
                3. Control E + Control F
                4. Treatment A + Treatment B
                5. Treatment C + Treatment D
                6. Treatment E + Treatment F

                Is this a reasonable way to reduce the "noise" from biological variability/random variation while maintaining the number of samples sequenced?

                Comment


                • #9
                  Yes, it is.

                  It's still worth double-checking whether multiplexing really is that expensive: Even if you want to use only one lane for two samples, you can still gain information by marking the fragments from each sample with a barcode. You don't pay more for the sequencing, but you do pay extra for the steps up to the barcode ligation because they cannot be performed in a pooled fashion.

                  Comment


                  • #10
                    Hi all,

                    To dredge up an old question again, I was wondering if I could get an opinion on a pooling / not pooling design.

                    First, I understand that I want biological replicates! But is it better to work with replicates of pools or replicates of individuals? I'm leaning towards individuals because we can better call alleles, I think. But my main goal is to identify differentially expressed genes.

                    An example. We have 3 treatments to compare:

                    Option A: 5 individuals per treatment, giving me 15 libraries.
                    Option B: 5 pools (of 10 individuals?), again giving me 15 libraries, but summarizing 150 individuals.

                    Any thoughts on this option would be appreciated.

                    Thanks!

                    Comment


                    • #11
                      Of course, B is the better option if you have so many samples anyway. (What are we talking about? Flies?) Unless you want to look at allele-specific expression, as you already noted. The trade-off here depends on how much signal you gain with B vs A and how much potentially interesting biology you lose by not being able to look at alleles.

                      The option I argued against is

                      Option C: Pool all the samples from each treatment, giving you 3 libraries in total.

                      It seems to be non-obvious to distressingly many practitioners why that one is not acceptable.


                      If it does not cost anything extra, you should consider

                      Option D: Label the cDNA from each individual with a barcode, the pool them all in one big library, spread over 15 sequencing lanes.

                      This offers you most information, but requires you to do all the sample-prep steps up to the barcoding 150 times in parallel, which is practicable only if these are only few steps before the pooling and/or you have suitable robotics or lots of patience.

                      Comment


                      • #12
                        Thanks for the reply! We're working with wasps that can be grown up, but high numbers will be a bit of a struggle. And as they're variable, sexual populations there will certainly be information that is lost by pooling.

                        Option D sounds fantastic. But as I actually have 12 experimental lines to sequence (well, 3 blocks of 4 parallel lines), with at least 5 biological replicates each, I think it's outside of my budget and pipetting capacity

                        Also, when it comes to pooling, do you have an opinion on how many individuals to use? It seems like pools of only 5 individuals might have problems with one weirdo dominating the response. But how high would one have to go to avoid that? This where my number limitations come in. I would like 10 per pool, but might be limited to fewer.
                        Last edited by aliceb; 01-09-2014, 04:29 AM.

                        Comment


                        • #13
                          Library prep can be more expensive than the sequencing, so option D would have a significant added cost.

                          I have money of 18 preps, and one run. I have three treatment groups, and hundreds of samples. Is it better to pick six from each group at random, or do six pools (of how many?) for each group?

                          Pooling would reduce chance bias from biological variability, and give a stronger signal for the most changed genes. It would also be more emotionally satisfying to use more of my samples. On the other hand, it would make allele-specific expression and alternative splicing much harder to do.

                          This is in humans, so I'm not concerned about creating a denovo trnscriptome.

                          Which would look better to apply for a follow grant to do more samples?

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM
                          • seqadmin
                            The Impact of AI in Genomic Medicine
                            by seqadmin



                            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                            02-26-2024, 02:07 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-14-2024, 06:13 AM
                          0 responses
                          33 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-08-2024, 08:03 AM
                          0 responses
                          72 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-07-2024, 08:13 AM
                          0 responses
                          81 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-06-2024, 09:51 AM
                          0 responses
                          68 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X