Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Barcoding vs anonymous pooling

    Hi, new to here, but hope someone may be able to offer some advice.

    I'm currently thinking of designing an experiment whereby we sequence ~1000human samples through a 1.7Mb custom region using Agilent SureSelect.

    2 strategies suggested are:

    1) Pool DNA from 25 samples, make 1 library from the pooled DNA, analyse for variants using Syzygy, follow up called variants through the original 24 samples to identify which sample the variant originates in.

    2) Barcode the samples from scratch which would require individual libraries for each +/- more lanes of sequencing?

    Obviously option 2 makes follow up easier but significantly increases the costs and time required.

    Does anyone have any experience of either method?

  • #2
    One of the problems with the anonymous pooling is that a heterozygous SNP in one of your 25 samples is only going to be present in 2% of your reads for that region, which isn't much higher than the error rate. However, if the samples are barcoded, then all the reads with the SNP will all be identified as being from one person, so statistically a lot easier to spot.
    I do understand why you wouldn't want to make 1000 libraries though.

    Comment


    • #3
      Have you considered doing this by PCR rather than capture? We have developed a system that allows simple preparation of PCR products in which the library preparation and barcoding takes place during amplification. It scales well with large numbers of samples and amplicons, and works with both 454 and Illumina sequencing.

      Comment


      • #4
        Thanks both - we are committed to the pulldown capture following a pilot project so PCR not currently an option. In the pilot, the analysis did seem to work reasonably at identifying even a single het call within the pooled system although clearly the false positive rate will be higher than if we barcoded. However, follow up through the pools is potentially fairly substantial.

        Comment


        • #5
          If you are willing to make 40 pooled libraries, would you be willing to make 80?
          If you put each sample into two internally anonymous libraries - and sequence at sufficient depth - then you will be able to determine which sample near-unique variations came from. It'll probably work okay for rare mutations, although the more common they are the more follow-up work required.

          Comment


          • #6
            I agree with Loris. A good way is to make 2 times the libraries in a row/column pool fashion. We use to do this with 'overgos' (ref: https://www.ncbi.nlm.nih.gov/project...chOvergo.shtml) and the same idea should be applicable to any sequencing project.

            Also I agree with henry.wood in that 2% is getting very close to the noise level. In theory with enough sequencing depth we should be able to detect variants below 1% but in practice I find this hard to accomplish as per the spiked controls we have used.

            Comment


            • #7
              WRT pooling, you might also look at DNA Sudoku.

              As per the comments above, you also could see this as an optimization problem -- what is the smallest number of pooled libraries which have acceptable sensitivity, with some degree of losing the ability to precisely localize a variant in the first run (i.e. instead of 1 pooled anonymous library, what about 2 each with half the samples, 4 each with 1/4, etc)

              If you haven't run this SureSelect design yet, beware that you may get uneven coverage -- so some regions will capture much more than others, which further complicates trying to design in the right sensitivity. Also, I believe Agilent still recommends capturing each library separately, though certainly here you will find folks discussing capturing pooled libraries

              Comment


              • #8
                Thanks all - this is extremely helpful.

                Comment


                • #9
                  Hello Snapper,

                  I developed Syzygy while at the Broad Institute. Syzygy performs well with 25 individuals per pool. In fact we have several small targeted experiments that we designed with 50 individuals per pool (100 chromosomes) across 10 pools . We observe a high validation rate (~90% ) for all variants singletons and above. You can get more information about Syzygy from


                  We are currently optimizing Syzygy to deal with larger target sizes. Intended targets for applications was approximately 60-100 kb.

                  Best Regards,
                  Manuel Rivas

                  Comment


                  • #10
                    Hello Manuel Rivas,

                    I have a pooled experiment with target size of ~803 kb.
                    Can I use Syzygy?

                    If not- does anyone has suggestions what tool to use to call the SNPs from a pooled run (10 individuals in one Illumina run)?

                    Comment


                    • #11
                      check out http://genomebiology.com/2011/12/1/R1/abstract in the latest Genome Biology.

                      A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries

                      Comment


                      • #12
                        Originally posted by gfmgfm View Post
                        Hello Manuel Rivas,

                        I have a pooled experiment with target size of ~803 kb.
                        Can I use Syzygy?

                        If not- does anyone has suggestions what tool to use to call the SNPs from a pooled run (10 individuals in one Illumina run)?
                        Yes you can use Syzygy. I am uploading an optimized version of Syzygy in the next couple of days it should handle 800 kb target without a problem. You can send an e-mail to [email protected]



                        Is the Software's website.

                        Best Regards,
                        Manuel

                        Comment


                        • #13
                          The current version handles 800 kb target size without a problem.

                          Comment


                          • #14
                            Great.
                            Thanks!

                            Comment


                            • #15
                              For calling variants from pooled sequencing data, you can also try CRISP, a method specifically designed to detect variants using sequence reads from multiple pools (each with a moderate number of individuals). The statistical model behind CRISP is described in this Bioinformatics article http://bioinformatics.oxfordjournals...i318.full?etoc

                              A python implementation of CRISP is available here: http://polymorphism.scripps.edu/~vba...oftware/CRISP/
                              A faster and more accurate C implementation is under development and is available on request. We have used CRISP to call variants (both SNPs and short indels) from pooled sequencing of ~600kb of DNA (captured using Agilent SureSelect) of 100 individuals using 5 pools of 20 each. The false discovery rate for detecting SNPs on this dataset was ~ 1%

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X