Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Protocol for sequencing of pooled subjects

    Dear All,

    I would like to ask for some help with design of a sequencing project. The aim would be to sequence the exons of 20 genes in 1000 subjects. I've look into the forum and did see some threads that have this or that bit of information, but none of them give a clear pipeline or design of a similar project. I do apologize for being very basic, though.

    So far, this is what I've got - I am planning on pooling 20 subjects and do a PCR amplification of the exons, then combine the PCR products and then multiplex sequence them on Illumina HiSeq 2000. This is a very broad outline and I've got no idea about the details. I assume I first need to design primers to capture all of the exons, but what size my products should be?
    For example, I would need say 350 products to capture all of the exons, then I would run 350 x 50 reactions to get the products (50 reactions would be the 1000 subjects divided by 20 because of the pooling). So far so good... then what do I do? my understanding is that I can pool again the PCR products - so I can combine all the products (all 350) of one 20 subjects' pool and then label it for sequencing? so, there will be 50 sequencing reads?

    again, I am sorry if this is very basic, but I've got limited options of support and got really confused at the moment...

    many thanks for your time and kind help!

  • #2
    So I'm biased towards using an algorithm called SPLINTER as I know the people who created it. Feel free to look it up and consider if you want to use it (you can do pool sizes of hundreds, not just 20, but it requires a positive and negative control).

    That said, regardless of the analysis methodology you end up choosing, you will probably want to do the pooling as you said, do the PCR as you said, then purify all of the amplicons and quantify them. Then, pool them equimolarly (ie, with a pool of 20 and 350 reactions, pool equal moles of each of the 350 amplicons for that set of 20 together). Then, you will probably want to ligate all of the products together and sonicate them to a fragment size of say 300-500 bp. The point of this is that if you have some amplicons that are say 700 bp long and you are using Illumina paired-end sequencing, you will sequence from the ends of the amplicons and never reach the middle (with 2x101 bp reads you would have around 500bp of uncaptured sequence).

    As for the size of the amplicon, the smaller the better in the sense that you will waste less sequencing on the intronic regions. That said, you do want some sort of flanking regions around the exons to take away boundary effects if you do the ligation/sonication I said above (because some reads will span the boundaries of two exons and these will not align unless you have an aligner specific for this purpose).

    Part of your approach depends on the manpower you have to do this. 350x50 = over 15,000 PCR reactions, which you'll want to purify, and quantify. With SPLINTER you can do pool sizes of hundreds of individuals and cut down the workload dramatically.

    Just some thoughts, I'm sure others will chime in with other useful ideas.

    Comment


    • #3
      Hi Heisman,

      Many thanks for you replay and ideas!
      I had a look at SPLINTER and the problem is that it seems one needs to be able to perform DNA fragmentation (I think the authors used Diagenode Bioruptor), which I don't think my lab can.

      I understand that a product of more than 200bp will be hard (if not impossible) to sequence, so I was thinking to have products of 100-150bp in size to cover all of the exons and the immediate exon/intron sequence. The question is - do I need the products to overlap? if so, how many bp?

      Thank you do much for confirming the steps of PCR pooling and PCR product combining. The point where I get confused is how to sequence the combined PCR products with Illumina HiSeq. My understanding is that one can label the products and then run several products on one cell flow, but id the PCR product needs to be about 200 bp in length for a good sequencing, then how combining several products of 200 bp each would work?

      I had the impression that I should combine all 350 products of the pooled 20 subjects, label them as one pool and then combine it with another 10 such pools, so there will be 11 reads on one cell... but I just don't get how is it going to work...

      Many thanks for your time and help!

      Comment


      • #4
        Any type of fragmentation would work, if doable. If not, so be it.
        With Illumina library preps you can put indexes on each fragment. You can also put barcodes on each fragments. So, you will create a library with a mixture of PCR products from the same 20 people, and each fragment will be indexed/barcoded. Then, you will repeat with another group of 20 and use a different index/barcode. Then you can pool these libraries (not reads) together and sequence on the flowcell.

        If you are going to have overlapping amplicons, the one big potential problem will be sequencing the PCR primers themselves. If the primers from one amplicon overlap sequence from a different amplicon, it will be tricky to get out just the biological sequence that you desire. It can be done but it'll take a bit of computational work. I can draw out a paint image to explain if desired.

        Comment


        • #5
          Thank you so much!
          The picture does gets clearer and I don't need to do overlaping products... but now I am lost in calculations:
          so, to get 1000 subjects with pooling 20 people per pool, I will have 50 pools in total;
          then, I will do the PCR to amplify the exons of 20 genes and get 350 products (it will be 350 plates with 50 "subjects" per plate);
          then, I take 350 products from one pool and label each of them (350 labels);
          then, I do the same for another pool and so on, so it will be 350*50 labels;

          OR can I combine all the products of one gene for one pool and then label it? so, it will be 20 genes for 50 pools = 20*50 = 1000 labels?

          then, I can combine the labelled products for a read - my understanding is that one flowcell can take up to 96 subjects and since I've got 50 pools does this mean I can read everything at once? I think it cannot be right... may be it is 96 subjects for a 200bp product? in which case there would be 350 reads? I am sorry, I get totally confused here...

          Many thanks for your time and help!

          Comment


          • #6
            You are using the word "reads" when you should be using the word "libraries" or "pools".

            You can pool everything and then label. So, 1000 labels total. The flowcell can take as many unique labels as you have. You can sequence everything on one lane, but it will be useful to calculate how many reads you theoretically need to get good coverage for your entire library.

            Comment


            • #7
              Thank you once again! and sorry for using the wrong term...

              so, I've got as far as having my products labelled... now, as you've suggested, need to figure out how many reads I need to get a good coverage for the library... if it is not too much to ask, could you please help me and let me know how to start?

              Many thanks for your time!

              Comment


              • #8
                Let's say you have 350 products at 200bp each. That is 70,000 bp total that you want to sequence. 20 people per pool means you have 40 alleles per pool. That is 2.8 million bp total. If you want say 20x coverage per allele, that would be 56 million bp total. With 50 pools, that is 2.8 billion bp total. This is assuming perfectly even coverage across all amplicons in all pools. In reality there will be some type of distribution, and so to be more on the safe side you probably want double this amount of coverage, so 5.6 billion bp total. If you use the HiSeq with 2x101 bp paired end reads (and all of your amplicons are 200bp or greater), that would be roughly 200 bp per read, so you will need 28 million reads.

                Work through that and make sure it's right as I just typed it off the top of my head.

                Comment


                • #9
                  That is great help!!!! Thank you so much!!!

                  I think my problem is that I confuse read and lane (as you've pointed out as well). I was told that one lane is 4 million reads and I couldn't get the difference, but with your explanation I think it is becoming clearer...

                  I went through your numbers and got the same result = 28 million reads for 40x coverage... this would mean 7 lanes then...

                  I think my projects is now more or less clear for me. Thank you so much for your help! I do appreciate it and your explanations really did help me!

                  Once I am done with all that, I will come back to ask about the analysis pipeline ;-)

                  Thank you once again!!!!

                  Comment


                  • #10
                    With the HiSeq you should get way over 28 million reads (with the new chemistry we're getting 150-200 million reads routinely).

                    Comment


                    • #11
                      mmmmhhhhhhh.... I will doublecheck how many reads we've got from one lane...

                      Many thanks and have a wonderful evening!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      32 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X