Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina arcoding methods: two vs three primer sequencing

    Hi all,
    I'd love some help sorting this out, as I'm new to this field. I want to start making 16S amplicon libraries for illumina sequencing. As this is pretty new, there aren't super well-established protocols yet, and I'm trying to pick through what's out there and come up with the easiest, most effective, most cost-conscious method.

    Caporaso et al, PNAS 2011 (here) use a three-primer system, which I guess is what Illumina's expensive 12-maximum indexing system uses, as well as the Nextera system. My home core is nervous about the 3-primer system, but could be convinced. I'm worried about the increased bioinformatic difficulty (is there any?), concerns I've seen here on seqanswers about library quality, and general increased complexity of this approach.

    Bartram et al in AEM 2011 (here) use a different, seemingly simpler strategy with barcodes in line with the sequencing primer and illumina adapters. I think this ends up with them sequencing through the primer, which "wastes" precious illumina read length, but if it ends up being simpler, I'm intrigued by it. I don't understand, though, how the barcode gets read, as it seems to me to be on the wrong side of the illumina sequencing primer?

    here's the description: Lowercase letters denote adapter sequences necessary for binding to the flow cell, underlined lowercase are binding sites for the Illumina sequencing primers, bold uppercase highlight the index sequences (the first 12 indexes were obtained from Illumina) and regular uppercase are the V3 region primers (341F on for the forward primers and 518R for the reverse primers).

    and the sequences:
    V3_f_modified
    aatgatacggcgaccaccgagatctacactctttccctacacgacgctcttccgatctNNNNCCTACGGGAGGCAGCAG

    an example of a barcoded reverse:
    aagcagaagacggcatacgagatCGTGATgtgactggagttcagacgtgtgctcttccgatctATTACCGCGGCTGCTGG


    Final questions:
    - what sort of modifications do you need to order on these primers? phosphate, phosphothioate, ...?
    - the second paper mentions the improvement when using the four N's, so as to improve the complexity and help with cluster generation. Does anyone have experience with this?
    - any other practical help would be really appreciated!

  • #2
    nextflex

    Save yourself the hassle and check out the NEXTflex DNA Barcodes

    Comment


    • #3
      Originally posted by nanelle View Post
      Hi all,
      I'd love some help sorting this out, as I'm new to this field. I want to start making 16S amplicon libraries for illumina sequencing. As this is pretty new, there aren't super well-established protocols yet, and I'm trying to pick through what's out there and come up with the easiest, most effective, most cost-conscious method.

      Caporaso et al, PNAS 2011 (here) use a three-primer system, which I guess is what Illumina's expensive 12-maximum indexing system uses, as well as the Nextera system. My home core is nervous about the 3-primer system, but could be convinced. I'm worried about the increased bioinformatic difficulty (is there any?), concerns I've seen here on seqanswers about library quality, and general increased complexity of this approach.

      Bartram et al in AEM 2011 (here) use a different, seemingly simpler strategy with barcodes in line with the sequencing primer and illumina adapters. I think this ends up with them sequencing through the primer, which "wastes" precious illumina read length, but if it ends up being simpler, I'm intrigued by it. I don't understand, though, how the barcode gets read, as it seems to me to be on the wrong side of the illumina sequencing primer?

      here's the description: Lowercase letters denote adapter sequences necessary for binding to the flow cell, underlined lowercase are binding sites for the Illumina sequencing primers, bold uppercase highlight the index sequences (the first 12 indexes were obtained from Illumina) and regular uppercase are the V3 region primers (341F on for the forward primers and 518R for the reverse primers).

      and the sequences:
      V3_f_modified
      aatgatacggcgaccaccgagatctacactctttccctacacgacgctcttccgatctNNNNCCTACGGGAGGCAGCAG

      an example of a barcoded reverse:
      aagcagaagacggcatacgagatCGTGATgtgactggagttcagacgtgtgctcttccgatctATTACCGCGGCTGCTGG


      Final questions:
      - what sort of modifications do you need to order on these primers? phosphate, phosphothioate, ...?
      - the second paper mentions the improvement when using the four N's, so as to improve the complexity and help with cluster generation. Does anyone have experience with this?
      - any other practical help would be really appreciated!
      Nanelle,

      I looked at this problem in depth recently, including studying both of the papers you cite. I'm associated with a sequencing core so would not be producing the amplicons, we were consulting with one of our researchers. First of all, both of these methods use a three read methodology:

      Read 1 is the forward read; its sequencing primer anneals to the underlined region in the V3_F(modified) PCR primer.

      Read 2 is the index read; this sequencing primer anneals to the region in the V3_nR primer, on the same strand as the read 1 primer.

      The clusters are then "regenerated" so that they now contain the complementary strand.

      Read 3 is the reverse read; sequencing primer here anneals to the same location as the index primer but it is the reverse complement of the index primer so it primes synthesis in the opposite direction.

      The significant difference between these two methods is the sequencing primers used. Bartram uses the standard Illumina sequencing primers, whose sites are incorporated into the middle of the PCR primers. This means that you have to sequence over the portion of 16S target introduced by your PCR primer. This part of the read can not legitimately be used for analysis so would need to be trimmed (==wasted sequencing). The Caporaso method uses custom sequencing primers which overlap the 16S specific portion of the PCR primers. You are reading usable sequence from base 1.

      The purpose of the four N's is not to improve cluster generation but to improve cluster identification by the image analysis software. The cluster identification algorithms really require a reasonable degree of base diversity to properly identify the cluster locations. This is not normally a problem with genomic or RNA sequence since the fragments are a random collection. With amplicon sequencing all of your clusters will have (nearly) identical sequence. The N's are meant to provide a random base distribution so the cluster identification can be done. The cluster identification routines only consider the initial few cycles of read 1, it used to be 4 cycles but I believe it is now 5 so if you used this method you might want to add another N.

      I don't believe any chemical modifications are required for these primers. I don't recall Caporaso or Bartram describing any modifications.

      I'm not sure why your core would be worried about the 3 read strategy. If they are using the Illumina TruSeq library system they should be familiar with it.

      As far as bioinformatic difficulty there really isn't any to speak of. The current version of the Illumina downstream software easily sorts reads by barcode when using the three read strategy and produces sample specific paired, compressed FASTQ files.

      Comment


      • #4
        Hey, thank you very much! I really really appreciate it. So am I right in that if I order and prep the samples using the barcoded PCR primers and provide the three sequencing primers, it should sort itself out (assuming the facility manager is on board)? In the Caporaso method, they don't use Ns - but it'd probably help things to include the 4/5 N's on the end of the forward primer, no?

        Do you have any other hints for me ...

        Comment


        • #5
          Originally posted by nanelle View Post
          Hey, thank you very much! I really really appreciate it. So am I right in that if I order and prep the samples using the barcoded PCR primers and provide the three sequencing primers, it should sort itself out (assuming the facility manager is on board)? In the Caporaso method, they don't use Ns - but it'd probably help things to include the 4/5 N's on the end of the forward primer, no?

          Do you have any other hints for me ...
          Nanelle,

          The Caporaso et al. design precludes inclusion of N's following the sequencing primer start site since they use custom sequencing primers which is coincident with the 3' end of their PCR primers. If you put random N's at the end of the PCR primer it would no longer work. Bartram et al. would be able to do this since the 3' end of the sequencing primer is held far back from the 3' end of the PCR primer; thus they can insert the N's in the middle of their PCR primers as they will not affect target annealing.

          If you decide to use a design strategy like Bartram et al. which relies on the standard Illumina PE and multiplex sequencing primers you would not need to provide the sequencing primers yourself. They are included with the cluster generation kit.

          The most important thing is to make sure that sequencing core fully understands the experimental design and intent so they can make any adjustments necessary and there are no surprises down the road. For example the barcode tags in the Caporaso paper are 12nt long, twice as long as the standard Illumina barcodes. The core facility will need to adjust the sequencing recipe to account for this. This line is tossed in to the methods section of the Caporaso paper:

          Because of technical limitations at the sequencing facility, only part of the barcode was sequenced, so we were unable to exploit the error-correcting properties fully;...
          I wonder if there was a miscommunication with the facility and they did not realize the barcodes were 12nt long, not 6.

          Comment


          • #6
            Just thought of one additional point I wanted to make.

            If you are not using a strategy which allows you to insert random bases at the beginning of read 1 there are a couple of things you could do to improve the ability of the image analysis software to identify the clusters. (You could consider using either or both of these).

            1. Spike in some non-amplicon DNA library. The Illumina phiX control DNA would work fine. For this to be effective though it would have to be present at an appreciable percentage of the sample. I have no knowledge of what that may be but I'd guess at least 10%. Another issue with this method is the sequencing and index primer. If the design of your amplicons calls for custom sequencing and index primers you would need to mix the standard primers in as well to generate sequence from the spiked in library.

            2. Use a lower cluster density than normal, perhaps 50% of normal. If the clusters are well separated the software will be better able to identify them even if it doesn't have a good cross-talk matrix.

            Comment


            • #7
              Additional comment--spiking in PhiX control DNA (~200bp) should work if the amplicon is of comparable size. We've used 10% successfully. It may not work if the product is 330 bp as in the Bartram paper. We end sequenced an amplicon of >850 bp and spiked in someone else's random genomic library (~800 bp) at 25%--didn't need all those amplicon reads.

              I had thought of the NNNN approach too and it was nice to hear that it worked.

              Comment


              • #8
                Originally posted by advanT View Post
                Save yourself the hassle and check out the NEXTflex DNA Barcodes
                Not everyone wants to ligate on adapters. That's a pretty big hassle itself.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X