Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Do extra PCR cycles really increase errors?

    So the dogma goes:

    Use as few PCR cycles during library construction as possible.

    One can imagine various rationales behind this one.
    1. Any bias in the PCR process becomes more pronounced as more cycles are performed.
    2. If your library is very small (eg, 1 million amplicons) then PCR amplifying it to 1 billion amplicons serves little purpose. You will just end up sequencing those original 1 million amplicons and average of 1000 times each.
    3. A PCR polymerase has an inherent error rate. The more product strands created, the more errors introduced.


    But is #3 even true? Before delving into it, I would like to exclude issues having to do with overrunning the supply of reactants in the PCR. If the amount of final product approaches the total supply of dNTPs in the reaction, I can easily imagine higher levels of misincorporation.

    By "errors" here, we mean errors per total bases, right? If the PCR polymerase used is like Taq polymerase it likely has an error rate of about 1 in 10,000 bases polymerized.



    Templates:

    One thousand 100 base amplicons.

    The question: will the error rate per amplicon be higher after 20 cycles than it was after 10 cycles?




    Again, presuming reactants are not limiting and each cycle is 100% efficient -- that is, doubling the number of amplicons: After 10 cycles there will be 1 million (2^10 * 1000) amplicons (~0.1 pg of DNA). After 20 cycles there will be 1 billion (2^20 *1000) amplicons (~0.1 ng of DNA).

    Doesn't intuition tell us that after 10 additional cycles the errors will have compounded and the overall error rate we might detect by sequencing 1000 of the amplicons will have increased?

    Ignoring indels, I don't see this is the case. Sure, Taq polymerase would tend to misincorporate a base in 1% of the product strands that it creates, and that erroneous template will then be amplified each cycle. But the 99% of product strands that did not contain an error will also be amplified each cycle. Meaning, your error rate in your product strands will just be the error rate of the polymerase -- 1 in 10,000 bases, 1% of the amplicons in this case.

    Don't get me wrong: I have sequenced PCR products. The error rate is way higher than 1 in 10,000 bases.

    So what is wrong with my logic?

    --
    Phillip

  • #2
    So I'm rather new in bioinformatics, I have little choice other than to go what I'm taught and told. One of our lab vetarans once expressed his opinion on PCR prior to sequencing: he said that back in the days when he was using "Taq version 0.1" he hardly saw any bias. That he didn't expect that PCR caused much trouble now, even though there's much discussion on the subject of PCR duplicate reads and errors introduced by PCR.

    And that is why I'd be interested to read reactions to Philips post.

    Comment


    • #3
      Hi Phillip,

      The problem arises b/c the errors are cumulative. Once you generate a mutation, it essentially becomes fixed at that fraction of the population since it's now template for all subsequent rounds of amplification. And every round of amplification generates additional errors, so the fraction of mutant molecules is always increasing.

      By calculating the probability, it should be clear. Consider a single 100bp amplicon. The likelihood that a single base replicates faithfully is 1-0.0001 (the error rate) = 0.9999. The likelihood that the entire amplicon is correct is 0.9999^100 (the amplicon size) = .99. After 10 rounds of PCR, the likelihood of being error-free is .99^10 = .90 (still pretty good odds), but after 20 cycles that drops to .99^20 = .82. The remainder will contain one or more errors.

      -Harold

      Comment


      • #4
        Originally posted by Bruins View Post
        So I'm rather new in bioinformatics, I have little choice other than to go what I'm taught and told. One of our lab vetarans once expressed his opinion on PCR prior to sequencing: he said that back in the days when he was using "Taq version 0.1" he hardly saw any bias. That he didn't expect that PCR caused much trouble now, even though there's much discussion on the subject of PCR duplicate reads and errors introduced by PCR.

        And that is why I'd be interested to read reactions to Philips post.
        Bias against high-GC & low-GC is seen in 2nd gen sequencing as well as in other PCR-based gene synthesis. Genomic PCR tends to behave worse in high GC regions, though I don't have real stats for that (but have seen it frequently). A number of papers have used low numbers of PCR cycles for library construction, though I don't know off-hand how carefully they looked at GC bias.

        Comment


        • #5
          I wrote:

          Originally posted by pmiguel View Post
          Sure, Taq polymerase would tend to misincorporate a base in 1% of the product strands that it creates, and that erroneous template will then be amplified each cycle. But the 99% of product strands that did not contain an error will also be amplified each cycle. Meaning, your error rate in your product strands will just be the error rate of the polymerase -- 1 in 10,000 bases, 1% of the amplicons in this case.
          To which Harold replies:
          Originally posted by HESmith View Post
          The problem arises b/c the errors are cumulative. Once you generate a mutation, it essentially becomes fixed at that fraction of the population since it's now template for all subsequent rounds of amplification. And every round of amplification generates additional errors, so the fraction of mutant molecules is always increasing.
          Thanks Harold. Very clear.
          Using this method, one could calculate a best case scenario for detection of minor variants in pools of samples.

          --
          Phillip

          Comment


          • #6
            Are these probabilities real?

            Originally posted by HESmith View Post
            Hi Phillip,

            By calculating the probability, it should be clear. Consider a single 100bp amplicon. The likelihood that a single base replicates faithfully is 1-0.0001 (the error rate) = 0.9999. The likelihood that the entire amplicon is correct is 0.9999^100 (the amplicon size) = .99. After 10 rounds of PCR, the likelihood of being error-free is .99^10 = .90 (still pretty good odds), but after 20 cycles that drops to .99^20 = .82. The remainder will contain one or more errors.

            -Harold
            Hi Harold,

            I used your principle above to calculate error rates in a library preparation for sequencing (I am getting a high number of false heterozygous calls and I am trying to investigate why). My calculations are below:

            enzyme error rate: 5.5x10-6
            base accuracy: 1-5.5x10-6
            likelihood of correct amplicon: accuracy x length (304) = 0.989077
            likelihood of correct amplicon after 35 cycles: 0.680865

            I thought that this number is way too high! Can you imagine doing variant calling with such error rate?

            When I try to calculate the total number of amplicon molecules that does not have an error in my library at the end, I get stuck. I know that the input DNA is 50ng in the reaction, so using a simple calculation considering 3.3pg = 1 genome, I should have around 15,100 copies of my target in the beginning of the reaction. Theoretically, the calculations above show me that over 30% of my product will contain an error, not to mention sequencing errors! I know that in practice I don't get that, it is up to 1% both library prep and sequencing error. Where am I mistaken?

            Best,
            Camila.

            Comment


            • #7
              Originally posted by cientista_carioca View Post
              Hi Harold,

              I used your principle above to calculate error rates in a library preparation for sequencing (I am getting a high number of false heterozygous calls and I am trying to investigate why). My calculations are below:

              enzyme error rate: 5.5x10-6
              base accuracy: 1-5.5x10-6
              likelihood of correct amplicon: accuracy x length (304) = 0.989077
              likelihood of correct amplicon after 35 cycles: 0.680865

              I thought that this number is way too high! Can you imagine doing variant calling with such error rate?

              When I try to calculate the total number of amplicon molecules that does not have an error in my library at the end, I get stuck. I know that the input DNA is 50ng in the reaction, so using a simple calculation considering 3.3pg = 1 genome, I should have around 15,100 copies of my target in the beginning of the reaction. Theoretically, the calculations above show me that over 30% of my product will contain an error, not to mention sequencing errors! I know that in practice I don't get that, it is up to 1% both library prep and sequencing error. Where am I mistaken?

              Best,
              Camila.
              The error rate of 0.989 refers to the probability of one error over the 304 bases, whereas when you consider variants, you are looking at the probability of an error at a single position over 35 cycles (0.02%)

              FWIW.

              Comment


              • #8
                Originally posted by cientista_carioca View Post
                enzyme error rate: 5.5x10-6
                base accuracy: 1-5.5x10-6
                likelihood of correct amplicon: accuracy x length (304) = 0.989077
                likelihood of correct amplicon after 35 cycles: 0.680865

                I thought that this number is way too high! Can you imagine doing variant calling with such error rate?

                When I try to calculate the total number of amplicon molecules that does not have an error in my library at the end, I get stuck. I know that the input DNA is 50ng in the reaction, so using a simple calculation considering 3.3pg = 1 genome, I should have around 15,100 copies of my target in the beginning of the reaction. Theoretically, the calculations above show me that over 30% of my product will contain an error, not to mention sequencing errors! I know that in practice I don't get that, it is up to 1% both library prep and sequencing error. Where am I mistaken?
                If the error rate is 5.5*10^-6 per base, then the probability of an error-free copy is (1-5.5*10^-6)^304, which is about 0.998329 . That's significantly better per generation.

                If you run 35 cycles of PCR, you're not going to get 35 doublings of DNA, because 1) each cycle is less than 2-fold even early in the reaction, and 2) reactants run out. Consider that 50ng * 2^35 = 1.7 kg of DNA.

                If you know the yield of the PCR reaction you can work out how many actual replications you're getting, on average.

                Comment


                • #9
                  It's important to note here that with every generation, only the new reads can have additional errors. So, for example, (correct rate)^(cycles) is not quite accurate; the actual number is better, and increasingly better as the duplication ratio per cycle drops from 100%.

                  Comment


                  • #10
                    Brian is absolutely correct that replication efficiency drops at every cycle as reagents are consumed. If not, then 35 cycles of 50 ng input would yield ~1.7 kg of product :-).

                    However, Phillip's original query asked to exclude the issue of reagent limitation. And the concept underlying Muller's ratchet, although not directly applicable to PCR, explains why the mutational load increases with each round of replication.

                    Comment


                    • #11
                      This makes sense, thanks for all the answers.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      18 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      47 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X