SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
bowtie error: extra parameters specified BioSlayer Bioinformatics 1 10-07-2011 11:38 AM
PCR duplicates increase when excess of beads tdm SOLiD 10 03-31-2011 09:48 AM
Number of PCR How many PCR cycles to enrich adapter-modified DNA fragments MGH Man Sample Prep / Library Generation 5 07-26-2010 06:15 AM
titrate adapter & extra bands in library xenia.zhang Sample Prep / Library Generation 0 01-13-2010 10:55 AM
PCR enrichment of libraries in 12 cycles or less? seqgirl123 Illumina/Solexa 6 07-05-2009 03:53 PM

Reply
 
Thread Tools
Old 12-16-2010, 05:11 AM   #1
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Question Do extra PCR cycles really increase errors?

So the dogma goes:

Use as few PCR cycles during library construction as possible.

One can imagine various rationales behind this one.
  1. Any bias in the PCR process becomes more pronounced as more cycles are performed.
  2. If your library is very small (eg, 1 million amplicons) then PCR amplifying it to 1 billion amplicons serves little purpose. You will just end up sequencing those original 1 million amplicons and average of 1000 times each.
  3. A PCR polymerase has an inherent error rate. The more product strands created, the more errors introduced.

But is #3 even true? Before delving into it, I would like to exclude issues having to do with overrunning the supply of reactants in the PCR. If the amount of final product approaches the total supply of dNTPs in the reaction, I can easily imagine higher levels of misincorporation.

By "errors" here, we mean errors per total bases, right? If the PCR polymerase used is like Taq polymerase it likely has an error rate of about 1 in 10,000 bases polymerized.



Templates:

One thousand 100 base amplicons.

The question: will the error rate per amplicon be higher after 20 cycles than it was after 10 cycles?




Again, presuming reactants are not limiting and each cycle is 100% efficient -- that is, doubling the number of amplicons: After 10 cycles there will be 1 million (2^10 * 1000) amplicons (~0.1 pg of DNA). After 20 cycles there will be 1 billion (2^20 *1000) amplicons (~0.1 ng of DNA).

Doesn't intuition tell us that after 10 additional cycles the errors will have compounded and the overall error rate we might detect by sequencing 1000 of the amplicons will have increased?

Ignoring indels, I don't see this is the case. Sure, Taq polymerase would tend to misincorporate a base in 1% of the product strands that it creates, and that erroneous template will then be amplified each cycle. But the 99% of product strands that did not contain an error will also be amplified each cycle. Meaning, your error rate in your product strands will just be the error rate of the polymerase -- 1 in 10,000 bases, 1% of the amplicons in this case.

Don't get me wrong: I have sequenced PCR products. The error rate is way higher than 1 in 10,000 bases.

So what is wrong with my logic?

--
Phillip
pmiguel is offline   Reply With Quote
Old 12-17-2010, 02:23 AM   #2
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

So I'm rather new in bioinformatics, I have little choice other than to go what I'm taught and told. One of our lab vetarans once expressed his opinion on PCR prior to sequencing: he said that back in the days when he was using "Taq version 0.1" he hardly saw any bias. That he didn't expect that PCR caused much trouble now, even though there's much discussion on the subject of PCR duplicate reads and errors introduced by PCR.

And that is why I'd be interested to read reactions to Philips post.
Bruins is offline   Reply With Quote
Old 12-17-2010, 06:26 PM   #3
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Hi Phillip,

The problem arises b/c the errors are cumulative. Once you generate a mutation, it essentially becomes fixed at that fraction of the population since it's now template for all subsequent rounds of amplification. And every round of amplification generates additional errors, so the fraction of mutant molecules is always increasing.

By calculating the probability, it should be clear. Consider a single 100bp amplicon. The likelihood that a single base replicates faithfully is 1-0.0001 (the error rate) = 0.9999. The likelihood that the entire amplicon is correct is 0.9999^100 (the amplicon size) = .99. After 10 rounds of PCR, the likelihood of being error-free is .99^10 = .90 (still pretty good odds), but after 20 cycles that drops to .99^20 = .82. The remainder will contain one or more errors.

-Harold
HESmith is offline   Reply With Quote
Old 12-18-2010, 09:28 AM   #4
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Quote:
Originally Posted by Bruins View Post
So I'm rather new in bioinformatics, I have little choice other than to go what I'm taught and told. One of our lab vetarans once expressed his opinion on PCR prior to sequencing: he said that back in the days when he was using "Taq version 0.1" he hardly saw any bias. That he didn't expect that PCR caused much trouble now, even though there's much discussion on the subject of PCR duplicate reads and errors introduced by PCR.

And that is why I'd be interested to read reactions to Philips post.
Bias against high-GC & low-GC is seen in 2nd gen sequencing as well as in other PCR-based gene synthesis. Genomic PCR tends to behave worse in high GC regions, though I don't have real stats for that (but have seen it frequently). A number of papers have used low numbers of PCR cycles for library construction, though I don't know off-hand how carefully they looked at GC bias.
krobison is offline   Reply With Quote
Old 12-20-2010, 05:18 AM   #5
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

I wrote:

Quote:
Originally Posted by pmiguel View Post
Sure, Taq polymerase would tend to misincorporate a base in 1% of the product strands that it creates, and that erroneous template will then be amplified each cycle. But the 99% of product strands that did not contain an error will also be amplified each cycle. Meaning, your error rate in your product strands will just be the error rate of the polymerase -- 1 in 10,000 bases, 1% of the amplicons in this case.
To which Harold replies:
Quote:
Originally Posted by HESmith View Post
The problem arises b/c the errors are cumulative. Once you generate a mutation, it essentially becomes fixed at that fraction of the population since it's now template for all subsequent rounds of amplification. And every round of amplification generates additional errors, so the fraction of mutant molecules is always increasing.
Thanks Harold. Very clear.
Using this method, one could calculate a best case scenario for detection of minor variants in pools of samples.

--
Phillip
pmiguel is offline   Reply With Quote
Old 11-11-2014, 04:52 PM   #6
cientista_carioca
Junior Member
 
Location: San Francisco

Join Date: Aug 2011
Posts: 2
Default Are these probabilities real?

Quote:
Originally Posted by HESmith View Post
Hi Phillip,

By calculating the probability, it should be clear. Consider a single 100bp amplicon. The likelihood that a single base replicates faithfully is 1-0.0001 (the error rate) = 0.9999. The likelihood that the entire amplicon is correct is 0.9999^100 (the amplicon size) = .99. After 10 rounds of PCR, the likelihood of being error-free is .99^10 = .90 (still pretty good odds), but after 20 cycles that drops to .99^20 = .82. The remainder will contain one or more errors.

-Harold
Hi Harold,

I used your principle above to calculate error rates in a library preparation for sequencing (I am getting a high number of false heterozygous calls and I am trying to investigate why). My calculations are below:

enzyme error rate: 5.5x10-6
base accuracy: 1-5.5x10-6
likelihood of correct amplicon: accuracy x length (304) = 0.989077
likelihood of correct amplicon after 35 cycles: 0.680865

I thought that this number is way too high! Can you imagine doing variant calling with such error rate?

When I try to calculate the total number of amplicon molecules that does not have an error in my library at the end, I get stuck. I know that the input DNA is 50ng in the reaction, so using a simple calculation considering 3.3pg = 1 genome, I should have around 15,100 copies of my target in the beginning of the reaction. Theoretically, the calculations above show me that over 30% of my product will contain an error, not to mention sequencing errors! I know that in practice I don't get that, it is up to 1% both library prep and sequencing error. Where am I mistaken?

Best,
Camila.
cientista_carioca is offline   Reply With Quote
Old 11-11-2014, 11:14 PM   #7
austinso
Member
 
Location: Bay area

Join Date: Jun 2012
Posts: 77
Default

Quote:
Originally Posted by cientista_carioca View Post
Hi Harold,

I used your principle above to calculate error rates in a library preparation for sequencing (I am getting a high number of false heterozygous calls and I am trying to investigate why). My calculations are below:

enzyme error rate: 5.5x10-6
base accuracy: 1-5.5x10-6
likelihood of correct amplicon: accuracy x length (304) = 0.989077
likelihood of correct amplicon after 35 cycles: 0.680865

I thought that this number is way too high! Can you imagine doing variant calling with such error rate?

When I try to calculate the total number of amplicon molecules that does not have an error in my library at the end, I get stuck. I know that the input DNA is 50ng in the reaction, so using a simple calculation considering 3.3pg = 1 genome, I should have around 15,100 copies of my target in the beginning of the reaction. Theoretically, the calculations above show me that over 30% of my product will contain an error, not to mention sequencing errors! I know that in practice I don't get that, it is up to 1% both library prep and sequencing error. Where am I mistaken?

Best,
Camila.
The error rate of 0.989 refers to the probability of one error over the 304 bases, whereas when you consider variants, you are looking at the probability of an error at a single position over 35 cycles (0.02%)

FWIW.
austinso is offline   Reply With Quote
Old 11-12-2014, 03:43 AM   #8
Loris
Junior Member
 
Location: uk

Join Date: Dec 2009
Posts: 6
Default

Quote:
Originally Posted by cientista_carioca View Post
enzyme error rate: 5.5x10-6
base accuracy: 1-5.5x10-6
likelihood of correct amplicon: accuracy x length (304) = 0.989077
likelihood of correct amplicon after 35 cycles: 0.680865

I thought that this number is way too high! Can you imagine doing variant calling with such error rate?

When I try to calculate the total number of amplicon molecules that does not have an error in my library at the end, I get stuck. I know that the input DNA is 50ng in the reaction, so using a simple calculation considering 3.3pg = 1 genome, I should have around 15,100 copies of my target in the beginning of the reaction. Theoretically, the calculations above show me that over 30% of my product will contain an error, not to mention sequencing errors! I know that in practice I don't get that, it is up to 1% both library prep and sequencing error. Where am I mistaken?
If the error rate is 5.5*10^-6 per base, then the probability of an error-free copy is (1-5.5*10^-6)^304, which is about 0.998329 . That's significantly better per generation.

If you run 35 cycles of PCR, you're not going to get 35 doublings of DNA, because 1) each cycle is less than 2-fold even early in the reaction, and 2) reactants run out. Consider that 50ng * 2^35 = 1.7 kg of DNA.

If you know the yield of the PCR reaction you can work out how many actual replications you're getting, on average.
Loris is offline   Reply With Quote
Old 11-12-2014, 08:55 AM   #9
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It's important to note here that with every generation, only the new reads can have additional errors. So, for example, (correct rate)^(cycles) is not quite accurate; the actual number is better, and increasingly better as the duplication ratio per cycle drops from 100%.
Brian Bushnell is offline   Reply With Quote
Old 11-12-2014, 12:29 PM   #10
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Brian is absolutely correct that replication efficiency drops at every cycle as reagents are consumed. If not, then 35 cycles of 50 ng input would yield ~1.7 kg of product :-).

However, Phillip's original query asked to exclude the issue of reagent limitation. And the concept underlying Muller's ratchet, although not directly applicable to PCR, explains why the mutational load increases with each round of replication.
HESmith is offline   Reply With Quote
Old 12-03-2014, 11:11 PM   #11
cientista_carioca
Junior Member
 
Location: San Francisco

Join Date: Aug 2011
Posts: 2
Default

This makes sense, thanks for all the answers.
cientista_carioca is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO