Seqanswers Leaderboard Ad

**Bruins** · 12-17-2010, 02:23 AM

So I'm rather new in bioinformatics, I have little choice other than to go what I'm taught and told. One of our lab vetarans once expressed his opinion on PCR prior to sequencing: he said that back in the days when he was using "Taq version 0.1" he hardly saw any bias. That he didn't expect that PCR caused much trouble now, even though there's much discussion on the subject of PCR duplicate reads and errors introduced by PCR.

And that is why I'd be interested to read reactions to Philips post.

**HESmith** · 12-17-2010, 06:26 PM

Hi Phillip,

The problem arises b/c the errors are cumulative. Once you generate a mutation, it essentially becomes fixed at that fraction of the population since it's now template for all subsequent rounds of amplification. And every round of amplification generates additional errors, so the fraction of mutant molecules is always increasing.

By calculating the probability, it should be clear. Consider a single 100bp amplicon. The likelihood that a single base replicates faithfully is 1-0.0001 (the error rate) = 0.9999. The likelihood that the entire amplicon is correct is 0.9999^100 (the amplicon size) = .99. After 10 rounds of PCR, the likelihood of being error-free is .99^10 = .90 (still pretty good odds), but after 20 cycles that drops to .99^20 = .82. The remainder will contain one or more errors.

-Harold

**krobison** · 12-18-2010, 09:28 AM

Originally posted by Bruins View Post

So I'm rather new in bioinformatics, I have little choice other than to go what I'm taught and told. One of our lab vetarans once expressed his opinion on PCR prior to sequencing: he said that back in the days when he was using "Taq version 0.1" he hardly saw any bias. That he didn't expect that PCR caused much trouble now, even though there's much discussion on the subject of PCR duplicate reads and errors introduced by PCR.

And that is why I'd be interested to read reactions to Philips post.

Bias against high-GC & low-GC is seen in 2nd gen sequencing as well as in other PCR-based gene synthesis. Genomic PCR tends to behave worse in high GC regions, though I don't have real stats for that (but have seen it frequently). A number of papers have used low numbers of PCR cycles for library construction, though I don't know off-hand how carefully they looked at GC bias.

**pmiguel** · 12-20-2010, 05:18 AM

I wrote:

Originally posted by pmiguel View Post

Sure, Taq polymerase would tend to misincorporate a base in 1% of the product strands that it creates, and that erroneous template will then be amplified each cycle. But the 99% of product strands that did not contain an error will also be amplified each cycle. Meaning, your error rate in your product strands will just be the error rate of the polymerase -- 1 in 10,000 bases, 1% of the amplicons in this case.

To which Harold replies:

Originally posted by HESmith View Post

The problem arises b/c the errors are cumulative. Once you generate a mutation, it essentially becomes fixed at that fraction of the population since it's now template for all subsequent rounds of amplification. And every round of amplification generates additional errors, so the fraction of mutant molecules is always increasing.

Thanks Harold. Very clear.
Using this method, one could calculate a best case scenario for detection of minor variants in pools of samples.

--
Phillip

**cientista_carioca** · 11-11-2014, 04:52 PM

Are these probabilities real?

Originally posted by HESmith View Post

Hi Phillip,

By calculating the probability, it should be clear. Consider a single 100bp amplicon. The likelihood that a single base replicates faithfully is 1-0.0001 (the error rate) = 0.9999. The likelihood that the entire amplicon is correct is 0.9999^100 (the amplicon size) = .99. After 10 rounds of PCR, the likelihood of being error-free is .99^10 = .90 (still pretty good odds), but after 20 cycles that drops to .99^20 = .82. The remainder will contain one or more errors.

-Harold

Hi Harold,

I used your principle above to calculate error rates in a library preparation for sequencing (I am getting a high number of false heterozygous calls and I am trying to investigate why). My calculations are below:

enzyme error rate: 5.5x10-6
base accuracy: 1-5.5x10-6
likelihood of correct amplicon: accuracy x length (304) = 0.989077
likelihood of correct amplicon after 35 cycles: 0.680865

I thought that this number is way too high! Can you imagine doing variant calling with such error rate?

When I try to calculate the total number of amplicon molecules that does not have an error in my library at the end, I get stuck. I know that the input DNA is 50ng in the reaction, so using a simple calculation considering 3.3pg = 1 genome, I should have around 15,100 copies of my target in the beginning of the reaction. Theoretically, the calculations above show me that over 30% of my product will contain an error, not to mention sequencing errors! I know that in practice I don't get that, it is up to 1% both library prep and sequencing error. Where am I mistaken?

Best,
Camila.

**austinso** · 11-11-2014, 11:14 PM

Originally posted by cientista_carioca View Post

Hi Harold,

I used your principle above to calculate error rates in a library preparation for sequencing (I am getting a high number of false heterozygous calls and I am trying to investigate why). My calculations are below:

enzyme error rate: 5.5x10-6
base accuracy: 1-5.5x10-6
likelihood of correct amplicon: accuracy x length (304) = 0.989077
likelihood of correct amplicon after 35 cycles: 0.680865

I thought that this number is way too high! Can you imagine doing variant calling with such error rate?

When I try to calculate the total number of amplicon molecules that does not have an error in my library at the end, I get stuck. I know that the input DNA is 50ng in the reaction, so using a simple calculation considering 3.3pg = 1 genome, I should have around 15,100 copies of my target in the beginning of the reaction. Theoretically, the calculations above show me that over 30% of my product will contain an error, not to mention sequencing errors! I know that in practice I don't get that, it is up to 1% both library prep and sequencing error. Where am I mistaken?

Best,
Camila.

The error rate of 0.989 refers to the probability of one error over the 304 bases, whereas when you consider variants, you are looking at the probability of an error at a single position over 35 cycles (0.02%)

FWIW.

**Loris** · 11-12-2014, 03:43 AM

Originally posted by cientista_carioca View Post

enzyme error rate: 5.5x10-6
base accuracy: 1-5.5x10-6
likelihood of correct amplicon: accuracy x length (304) = 0.989077
likelihood of correct amplicon after 35 cycles: 0.680865

I thought that this number is way too high! Can you imagine doing variant calling with such error rate?

When I try to calculate the total number of amplicon molecules that does not have an error in my library at the end, I get stuck. I know that the input DNA is 50ng in the reaction, so using a simple calculation considering 3.3pg = 1 genome, I should have around 15,100 copies of my target in the beginning of the reaction. Theoretically, the calculations above show me that over 30% of my product will contain an error, not to mention sequencing errors! I know that in practice I don't get that, it is up to 1% both library prep and sequencing error. Where am I mistaken?

If the error rate is 5.5*10^-6 per base, then the probability of an error-free copy is (1-5.5*10^-6)^304, which is about 0.998329 . That's significantly better per generation.

If you run 35 cycles of PCR, you're not going to get 35 doublings of DNA, because 1) each cycle is less than 2-fold even early in the reaction, and 2) reactants run out. Consider that 50ng * 2^35 = 1.7 kg of DNA.

If you know the yield of the PCR reaction you can work out how many actual replications you're getting, on average.

**Brian Bushnell** · 11-12-2014, 08:55 AM

It's important to note here that with every generation, only the new reads can have additional errors. So, for example, (correct rate)^(cycles) is not quite accurate; the actual number is better, and increasingly better as the duplication ratio per cycle drops from 100%.

**HESmith** · 11-12-2014, 12:29 PM

Brian is absolutely correct that replication efficiency drops at every cycle as reagents are consumed. If not, then 35 cycles of 50 ng input would yield ~1.7 kg of product :-).

However, Phillip's original query asked to exclude the issue of reagent limitation. And the concept underlying Muller's ratchet, although not directly applicable to PCR, explains why the mutational load increases with each round of replication.

**cientista_carioca** · 12-03-2014, 11:11 PM

This makes sense, thanks for all the answers.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Do extra PCR cycles really increase errors?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News