Seqanswers Leaderboard Ad

**Jon_Keats** · 01-13-2011, 08:36 AM

Someone may have more intelligent things to say and suggest a informatics work around but I tend to agree you are limited to the error rate of the technology.

For example say you have a 1/1000 error rate and you required three hits (error +2, say one read in each direction) at a position to detect a mutation and say it's a heterozygous KRAS mutant. That means in a 1000 reads in a diploid tumor you analyzed 500 cells and with a 2 read hit requirement you have 1/250 cells sensitivity. Still not bad but those are generous error rates and I doubt anyone would waste much time using such low stringency cutoffs so all the talk of "deep sequencing" with no knowledge of error rates seems very naive.

Not sure if that helps as I pretty much agreed with you verbatim but I get tired of listening to people talk about "deep sequencing" who have not seen a sequencer, have not seen the raw data, and definitely don't know the error rates of the current machines...

I look forward to seeing others opinions too

**henry.wood** · 01-14-2011, 01:45 AM

There's a possible wet lab workaround that might go some of the way. It would work best with sequencing PCR products. You can split your sample into several very small pools. If you have something very rare then it will be present in only a portion of them. You can then do your PCR and make your libraries with tags on. There are a few clever tricks around like DNA sudoku http://hannonlab.cshl.edu/dna_sudoku/main.html which will allow very high levels of multiplexing. Then when you do your sequencing, if your SNP is real as opposed to being a sequencing error, all the reads with that SNP will be concentrated in just a few pools.

**lh3** · 01-14-2011, 04:55 AM

I second henry.wood. If you sequence with very small pools to high coverage, it is in theory possible to find a SNP at the 0.01% frequency.

**genlyai** · 01-14-2011, 07:37 AM

Glad to see that I'm not alone in caring about this. Like JK said, the idea of sequencing one locus to significant depth was always one of the selling points of NGS, and it's frustrating to not be able to access that.

Henry and lh3, that idea makes sense. So you're saying, let's say I want to be able to detect something present at 0.01%, and my "normal" threshold for calling a SNP significant is 5%. I would make tiny libraries such that my expected number of amplifiable fragments containing a given site is just 20 for each library, so if the rare allele is present in a lib, we should sequence it 5% of the time that site is sequenced. To get decent odds of seeing something at 0.01%, I would need to make and sequence ballpark [100% / (0.01% * 20) =] 500 such libs (but really a few fold more than that). Does this correctly summarize what you were envisioning?

I'm a bit concerned about the DNA sudoku aspect here. Won't it only work as long as we have perfect sensitivity for our detectable event even once the libraries are multiplexed? So as soon as you pool a few libs, your SNP sensitivity for the pool is below the workable threshold. So it seems like we are going to need a very large number of libraries and sequencing pools to get this to work.

Ok ... I started off enthusiastic there, and then convinced myself that it would be quite unwieldy. Am I thinking about this the wrong way? Let me know what you think.

**james hadfield** · 01-14-2011, 08:18 AM

One thing I hope can make a difference is bi-directional reads on Ilumina.

Being able to sequence in F&R over an amplicon should allow very much higher Qscores to be called.

As seqeuncing read errors are probably higher than incorporation errors these would be greatly reduced following a bi-directional strategy. The major blocks for ultra-low detection are PCR errors at initial amplification and initial/early cycles of cluster generation. However these should be random and lower than the 0.1 or 0.01% you have mentioned.

**henry.wood** · 01-14-2011, 08:20 AM

Originally posted by genlyai View Post

To get decent odds of seeing something at 0.01%, I would need to make and sequence ballpark [100% / (0.01% * 20) =] 500 such libs (but really a few fold more than that)

That's the kind of thing I was envisaging. You never said you wanted a cheap and simple solution

I can't claim to have got to the end of the Sudoku paper without my head spinning slightly, so it may well not be what you're after. Good luck with all those libraries.

**genlyai** · 01-14-2011, 08:34 AM

Originally posted by james hadfield View Post

One thing I hope can make a difference is bi-directional reads on Ilumina.

Being able to sequence in F&R over an amplicon should allow very much higher Qscores to be called.

As seqeuncing read errors are probably higher than incorporation errors these would be greatly reduced following a bi-directional strategy. The major blocks for ultra-low detection are PCR errors at initial amplification and initial/early cycles of cluster generation. However these should be random and lower than the 0.1 or 0.01% you have mentioned.

Good point, and there may be some data out there to address this already.

On the other hand, my intuition is the opposite of yours wrt the contribution of PCR errors. Taq is normally quoted as having an error rate of around 0.01%/base*cyc. After 10-12 cyc, this is in the ballpark of the error rate of the whole process.

As I said, though, the data may well be out there to answer this without resorting to guesswork.

**genlyai** · 01-14-2011, 08:42 AM

Originally posted by henry.wood View Post

That's the kind of thing I was envisaging. You never said you wanted a cheap and simple solution

I can't claim to have got to the end of the Sudoku paper without my head spinning slightly, so it may well not be what you're after. Good luck with all those libraries.

To be fair, the 5% detection threshold could probably be lowered for a well-run process, but we are still talking about 100+ libraries. Doable, but far from simple.

**frozenlyse** · 01-16-2011, 04:21 PM

Originally posted by genlyai View Post

On the other hand, my intuition is the opposite of yours wrt the contribution of PCR errors. Taq is normally quoted as having an error rate of around 0.01%/base*cyc. After 10-12 cyc, this is in the ballpark of the error rate of the whole process.

At least in the ChIP-seq library kits (the only one I have hands on experience with) the library prep PCR uses Phusion, which has a much lower error rate

**genlyai** · 01-17-2011, 06:21 AM

Originally posted by frozenlyse View Post

At least in the ChIP-seq library kits (the only one I have hands on experience with) the library prep PCR uses Phusion, which has a much lower error rate

Good point. Is this the case with Illumina's reagent kits?

For that matter, is anyone aware of a study that tries to quantify error arising from each step of the process?

**SeqR&D** · 06-13-2012, 09:48 AM

Nothing is impossible

I say not impossible, and not necessarily relyant on the error rate of sequencing, or of amplification. If you can make a sequencing library that is smart enough to overcome these obstacles, you can attain the currently unattainable. I realize I'm not telling you anything, but if I shared all my ideas maybe you wouldn't come up with different ones.

**Heisman** · 06-13-2012, 01:01 PM

This paper should get the job done: http://www.pnas.org/content/108/23/9530.abstract

**The_Roads** · 06-21-2012, 12:33 PM

i would say most talk of the error rates of ngs grossly overemphasize the problem. yes there is a conflict rate of >>1% when you compare plain read sequences but as soon as you introduce any sort of quality filtering in your variant detection frequencies drop right down.

i agree thought that the error rate of the process will determine how "deep" we can see things but i would say 0.01% would not unrealistic with some tweaks.

we have done some sequencing of pcr amplified mtdna without doing anything as complex as the paper above where we assembled at 100,000-200,000 fold coverage. using just a medium stringency call quality filter during snp detection you can see a shift in average variant frequencies from 0.04% to 0.15% in control vs mice expressing error prone dna polymerase.

Obviously this does not in itself tell you a whole lot because we do not know how much of each value is error BUT it does demonstrate that even in a system as simple as this you can see some biology at this level of detection above that of the error rate of the overall process.

**krobison** · 06-22-2012, 02:07 AM

A similar paper came out from Sydney Brenner and friends around the same time as the Vogelstein PNAS paper cited above.

A method for counting PCR template molecules with application to next-generation sequencing - PubMed

http://www.ncbi.nlm.nih.gov/pubmed/21490082

Amplification by polymerase chain reaction is often used in the preparation of template DNA molecules for next-generation sequencing. Amplification increases the number of available molecules for sequencing but changes the representation of the template molecules in the amplified product and introdu …

Nucleic Acids Res. 2011 Jul;39(12):e81. Epub 2011 Apr 13.
A method for counting PCR template molecules with application to next-generation sequencing.
Casbon JA, Osborne RJ, Brenner S, Lichtenstein CP.
Source
Population Genetics Technologies Ltd., Babraham Institute, Babraham, Cambridgeshire CB22 3AT, UK.

Abstract
Amplification by polymerase chain reaction is often used in the preparation of template DNA molecules for next-generation sequencing. Amplification increases the number of available molecules for sequencing but changes the representation of the template molecules in the amplified product and introduces random errors. Such changes in representation hinder applications requiring accurate quantification of template molecules, such as allele calling or estimation of microbial diversity. We present a simple method to count the number of template molecules using degenerate bases and show that it improves genotyping accuracy and removes noise from PCR amplification. This method can be easily added to existing DNA library preparation techniques and can improve the accuracy of variant calling.

PMID: 21490082 [PubMed - indexed for MEDLINE] PMCID: PMC3130290

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 23 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Ultra-rare variant detection -- impossible?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News