SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: Quantitative and sensitive detection of rare mutations using droplet-based mi Newsbot! Literature Watch 0 05-20-2011 04:40 AM
Genomic DNA Pooling Strategy for Next-Generation Sequencing-Based Rare Variant Discov Asashoryu Literature Watch 0 03-11-2011 01:18 PM
PubMed: Detection of quasispecies variants predicted to use CXCR4 by ultra-deep pyros Newsbot! Literature Watch 0 12-17-2010 03:20 AM
PubMed: Ultra High Throughput Sequencing in Human DNA Variation Detection: A Comparat Newsbot! Literature Watch 0 10-12-2010 04:50 AM
Rare mutation detection dismith Events / Conferences 0 08-03-2008 02:12 PM

Reply
 
Thread Tools
Old 01-13-2011, 07:37 AM   #1
genlyai
Member
 
Location: Boston, MA

Join Date: Aug 2009
Posts: 39
Default Ultra-rare variant detection -- impossible?

What if I want to know whether an oncogene has acquired a single base mutation in a tumor, even if it's only present in 1 of every 10,000 cells? What if I want to know the prevalence of single nucleotide transcription errors for a specific transcript, even errors present at 1 in 20,000?

NGS technologies have an error rate of 1/1000 - 1/100 per base*read. Does this make the above two problems impossible to solve with NGS, even if I get millions of reads covering the regions in question?

I wanted to get the forum's thoughts on the above, in addition to hearing whether there are any publications addressing this, as I've found few. Can we think of any workarounds, either informatic or wet? How should we set thresholds for the minimum frequency at which a variant would be confidently detectable?

I look forward to anyone else's take,
Genly

(Wasn't sure what subforum to put this post in, so feel free to suggest another.)
genlyai is offline   Reply With Quote
Old 01-13-2011, 08:36 AM   #2
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Someone may have more intelligent things to say and suggest a informatics work around but I tend to agree you are limited to the error rate of the technology.

For example say you have a 1/1000 error rate and you required three hits (error +2, say one read in each direction) at a position to detect a mutation and say it's a heterozygous KRAS mutant. That means in a 1000 reads in a diploid tumor you analyzed 500 cells and with a 2 read hit requirement you have 1/250 cells sensitivity. Still not bad but those are generous error rates and I doubt anyone would waste much time using such low stringency cutoffs so all the talk of "deep sequencing" with no knowledge of error rates seems very naive.

Not sure if that helps as I pretty much agreed with you verbatim but I get tired of listening to people talk about "deep sequencing" who have not seen a sequencer, have not seen the raw data, and definitely don't know the error rates of the current machines...

I look forward to seeing others opinions too
Jon_Keats is offline   Reply With Quote
Old 01-14-2011, 01:45 AM   #3
henry.wood
Member
 
Location: Leeds, UK

Join Date: Apr 2010
Posts: 63
Default

There's a possible wet lab workaround that might go some of the way. It would work best with sequencing PCR products. You can split your sample into several very small pools. If you have something very rare then it will be present in only a portion of them. You can then do your PCR and make your libraries with tags on. There are a few clever tricks around like DNA sudoku http://hannonlab.cshl.edu/dna_sudoku/main.html which will allow very high levels of multiplexing. Then when you do your sequencing, if your SNP is real as opposed to being a sequencing error, all the reads with that SNP will be concentrated in just a few pools.
henry.wood is offline   Reply With Quote
Old 01-14-2011, 04:55 AM   #4
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

I second henry.wood. If you sequence with very small pools to high coverage, it is in theory possible to find a SNP at the 0.01% frequency.
lh3 is offline   Reply With Quote
Old 01-14-2011, 07:37 AM   #5
genlyai
Member
 
Location: Boston, MA

Join Date: Aug 2009
Posts: 39
Default

Glad to see that I'm not alone in caring about this. Like JK said, the idea of sequencing one locus to significant depth was always one of the selling points of NGS, and it's frustrating to not be able to access that.

Henry and lh3, that idea makes sense. So you're saying, let's say I want to be able to detect something present at 0.01%, and my "normal" threshold for calling a SNP significant is 5%. I would make tiny libraries such that my expected number of amplifiable fragments containing a given site is just 20 for each library, so if the rare allele is present in a lib, we should sequence it 5% of the time that site is sequenced. To get decent odds of seeing something at 0.01%, I would need to make and sequence ballpark [100% / (0.01% * 20) =] 500 such libs (but really a few fold more than that). Does this correctly summarize what you were envisioning?

I'm a bit concerned about the DNA sudoku aspect here. Won't it only work as long as we have perfect sensitivity for our detectable event even once the libraries are multiplexed? So as soon as you pool a few libs, your SNP sensitivity for the pool is below the workable threshold. So it seems like we are going to need a very large number of libraries and sequencing pools to get this to work.

Ok ... I started off enthusiastic there, and then convinced myself that it would be quite unwieldy. Am I thinking about this the wrong way? Let me know what you think.
genlyai is offline   Reply With Quote
Old 01-14-2011, 08:18 AM   #6
james hadfield
Moderator
Cambridge, UK
Community Forum
 
Location: Cambridge, UK

Join Date: Feb 2008
Posts: 221
Default

One thing I hope can make a difference is bi-directional reads on Ilumina.

Being able to sequence in F&R over an amplicon should allow very much higher Qscores to be called.

As seqeuncing read errors are probably higher than incorporation errors these would be greatly reduced following a bi-directional strategy. The major blocks for ultra-low detection are PCR errors at initial amplification and initial/early cycles of cluster generation. However these should be random and lower than the 0.1 or 0.01% you have mentioned.
james hadfield is offline   Reply With Quote
Old 01-14-2011, 08:20 AM   #7
henry.wood
Member
 
Location: Leeds, UK

Join Date: Apr 2010
Posts: 63
Default

Quote:
Originally Posted by genlyai View Post
To get decent odds of seeing something at 0.01%, I would need to make and sequence ballpark [100% / (0.01% * 20) =] 500 such libs (but really a few fold more than that)
That's the kind of thing I was envisaging. You never said you wanted a cheap and simple solution I can't claim to have got to the end of the Sudoku paper without my head spinning slightly, so it may well not be what you're after. Good luck with all those libraries.
henry.wood is offline   Reply With Quote
Old 01-14-2011, 08:34 AM   #8
genlyai
Member
 
Location: Boston, MA

Join Date: Aug 2009
Posts: 39
Default

Quote:
Originally Posted by james hadfield View Post
One thing I hope can make a difference is bi-directional reads on Ilumina.

Being able to sequence in F&R over an amplicon should allow very much higher Qscores to be called.

As seqeuncing read errors are probably higher than incorporation errors these would be greatly reduced following a bi-directional strategy. The major blocks for ultra-low detection are PCR errors at initial amplification and initial/early cycles of cluster generation. However these should be random and lower than the 0.1 or 0.01% you have mentioned.
Good point, and there may be some data out there to address this already.

On the other hand, my intuition is the opposite of yours wrt the contribution of PCR errors. Taq is normally quoted as having an error rate of around 0.01%/base*cyc. After 10-12 cyc, this is in the ballpark of the error rate of the whole process.

As I said, though, the data may well be out there to answer this without resorting to guesswork.
genlyai is offline   Reply With Quote
Old 01-14-2011, 08:42 AM   #9
genlyai
Member
 
Location: Boston, MA

Join Date: Aug 2009
Posts: 39
Default

Quote:
Originally Posted by henry.wood View Post
That's the kind of thing I was envisaging. You never said you wanted a cheap and simple solution I can't claim to have got to the end of the Sudoku paper without my head spinning slightly, so it may well not be what you're after. Good luck with all those libraries.
To be fair, the 5% detection threshold could probably be lowered for a well-run process, but we are still talking about 100+ libraries. Doable, but far from simple.
genlyai is offline   Reply With Quote
Old 01-16-2011, 04:21 PM   #10
frozenlyse
Senior Member
 
Location: Australia

Join Date: Sep 2008
Posts: 136
Default

Quote:
Originally Posted by genlyai View Post
On the other hand, my intuition is the opposite of yours wrt the contribution of PCR errors. Taq is normally quoted as having an error rate of around 0.01%/base*cyc. After 10-12 cyc, this is in the ballpark of the error rate of the whole process.

At least in the ChIP-seq library kits (the only one I have hands on experience with) the library prep PCR uses Phusion, which has a much lower error rate
frozenlyse is offline   Reply With Quote
Old 01-17-2011, 06:21 AM   #11
genlyai
Member
 
Location: Boston, MA

Join Date: Aug 2009
Posts: 39
Default

Quote:
Originally Posted by frozenlyse View Post
At least in the ChIP-seq library kits (the only one I have hands on experience with) the library prep PCR uses Phusion, which has a much lower error rate
Good point. Is this the case with Illumina's reagent kits?

For that matter, is anyone aware of a study that tries to quantify error arising from each step of the process?
genlyai is offline   Reply With Quote
Old 06-13-2012, 10:48 AM   #12
SeqR&D
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 26
Default Nothing is impossible

I say not impossible, and not necessarily relyant on the error rate of sequencing, or of amplification. If you can make a sequencing library that is smart enough to overcome these obstacles, you can attain the currently unattainable. I realize I'm not telling you anything, but if I shared all my ideas maybe you wouldn't come up with different ones.
SeqR&D is offline   Reply With Quote
Old 06-13-2012, 02:01 PM   #13
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

This paper should get the job done: http://www.pnas.org/content/108/23/9530.abstract
Heisman is offline   Reply With Quote
Old 06-21-2012, 01:33 PM   #14
The_Roads
Member
 
Location: USA

Join Date: May 2009
Posts: 37
Default

i would say most talk of the error rates of ngs grossly overemphasize the problem. yes there is a conflict rate of >>1% when you compare plain read sequences but as soon as you introduce any sort of quality filtering in your variant detection frequencies drop right down.

i agree thought that the error rate of the process will determine how "deep" we can see things but i would say 0.01% would not unrealistic with some tweaks.

we have done some sequencing of pcr amplified mtdna without doing anything as complex as the paper above where we assembled at 100,000-200,000 fold coverage. using just a medium stringency call quality filter during snp detection you can see a shift in average variant frequencies from 0.04% to 0.15% in control vs mice expressing error prone dna polymerase.

Obviously this does not in itself tell you a whole lot because we do not know how much of each value is error BUT it does demonstrate that even in a system as simple as this you can see some biology at this level of detection above that of the error rate of the overall process.
The_Roads is offline   Reply With Quote
Old 06-22-2012, 03:07 AM   #15
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

A similar paper came out from Sydney Brenner and friends around the same time as the Vogelstein PNAS paper cited above.

http://www.ncbi.nlm.nih.gov/pubmed/21490082
Nucleic Acids Res. 2011 Jul;39(12):e81. Epub 2011 Apr 13.
A method for counting PCR template molecules with application to next-generation sequencing.
Casbon JA, Osborne RJ, Brenner S, Lichtenstein CP.
Source
Population Genetics Technologies Ltd., Babraham Institute, Babraham, Cambridgeshire CB22 3AT, UK.

Abstract
Amplification by polymerase chain reaction is often used in the preparation of template DNA molecules for next-generation sequencing. Amplification increases the number of available molecules for sequencing but changes the representation of the template molecules in the amplified product and introduces random errors. Such changes in representation hinder applications requiring accurate quantification of template molecules, such as allele calling or estimation of microbial diversity. We present a simple method to count the number of template molecules using degenerate bases and show that it improves genotyping accuracy and removes noise from PCR amplification. This method can be easily added to existing DNA library preparation techniques and can improve the accuracy of variant calling.

PMID: 21490082 [PubMed - indexed for MEDLINE] PMCID: PMC3130290
krobison is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:00 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO