SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Oxford Nanopore



Similar Threads
Thread Thread Starter Forum Replies Last Post
454 Titanium error rate ? 454fungi 454 Pyrosequencing 0 09-14-2010 01:06 AM
454 homopolymer error rate joa_ds 454 Pyrosequencing 10 12-18-2009 10:33 AM
error rate der_eiskern Illumina/Solexa 0 12-11-2009 02:51 PM
MAQ mapstat error rate fadista Bioinformatics 0 02-05-2009 01:21 AM

Reply
 
Thread Tools
Old 10-31-2015, 01:52 PM   #1
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 116
Default minion error rate

how can I know the error rate of my reads Minion?
how to distinguish the rate of insertion, deletion and substitutions?
thanks
mido1951 is offline   Reply With Quote
Old 10-31-2015, 05:00 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,244
Default

You won't unless there is a reference available to compare the reads to.
GenoMax is offline   Reply With Quote
Old 10-31-2015, 05:07 PM   #3
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 116
Default

for now I am trying to look for the reference.
If there is other solutions to determine the rate of error thank you for telling it to me.
mido1951 is offline   Reply With Quote
Old 11-02-2015, 03:04 AM   #4
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 750
Default

If there's no reference, a reasonable ballpark (especially for passed reads) is about 10% total error, distributed about 1/3 insertions, 1/3 deletions, and 1/3 SNPs.

It's possible to remap reads against other reads, but you need to be very careful with that, particularly across homopolymer regions. There's a lot of systematic error in the base calling (not the signal) which causes problems in consensus sequences.
gringer is offline   Reply With Quote
Old 11-02-2015, 01:23 PM   #5
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 116
Default

according to my research the overall error rate in the Minion sequencer is about 25%.
what's homopolymer regions?
Thank you
mido1951 is offline   Reply With Quote
Old 11-02-2015, 01:28 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,244
Default

Homo-polymers are long stretches of identical nucleotides (e.g. AAAAAAAAAAAAAAA)

This paper claims an error rate of 38%, but it is early days. This rate may be dependent on the sample and its composition too.
GenoMax is offline   Reply With Quote
Old 11-02-2015, 01:49 PM   #7
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 116
Default

I want to make a local alignment Minion reads but I do not know how to put the parameters of: insertion, deletion and match / mismatch.
I want to use "Affine GAP PENALTIES" but this method penalizes insertions and deletions. If I'm going to use "Affine GAP PENALTIES" but I also need a function that penalizes subsitutions.
Is this is a good idea friends?
mido1951 is offline   Reply With Quote
Old 11-02-2015, 02:54 PM   #8
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 750
Default

Quote:
according to my research the overall error rate in the Minion sequencer is about 25%.
When was the sequencing done? The f1000 paper has the most recent error analysis that I'm aware of (total error 10-15%), and that is from sequencing done a couple of months ago, prior to changing to a hexamer model for base calling:

http://f1000research.com/articles/4-1075/v1
gringer is offline   Reply With Quote
Old 11-02-2015, 04:56 PM   #9
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 278
Default

How high is the rate of "phantasy sequences", that have no resemblance to the reference, with the latest versions? I actually can't find many data of interest for me in the f1000 article. The athors are writing often a bout "a run". Are they writing about an average run or the cherry picked best run they ever encountered?
luc is offline   Reply With Quote
Old 11-02-2015, 05:19 PM   #10
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 750
Default

Quote:
How high is the rate of "phantasy sequences", that have no resemblance to the reference, with the latest versions?
Other people call this "mismatch rate", or alternatively refer to "mapping rate". From the 2D reads, mapping rate was pretty close to 100% for all except two runs (see Figure 6, or section with heading "Proportion of target and control sample"). I'm a bit hesitant to trust this fully, because there's a high chance of false positive matches when you have a high error rate.

Quote:
The authors are writing often about "a run". Are they writing about an average run or the cherry picked best run they ever encountered?
There were 20 runs which each tried to stick to the same specific sample preparation and sequencing protocol. You can call those cherry picked runs if you like, but these were the only runs that the groups did as part of their Phase 1 experiments. Only six runs were able to keep to this without variation, with the deviations from the standard protocol specified in Table S5.
gringer is offline   Reply With Quote
Old 11-02-2015, 07:06 PM   #11
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 484
Default

Quote:
Originally Posted by gringer View Post
When was the sequencing done? The f1000 paper has the most recent error analysis that I'm aware of (total error 10-15%), and that is from sequencing done a couple of months ago, prior to changing to a hexamer model for base calling:

http://f1000research.com/articles/4-1075/v1
So now is hexamer model? How much base calling accuracy does it gain over the previous pentamer model?
ymc is offline   Reply With Quote
Old 11-02-2015, 07:13 PM   #12
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 750
Default

Quote:
Originally Posted by ymc View Post
So now is hexamer model? How much base calling accuracy does it gain over the previous pentamer model?
See Jared Simpson's post here. There's no raw sequence accuracy mentioned there, but in consensus it was a 0.4% improvement.

I don't have results from any comparative sequencing that we've done, because getting *any* sequence is a bit of a challenge for us.
gringer is offline   Reply With Quote
Old 11-05-2015, 05:05 AM   #13
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 143
Default

Consensus improvement isn't the same as per-base improvement. I wonder what the difference is there.

Also Jared said it was mainly down to improvements in homopolymers or whether it is simply due to the frequency distribution of different homopolymer lengths. I wonder if this means they've changed their strategy.

Obviously any homopolymer longer than 5 (now 6) looks like a single event. Eg TAAAAAAAAG yields TAAAA AAAAA AAAAA AAAAA AAAAA AAAAG. Those AAAAA events all get joined together into a single longer event, which meant the longest homopolymer previously reported was 5. I assume it's now 6.

However an alternative (and obvious) strategy is to observe the event length distributions for non-homopolymer. They're largely random, centred around a particular amount. They'll either be gaussian or poisson I'd guess. Given a homopolymer signal of length L we can hypothesise 5, 6, 7, 8, ... lengths and derive the probability of the homopolymer being more than 5. The error rate would likely be horrendous, but in theory it ought to be better than just say "5, never more".

No idea if the signal is strong enough (ie the distribution tight enough) to extract sufficient accuracy to make it anything better than a wild guess.
jkbonfield is offline   Reply With Quote
Old 11-05-2015, 10:17 AM   #14
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 750
Default

Quote:
Originally Posted by jkbonfield View Post
However an alternative (and obvious) strategy is to observe the event length distributions for non-homopolymer. They're largely random, centred around a particular amount. They'll either be gaussian or poisson I'd guess. Given a homopolymer signal of length L we can hypothesise 5, 6, 7, 8, ... lengths and derive the probability of the homopolymer being more than 5. The error rate would likely be horrendous, but in theory it ought to be better than just say "5, never more".
This doesn't work so well in the way that ONT is modelling for base-calling, because the event length actually depends quite strongly on the bases that are found about 20bp upstream from the signal site (where the DNA is being unwound and split). In other words, it's not particularly random.
gringer is offline   Reply With Quote
Old 11-06-2015, 12:18 AM   #15
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 143
Default

Quote:
Originally Posted by gringer View Post
This doesn't work so well in the way that ONT is modelling for base-calling, because the event length actually depends quite strongly on the bases that are found about 20bp upstream from the signal site (where the DNA is being unwound and split). In other words, it's not particularly random.
In that case, it makes it possible to be more accurate than just guessing based on the random distribution, albeit perhaps also (too?) complex to tease out all the correlations.

Either way, I'm curious to know what the per-sequence-base (not per consensus base) accuracy is like with a 6mer model vs a 5mer model. We have some data of our own, but fundamentally the variabiity from run to run is high enough that I think it would need a large project to average out that variability to get robust numbers.
jkbonfield is offline   Reply With Quote
Old 11-06-2015, 01:17 AM   #16
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 750
Default

Quote:
Originally Posted by jkbonfield View Post
Either way, I'm curious to know what the per-sequence-base (not per consensus base) accuracy is like with a 6mer model vs a 5mer model.
But presumably not curious enough to download Nick Loman's publicly-available data and find out for yourself.

That's perfectly understandable. I'm still trying to find time to do proper signal-level correlations from our mtDNA data from over a year ago.
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO