SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Oxford Nanopore



Similar Threads
Thread Thread Starter Forum Replies Last Post
macs2 -- peak calling error on non-model genome grgmncr Bioinformatics 0 07-29-2014 11:08 AM
Performance improvements perencia Bioinformatics 5 07-02-2010 06:08 AM
PubMed: Model-Based Quality Assessment and Base-Calling for Second-Generation Sequenc Newsbot! Literature Watch 0 11-17-2009 02:10 AM

Reply
 
Thread Tools
Old 08-15-2015, 07:12 PM   #1
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default Re-calling the same event model -- improvements over 10 months

People seem to be interested in the error rate of the MinION. I'd like to put this image up to demonstrate one of the reasons why error rate is a fickle beast to calculate:



This is exactly the same event signal model (combination of current and dwell time inside the pore) recalled at three separate times over the past year. I've selected a small region covering a homopolymer sequence to make the mapping changes more impressive and easier to see. The reference sequence is shown in the middle (at the 0 line), with changes shown above and below the sequence.
gringer is offline   Reply With Quote
Old 09-14-2015, 05:21 PM   #2
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 497
Default

How do I make sense of this graph? What software was used to generate it?

What are the versions of Flowcell, SQK-MAP, Metrichor for each of them?
ymc is offline   Reply With Quote
Old 09-14-2015, 06:24 PM   #3
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

I used a custom R script for generating the graph, which works on a pairwise alignment of two sequences. The reference sequence appears at the 0 line in the graph (the top letters), and any substitutions appears underneath that, colour-coded depending on the three different types of substitutions (purine/pyrimidine, methyl/keto, strong/weak). Insertions appear as chartreuse wedges above the reference sequence, and deletions are steel blue triangles that exclude reference sequences.

I've attached the script I used to create an earlier graph with the same appearance (Image 5 in that script).

Flow cell and sequencing kits are obviously the same for all sequences, and were current on 2014-Oct-03: R7.3 flow cell, and I think SQK-MAP003.

I'm not sure about Metrichor, it was just whatever was current at the time. According to the Fast5 files, the first sequence was chimaera v1.2.2, the middle sequence was chimaera v1.6.3, and the third sequence was chimaera v1.14.4 with dragonet v1.14.2.
Attached Files
File Type: gz LC_PK_graphs.r.gz (7.0 KB, 4 views)

Last edited by gringer; 09-14-2015 at 06:32 PM.
gringer is offline   Reply With Quote
Old 09-15-2015, 01:14 AM   #4
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 497
Default

Thanks for your reply. The third graph doesn't seem to be an obvious improvement over the second one. It seems to me it just substituted one type of error with another type.
ymc is offline   Reply With Quote
Old 09-15-2015, 04:35 AM   #5
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

The improvement is that it has detected a single base insertion in the homopolymer region, which is a nice result given that our sample had a single base insertion in that region. There are substitution errors, and the inserted base is incorrect (T instead of A), but it suggests to me that things are moving in the right direction. It also demonstrates that it might be possible to call sequences across long homopolymer regions after all, despite the theoretical model suggesting that there should be no difference in signal between adjacent events in the middle of the region.
gringer is offline   Reply With Quote
Old 09-15-2015, 04:48 PM   #6
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 497
Default

Do you mean the T insertion between 9825 and 9826 is real? I thought you were just re-sequencing a reference sample. Did you actually sequence a sample from the same strain of the reference but was not the same sample?
ymc is offline   Reply With Quote
Old 09-15-2015, 07:03 PM   #7
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

It should be an 'A' insertion, but yes, it's real. We were sequencing 4T1 cancer cells, which have a few variants different from the reference sequence. You can see the paper for more details:

http://www.cell.com/cell-metabolism/...2814%2900554-3

ResearchGate link if you don't have direct access to the paper through Cell:

https://www.researchgate.net/publication/270582858
gringer is offline   Reply With Quote
Old 10-15-2015, 04:02 PM   #8
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,179
Default

Quote:
Originally Posted by gringer View Post
People seem to be interested in the error rate of the MinION. I'd like to put this image up to demonstrate one of the reasons why error rate is a fickle beast to calculate:



This is exactly the same event signal model (combination of current and dwell time inside the pore) recalled at three separate times over the past year. I've selected a small region covering a homopolymer sequence to make the mapping changes more impressive and easier to see. The reference sequence is shown in the middle (at the 0 line), with changes shown above and below the sequence.
I think it would be a good idea to declare conflict of interest when praising a platform. Are you involved in MinIon Analysis and Reference Consortium (MARC)?
nucacidhunter is offline   Reply With Quote
Old 10-17-2015, 04:14 AM   #9
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

Quote:
Originally Posted by nucacidhunter View Post
I think it would be a good idea to declare conflict of interest when praising a platform. Are you involved in MinION Analysis and Reference Consortium (MARC)?
Yes, and I've also been part of the MAP since the start, and have mentioned my involvement with MAP previously on SEQanswers. It's silly to repeat that every time I talk about the MinION, because everyone who has access to a MinION sequencer has received some amount of shipping-cost-only flow cells and reagents from Oxford Nanopore.

The only way you're going to find an interest-free analysis is if someone from outside MAP takes some of the publically-available data and does their own analysis on that. Based on how much feedback I've got on the mitochondrial data I released last year (i.e. none), don't get your hopes up on that.

It's also currently impossible to re-call event data without having access to Metrichor, so unless someone from outside MAP writes their own base caller everyone is stuck with what ONT throws at them.

Perhaps our MARC paper will change that, because it's a bit more public and has a lot more pre-analysed and mapped data for other people to look at.

Last edited by gringer; 10-17-2015 at 04:23 AM.
gringer is offline   Reply With Quote
Old 10-17-2015, 05:35 AM   #10
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 497
Default

Quote:
Originally Posted by gringer View Post
It should be an 'A' insertion, but yes, it's real. We were sequencing 4T1 cancer cells, which have a few variants different from the reference sequence. You can see the paper for more details:

http://www.cell.com/cell-metabolism/...2814%2900554-3

ResearchGate link if you don't have direct access to the paper through Cell:

https://www.researchgate.net/publication/270582858
Does it make sense to use long read technology to study somatic mutations?

I think the Illumina and X10 combo should work better because I have yet encountered a somatic repeat that can take advantage of the true long read technology.
ymc is offline   Reply With Quote
Old 10-17-2015, 11:02 AM   #11
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

Quote:
Originally Posted by ymc View Post
Does it make sense to use long read technology to study somatic mutations?
Yes, because we were able to do a whole-mitochondria run on two amplified 8kb fragments of mitochondrial DNA for about $100 (approximate cost of non-ONT reagents and shipping-cost-only flow cells). Illumina is overkill for mitochondrial sequencing, so it makes sense to use something cheaper when available. Even without barcoding, we can get at least 4 mitochondrial runs done on the MinION by using wash buffer between runs and running for 1-4 hours.

Quote:
Originally Posted by ymc View Post
I think the Illumina and X10 combo should work better because I have yet encountered a somatic repeat that can take advantage of the true long read technology.
The MinION does a reasonable job with SNPs and small INDELs. It's just not (yet) great for long homopolymers as demonstrated here. I found a few other mitochondrial SNPs that did work well with the MinION, and were supported by IonTorrent sequencing.
gringer is offline   Reply With Quote
Old 10-17-2015, 06:51 PM   #12
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,179
Default

Quote:
Originally Posted by gringer View Post
Yes, and I've also been part of the MAP since the start, and have mentioned my involvement with MAP previously on SEQanswers. It's silly to repeat that every time I talk about the MinION, because everyone who has access to a MinION sequencer has received some amount of shipping-cost-only flow cells and reagents from Oxford Nanopore.

The only way you're going to find an interest-free analysis is if someone from outside MAP takes some of the publically-available data and does their own analysis on that. Based on how much feedback I've got on the mitochondrial data I released last year (i.e. none), don't get your hopes up on that.

It's also currently impossible to re-call event data without having access to Metrichor, so unless someone from outside MAP writes their own base caller everyone is stuck with what ONT throws at them.

Perhaps our MARC paper will change that, because it's a bit more public and has a lot more pre-analysed and mapped data for other people to look at.
Only a subset of MAP participants and ONT paid consultants are involved with MARC and I think this makes it different from ordinary MAPers.
nucacidhunter is offline   Reply With Quote
Old 10-18-2015, 12:33 AM   #13
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

Quote:
Originally Posted by nucacidhunter View Post
Only a subset of MAP participants and ONT paid consultants are involved with MARC and I think this makes it different from ordinary MAPers.
This comment suggests that MARC is some exclusive club, but it's not. Anyone can be part of MARC, even those outside MAP. There's no fee to pay, and no one cares if members don't say anything on the mailing lists or check in during the meetings.

MARC is no different from the rest of MAP in that ONT will give free-excluding-shipping flow cells to anyone who wants to try out a big experiment and publish a paper or present at a meeting. There is some collective bargaining advantage, but we're still all waiting for flow cells to arrive, and are stuck behind the queue of commercial customers just like everyone else in MAP. Any people who pay $1000 (or $500 in bulk) for each flow cell will get faster access to ONT services than anyone in MARC.

Anyone inside MAP can see the results that MARC is producing (they're on the MAP wiki), and (when I've got a bit of spare time to write) can also see the minutes of the teleconferences that we have.

If anyone else wants to join, just let Ewan Birney know (birney at ebi.ac.uk), and he can add another email address to the mailing list.
gringer is offline   Reply With Quote
Old 10-18-2015, 12:48 AM   #14
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,179
Default

Thanks for providing more info on MARC.
nucacidhunter is offline   Reply With Quote
Old 10-18-2015, 01:45 AM   #15
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 497
Default

"behind the queue of commercial customers" - What does this mean? Does it mean if you pay (how much?), then you can get a box really quick? Can you elaborate? Thanks
ymc is offline   Reply With Quote
Old 10-18-2015, 01:55 AM   #16
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

When you join with the program, you get discount vouchers for a set of flow cells and reagent kits, and these vouchers are also handed out for various other reasons (e.g. conference talks, publications, popular community wiki posts). The discount is frequently for the total cost of kits, but participants still need to pay for shipping.

Since they opened up the commercial space after the London Calling conference in May, ONT's stance has been that people who pay will get ordering/shipping priority over those who are getting shipping-cost-only kits and flow cells. I expect that this means there are a lot of small labs around the world getting frustrated waiting for their first kits to find out if the platform is useful at all, while big facilities with deep pockets get the lions share of the sequencing capacity.
gringer is offline   Reply With Quote
Old 01-16-2016, 01:03 PM   #17
sshen
Junior Member
 
Location: new york

Join Date: Dec 2010
Posts: 6
Default

Quote:
Originally Posted by gringer View Post
Yes, and I've also been part of the MAP since the start, and have mentioned my involvement with MAP previously on SEQanswers. It's silly to repeat that every time I talk about the MinION, because everyone who has access to a MinION sequencer has received some amount of shipping-cost-only flow cells and reagents from Oxford Nanopore.

The only way you're going to find an interest-free analysis is if someone from outside MAP takes some of the publically-available data and does their own analysis on that. Based on how much feedback I've got on the mitochondrial data I released last year (i.e. none), don't get your hopes up on that.

It's also currently impossible to re-call event data without having access to Metrichor, so unless someone from outside MAP writes their own base caller everyone is stuck with what ONT throws at them.

Perhaps our MARC paper will change that, because it's a bit more public and has a lot more pre-analysed and mapped data for other people to look at.

Is Metrichor open source? We are actually interested in developing a base caller for minIon data. If someone with experience would like to work together, please let me know. Thanks!
sshen is offline   Reply With Quote
Old 01-17-2016, 10:18 AM   #18
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

Quote:
If someone with experience would like to work together, please let me know. Thanks!
Yes, please. I have a lot of experience with the MinION, but unfortunately not much time.

There's quite a difference between open source and free (as in freedom). You can blame Microsoft for blurring the lines with that. A lot of the MinION stuff is "open source", but still closed due to a restrictive license.

However, while the MinKNOW code is almost all python and available for viewing on the local machine, I haven't seen any Metrichor code. ONT haven't been particularly keen to release their base-caller code, but have given us their rough algorithm; it's enough that a programmer familiar with signal processing could make something that's pretty close to what ONT has done.
gringer is offline   Reply With Quote
Old 01-17-2016, 07:36 PM   #19
sshen
Junior Member
 
Location: new york

Join Date: Dec 2010
Posts: 6
Default

Quote:
Originally Posted by gringer View Post
Yes, please. I have a lot of experience with the MinION, but unfortunately not much time.

There's quite a difference between open source and free (as in freedom). You can blame Microsoft for blurring the lines with that. A lot of the MinION stuff is "open source", but still closed due to a restrictive license.

However, while the MinKNOW code is almost all python and available for viewing on the local machine, I haven't seen any Metrichor code. ONT haven't been particularly keen to release their base-caller code, but have given us their rough algorithm; it's enough that a programmer familiar with signal processing could make something that's pretty close to what ONT has done.
Glad to know you are interested in such work. I have been reading some articles (http://www.cs.cmu.edu/~dgovinda/pdf/...and-cvpr97.pdf) and thought that ONT algorithm is possible to be improved. But without knowing a bit detail of it, it's hard to get anything down.

I don't have much time to code either. But if others are also interested in creating better base caller, we should form a group. I also have a small funding for this work, which will be used for compensating people's time.

Sorry if this isn't a right place for such discussion.
sshen is offline   Reply With Quote
Reply

Tags
basecaller, error, nanopore

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO