SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Oxford Nanopore



Similar Threads
Thread Thread Starter Forum Replies Last Post
ONT MAP - what do you plan to do with it? BBoy The Pipeline 3 03-10-2014 09:45 AM
Slides from a talk on genome assembly & Assemblathon 2 kbradnam General 0 04-25-2013 10:25 AM
Let's Talk About TruSeq farrel75 Sample Prep / Library Generation 4 06-20-2012 03:55 AM
ONT error model and quality scoring SillyPoint The Pipeline 0 02-21-2012 07:21 AM
Oxford Nanopore mccullou The Pipeline 0 10-22-2008 09:05 AM

Reply
 
Thread Tools
Old 09-25-2014, 05:09 PM   #1
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default Let's talk about ONT nanopore stuff!

Per the request here, it seems time to create this forum! I'm really excited to see where this data goes and when I can get my hands on a MinION!
ECO is offline   Reply With Quote
Old 09-25-2014, 05:42 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by ECO View Post
Per the request here, it seems time to create this forum! I'm really excited to see where this data goes and when I can get my hands on a MinION!
More like "get my hand on a MinION" They're tiny!

That said, it's not entirely clear to me what users are allowed to discuss about results, though I will describe my methodology for evaluating it. I used both the 1D and 2D reads (converted to fastq), and mapped with this command line:

Code:
mapPacBio.sh -Xmx30g k=7 in=reads.fastq ref=reference.fa maxreadlen=1000 minlen=200 idtag ow int=f qin=33 mhist=mhist1.txt idhist=idhist1.txt ehist=ehist1.txt indelhist=indelhist1.txt lhist=lhist1.txt gchist=gchist1.txt qhist=qhist1.txt qahist=qahist1.txt bhist=bhist1.txt out=mapped1.sam minratio=0.15 ignorequality slow ordered maxindel1=40 maxindel2=400 nodisk bs=bs1.sh
Then I pasted the histograms into Excel and examined their scatterplots. This command breaks reads over 1kbp into 1kbp pieces and maps them independently; you can set this higher (up to 6kbp) but the mapping rate drops as the shred length increases. The output is in the same order as the input, so you can determine mapped read length by counting the number of consecutive sam lines with the same read name (the pieces get a name suffix of _1, _2, etc) that map to consecutive genomic positions.

If you run the resulting "bs1.sh" bash shellscript, and have samtools installed, it will turn the sam output into a sorted, indexed bam file ready for IGV.

Last edited by Brian Bushnell; 09-25-2014 at 05:44 PM.
Brian Bushnell is offline   Reply With Quote
Old 09-25-2014, 10:18 PM   #3
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Quote:
Originally Posted by Brian Bushnell View Post
This command breaks reads over 1kbp into 1kbp pieces and maps them independently; you can set this higher (up to 6kbp) but the mapping rate drops as the shred length increases.
And can you state on why it is dropping? To many errors in the alignment? Breaking the reads into small fragments sounds like one step backwards to me
WhatsOEver is offline   Reply With Quote
Old 09-26-2014, 03:50 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

So is this an one-off observation?

Can people chime in (with non-specific comments, if they can't talk about specifics) if their experience is in-line/unlike the paper above?
GenoMax is offline   Reply With Quote
Old 09-26-2014, 07:10 AM   #5
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

By the time people post results the data is already obsolete. The protocols and software change every week if not sooner.
NextGenSeq is offline   Reply With Quote
Old 09-26-2014, 07:21 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

@NextGenSeq: Are you implying that data from one week can't be trusted the next
GenoMax is offline   Reply With Quote
Old 09-26-2014, 08:18 AM   #7
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

No it is improving every week.

If I were PacBio I would be worried.
NextGenSeq is offline   Reply With Quote
Old 09-26-2014, 08:57 AM   #8
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by WhatsOEver View Post
And can you state on why it is dropping? To many errors in the alignment? Breaking the reads into small fragments sounds like one step backwards to me
The Nanopore reads I've seen have a sort of 'bistable' error model - lower for a while, then higher for a while, then lower for a while, etc. The higher-error mode is harder to map. Breaking the reads into pieces allows mapping the lower-error-mode pieces and discarding the higher-error-mode pieces; the shorter the piece, the more likely it will be entirely within a lower-error-mode region.
Brian Bushnell is offline   Reply With Quote
Old 09-26-2014, 10:09 AM   #9
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

About a month back, one of my collaborators asked me to check about Oxford Nanopore, because she was planning to do a large sequencing project with Illumina+Pacbio, and wanted to know whether waiting for ONT would save her money. She heard good things from another colleague about the portability of Minions and was curious. I am not involved in the early access program and looked for any information available publicly. Based on what I found, I believe the company is advised by incompetent scientists, who are getting the company bad reputation.


My personal background - I have been working on nanotechnology since 1993, wrote the first (and highly cited) paper on calculating electrical current through small organic molecules in 1995 and worked with the NASA Nanotech group for several years in early 2000 before moving on to genomics. At NASA, one my closest collaborator worked on nanopore sequencing and another one worked on computational modeling current flow through the pore. However, I was never directly involved in either of those projects and the main reason being signal quality from the pores. So, the first thing I wanted to find out about ONT is the error rate, because the electrical signal from molecules moving at room temperature tends to get noisy. This is basic quantum (and statistical) physics, which no amount of technology can overcome.


The error rate is very important in deciding about assembly projects. It is definitely possible to do assembly from long erroneous reads, but you will need more reads and that means your costs go up. At the end of the day, my collaborator is interested in comparative costs between various technologies.


I tried to find a straight answer for over a month and could not. For example, Michael Schatz, who is involved in early access program, posted a figure showing 'assembly from nanopore' in twitter. When I asked him about the error rate, he gave some philosophical answer - 'I do not care, because assembly is possible, as long as there is more signal than noise'. WTF? Based on his slides from a recent conference (see here), he had the numbers, but decided to stonewall. Then I learned that the assembly was done with nanopore+ILMN (hybrid), whereas PacBio assemblies are done with PacBio only. Neither did I get a straight answer about error rate from Nick Loman - another scientist working closely with ONT CEO to release data. Those frustrations led me to write this blog post about the company -

http://www.homolog.us/blogs/blog/201...ays-from-here/


The situation seems to have improved somewhat after the company allowed Nick Loman to release his data (check our blog for link), and Michael Schatz posted his slides with the kind of information one needs to make decisions -

http://www.homolog.us/blogs/blog/201...but-the-truth/


Hopefully, others will take a look at the data and come up with an objective answer regarding what is possible and not possible. The technology has promises, but error rate is a critical concern.
__________________
http://homolog.us
samanta is offline   Reply With Quote
Old 09-26-2014, 10:14 AM   #10
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

Quote:
Originally Posted by Brian Bushnell View Post
The Nanopore reads I've seen have a sort of 'bistable' error model - lower for a while, then higher for a while, then lower for a while, etc. The higher-error mode is harder to map. Breaking the reads into pieces allows mapping the lower-error-mode pieces and discarding the higher-error-mode pieces; the shorter the piece, the more likely it will be entirely within a lower-error-mode region.
That is possibly due to the molecule moving through the pore at different speed, and the HMM (Viterbi) calculation for base-calling being fixed at one mode and miscalling in the other mode.

This thing is definitely a physicist's paradise and would give rise to interesting physics papers, similar to what we used to do on current transport during early 1990s.
__________________
http://homolog.us
samanta is offline   Reply With Quote
Old 09-26-2014, 11:57 AM   #11
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

Quote:
Originally Posted by samanta View Post
About a month back, one of my collaborators asked me to check about Oxford Nanopore, because she was planning to do a large sequencing project with Illumina+Pacbio, and wanted to know whether waiting for ONT would save her money. She heard good things from another colleague about the portability of Minions and was curious. I am not involved in the early access program and looked for any information available publicly. Based on what I found, I believe the company is advised by incompetent scientists, who are getting the company bad reputation.


My personal background - I have been working on nanotechnology since 1993, wrote the first (and highly cited) paper on calculating electrical current through small organic molecules in 1995 and worked with the NASA Nanotech group for several years in early 2000 before moving on to genomics. At NASA, one my closest collaborator worked on nanopore sequencing and another one worked on computational modeling current flow through the pore. However, I was never directly involved in either of those projects and the main reason being signal quality from the pores. So, the first thing I wanted to find out about ONT is the error rate, because the electrical signal from molecules moving at room temperature tends to get noisy. This is basic quantum (and statistical) physics, which no amount of technology can overcome.


The error rate is very important in deciding about assembly projects. It is definitely possible to do assembly from long erroneous reads, but you will need more reads and that means your costs go up. At the end of the day, my collaborator is interested in comparative costs between various technologies.


I tried to find a straight answer for over a month and could not. For example, Michael Schatz, who is involved in early access program, posted a figure showing 'assembly from nanopore' in twitter. When I asked him about the error rate, he gave some philosophical answer - 'I do not care, because assembly is possible, as long as there is more signal than noise'. WTF? Based on his slides from a recent conference (see here), he had the numbers, but decided to stonewall. Then I learned that the assembly was done with nanopore+ILMN (hybrid), whereas PacBio assemblies are done with PacBio only. Neither did I get a straight answer about error rate from Nick Loman - another scientist working closely with ONT CEO to release data. Those frustrations led me to write this blog post about the company -

http://www.homolog.us/blogs/blog/201...ays-from-here/


The situation seems to have improved somewhat after the company allowed Nick Loman to release his data (check our blog for link), and Michael Schatz posted his slides with the kind of information one needs to make decisions -

http://www.homolog.us/blogs/blog/201...but-the-truth/


Hopefully, others will take a look at the data and come up with an objective answer regarding what is possible and not possible. The technology has promises, but error rate is a critical concern.
There's paper in press claiming that using ONT data in combination with Illumina improves assembly quality ten fold.
NextGenSeq is offline   Reply With Quote
Old 09-26-2014, 12:23 PM   #12
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

ten fold compared to what?

Check page 13 of Michael Schatz's slides I posted here.

http://www.homolog.us/blogs/blog/201...but-the-truth/

Illumina alone - N50=59Kb

Illumina + Nanopore - N50=362kbp

Illumina + Pacbio - N50=811kbp

So, my collaborator will lose by going from Pacbio to Nanopore. Moreover, the promise of carrying USB stick to the field does not hold, if she has to also carry a 90Kg Illumina machine.
__________________
http://homolog.us
samanta is offline   Reply With Quote
Old 09-26-2014, 03:29 PM   #13
seqqeq
Junior Member
 
Location: usa

Join Date: Nov 2009
Posts: 3
Default

Quote:
Originally Posted by samanta View Post
That is possibly due to the molecule moving through the pore at different speed, and the HMM (Viterbi) calculation for base-calling being fixed at one mode and miscalling in the other mode.

This thing is definitely a physicist's paradise and would give rise to interesting physics papers, similar to what we used to do on current transport during early 1990s.
The binary error mode is something I would expect from HMM basecalling. Not all levels have the same clear differentiation. At difficult regions, once you got a base wrong, all the following bases have to be consistent to be wrong also. So you get a string of very wrong calls, only to recover later back to consistent correct calls.

Systematic error is likely to be troubling.
seqqeq is offline   Reply With Quote
Old 09-26-2014, 04:52 PM   #14
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

Quote:
Originally Posted by samanta View Post
ten fold compared to what?

Check page 13 of Michael Schatz's slides I posted here.

http://www.homolog.us/blogs/blog/201...but-the-truth/

Illumina alone - N50=59Kb

Illumina + Nanopore - N50=362kbp

Illumina + Pacbio - N50=811kbp

So, my collaborator will lose by going from Pacbio to Nanopore. Moreover, the promise of carrying USB stick to the field does not hold, if she has to also carry a 90Kg Illumina machine.
Versus a 2 ton PacBio instrument?

Anyway read the paper when it comes out. I can't post further info about it.

There is data showing over 99% accuracy of ONT data aligned to reference genomes which is not yet publicly available.
NextGenSeq is offline   Reply With Quote
Old 09-26-2014, 05:50 PM   #15
robp
Member
 
Location: Stony Brook, NY

Join Date: Aug 2013
Posts: 13
Default

Quote:
Originally Posted by samanta View Post
That is possibly due to the molecule moving through the pore at different speed, and the HMM (Viterbi) calculation for base-calling being fixed at one mode and miscalling in the other mode.

This thing is definitely a physicist's paradise and would give rise to interesting physics papers, similar to what we used to do on current transport during early 1990s.
I also think an algorithm for dealing with this would give rise to a very interesting CS paper. I'd be willing to bet that changes in molecular speed affect the resulting signal in detectable ways, and that modifying the underlying HMM to account for this is possible. ONP base-calling definitely seems like an interesting computational problem.
robp is offline   Reply With Quote
Old 09-26-2014, 08:53 PM   #16
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by robp View Post
I also think an algorithm for dealing with this would give rise to a very interesting CS paper. I'd be willing to bet that changes in molecular speed affect the resulting signal in detectable ways, and that modifying the underlying HMM to account for this is possible. ONP base-calling definitely seems like an interesting computational problem.
I agree - it seems plausible to address some of the purported deficiencies in the current Nanopore system through primarily computational means.

As an unrelated side-note, Illumina's NextSeq systems - in my testing - give vastly inferior output compared to HiSeq or MiSeq (and the data was certified by Illumina as being in-spec). I believe this may largely be due to the software; improved base-calling software may be able to substantially improve the output of NextSeq, or other new platforms. That said, for a market-dominant company to release a new product that is undeniably inferior to prior products, indicates to me that sequencing companies have good reason to support alternatives, if they desire better data.
Brian Bushnell is offline   Reply With Quote
Old 09-27-2014, 08:58 AM   #17
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

Quote:
Originally Posted by Brian Bushnell View Post
As an unrelated side-note, Illumina's NextSeq systems - in my testing - give vastly inferior output compared to HiSeq or MiSeq (and the data was certified by Illumina as being in-spec). I believe this may largely be due to the software; improved base-calling software may be able to substantially improve the output of NextSeq, or other new platforms. That said, for a market-dominant company to release a new product that is undeniably inferior to prior products, indicates to me that sequencing companies have good reason to support alternatives, if they desire better data.
That is an interesting observation.

Once sequencing becomes a commodity, finer points of how one got the sequence become moot. I find it striking how parallel everything seems to be between microarrays in early 2000's and HTS now.

An attractive price point (nicely slotted between the MiSeq and HiSeq) and a strong sales push helps seal the deal on NextSeq in most places.
GenoMax is offline   Reply With Quote
Old 09-27-2014, 10:58 AM   #18
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Quote:
Originally Posted by samanta View Post
The situation seems to have improved somewhat after the company allowed Nick Loman to release his data (check our blog for link), and Michael Schatz posted his slides with the kind of information one needs to make decisions -

http://www.homolog.us/blogs/blog/201...but-the-truth/
Just for the record, you do not need permission from Oxford Nanopore to release data after you self-certify the "burn-in" and I did not seek it. One of the reasons I took until September to release anything is that there were teething troubles on the laboratory side getting sufficient yields of full 2D reads. Now the protocol is sorted out these full 2D yields are much better with the R7.3 chemistry and this will become the standard for Illumina-style 'passing filter (PF)' reads for nanopore. Full 2D reads means that the fragment chemistry is working, with a hairpin and hairpin motor successfully ligated. This controls the speed of the complement strand which has a huge effect on accuracy.

Some indications of full 2D performance can be seen in Figure 2 at:

http://biorxiv.org/content/early/2014/09/26/009613.2

And of course the data is fully available (including the underlying signal measurements).
nickloman is offline   Reply With Quote
Old 09-27-2014, 11:15 AM   #19
robp
Member
 
Location: Stony Brook, NY

Join Date: Aug 2013
Posts: 13
Default

Quote:
Originally Posted by nickloman View Post
Just for the record, you do not need permission from Oxford Nanopore to release data after you self-certify the "burn-in" and I did not seek it. One of the reasons I took until September to release anything is that there were teething troubles on the laboratory side getting sufficient yields of full 2D reads. Now the protocol is sorted out these full 2D yields are much better with the R7.3 chemistry and this will become the standard for Illumina-style 'passing filter (PF)' reads for nanopore. Full 2D reads means that the fragment chemistry is working, with a hairpin and hairpin motor successfully ligated. This controls the speed of the complement strand which has a huge effect on accuracy.

Some indications of full 2D performance can be seen in Figure 2 at:

http://biorxiv.org/content/early/2014/09/26/009613.2

And of course the data is fully available (including the underlying signal measurements).
Hi Nick! First, thanks for getting the manuscript out there as a pre-print. Also, thanks for making the data available in all it's glory. I am curious if you know if ONT's basecaller is publicly available, or if it's currently proprietary software. I'm interested in learning more about how it's working, but apart from the fact that it "uses an HMM" and the reference to the Timp paper from a couple years ago , there doesn't seem to be too much in the way of details.
robp is offline   Reply With Quote
Old 09-28-2014, 05:13 AM   #20
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Quote:
Originally Posted by robp View Post
Hi Nick! First, thanks for getting the manuscript out there as a pre-print. Also, thanks for making the data available in all it's glory. I am curious if you know if ONT's basecaller is publicly available, or if it's currently proprietary software. I'm interested in learning more about how it's working, but apart from the fact that it "uses an HMM" and the reference to the Timp paper from a couple years ago , there doesn't seem to be too much in the way of details.
Hi robp-- Sadly the base caller is proprietary software and I am not aware of any documentation about how it works. It would be great if someone hot on HMMs and the Viterbi algorithm could try and implement a reference open-source base caller to serve as a foundation for improvements. Some more details about how the nanopore base caller works might be gleaned from the FAST5 files.
nickloman is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO