SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Oxford Nanopore



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pacbio sequel for single cell human WGS? WhatsOEver Pacific Biosciences 5 06-27-2017 12:02 PM

Reply
 
Thread Tools
Old 05-04-2017, 03:00 AM   #1
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default ONT promethion for single cell human WGS?

Hi @all,
we are currently struggling with pros/cons of buying a PB sequel vs ONT Promethion for our analyses (human de novo single celll WGS). Ideally we are looking for something with which we can analyse structural variants and SNPs (so something with ~100Gb+ (>30x) output).
Unfortunately, we're getting extremely diverse information about the specifications of the two systems. Hence, I'd like to ask users of each system for info on their experience (I have therefore a similar post in the pacbio forum).

My information on the promethion so far:
1) Accuracy of >95%
Is this a consensus acc from assembly/mapping? As this would be coverage dependent, it is meaningless for me... What about individual reads (1D or 2D) accuracy? I have looked briefly into the available data on github which looks more like 50-70% acc for single reads (but I might just have been unlucky with my selected genomic regions...)

2) I got information of 3-11Gb per flow cell / ~900 per flow cell
So, I would need 9-33 flow cells / genome => ~8k - 30k / genome?

3) Price ~150k
Is installation, training, etc. included?

4) New developments
I'm a little concerned with the fast release of new protocols and chemistries. Although I understand that this is a system under development, how big are the differences between new releases? It seems to my like every tiny new improvement a new release?

I would be grateful for any information on any of my four points.
As minion and promethion use the same pore and same chemistry, information on both would be appreciated.

Thanks!
WhatsOEver is offline   Reply With Quote
Old 05-04-2017, 03:48 AM   #2
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 1,080
Default

First thing to consider is the need for amplification of single cell genome: currently the best methods amplified DNA length would be around 10 kb limiting benefits of obtaining long reads by any platform. Unfortunately methods that output longest DNA length are prone to chimer production which will affect detection of structural variants.

1- Promethion is in early access α release and with limited production capacity (cannot run 48 flow cells yet) and they just have shipped flow cells couple of weeks ago. I have not seen any non-ONT released data. The accuracy mentioned is for the best runs not the average run and they are abandoning 2D reads due to litigation.

2- There are different plans for consumables and the more one buys it gets cheaper but the money spend in the lowest price point is for over million dollars purchase. The length limitation of input DNA also means lower sequencing output.

3- To my knowledge installation cost is additional and significant depending on location. The system also requires purchasing or having access to significant high speed computing hardware.

4- This is disadvantage as one gets hang of working with one chemistry another one comes up resulting in inconsistent and non-camparable data.
nucacidhunter is offline   Reply With Quote
Old 05-04-2017, 06:30 AM   #3
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Thanks @nucacidhunter for your input.

I also thought that the length of our DNA fragments would be a major limitation, but actually heard that it probably isn't. This is because it has no influence on the lifetime of the pore if you sequence 1x 10kb or 10x 1kb fragments. The "only" thing which requires adjustment is the amount of added adapter (containing the ssDNA leader sequence). Do you have other information?
WhatsOEver is offline   Reply With Quote
Old 05-06-2017, 06:34 PM   #4
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

You're probably better off getting a GridION X5 if you only want to do a few human genomes. It'll be available in a couple of weeks (but will probably have a substantial shipping delay due to demand exceeding supply), and uses identical flow cells to the MinION (which are fairly reliable and well-tested now).

Output for a properly-prepared sample is currently 5-15Gb, so you'll need about 10 flow cells for a 10X coverage project (consumable cost $300/$500 USD per flow cell depending on capital expenditure).

However, you should have another think about what you want to do, and whether the coverage is overkill. A full-genome structural analysis (looking for chromosome-scale modifications) can be done at 1-3X coverage on a single flow cell. An exome sequencing experiment looking for single-nucleotide variants can also be done on a single flow cell at 10X~30X coverage depending on how an "exome" is defined.

I'd recommend doing a pilot run on a MinION first ($1000 per run, no substantial delay in ordering MinIONs or flow cells / reagents), because a single MinION run might be sufficient for your needs.

Last edited by gringer; 05-06-2017 at 06:42 PM.
gringer is offline   Reply With Quote
Old 05-06-2017, 06:56 PM   #5
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

Quote:
Originally Posted by WhatsOEver View Post
1) Accuracy of >95%
Is this a consensus acc from assembly/mapping? As this would be coverage dependent, it is meaningless for me... What about individual reads (1D or 2D) accuracy? I have looked briefly into the available data on github which looks more like 50-70% acc for single reads (but I might just have been unlucky with my selected genomic regions...)
Single read modal accuracy for nanopore reads is about 85-90% at the moment; this will increase in the future through software improvements, and reads can be re-called at a later date to incorporate new accuracy improvements. As far as I'm aware, the electrical sensor on the MinION flow cells is the same as the one that was used when the MinION was first released (and giving accuracies of 70%) -- all accuracy improvements have been software and chemistry changes (mostly software).

But if you're doing 10X coverage on known variants, the single read accuracy isn't all that important. There is a bit of systematic bias in the calling (particularly around long homopolymers), which means perfect single-base calling is not possible even at infinite coverage. From an unmethylated whole-genome amplification, consensus calling accuracy for known variants at single nucleotides should be at least q30, and essentially perfect for structural variants (assuming they are covered at all).

Last edited by gringer; 05-06-2017 at 06:58 PM.
gringer is offline   Reply With Quote
Old 05-08-2017, 03:25 AM   #6
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Thanks for the info @gringer esp on pricing and output.
I would be grateful if you could further comment on the following:

Quote:
Originally Posted by gringer
Single read modal accuracy for nanopore reads is about 85-90% at the moment;
[...]
A full-genome structural analysis (looking for chromosome-scale modifications) can be done at 1-3X coverage on a single flow cell.
Sorry, but based on the data I have seen (which is the NA12878 Human Reference on github produced with the latest(?) R9.4 chemistry), I cannot believe these statements.
For example, let's look at the following reference assemblies from ONT and Pacbio:


ONT mapping:



Pacbio mapping:



If you exclude InDels from accuracy calculation, you might me correct with 85%+...
Also if you look into the literature (https://www.nature.com/articles/ncomms11307 <- yes, it is quite old, but the best I could find...) you get lower values. Maybe it is simply an issue with using bwa as aligner, but if it isn't the best working, why is the reference consortium using it?!
Concerning chromosomal rearrangements: Support from 1-3 reads can imo not truly be considered as sufficient support for anything. With both methods you will get artefacts (see pictures) with largely clipped reads that couldn't be mapped elsewhere. In addition to the high error rate you will get numerous false positives. Because I'm working with amplified data, which is of course far from being equally well amplified over the whole genome, I would also get numerous false negatives due to insufficient coverage with calculated 1-3X.


Quote:
Originally Posted by gringer
An exome sequencing experiment looking for single-nucleotide variants can also be done on a single flow cell at 10X~30X coverage depending on how an "exome" is defined.
True, but exome sequencing works very well with our Agilent/Illumina protocol and I don't see a real advantage of long reads for WES.


Quote:
Originally Posted by gringer
But if you're doing 10X coverage on known variants, the single read accuracy isn't all that important.
For homozygous maybe, for heterozygous probably not: I have two alleles which are probably not amplified to preserve the 50:50 ratio. This means I will have eventually only 2-4 reads for one allele and 6-8 for the other. With the high error rate and the variant on the lower amplified allele I would rather tag this position as uncertain/low support.
WhatsOEver is offline   Reply With Quote
Old 05-09-2017, 03:02 AM   #7
Ola
Member
 
Location: Sweden

Join Date: Aug 2011
Posts: 30
Default

WhatsOEver, you are correct in that PacBio at this point gives a much cleaner alignment but of course with amplified DNA you will get lower throghput given the limited read numbers. For ONT throughput should be more or less the same if you have 5kb or 25kb fragments but you would need to do quality-filtering of the reads, and use additional programs such as nanopolish to get good SNP-calls. The new 1D^2 kit will improve single read accuracy significantly, and the latest basecalling gives slightly lower error rates also with old datasets like the NA12878 in your comparison.

PromethION is expected to give ~50 Gb per flow cell at start as each flowcell has more pores and possibly longer run time compared to MinION. Price depends on volume, starting at ~$650/fc for very large orders giving a significantly lower cost/Gb compared to current Sequel specs. The $135k for the PEAP includes reagents (60 flowcells and kits to run them).
Ola is offline   Reply With Quote
Old 05-09-2017, 04:14 AM   #8
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

... I'm not quite sure what happened with my previous response about this....

Here's a plot of sequencing data from our recently published chimeric read paper, taking only the reads that mapped (with LAST) to Ifnb (removing complications about gene isoforms and exon/intron confusion):



In general, the accuracy for nanopore will always be better than what can be found in any published papers. This is certainly the case for any sequencing done prior to May last year on R7.3 (and earlier) flow cells (2D consensus accuracy on R7.3 is similar to 1D single-read accuracy on R9.x), but is also the case even for runs done this year. For example, homopolymer resolution was not implemented in standard base callers until about a month ago, so any basecalled data prior to that (which includes my chimeric read paper, and the NA12878 data) is going to have at homopolymer issues that are not present in current base-called data. However, the data can be re-called; I notice that the NA12878 github page has a "scrappie" recalling for chromosome 20, which will be closer to the current accuracy. I'll have a go at re-calling the Ifnb data that I used for the above graph and see how much of a difference it makes.

However, it's worth also considering that the nanopore NA12878 data was probably done on unamplified DNA (certainly the more recent runs were, as PCR doesn't work so well over 100kb). For this reason there will be errors in the base-called sequence that are due to unmodelled contexts in the DNA. The PacBio system can only sequence what is modelled (more strictly, it produces a sequence that indicates what DNA bases the true sequence will hybridise to), so almost all of the modifications would be lost. Accuracy is currently higher when using amplified DNA on a nanopore, but this removes the possibility of re-calling in the future when different calling models have been developed to incorporate base modifications.

In any case, it's almost impossible to get anyone to change their view about the utility of the MinION because it's a highly-disruptive technology -- it changes almost everything about how sequencing is done. People will cling to whatever shreds of dissent they can find about nanopore sequencing, and fluidly shift onto anything else that can be found once the issues start disappearing (without a mention of the progress). The homopolymer straws are almost all gone, and the accuracy straws are looking pretty thin. What remains are mostly prejudices against change.

I'm very definitely an Oxford Nanopore fanboy, and recommend spending a little change on a big change. It's not much of a pocket-burn to spend $2000 USD to purchase the MinION and run a few flow cells as a pilot study to work out if it will be suitable for a given purpose. The thing is basically capital-free, and there's no obligation to continue with using it if it turns out to be useless.
gringer is offline   Reply With Quote
Old 05-09-2017, 08:44 AM   #9
seq_bio
Junior Member
 
Location: UK

Join Date: May 2017
Posts: 4
Default

Thanks, I think your point about it being impossible to change people's minds because it's disruptive is a tad pessimistic. I think it's natural (even if not always right) to be a bit skeptical about new tech considering that over-promising and under-delivering is common in NGS companies. The new tech will be embraced once they realize what it does for them.

I think you are suggesting that we use it for projects where you say it works, low coverage structural variants, methylation etc. And the low cost appears to be a key part of your argument.
But isn't the right approach to just put out there a convincing enough study ? Say, an assembly from purely nanopore reads that have high consensus accuracy and long enough reads and then people will more than keen to move over as it's cheap, has high throughput and is accurate. I don't understand your point about accuracy being a thin straw.

https://genomeinformatics.github.io/cliveome/
96.5% consensus accuracy is not really there right or 99% after corrections etc ? What am I missing apart from being one of those impossible to convince ?

I also have another question which you might perhaps be able to answer, throughput on the minion has been improved by moving the dna faster through the pore correct - (one would assume this if anything makes accuracy worse ) ? And base calling improvements are done primarily through software improvements, NNs, machine learning etc.

So in a way its like software defined sequencing as the hardware is really cheap. Then one would imagine that this algo trained on the platinum reference genome(s) would tell us about new regions.

Is it then likely that they will be different to those from say Pacbio ? even though both have a high consensus accuracy ? Is that a possible outcome ? I guess we don't have a study anywhere with a like for like comparison where both have 99.99% consensus accuracy ?
seq_bio is offline   Reply With Quote
Old 05-09-2017, 12:15 PM   #10
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

Quote:
But isn't the right approach to just put out there a convincing enough study?
Studies exist. Whether or not they're "convincing enough" is entirely up to the readers.

Quote:
I don't understand your point about accuracy being a thin straw.
Accuracy is a thin straw because it is almost entirely a software problem. It's being worked on, and any software improvements can largely be fed back to any R9 run (i.e. since about July last year) to substantially improve accuracy. The biggest issue is that it's really only ONT who is working on base-calling algorithms. I expect a few people who have been studying neural networks and wavelets for the better part of their lives would make light work of the base calling accuracy problem.

Quote:
96.5% consensus accuracy is not really there right or 99% after corrections etc?
Yes, that's correct. The primary goal of the human assemblies is to generate assemblies that are as accurately contiguous as possible, rather than getting high accuracy at the single base level. Long reads are a huge advantage there, particulary in the hundreds of kilobases range.

The current ONT basecalling is trained mostly on bacteriophage lambda and E. coli, which have much simpler unmodelled DNA context. For ONT to be able to correctly call human genomic sequence, they need to add all the possible DNA base modifications into their calling model, and that's going to take quite a long time. Until then, single-base consensus accuracy will be lower than expected even at infinite coverage. It may be that the majority of the systematic [non-homopolymer] base-calling error is associated with modified bases, but we're not going to know that until a sufficiently complete model exists.

Quote:
throughput on the minion has been improved by moving the dna faster through the pore correct - (one would assume this if anything makes accuracy worse
There have only been a couple of shifts in base calling speed, but at each step the accuracy was at least as good as it was for slower speeds. ONT made sure of this in their internal tests of base calling, and they have given me a plausible explanation for why accuracy might improve with sequencing speed. The explanation is that the DNA wants to move a lot faster, so a lot of effort is put in on the chemistry side of things to slow everything down, and there's much more chance for DNA to wiggle around while it's being ratcheted through at slower speeds. Move the DNA faster and the transitions between bases become more obvious because there's less movement/noise at each step.

The base transition speed has remained at 450 bases per second for at least the last 6 months, but flow cell yield has increased about 5 times since then. The majority of those yield fixes have been in software, and primarily around recognising when a pore is blocked and making sure the current at that specific pore is reversed quickly to unblock before the blockage becomes permanent. There have been some issues with sequencing kits as well due to reagents drying out in the freezer. It seems strange to think that yield improvements have been realised mostly by software fixes, but that is actually the case.
gringer is offline   Reply With Quote
Old 05-09-2017, 01:00 PM   #11
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

Quote:
Originally Posted by seq_bio View Post
Is it then likely that they will be different to those from say Pacbio ? even though both have a high consensus accuracy ? Is that a possible outcome ? I guess we don't have a study anywhere with a like for like comparison where both have 99.99% consensus accuracy ?
I could imagine that there will be some point in the future where both PacBio and Nanopore have perfect consensus accuracy (at, say, 20X coverage). In this case, the distinction will be in the things unrelated to accuracy, which is where (from my perspective) Nanopore wins on all counts.

Because PacBio is inherently a model-based approach to sequencing (i.e. by hybridisation), it's impossible to use it to detect things that we don't know about yet. Was that delay in hybridisation time due to a sugar, a deamination, or a methylation? How would PacBio detect abasic sites in a DNA sequence? What about pyrimidine dimers? I can imagine a situation where PacBio might introduce different chemistries to detect these different situations, but those chemistries and models need to be there before the observation can be made. This is much less of an issue with Nanopore, because the electrical signal has a lot more information in it. As an easy example, abasic sites produce a very characteristic increase in current, which is used by the basecallers to detect the presence of a hairpin (which has abasic sites in its sequence).

Read length is another point where Nanopore is starting to push ahead. Even if PacBio had perfect accuracy, much longer reads will be needed to fully resolve the human genome. End-to-end mapped single-strand nanopoore reads of over 800kb have been sequenced (and base-called), and double-strand reads of over a megabase of combined sequence have also been seen. Clive Brown has suggested that photobleaching might be an issue for really long PacBio reads. I don't know if that's true for PacBio, but I do know that it is a common issue for Illumina sequencing, requiring an intensity ramp over the course of a sequencing run. PacBio could probably work around that issue by continually replenishing fluorophores, but at a substantially increased expense.

The other advantage for Nanopore is speed. Just considering read transition time, a 450kb nanopore read with an average base transition time of 450 bases per second would take 1000 seconds (about 17 minutes) to go through the pore. After the motor protein / polymerase is ejected (which can take less than 1/10 of a second), the pore is ready for the next sequence. If all someone were looking for was a read over 100kb, they could run the MinION for less than 10 minutes and have a good chance of finding one (assuming the sample prep was up to the job). Whole contig bacterial assemblies from simple mixes can be sequenced and assembled in about an hour. There are people who have done simulations of diagnostic situations (e.g. Tb detection, antibiotic resistance) with "presence" results produced in less than half an hour, and "absence" results (alongside low-abundance positive controls) established with high likelihood in a few hours. A whole bunch of other things change from impossible to possible when the sequencing speed is considered (for individual reads, for analysis/turnaround time, and for sample preparation time).

.... There's always more, but I'll stop there, reiterating my previous point about the disruptive nature of this. Nanopore sequencing changes almost everything about how sequencing is done.
gringer is offline   Reply With Quote
Old 05-09-2017, 02:50 PM   #12
seq_bio
Junior Member
 
Location: UK

Join Date: May 2017
Posts: 4
Default

Thanks for that, it answered a few questions I had, esp that bit about improvement to yield from software alone was interesting - making me think of it in terms of software defined sequencing similar to software defined networking (SDN) etc. In terms of your comment on accuracy:

"The primary goal of the human assemblies is to generate assemblies that are as accurately contiguous as possible rather than getting high accuracy at the single base level. Long reads are a huge advantage there, particulary in the hundreds of kilobases range."

If we are looking to create de novo assemblies of high quality, you can't really do that with errors at the single base level right ? One can certainly make the case that nanopore data can be used to improve existing platinum genomes while having a higher threshold for error as hybrid strategies may be sufficient for most solutions.

But as of now, as a standalone system, It's not really ready for human WGS correct ? I think that's what Whatsoever was interested in if I understood that correctly.
seq_bio is offline   Reply With Quote
Old 05-09-2017, 03:00 PM   #13
seq_bio
Junior Member
 
Location: UK

Join Date: May 2017
Posts: 4
Default

Thanks for that, it answered a few questions I had, esp that bit about improvement to yield from software alone was interesting - making me think of it in terms of software defined sequencing similar to software defined networking (SDN) etc. In terms of your comment on accuracy:

"The primary goal of the human assemblies is to generate assemblies that are as accurately contiguous as possible rather than getting high accuracy at the single base level. Long reads are a huge advantage there, particulary in the hundreds of kilobases range."

If we are looking to create de novo assemblies of high quality, you can't really do that with errors at the single base level right ? One can certainly make the case that nanopore data can be used to improve existing platinum genomes while having a higher threshold for error as hybrid strategies may be sufficient for most solutions.

But as of now, as a standalone system, It's not really ready for human WGS correct ? I think that's what Whatsoever was interested in if I understood that correctly.
seq_bio is offline   Reply With Quote
Old 05-09-2017, 08:33 PM   #14
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

For some reason a lot of posts in this thread are getting moderated... I'm not really sure why, but as a result, I'm not sure everyone is seeing everyone else's replies. I've unblocked the affected posts, so you may want to look back through the thread so you can be sure you are on the same page. I'll try to resolve this.

To add my two cents - personally, I like long Illumina reads, because as a developer, they are so much easier to work with And they are so accurate! It's straightforward to make a good variant-caller for Illumina reads (well, let's restrict this to the HiSeq 2500 platform with 150bp reads, to be precise; some of their other platforms are much more problematic), in terms of SNPs, deletions, short insertions.

However, if you want to call SV's or deal with repeats, the balance changes and you need long reads. We bought a Promethion, but I don't think it's installed yet. We also have a couple Sequels, and recently transitioned production over to them because they are now consistently meeting/beating the RSII in quality and cost/yield metrics. But we mainly use PacBio for assembly, and for that purpose, we aim for 100x coverage which can achieve 99.99%+ accuracy, for a haploid (nominally 99.999% but I don't remember the exact numbers as measured). I'm not really sure what you would get at 30x for a repetitive diploid (though I should note that we also use low-depth PacBio for phasing variations called by Illumina on diploids).

High-error-rate long reads are great for variant phasing, particularly in conjunction with a second library of low-error-rate short reads. And they are great for SV/CNV calling, particularly since PacBio has a vastly lower coverage bias compared to Illumina (I'm unaware of the ONT bias rate, but I think it's similarly low). I'm less convinced about the utility of PacBio/ONT as a one-stop shop for all variant calls, when cost (and thus depth) is a consideration. Particularly, PacBio's error mode is not completely random as is often stated, but it is random enough to make self-correction possible and useful for achieving high accuracy (again, given sufficient coverage). But for SNPs and short indels alone, without phasing, you can currently get better results for less money with Illumina HiSeq 2500. For human sequencing (in which, unlike bacterial sequencing, the reagent costs outweigh the library-prep costs) it seems like it might be prudent to pursue a dual-library approach, with short and long reads on different platforms. In that case you don't need to pick a single platform that's optimal for everything.
Brian Bushnell is offline   Reply With Quote
Old 05-09-2017, 11:16 PM   #15
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

Quote:
Originally Posted by seq_bio View Post
If we are looking to create de novo assemblies of high quality, you can't really do that with errors at the single base level right ?
That's correct. There's [currently] systematic error in ONT that means resolution with perfect single-base accuracy is not possible. Canu and Nanopolish do really well in fixing consensus errors, but they can't fix it to 100% accuracy. I'm optimistic that this can be dealt with in software (allowing existing runs to be recalled), but it's not there yet.

For de-novo assembly, nanopore works really well when it is used for initial scaffolding, and base call errors are cleaned up by mapping Illumina reads to the assembly and correcting based on those alignments. I used that approach for correcting mitochondrial sequence from a low-yield nanopore run done in July last year.

But, there is also substantial systematic error in Illumina sequencing. Illumina cannot properly resolve tandem repeat regions (such as in centromeric regions) where the repeat unit length is greater than the read length. I've got an example of this in my London Calling poster, where a 20kb repeat region was collapsed to 2kb in the current Illumina-based reference assembly. Whether or not such errors are included in definitions of "high-quality assemblies" is up to the person making the definition.
gringer is offline   Reply With Quote
Old 05-09-2017, 11:49 PM   #16
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

Quote:
Originally Posted by Brian Bushnell View Post
for SNPs and short indels alone, without phasing, you can currently get better results for less money with Illumina HiSeq 2500.
This depends a lot on how you define cost. Nanopore is currently getting about 5-10 gigabases per 2-day run for experienced labs with careful sample prep, with some groups getting 10-15 Gb. Internal testing at ONT has higher yield, but that is less useful for people that prefer information on what they can achieve. Assuming 10Gb per run, that's $100 per gigabase using the most expensive flow cell option ($900 USD + $100 reagents), which I think is in the realm of HiSeq / MiSeq. With the cheapest bulk flow cell option ($500 USD + $100 reagents), it's $60 per gigabase, which is nipping at the toes of HiSeq and NextSeq using currently-available flow cell costs and yields. That ignores the advantage conferred by long reads, which is substantial when considering things like isoform detection for cDNASeq.

However, what I have found that most labs care about (certainly small labs) is the minimum cost of sequencing. A MinION purchase gives you a couple of flow cells to play around with, which means that the sequencing cost is effectively capital-free. Factoring in additional reagents and training time, an initial pilot study can be done with the MinION starting from nothing in a basic lab (with pipettes and centrifuges) for about $2000 USD, with delivery of MinION and flow cells happening within a couple of weeks. After that, it's no more than $1000 per run, with results that can be analysed within a few minutes of the run starting.
gringer is offline   Reply With Quote
Old 05-10-2017, 12:21 AM   #17
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

I won't contest any of that, since you're better-informed with Nanopore sequencing costs than I am (so far, I think we got them all for free). However, I would still prefer short Illumina reads for SNP, deletion, or short insertion calling. But the OP's question is which platform would be ideal for all variant calling for minimal cost. I can't answer that directly, because I don't think that it is currently possible to do accurate variant-calling covering SNPs, indels, CNVs, and SVs, using a single platform. But I would suggest that Illumina should be part of the equation, for now.

Last edited by Brian Bushnell; 05-10-2017 at 12:24 AM.
Brian Bushnell is offline   Reply With Quote
Old 05-10-2017, 12:37 AM   #18
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Quote:
Originally Posted by gringer View Post
Studies exist. Whether or not they're "convincing enough" is entirely up to the readers.
mhh, the discussion about ONT vs others seems to me a little like apple vs samsung: Either you hate it or you love it

Quote:
Originally Posted by gringer View Post
Accuracy is a thin straw because it is almost entirely a software problem.
Can't be, because this would imply that you already know that the signal differences between all bases with all modifications are large enough to distinguish them from the noise within the system, which you don't as you already mentioned. There might be point were a new technical innovation is required to improve accuracy. A simple example would be the new pore ONT is now using. Besides that, I fully agree with your statements on software development and accuracy.

Quote:
Originally Posted by gringer View Post
The current ONT basecalling is trained mostly on bacteriophage lambda and E. coli, which have much simpler unmodelled DNA context. For ONT to be able to correctly call human genomic sequence, they need to add all the possible DNA base modifications into their calling model, and that's going to take quite a long time. Until then, single-base consensus accuracy will be lower than expected even at infinite coverage. It may be that the majority of the systematic [non-homopolymer] base-calling error is associated with modified bases, but we're not going to know that until a sufficiently complete model exists.
That is an extremely valuable piece of information for me - Thanks!


Quote:
Originally Posted by Brian Bushnell View Post
For human sequencing (in which, unlike bacterial sequencing, the reagent costs outweigh the library-prep costs) it seems like it might be prudent to pursue a dual-library approach, with short and long reads on different platforms. In that case you don't need to pick a single platform that's optimal for everything.
Why do you think so for human (or more generally multiploid organisms) WGS? For human data, we are currently unable to do whole genome reconstruction with short reads alone using the existing reference. If we create a scaffold of our genome of interest with long reads, we would still be unable to map the short reads accurately. As an example: How would a dual platform approach help me to resolve highly repetitive regions in the genome like MHC or mucins?

Quote:
Originally Posted by seq_bio View Post
But as of now, as a standalone system, It's not really ready for human WGS correct ? I think that's what Whatsoever was interested in if I understood that correctly.
True, and it is my conclusion from this discussion as well. We would probably be able to do CNV calling and RNA-Seq, but for identifying SNPs on whole genome level it is not ready, yet. I think our next step now must be a test run on the MinION as suggested to see how well our libraries are represented in the data.
WhatsOEver is offline   Reply With Quote
Old 05-10-2017, 12:44 AM   #19
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Quote:
Originally Posted by Brian Bushnell View Post
I won't contest any of that, since you're better-informed with Nanopore sequencing costs than I am (so far, I think we got them all for free). However, I would still prefer short Illumina reads for SNP, deletion, or short insertion calling. But the OP's question is which platform would be ideal for all variant calling for minimal cost. I can't answer that directly, because I don't think that it is currently possible to do accurate variant-calling covering SNPs, indels, CNVs, and SVs, using a single platform. But I would suggest that Illumina should be part of the equation, for now.
Its actually the variant calling for minimal cost on whole genome level which is the critical part for me I'm totally fine with our existing Illumina-Agilent-WES variant calling pipeline.
WhatsOEver is offline   Reply With Quote
Old 05-10-2017, 12:53 AM   #20
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

Quote:
Originally Posted by WhatsOEver View Post
Can't be, because this would imply that you already know that the signal differences between all bases with all modifications are large enough to distinguish them from the noise within the system, which you don't as you already mentioned.
Sensor noise is negligible in comparison to the shift from one base to another. All existing known base modifications produce a large current shift in the signal. Distinguishing between two different pyrimidines (i.e. C/T) is probably one of the most difficult things at the moment, because their chemical structure is so similar.

But there's a whole lot of context that isn't included in the current models. The current basecallers typically only look at the absolute signal level, and pay limited (if any) attention to the change in signal from the previous value(s), and also don't account for base transition time (except for calling homopolymers). I had a look at event information a couple of years ago for a single read, and in spite of being overwhelmed with the amount of information there was, I found a lot of suggestions that base calling could be improved by looking beyond the single base that was found in the middle of the pore at the time the signal was read.
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:53 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO