Hi there,
I know this question comes up every now and then and is eventually hard to answer, but we have no sysadmin at hand with enough NGS experience and are in need to spend some money
We need to / would like to upgrade our NGS throughput quite significantly. Currently, the only suitable sequencer for us seems to be the new Novaseq, because: HiSeqs will not be delivered anymore from mid of the year, so support of chemistries also stops probably rather sooner than later. Pacbio seems a little risky since Roche stopped the support. Genia/Oxford are no real options as they are still in some kind of alpha/beta stage.
We want to sequencing something in the range of 40 human genomes per month at 30x. So we will have something like 20TB of data to process per month. Because variant calling is computationally probably the most expensive part, there is no real need to consider anything else here (transcriptome, methylome, etc), is it?. So the main question would be: what kind of infrastructure do we need for this? Is a cluster really required here or would something with a lower maintenance-demand also suffice? We also have the possibility to use the HPC at the local university occasionally, hence, we may perform the computational heaviest tasks there and do the rest on our local "whatever". Is this realistic or are we going to spend more time sending data around than analyzing it?
Any ideas are highly appreciated!
Btw: We are in Germany and working with human tumor patient samples. Hence, data protection is something we need to critically consider in every step. Cloud computing is therefore probably not a possibility, even not if it is a private cloud (maybe, if the data is guaranteed to stay in Germany, but I'm not aware of a company that can make such a guarantee)
Thanks for reading and any comment
I know this question comes up every now and then and is eventually hard to answer, but we have no sysadmin at hand with enough NGS experience and are in need to spend some money
We need to / would like to upgrade our NGS throughput quite significantly. Currently, the only suitable sequencer for us seems to be the new Novaseq, because: HiSeqs will not be delivered anymore from mid of the year, so support of chemistries also stops probably rather sooner than later. Pacbio seems a little risky since Roche stopped the support. Genia/Oxford are no real options as they are still in some kind of alpha/beta stage.
We want to sequencing something in the range of 40 human genomes per month at 30x. So we will have something like 20TB of data to process per month. Because variant calling is computationally probably the most expensive part, there is no real need to consider anything else here (transcriptome, methylome, etc), is it?. So the main question would be: what kind of infrastructure do we need for this? Is a cluster really required here or would something with a lower maintenance-demand also suffice? We also have the possibility to use the HPC at the local university occasionally, hence, we may perform the computational heaviest tasks there and do the rest on our local "whatever". Is this realistic or are we going to spend more time sending data around than analyzing it?
Any ideas are highly appreciated!
Btw: We are in Germany and working with human tumor patient samples. Hence, data protection is something we need to critically consider in every step. Cloud computing is therefore probably not a possibility, even not if it is a private cloud (maybe, if the data is guaranteed to stay in Germany, but I'm not aware of a company that can make such a guarantee)
Thanks for reading and any comment
Comment