I realize that this topic (of hardware recommendedations) is a bit of an oldie, but goodie; however, most of the threads that I found are several years old, and are likely of less value now, as technology has improved over time.
Also, my question is more narrowly focused, as I'm specifically interested in suggestions on I/O transfer speed, as it is generally acknowledged that this is one of the major bottlenecks for NGS analysis.
To explain, my group is exploring developing a centralized bioinformatics compute & storage cluster to support several labs that are doing NGS analysis at our university. At this point, we have a decent idea of what we're looking for for the compute requirements, however, as storage is the more expensive cost of the two, I'm seeking suggestions on I/O transfer speed - since this is one of the major cost factors for the drives that we are considering.
Currently, we're exploring a 3-4 tiered approach where the tiers would be:
I've seen others suggest this framework, or something similar, however, it's not clear to us what faster/slower would be sufficient, given the current state of technology, circa December, 2014. Currently, our IT staff has suggested/recommended using an Isolon Drive from EMC, however, while it is a strong product, the cost is prohibitive, and we're curious if anyone else can make any recommendations.
If so, we'd greatly appreciate it.
Also, my question is more narrowly focused, as I'm specifically interested in suggestions on I/O transfer speed, as it is generally acknowledged that this is one of the major bottlenecks for NGS analysis.
To explain, my group is exploring developing a centralized bioinformatics compute & storage cluster to support several labs that are doing NGS analysis at our university. At this point, we have a decent idea of what we're looking for for the compute requirements, however, as storage is the more expensive cost of the two, I'm seeking suggestions on I/O transfer speed - since this is one of the major cost factors for the drives that we are considering.
Currently, we're exploring a 3-4 tiered approach where the tiers would be:
- (fastest) : local storage to the compute nodes that would be used solely for the temp files that are generated by GATK
- (faster) : NAS drive for recently sequenced samples (or those that are currently being (re-)analyzed)
- (slower) : Secondary NAS drive for legacy files/samples, e.g. fastq's for samples that already have BAM's generated
- (slowest): TBD if we include this, but optical/tape/other for archival purposes
I've seen others suggest this framework, or something similar, however, it's not clear to us what faster/slower would be sufficient, given the current state of technology, circa December, 2014. Currently, our IT staff has suggested/recommended using an Isolon Drive from EMC, however, while it is a strong product, the cost is prohibitive, and we're curious if anyone else can make any recommendations.
If so, we'd greatly appreciate it.
Comment