Hello,
I'm a system admin for a handful of SOLiDs and I'm curious to know what other people are doing as far as IT related issues with SOLiD. We've tried a few novel things in our environment with some success, but it'd be nice to hear from others who are managing these.
We currently have all our machines write directly to a central NFS server (a high performance clustered storage solution) rather that copying data after primary analysis completes. A side effect is that our instruments can access results for any run they have ever performed (with the noted exception that run data in the instrument database was not preserved during the 2 -> 3 upgrade) whether that is good or bad remains to be determined. This also allows us to recover the results disk space and enlarge the images volume on the instrument, which is nice.
Secondary analysis on instrument has been a challenge. We've made attempts at using Global SETS to move it off instrument with little success. We've played with increasing the core count on an instrument by adding nodes to an instrument (via a VLAN to our data center) and that seems promising (a 35x2 run against full human genome completes in ~ 5 days with 104 cores.) All real analysis has been done on our existing compute farm and infrastructure using Corona Lite.
We've considered using the VLAN approach to move all compute nodes off instrument to help address heat issues in the lab where these reside.
Any feedback would be appreciated. We are doing things in a non-standard way in an attempt to make the instruments more manageable. It'd be nice if an instrument could notify an external service when primary analysis was complete, for instance. If anyone else has had luck making SOLiD more automated, manageable and scalable I'd love to hear what you are doing.
Thanks,
griznog
I'm a system admin for a handful of SOLiDs and I'm curious to know what other people are doing as far as IT related issues with SOLiD. We've tried a few novel things in our environment with some success, but it'd be nice to hear from others who are managing these.
We currently have all our machines write directly to a central NFS server (a high performance clustered storage solution) rather that copying data after primary analysis completes. A side effect is that our instruments can access results for any run they have ever performed (with the noted exception that run data in the instrument database was not preserved during the 2 -> 3 upgrade) whether that is good or bad remains to be determined. This also allows us to recover the results disk space and enlarge the images volume on the instrument, which is nice.
Secondary analysis on instrument has been a challenge. We've made attempts at using Global SETS to move it off instrument with little success. We've played with increasing the core count on an instrument by adding nodes to an instrument (via a VLAN to our data center) and that seems promising (a 35x2 run against full human genome completes in ~ 5 days with 104 cores.) All real analysis has been done on our existing compute farm and infrastructure using Corona Lite.
We've considered using the VLAN approach to move all compute nodes off instrument to help address heat issues in the lab where these reside.
Any feedback would be appreciated. We are doing things in a non-standard way in an attempt to make the instruments more manageable. It'd be nice if an instrument could notify an external service when primary analysis was complete, for instance. If anyone else has had luck making SOLiD more automated, manageable and scalable I'd love to hear what you are doing.
Thanks,
griznog
Comment