![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert Solid BAM/SAM to Solid GFF? | ioannis | Bioinformatics | 0 | 07-20-2010 08:06 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Massachusetts Join Date: Apr 2009
Posts: 2
|
![]()
Hello,
I'm a system admin for a handful of SOLiDs and I'm curious to know what other people are doing as far as IT related issues with SOLiD. We've tried a few novel things in our environment with some success, but it'd be nice to hear from others who are managing these. We currently have all our machines write directly to a central NFS server (a high performance clustered storage solution) rather that copying data after primary analysis completes. A side effect is that our instruments can access results for any run they have ever performed (with the noted exception that run data in the instrument database was not preserved during the 2 -> 3 upgrade) whether that is good or bad remains to be determined. This also allows us to recover the results disk space and enlarge the images volume on the instrument, which is nice. Secondary analysis on instrument has been a challenge. We've made attempts at using Global SETS to move it off instrument with little success. We've played with increasing the core count on an instrument by adding nodes to an instrument (via a VLAN to our data center) and that seems promising (a 35x2 run against full human genome completes in ~ 5 days with 104 cores.) All real analysis has been done on our existing compute farm and infrastructure using Corona Lite. We've considered using the VLAN approach to move all compute nodes off instrument to help address heat issues in the lab where these reside. Any feedback would be appreciated. We are doing things in a non-standard way in an attempt to make the instruments more manageable. It'd be nice if an instrument could notify an external service when primary analysis was complete, for instance. If anyone else has had luck making SOLiD more automated, manageable and scalable I'd love to hear what you are doing. Thanks, griznog |
![]() |
![]() |
![]() |
#2 | |
Rick Westerman
Location: Purdue University, Indiana, USA Join Date: Jun 2008
Posts: 1,104
|
![]()
No good feedback here but I concur:
Quote:
|
|
![]() |
![]() |
![]() |
#3 | |
Member
Location: Sydney, Australia Join Date: Jul 2009
Posts: 13
|
![]()
Unfortunately not much feedback here either, but I am interested in how you connect these machines together.
Quote:
Unfortunately at the moment the network speeds available at our site makes dumping the images directly to our data centre via NFS unfeasible. |
|
![]() |
![]() |
![]() |
#4 | |
Junior Member
Location: Massachusetts Join Date: Apr 2009
Posts: 2
|
![]() Quote:
Since posting this thread we've had some very good interaction with ABI and the roadmap for v3.5 of the instrument software appears to address many of our issues so once we upgrade to 3.5 later this year we'll revert back to the export model rather than using central NFS. Given the roadmap shown to us, I would withhold recommending central NFS storage until we've seen how well the new software handles exporting. griznog |
|
![]() |
![]() |
![]() |
#5 | |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() Quote:
I would love to hear any successes with using some type of workflow system (Kepler etc.) in automating not only SOLiDs but also other NGS technology, since the big problem for us is having a mix of technologies (and workflows/applications) that are constantly being developed/updated. |
|
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: NY, NY Join Date: Sep 2009
Posts: 6
|
![]()
This is somewhat related to the above. I am with PSSC Labs (www.pssclabs.com). We are working to develop a SOLiD Offline Cluster. All of the information provided above is great. It gives me a much better understanding of the computing needs of the cluster than any of my discussions with AB.
I had a few questions. Do any of you have experience running any AB developed application over Infiniband or other high speed network interconnects? Is there a maximum number of cores where the AB software will no longer scale? Or the performance gain of adding more nodes is negligible? Thank you |
![]() |
![]() |
![]() |
#7 | |
Rick Westerman
Location: Purdue University, Indiana, USA Join Date: Jun 2008
Posts: 1,104
|
![]() Quote:
If we consider the first program -- Mapping -- then there is a maximum number of cores. Basically the mapping program is broken down into 6 sub-programs: 1) Map the read file to each chromosome. The natural core limit on this is the number of chromosomes. 2) Collect the map information into one overall file -- limit of 1 core. 3) Do a per-chromosome re-mapping for the optimal matches. 4-6) Gather back the mapping into one overall file with statistics and an index. Overall rather inefficient. Some of the other ABI programs do seem to take into account the number of cores. Also one could see a way to split the read file into parts and map those parts against the chromosomes. New AB software due out "soon". Maybe it will be more efficient. |
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 197
|
![]()
Interesting info! especially the NFS bit.
How about cost-effective solutions to analysis? I am trying to build an offline cluster with the minimum specs to do the analysis. I am thinking not all labs would have the budget for a cluster computer that just collects dust when they are done with the analysis. What's the lowest spec machine that a Solid User has managed to get away with? Anyone did any benchmarking? |
![]() |
![]() |
![]() |
#9 | |
Rick Westerman
Location: Purdue University, Indiana, USA Join Date: Jun 2008
Posts: 1,104
|
![]() Quote:
Or if you want high-ball then share $100,000+ machines with other people. This is what we do. Seriously, you really should set a budget and then buy within that. That is generally the best bet when purchasing computer equipment. |
|
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 197
|
![]() Quote:
often times when you have a super HPC you think less about algo speedups anyway I managed to find this desktop benchmark for de novo assembly by CLCBIO http://www.clcngs.com/2009/11/new-be...ovo-assembler/ |
|
![]() |
![]() |
![]() |
Tags |
solid, system administration |
Thread Tools | |
|
|