SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert Solid BAM/SAM to Solid GFF? ioannis Bioinformatics 0 07-20-2010 08:06 AM

Reply
 
Thread Tools
Old 06-12-2009, 07:45 AM   #1
griznog
Junior Member
 
Location: Massachusetts

Join Date: Apr 2009
Posts: 2
Default SOLiD from an IT perspective

Hello,

I'm a system admin for a handful of SOLiDs and I'm curious to know what other people are doing as far as IT related issues with SOLiD. We've tried a few novel things in our environment with some success, but it'd be nice to hear from others who are managing these.

We currently have all our machines write directly to a central NFS server (a high performance clustered storage solution) rather that copying data after primary analysis completes. A side effect is that our instruments can access results for any run they have ever performed (with the noted exception that run data in the instrument database was not preserved during the 2 -> 3 upgrade) whether that is good or bad remains to be determined. This also allows us to recover the results disk space and enlarge the images volume on the instrument, which is nice.

Secondary analysis on instrument has been a challenge. We've made attempts at using Global SETS to move it off instrument with little success. We've played with increasing the core count on an instrument by adding nodes to an instrument (via a VLAN to our data center) and that seems promising (a 35x2 run against full human genome completes in ~ 5 days with 104 cores.) All real analysis has been done on our existing compute farm and infrastructure using Corona Lite.

We've considered using the VLAN approach to move all compute nodes off instrument to help address heat issues in the lab where these reside.

Any feedback would be appreciated. We are doing things in a non-standard way in an attempt to make the instruments more manageable. It'd be nice if an instrument could notify an external service when primary analysis was complete, for instance. If anyone else has had luck making SOLiD more automated, manageable and scalable I'd love to hear what you are doing.

Thanks,

griznog
griznog is offline   Reply With Quote
Old 06-12-2009, 01:08 PM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

No good feedback here but I concur:

Quote:
If anyone else has had luck making SOLiD more automated, manageable and scalable I'd love to hear what you are doing.
We have only one SOLiD and thus do not have the problems that griznog has. Never-the-less I find the lack of automation irritating as well as the lack of scalability.
westerman is offline   Reply With Quote
Old 07-19-2009, 05:17 PM   #3
OneManArmy
Member
 
Location: Sydney, Australia

Join Date: Jul 2009
Posts: 13
Default

Unfortunately not much feedback here either, but I am interested in how you connect these machines together.

Quote:
Originally Posted by griznog View Post
We currently have all our machines write directly to a central NFS server (a high performance clustered storage solution) rather that copying data after primary analysis completes.
What is the network speed of the connection you use to connect your SOLiDs to this central NFS server? The images are acquired in Windows, so do yours write to a samba share on the onboard cluster which maps to a NFS mount?

Unfortunately at the moment the network speeds available at our site makes dumping the images directly to our data centre via NFS unfeasible.
OneManArmy is offline   Reply With Quote
Old 07-19-2009, 05:51 PM   #4
griznog
Junior Member
 
Location: Massachusetts

Join Date: Apr 2009
Posts: 2
Default

Quote:
Originally Posted by OneManArmy View Post
What is the network speed of the connection you use to connect your SOLiDs to this central NFS server? The images are acquired in Windows, so do yours write to a samba share on the onboard cluster which maps to a NFS mount?

Unfortunately at the moment the network speeds available at our site makes dumping the images directly to our data centre via NFS unfeasible.
Each SOLiD has a 1 gig uplink to an aggregate switch, which then has 10gbps connection to storage (via about 3 switch hops and one router hop). It's not ideal for latency, but performance seems reasonable and in simple benchmarks the central storage was at least as good as the head node storage for single clients and vastly better for multiple clients. Note that we were only using this for results. I have used it for images on one instrument when a failure in the MD1000 left us without an images directory for a few days but don't consider that a good test of central images storage because of the short duration of usage.

Since posting this thread we've had some very good interaction with ABI and the roadmap for v3.5 of the instrument software appears to address many of our issues so once we upgrade to 3.5 later this year we'll revert back to the export model rather than using central NFS. Given the roadmap shown to us, I would withhold recommending central NFS storage until we've seen how well the new software handles exporting.

griznog
griznog is offline   Reply With Quote
Old 07-19-2009, 06:39 PM   #5
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by griznog View Post
Each SOLiD has a 1 gig uplink to an aggregate switch, which then has 10gbps connection to storage (via about 3 switch hops and one router hop). It's not ideal for latency, but performance seems reasonable and in simple benchmarks the central storage was at least as good as the head node storage for single clients and vastly better for multiple clients. Note that we were only using this for results. I have used it for images on one instrument when a failure in the MD1000 left us without an images directory for a few days but don't consider that a good test of central images storage because of the short duration of usage.

Since posting this thread we've had some very good interaction with ABI and the roadmap for v3.5 of the instrument software appears to address many of our issues so once we upgrade to 3.5 later this year we'll revert back to the export model rather than using central NFS. Given the roadmap shown to us, I would withhold recommending central NFS storage until we've seen how well the new software handles exporting.

griznog
We rely on copying the primary data (after color calling) over to NFS volumes, which allows us to have lost of cheap storage. The most current runs are then stored on a fast distributed file system (lustre) while alignment, variant calling, structural variants, and all other downstream analysis is completed. We then copy back all the results and intermediate files that need to be archived to the NFS servers. A lot of this is human automated, whereby a human has to initiate the transfer, the secondary analysis, and the final archiving.

I would love to hear any successes with using some type of workflow system (Kepler etc.) in automating not only SOLiDs but also other NGS technology, since the big problem for us is having a mix of technologies (and workflows/applications) that are constantly being developed/updated.
nilshomer is offline   Reply With Quote
Old 10-08-2009, 09:31 PM   #6
pssclabs
Junior Member
 
Location: NY, NY

Join Date: Sep 2009
Posts: 6
Default

This is somewhat related to the above. I am with PSSC Labs (www.pssclabs.com). We are working to develop a SOLiD Offline Cluster. All of the information provided above is great. It gives me a much better understanding of the computing needs of the cluster than any of my discussions with AB.

I had a few questions. Do any of you have experience running any AB developed application over Infiniband or other high speed network interconnects?

Is there a maximum number of cores where the AB software will no longer scale? Or the performance gain of adding more nodes is negligible?

Thank you
pssclabs is offline   Reply With Quote
Old 10-13-2009, 02:03 PM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by pssclabs View Post

Is there a maximum number of cores where the AB software will no longer scale?
There are a handful of ABI software packages out there -- e.g., Mapping, SNP calling, Transcriptome -- which often stand alone although they may be sharing programs.

If we consider the first program -- Mapping -- then there is a maximum number of cores. Basically the mapping program is broken down into 6 sub-programs:

1) Map the read file to each chromosome. The natural core limit on this is the number of chromosomes.

2) Collect the map information into one overall file -- limit of 1 core.

3) Do a per-chromosome re-mapping for the optimal matches.

4-6) Gather back the mapping into one overall file with statistics and an index.

Overall rather inefficient. Some of the other ABI programs do seem to take into account the number of cores. Also one could see a way to split the read file into parts and map those parts against the chromosomes.

New AB software due out "soon". Maybe it will be more efficient.
westerman is offline   Reply With Quote
Old 11-26-2009, 01:52 AM   #8
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 197
Default

Interesting info! especially the NFS bit.

How about cost-effective solutions to analysis?
I am trying to build an offline cluster with the minimum specs to do the analysis. I am thinking not all labs would have the budget for a cluster computer that just collects dust when they are done with the analysis.

What's the lowest spec machine that a Solid User has managed to get away with?
Anyone did any benchmarking?
KevinLam is offline   Reply With Quote
Old 11-30-2009, 09:08 AM   #9
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by KevinLam View Post
Interesting info! especially the NFS bit.

How about cost-effective solutions to analysis?
I am trying to build an offline cluster with the minimum specs to do the analysis. I am thinking not all labs would have the budget for a cluster computer that just collects dust when they are done with the analysis.

What's the lowest spec machine that a Solid User has managed to get away with?
Anyone did any benchmarking?
I doubt if anyone will bother benchmarking the lowest machine since such a task would be boring and, IMHO, not much use. Basically just grab a x86-64 based computer with 12 GB of memory and 500 GB of disk space. About $2500 from Dell. That would work. Might be slow. Might run out of disk space eventually. But if you want low-ball then the above should be ok.

Or if you want high-ball then share $100,000+ machines with other people. This is what we do.

Seriously, you really should set a budget and then buy within that. That is generally the best bet when purchasing computer equipment.
westerman is offline   Reply With Quote
Old 11-30-2009, 11:52 PM   #10
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 197
Default

Quote:
Originally Posted by westerman View Post
I doubt if anyone will bother benchmarking the lowest machine since such a task would be boring and, IMHO, not much use. Basically just grab a x86-64 based computer with 12 GB of memory and 500 GB of disk space. About $2500 from Dell. That would work. Might be slow. Might run out of disk space eventually. But if you want low-ball then the above should be ok.

Or if you want high-ball then share $100,000+ machines with other people. This is what we do.

Seriously, you really should set a budget and then buy within that. That is generally the best bet when purchasing computer equipment.
Actually i think benchmarking cost effective machines can be very exciting!
often times when you have a super HPC you think less about algo speedups

anyway I managed to find this desktop benchmark for de novo assembly by CLCBIO
http://www.clcngs.com/2009/11/new-be...ovo-assembler/
KevinLam is offline   Reply With Quote
Reply

Tags
solid, system administration

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO