SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat help: how to maximize CPU performance lewewoo Bioinformatics 3 10-19-2011 06:09 PM
v3: Effect of high cluster densities on cluster PF and %Q30 pmiguel Illumina/Solexa 3 10-05-2011 06:36 AM
Setting up a shared analysis platform for NGS (advice is welcome) splaisan Core Facilities 1 09-06-2010 12:40 AM
Seeking advice on setting up a breakpoint detection Baseless Bioinformatics 0 03-08-2010 12:54 PM
newbler cpu core usage malatorr 454 Pyrosequencing 2 01-12-2010 05:59 AM

Reply
 
Thread Tools
Old 10-12-2011, 08:21 PM   #1
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default Advice for setting up a cpu cluster

Hi,

We've been working with NGS data on a desktop PC with AMD phenomII x6 processor, and 16GB RAM, Linux Ubuntu. This was put together rather easily, but now we are looking to create a simple cluster of nodes. We are not looking to do anything fancy, and would be more than happy to have duplicate towers with the same specs, but connected somehow. It will just be a local network.
Our main computations at the moment is localized assembly of genes (AMOS, velvet) and alignments using various software (bwa, bowtie, blast, smalt), and we are ok to limit any particular analyses to one node.

We would like to keep the box we've been using, but if we were to create a cluster:

1. Do we have to buy some kind of special hardware of clusters and setup from scratch? Or just build identical boxes and connect them somehow?

2. What sort of software should we use to connect the nodes? Given alot of the NGS software still don't support MPI, should we consider MPI, or just some kind of LAN/switch connection between the nodes/towers?

3. Can the extra nodes be of the different architecture (No. of processors, motherboard, amount of RAM etc) as the master node if we consider MPI?

We've started to do some research, but if someone experienced could give some quick advice that would help us greatly!

Thanks in advance!
Kennels is offline   Reply With Quote
Old 10-13-2011, 04:46 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,082
Default

Quote:
Originally Posted by Kennels View Post
Hi,

We would like to keep the box we've been using, but if we were to create a cluster:

1. Do we have to buy some kind of special hardware of clusters and setup from scratch? Or just build identical boxes and connect them somehow?
No special hardware is needed. You will connect the nodes/computers you buy using ethernet as your interconnect (there are other options but since you are probably on a tight budget this will be perfectly fine). Plan to purchase a good quality switch (do not buy a cheap desktop ethernet switch but get something more beefy).

Quote:
Originally Posted by Kennels View Post
2. What sort of software should we use to connect the nodes? Given alot of the NGS software still don't support MPI, should we consider MPI, or just some kind of LAN/switch connection between the nodes/towers?
Take a look at http://www.rocksclusters.org/wordpress/. This would be the operating system/queuing software (SGE/PBS) that you will be installing on your cluster. Plan to spend some time on coming up to speed on the finer points of linux clusters if you have not done this sort of thing before.

Quote:
Originally Posted by Kennels View Post
3. Can the extra nodes be of the different architecture (No. of processors, motherboard, amount of RAM etc) as the master node if we consider MPI?


We've started to do some research, but if someone experienced could give some quick advice that would help us greatly!

Thanks in advance!
You can build heterogeneous clusters. You may want to keep things simple by using identical nodes. You will want to get some kind of network attached storage or you could build a NAS box yourself (google for hardware options, software can be this http://www.freenas.org/). Again this is a component that you would want to pay special attention to since your data (and valuable analysis) are going to reside on this storage.

Plan to have a data backup solution of some kind. If you are going to do this as a serious business then you need to be prepared for some sort of failure (hardware/software) from which you need to be able to recover your cluster and your data.

Finally .. before you go overboard consider overall power requirements. A cluster in a small space can start putting out significant heat so give some thought to cooling (if needed).
GenoMax is offline   Reply With Quote
Old 10-13-2011, 10:38 AM   #3
arolfe
Member
 
Location: 02119

Join Date: Jul 2011
Posts: 29
Default

Genomax gave you good advice. For storage, you might consider Gluster, which let's you aggregate storage space from a set of servers into a single filesystem. This might simplify your storage issues and be a cheaper solution.

Also think about whether you can use fewer machines, each with 2 or 4 multicore processors. Aggregating your disk and memory into fewer machines gives you more resources when a job needs huge amounts of memory and can't be split across nodes.
arolfe is offline   Reply With Quote
Old 10-13-2011, 03:38 PM   #4
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

thanks for the replies, it is of great help.
Kennels is offline   Reply With Quote
Old 10-14-2011, 01:43 AM   #5
ashwatha
Member
 
Location: India

Join Date: Jul 2011
Posts: 14
Default

No additional hardware is needed. You could consider installing a Hadoop cluster - it simply involves unpacking some tarballs and setting up some config details. The good thing here is that there are already some bioinformatics frameworks (e.g. Crossbow) that can leverage an underlying Hadoop cluster.

http://www.michael-noll.com/tutorial...-node-cluster/
http://bowtie-bio.sourceforge.net/crossbow/index.shtml
http://hadoop.apache.org/
ashwatha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:30 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO