Seqanswers Leaderboard Ad

**GenoMax** · 10-13-2011, 03:46 AM

Originally posted by Kennels View Post

Hi,

We would like to keep the box we've been using, but if we were to create a cluster:

1. Do we have to buy some kind of special hardware of clusters and setup from scratch? Or just build identical boxes and connect them somehow?

No special hardware is needed. You will connect the nodes/computers you buy using ethernet as your interconnect (there are other options but since you are probably on a tight budget this will be perfectly fine). Plan to purchase a good quality switch (do not buy a cheap desktop ethernet switch but get something more beefy).

Originally posted by Kennels View Post

2. What sort of software should we use to connect the nodes? Given alot of the NGS software still don't support MPI, should we consider MPI, or just some kind of LAN/switch connection between the nodes/towers?

Take a look at http://www.rocksclusters.org/wordpress/. This would be the operating system/queuing software (SGE/PBS) that you will be installing on your cluster. Plan to spend some time on coming up to speed on the finer points of linux clusters if you have not done this sort of thing before.

Originally posted by Kennels View Post

3. Can the extra nodes be of the different architecture (No. of processors, motherboard, amount of RAM etc) as the master node if we consider MPI?

We've started to do some research, but if someone experienced could give some quick advice that would help us greatly!

Thanks in advance!

You can build heterogeneous clusters. You may want to keep things simple by using identical nodes. You will want to get some kind of network attached storage or you could build a NAS box yourself (google for hardware options, software can be this http://www.freenas.org/). Again this is a component that you would want to pay special attention to since your data (and valuable analysis) are going to reside on this storage.

Plan to have a data backup solution of some kind. If you are going to do this as a serious business then you need to be prepared for some sort of failure (hardware/software) from which you need to be able to recover your cluster and your data.

Finally .. before you go overboard consider overall power requirements. A cluster in a small space can start putting out significant heat so give some thought to cooling (if needed).

**arolfe** · 10-13-2011, 09:38 AM

Genomax gave you good advice. For storage, you might consider Gluster, which let's you aggregate storage space from a set of servers into a single filesystem. This might simplify your storage issues and be a cheaper solution.

Also think about whether you can use fewer machines, each with 2 or 4 multicore processors. Aggregating your disk and memory into fewer machines gives you more resources when a job needs huge amounts of memory and can't be split across nodes.

**Kennels** · 10-13-2011, 02:38 PM

thanks for the replies, it is of great help.

**ashwatha** · 10-14-2011, 12:43 AM

No additional hardware is needed. You could consider installing a Hadoop cluster - it simply involves unpacking some tarballs and setting up some config details. The good thing here is that there are already some bioinformatics frameworks (e.g. Crossbow) that can leverage an underlying Hadoop cluster.

Running Hadoop On Ubuntu Linux (Multi-Node Cluster)

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

I am a software engineer turned product manager. Currently focusing on product & technology strategy and competitive analysis at Confluent (USA), the com...

Crossbow: Whole Genome Resequencing Analysis in the Clouds

http://bowtie-bio.sourceforge.net/crossbow/index.shtml

Apache Hadoop

http://hadoop.apache.org/

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Advice for setting up a cpu cluster

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News