Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice for setting up a cpu cluster

    Hi,

    We've been working with NGS data on a desktop PC with AMD phenomII x6 processor, and 16GB RAM, Linux Ubuntu. This was put together rather easily, but now we are looking to create a simple cluster of nodes. We are not looking to do anything fancy, and would be more than happy to have duplicate towers with the same specs, but connected somehow. It will just be a local network.
    Our main computations at the moment is localized assembly of genes (AMOS, velvet) and alignments using various software (bwa, bowtie, blast, smalt), and we are ok to limit any particular analyses to one node.

    We would like to keep the box we've been using, but if we were to create a cluster:

    1. Do we have to buy some kind of special hardware of clusters and setup from scratch? Or just build identical boxes and connect them somehow?

    2. What sort of software should we use to connect the nodes? Given alot of the NGS software still don't support MPI, should we consider MPI, or just some kind of LAN/switch connection between the nodes/towers?

    3. Can the extra nodes be of the different architecture (No. of processors, motherboard, amount of RAM etc) as the master node if we consider MPI?

    We've started to do some research, but if someone experienced could give some quick advice that would help us greatly!

    Thanks in advance!

  • #2
    Originally posted by Kennels View Post
    Hi,

    We would like to keep the box we've been using, but if we were to create a cluster:

    1. Do we have to buy some kind of special hardware of clusters and setup from scratch? Or just build identical boxes and connect them somehow?
    No special hardware is needed. You will connect the nodes/computers you buy using ethernet as your interconnect (there are other options but since you are probably on a tight budget this will be perfectly fine). Plan to purchase a good quality switch (do not buy a cheap desktop ethernet switch but get something more beefy).

    Originally posted by Kennels View Post
    2. What sort of software should we use to connect the nodes? Given alot of the NGS software still don't support MPI, should we consider MPI, or just some kind of LAN/switch connection between the nodes/towers?
    Take a look at http://www.rocksclusters.org/wordpress/. This would be the operating system/queuing software (SGE/PBS) that you will be installing on your cluster. Plan to spend some time on coming up to speed on the finer points of linux clusters if you have not done this sort of thing before.

    Originally posted by Kennels View Post
    3. Can the extra nodes be of the different architecture (No. of processors, motherboard, amount of RAM etc) as the master node if we consider MPI?


    We've started to do some research, but if someone experienced could give some quick advice that would help us greatly!

    Thanks in advance!
    You can build heterogeneous clusters. You may want to keep things simple by using identical nodes. You will want to get some kind of network attached storage or you could build a NAS box yourself (google for hardware options, software can be this http://www.freenas.org/). Again this is a component that you would want to pay special attention to since your data (and valuable analysis) are going to reside on this storage.

    Plan to have a data backup solution of some kind. If you are going to do this as a serious business then you need to be prepared for some sort of failure (hardware/software) from which you need to be able to recover your cluster and your data.

    Finally .. before you go overboard consider overall power requirements. A cluster in a small space can start putting out significant heat so give some thought to cooling (if needed).

    Comment


    • #3
      Genomax gave you good advice. For storage, you might consider Gluster, which let's you aggregate storage space from a set of servers into a single filesystem. This might simplify your storage issues and be a cheaper solution.

      Also think about whether you can use fewer machines, each with 2 or 4 multicore processors. Aggregating your disk and memory into fewer machines gives you more resources when a job needs huge amounts of memory and can't be split across nodes.

      Comment


      • #4
        thanks for the replies, it is of great help.

        Comment


        • #5
          No additional hardware is needed. You could consider installing a Hadoop cluster - it simply involves unpacking some tarballs and setting up some config details. The good thing here is that there are already some bioinformatics frameworks (e.g. Crossbow) that can leverage an underlying Hadoop cluster.

          I am a software engineer turned product manager. Currently focusing on product & technology strategy and competitive analysis at Confluent (USA), the com...


          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X