Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hardware requirements for multi purpose NGS Data analyses

    Hi,

    Since I only found quite old answers to the broad question "What Hardware is needed for NGS analyses", I though I'll give it another try.

    Our department thinks about buying a server for the analysis of NGS data. The projects we work on are quite different: Metagenomics/transcriptomics, De Novo Assembly of eukaryotic genomes (fungi/fish), RAD-Taq for population studies, Phylogenomics, analysis of RNA-Seq Data, Amplicon sequencing analysis (e.g. with QIIME).

    The optimal solution for us would be to buy a server that can handle multiple simultaneous tasks, with lots of space (RAM and HDD) and cores.

    If money is not an issue, what do you think would be a good configuration for that?

    thanks for your thoughts and ideas,

    Philipp

  • #2
    If money is not an issue then you do not need to ask this question

    See some recent threads about hardware:

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Comment


    • #3
      well you are right I just wanted to make sure that people say what they think about their optimal configuration without having to worry about money. In fact we have no budget yet for your server...
      thank you for pointing me to some more recent threads about the topic!
      Last edited by sinnafoch; 10-15-2014, 07:21 AM.

      Comment


      • #4
        Are you looking for a true server (i.e. one that would reside in a server room and not under a desk)? How many people would be using this machine (simultaneously)? What kind/type of job loads are your expecting?

        Comment


        • #5
          A small cluster with at least one high memory node is probably what would be most efficient if you’re talking about whole department with very different tasks.

          If it were me I’d maybe look at something like 4 servers all with 2x2640v3 as the CPU choice. Two would have 128GB of RAM, one with 512 GB of RAM and one with 1TB of RAM. Plus, I’d attached a data storage device that would have all the data in a large RAID. I would avoid going for a single huge server, like some 4x4800 with 2TB RAM machine. In my experience things don’t work out well well too many people are using the same machine. Some of those genome assembly programs (looking at you AllPaths-LG) will use as much memory as they can get their hands on, so if a couple smallish jobs start up, things could go south for everyone.

          I would set up torque to manage the job scheduling since most people are used to that, and you could do things like enforce that only one or maybe two jobs could run on your high memory machines at a time (i.e., 1 16 job or 2 8 core jobs allowed on the machine at once). Then let the small machine soak up the single CPU jobs or things like bowtie runs that don’t need much RAM. You may also want a cheap head node just for logging on, submitting qsub jobs, doing script editing and file transfers. With something like 2x2603v3 with 32 GB of RAM would be fine.

          I’ve recommended these guys a few times on this site, siliconmechanics.com. My lab worked with them before and I know others who have also been happy. My current workstation is from thinkmate, and they were good too, but I don’t know anyone that has bought their rack mountable servers.

          Comment


          • #6
            Originally posted by GenoMax View Post
            Are you looking for a true server (i.e. one that would reside in a server room and not under a desk)? How many people would be using this machine (simultaneously)? What kind/type of job loads are your expecting?
            Yeah, it would be nice to clarify the question of how many people are expected to simultaneously use this thing. When you said it was for a whole department, I assumed we might be talking about a dozen or so people wanting things running 24-7, but if that’s really only going to be 3-4, my recommendations would change. Though if its only 3-4, I wonder why the department would do this rather than the labs themselves.

            Comment


            • #7
              Thank you for the suggestions. I will try to clarify a little bit what we have in mind:
              I think around 10 people would work on the server(s). Of course not everybody will submit jobs all the time, but it might happen sometimes. As I said the research topics are quite different and so is the used software. What I have learned from my own experience and other threads in the forum is that not every software is written in the most memory efficient way or able to properly multithread. Please correct me if I am wrong.
              Therefore it sounds reasonable to me to have a system that is able to deal with many different tasks (in terms of what they need computationally; RAM, CPUs,...).
              In terms of job loads I can't really tell yet...

              Comment


              • #8
                When planning this sort of thing, it is often useful to get your local area representatives for hardware companies in to go over options with you. For example, our local Dell representative is always willing to bring in their experts on storage systems, virtualization systems, and general hardware. They'll sit down with us, go over the sorts of computing needs we have, and then offer their suggestions and various pricing options for their hardware. Lenovo, and others will readily offer the same consultations.

                Nowadays, for example, there is often no need or point to mission dedicated server boxes, when a single high end system and virtualization can handle multiple setups for you easily.

                As already mentioned though, much of what is done with NGS and other genomic data can be best handled in a parallel compute environment, so clusters can be very valuable assets.

                You might also want to contact some companies about using cloud based resources and simply leasing what you need. Numerous companies offer cloud based virtual servers and or clusters, and will include complete system backup and redundancy, so you don't have to also deal with those issues with your own hardware. It seems I frequently hear from colleagues nowadays who have stopped investing in their own hardware and have moved their labs to completely virtual compute systems. Cloud based systems are great for collaborative projects as all you need do is share access to the project, not move files back and forth with people.

                One downside may be where the data can be allowed to live - not everyone wants, or is even allowed, to upload their data or keep their projects entirely in virtual storage.

                If you use Illumina data, you can check their BaseSpace cloud computing offering to see how that sort of virtual compute environment could work for you - https://basespace.illumina.com/home/sequence.
                Last edited by mbblack; 10-15-2014, 08:42 AM.
                Michael Black, Ph.D.
                ScitoVation LLC. RTP, N.C.

                Comment


                • #9
                  @sinnafoch: If it is not apparent by now let me just say that there is no correct/right answer to the question you had originally posed. You have guidelines to go by in the discussion above but much of your decision process will depend on local conditions that are hard for us outsiders to understand.

                  Depending on the IT expertise you have at hand (administering big servers/a cluster is not a simple task and would take valuable time away from an experimental biologist and could lead to a security risk), preferences for hardware vendors at your institution and the tolerances you need to adhere to in regulatory/data integrity terms, would all have to be factored in your decision. It is easy to overlook data backup strategies so I do want to point out that you need to include those in your list of considerations.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X