Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hardware requirements - analysis only - current recommendations?

    Hi everyone!

    I am trying to figure out what would be a good system for our ngs analysis needs - we are pretty new to working with the data, but it has become quite clear that we need a dedicated system for analysis.

    We do not do sequencing ourselves, only analysis (mapping; resequencing, RNAseq, variant calling) on the data provided. We work exclusively with eucaryotic genomes. We want to use open source tools to begin with, but may want to explore commercial software at some stage, so the hardware should support this/be upgrade-able.

    Outsourcing the analysis is often a problem - either due to confidentiality issues or lack of capacity in core facitlities, or simply when we figure out some interesting analyses for an old data set at a later date - which is why we like to analyse ourselves.

    For now, we are definitely talking small scale - a few runs per year for various purposes, parallelisation not required, so I am thinking a separate dedicated work station.

    Just what would be a reasonable setup?

    How many cores? RAM? Internal memory/HD? Long term data storage will be on a network drive so less pressure on an internal HD, space wise.

    If I could chose, I'd prefer OSX as an operating system to a Linux distribution.

    Any thoughts, community?

    Cheers,
    TabeaK
    Last edited by TabeaK; 11-06-2012, 05:42 AM.

  • #2
    If you are willing to purchase a real "workstation" (i.e. not a high-end gaming system) then pretty much anything that fits your budget in that category would likely be ok. You should max out the RAM (choosing slightly slower CPU's, if you needed to make a trade-off).

    If you are set on using OSX then your options are limited to the Mac Pro's (probably not a good idea to go the "hackintosh" route for a serious workstation). Though most NGS software packages install and work on OS X be ready to spend additional time to make things work (compared to a linux distro) for some of the packages.

    Comment


    • #3
      Yeah...I had kind of forgotten the hardware limits when it comes to OSX...

      Well, I can live with a Linux station if I have to, I just like the convenience of OSX.

      The "pretty much anything that fits in your budget" is stumping me a little bit. Budget wise, we should be reasonably flexible, but what I do not wanna do is completely overspend and have something completely out of dimension.

      I'll have to give our IT guys at least a wish list of processor/cores, RAM and internal memory - and specifying that is giving me a bit of a hard time right now.

      Comment


      • #4
        Originally posted by TabeaK View Post
        Yeah...I had kind of forgotten the hardware limits when it comes to OSX...

        Well, I can live with a Linux station if I have to, I just like the convenience of OSX.
        If you have the money, you could certainly get a Mac Pro. As of now they do go up to 2 hexa-core Xeons (12 total cores/24 threads), 64GB RAM and 8 TB disk (you are looking at $10K USD at that point and that may translate to same amount in Euro the way prices in Europe seem to work).

        Originally posted by TabeaK View Post
        The "pretty much anything that fits in your budget" is stumping me a little bit. Budget wise, we should be reasonably flexible, but what I do not wanna do is completely overspend and have something completely out of dimension.

        I'll have to give our IT guys at least a wish list of processor/cores, RAM and internal memory - and specifying that is giving me a bit of a hard time right now.
        Let me put in a disclaimer that my answer below is only a general guideline. I am sure others would dispute the recommendations below:

        Since you said that you are planning to analyze only a few runs per year ...

        You can start with a single CPU machine (core i7 or xeons would have 4 cores) with 32 GB RAM (get more if you are planning de novo assemblies on largish genomes, alternatively you could possibly get away with 16 GB). Plan to use network storage for long term data but for the actual analysis you should try to provision enough local disk space (4 TB or more [RAID 5 or better, if possible]) with your OS/programs on a solid state drive (if possible). You are going to be bandwidth limited by the Ethernet connection (1 GB?) if you plan to exclusively use network storage for data processing.

        Depending what vendor your IT prefers you should be able to configure a workstation to get a rough idea online.
        Last edited by GenoMax; 11-06-2012, 10:23 AM.

        Comment


        • #5
          Thanx GenoMax, you are an absolute star!

          This is extremely helpful. And while I'd love to get my hands on a fancy MacPro , the second option is probably more realistic considering the anticipated levels of usage. And while money is not that tight, I still have to justify the expense...

          What I take from this is that RAM is WAY more important than processing power - at least in a single work station, right? I'll doubt we"ll be doing denovo assembly in the intermediate future - that is probably something we'd outsource and then work on the contigs we get back or something...

          Anyway, I'll use your suggestion as a basis to work from, thanks a million again!

          Comment


          • #6
            You need to balance RAM and processor power. For Sequence analysis generally additional CPU cores are more important than a few Hundred more Mhz.

            Rough specs for a recommendation would be:
            2x Xeon E5-2665, 8 Cores, 2.4Ghz (or any other 8 core cpu with with higher MHz)
            64GB of RAM minimum, more if at all possible (128GB would be good).
            256GB or bigger SSD (for /tmp or local scratch disk or 2x in raid if you've the money)
            4x 3TB HDD in Raid (Local storage unless you have a really fast NAS)
            10Gb Network card if your network and NAS support it otherwise 2+x 1Gb Ethernet
            A current 64bit Linux version

            If your interested in some of CUDA or OpenCL accelerated algorithms for seq analysis then a NIVIDA graphics card with 3GB+ of ram would be worth considering. Though if it's a choice between a graphics card and more system RAM I'd get the ram.

            With a machine like the above your looking at being able to run programs with 16 threads, or upto 16 different chromosomes simultaneously through something like samtools or other low memory pipelines. Assuming your've got enough bandwidth to your NAS or use cache on the local disk to keep the CPU feed. With 64GB of ram though your only looking at 4GB per core, ideally if possible it'd be better to get double that (128GB total) with 8GB per core you should be able to run either 16 copies or 16threads of many of the common Seq apps with out having to worry about running out of memory.

            128GB RAM would also be enough for you to try some smaller assembly jobs, or with some of the more modern low memory apps some fairly big organisms. The additional RAM gives you a lot more options on what you can do and how you can do it.

            If you've got a really decent NAS that isn't being used by alot of other machines you could probably discard the local HDD and just run analyses off the NAS. If the NAS has alot of users or can't sustain >300MBps reads then it may be worth getting the HDD and copying your data onto it before you start processing it.

            The SSD can either be used similarly to the HDD for storing your working files and the bandwidth you'll get out of it should keep those cores running optimally if your datasets aren't to large, otherwise it can be very useful for dumping intermediate stages and tmp files from your pipeline.
            Last edited by aeonsim; 11-06-2012, 01:08 PM.

            Comment


            • #7
              Excellent aeonsim! Thanks.

              Gonna compile a wish list now and visit our IT department...

              Comment


              • #8
                Originally posted by TabeaK View Post
                Yeah...I had kind of forgotten the hardware limits when it comes to OSX...

                Well, I can live with a Linux station if I have to, I just like the convenience of OSX.
                Personally, I'd recommend getting a linux server class machine and then connecting to that from your OSX machine. All of our analysis is done using remote console sessions on processing servers and you can use a relatively low powered desktop / laptop as the front end. You can even run graphical programs from the remote machine using X11.app if you need to. This also allows multiple people to work on the processing system at once.

                Comment


                • #9
                  I like Simon's suggestion. My work desktop is some little Dell desktop that can be had for well under $1000 USD, while the machines I run almost all my analyses on (a pair of 8-core Dell servers and a small Scyld Beowulf cluster) are downstairs in a server closet. Remote desktop, terminal or X11 sessions are then used to access those machines.

                  That way you could still use a much less expensive OS X machine for all your interface and final tertiary and figure prep. software work (get an i7 mac mini for your personal machine, for example). And, as mentioned, anyone else who needs to use the analytical box(es) can readily do so.
                  Michael Black, Ph.D.
                  ScitoVation LLC. RTP, N.C.

                  Comment


                  • #10
                    If it's do able the approach described by Simon & mbblack is definitely worth considering. Especially seeing it can allow you to plug the server into a rack next to the NAS minimizing any network issues.

                    SSH, MacFuse, and Mac X11 allows you to pretty much treat any server on a local network much like your desktop.

                    Comment


                    • #11
                      Excellent suggestion! We are quite a big organisation, so it is possible some server architecture already exists that may be sufficient for my needs...hmmm...gonna investigate, thanks!

                      Comment


                      • #12
                        Oh, and you guys rock!

                        Comment


                        • #13
                          If you work at a university, you may already have a cluster you could gain access to. If bioinformatics is new to your university, its probably mostly used by engineers.

                          Comment


                          • #14
                            Good comment! I am not at a university, but we still have a rather large campus here, where someone may have a suitable setup already...even if it is primarily used for a different purpose.

                            Comment


                            • #15
                              And if you want to by a Mac Pro, I highly suggest not buying it new from Apple. That computer is now effectively 3 years old and horribly over priced.

                              However, a reasonable value is the 12 core 2.66 on Apple's refurbished site.

                              Alternatively, to use OSX a 15" MBP (or iMac) + a linux server with the basic configuration aeonsim mentioned would work great. I highly recommend taking a look at Silicon Mechanics for a linux box.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              39 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              41 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X