Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Workstation for Bioinformatic Analysis

    Hello Everyone,

    My mentors and my colleagues are trying build a workstation that will serve as an analysis and data storage workstation for our bioinformatics research. Our research currently deals with RNA-Seq data sets that we hope analyze on this workstation. The eukaryotic systems that are working with are fish, flies, and mice.

    We will be doing trimming of data on this workstation by using applications like trimmomatic and FastQC. Then the assembly of the data will be done on cloud solutions like iPlant, Galaxy, Amazon AWS-Galaxy. Then maybe we might even do some analysis using EdgeR, DESeq, and other RNA-Seq analysis tools on the workstation.

    We have talked about installing our own Galaxy version on the workstation as well.

    We hope to run Matlab, R, Linux (virtual unix), or Windows with emulated Unix via Cygwin. Also we will try Virtual Login into the machine through VNC.

    Our budge is 5 to 6K.

    We are thinking of going with the following:

    2x Intel Xeon with 8 cores each or anything newer that you would recommend?
    On of our collaborators was recommending just going with 16 core i7.

    How much RAM should we go with and the Motherboard?

    I was thinking somewhere between 32 to 64 GB with future option to upgrade to 128 GB of RAM.

    What would you guys and gals, kindly recommend in specs for the workstation? Thank you in advance.

    I am trying to use this as our base: http://ark.intel.com/products/series...ct-Family#@All

  • #2
    Judging by Intel's official prices, 15-core chips are out of your price range. If you get dual E5-2660, E5-2665, or E5-2670, you will get more CPUs at higher speeds than a single 15-core chip or even 12-core chip, at a lower price - under $3000.

    These 8-core chips are Sandy Bridge; they're great for bioinformatics. The 15-core chips are Ivy Bridge which is newer but doesn't perform significantly better in integer applications, which includes most bioinformatics programs.

    Dual ECC 4x8GB memory kits (64GB total; for dual Sandy Bridge server chips, you need a minimum of 8 modules to get maximal performance) should be around $800 or less. That should leave you, on most motherboards, at least 8 free slots so you could easily upgrade to 128GB. These should be 1600MHz or more (PC3 12800). For more information you can browse Newegg:

    ...or call up crucial.com to talk to an actual person with knowledge of RAM.

    Then you need some hard drives, which are cheap, so 2 to 4 drives of size 2TB or 3TB (at $100 to $200 each) would be good in some kind of RAID configuration, unless you already have fast network-attached storage, in which case you could get by with only a single HDD.

    And you have to budget for a few more things, like a case (or rack), motherboard, and power supply. I don't know much about these in the server realm.

    FYI, our cluster nodes are mainly dual E5-2670 with 128GB RAM and either 2TB or 4TB of local disk, running Linux, and they are quite nice to work on. We do all pre- and post-processing, mapping, and assemblies on them. Large assemblies often need to run on nodes with more RAM, though.

    I'm not sure it's a good idea to plan on running Windows + Cygwin; I find Cygwin to be much more difficult to use than real Linux, and there is also probably some performance reduction. Most bioinformatics tools are aimed at Linux so I would suggest either Linux + Windows dual-boot (if you need Windows applications) or just Linux. Linux also has the advantage of being a lot cheaper than Windows. Personally, I'm OS-agnostic; my BBTools package (which does things like trimming, mapping, normalization, and other assembly-related bioinformatics functionality) is written in Java and works equally well in Linux or Windows. But ultimately you will need to run some things that require Linux, and are possibly difficult to install, or compile, or are buggy and unstable; running emulated Linux can only make these factors worse.
    Last edited by Brian Bushnell; 05-24-2014, 09:35 AM.

    Comment


    • #3
      Just so you’re aware, Intel’s timeline is putting Haswell E5s (2600 v3) to be released in Q3 of this year, probably meaning widely available by Q4. So if you can wait you’ll see pretty significant boost in performance. For the lower end 2600s, we’ll see 10 cores while maintaining the same clock rate or slightly higher. Also Haswell EP will bring DDR4 memory, which will improve speed and capacity.

      However, what you’re doing doesn’t sound so computationally intensive, so if time is an issue, you should be fine with Ivy Bridge (2600s v2). For what you’re talking about I’d recommend:

      2 x E5-2640 v2 (16 cores at 2.0GHz base)
      64 GB of RAM (as an 8x8GB configuration)

      Your hard drive situation will depend on data storage needs. Personally, I’d go with a small RAID5 if you have some network attached storage that provides the main location for data and you just pull stuff down to your server temporally and put the final results back. If that’s not the case, you may want two different types of RAID. Such as a RAID0 (or RAID5) for your working stuff and a RAID1 for archive.

      I think you’ll want Linux as the operating system, then just put windows in a Virtual Machine if you must. If this computer’s primary job is to be the informatics work horse for several labs, you’ll need it booted to linux basically 24-7. Or you could just give up on Windows for this machine and get a cheap windows laptop to use when you actually need Windows. If you want IT support, you could probably install RedHat, but you have to check with them and understand that its not cheap. Otherwise I’m very happy on Ubuntu and I have a Mac laptop for when I want to use things like Word or PhotoShop.

      Comment


      • #4
        Hmmm, looks like the E5-2650 V2 is actually a much better deal than the Sandy Bridge (V1) processors - lower power and higher performance than the E5-2665 at a lower price. Or, as mentioned above, you could go with the E5-2640 v2, and save around $800 (on the pair). The E5-2650 V2 would be about 30% faster, though, while making the whole system maybe 15% more expensive, so it's not a bad deal.

        Comment


        • #5
          Thank you guys for the great suggestions. I am trying to find a compatible motherboard for them on Newegg as I can't seem to find one amazon without them being out of stock based on http://processormatch.intel.com If you guys could kindly help us out that where to find compatible motherboards, I will really appreciate it.

          Will something like this be possible on the motherboard:

          2x E5-2650 v2 $2600 total
          64 GB of 1600 DDR3 of RAM (8x8) with the option of another (8 x 8) for a total of 128 GB of RAM (if this is possible. I would go with the former) (1000-1200 dollars)
          USB 3.0
          Firewire (maybe)
          Motherboard(700 dollars?)

          Graphics Card PCI-E (200-500 dollars, if we do some 3D protein modeling)
          RAID 0/1/5 Hard Drives (2 TB) with other drives being a few TBs as well. (600 dollars)

          Then Case, Cooling, DVD+/-RW/keyboard/mice/2 wide screen monitors ($800)

          I will definitely share the information in regards newer Intel processors releasing this fall/winter (Q3/Q4). Although we do need the computers by the end of June for research needs to back up everything and to do further research.

          Yes, we will definitely will be dual boot Linux and Windows or just leave it with Linux (Ubuntu or RedHat) The reason behind having Windows is that there some other softwares, which only run on Windows as far as we know.

          Comment


          • #6
            I don't know if I'd spend a lot of money on the video card. You have to get something, because these CPU's lack integrated graphics, but unless you will be unable to upgrade the machine (which is typical at institutions with a lot of bureaucracy), I would pass on an expensive one until you actually have something you need it for. Something like a GeForce 750 should be fine as long as it has 2 video outputs of the type you need (HDMI/DisplayPort). DVI output can also drive an HDMI monitor with a cheap (<$10) adapter. Anyway, a card of that level is perfectly 3D-capable. The more important issue is drivers, and I'm suggesting Nvidia because I have heard better things about their Linux drivers than ATI/AMD, but I do not know much about either; I use Windows and just connect to Linux compute nodes with Putty - the actual nodes themselves don't have any graphics capability at all.

            Also, I can't offer you any advice on server-class motherboards. You might be better off asking in the forums of a tech site like TechReport where a lot of people build computers, and some of the members maintain servers and clusters for companies. Also, many people there run Linux with various video cards so they'll know more about the state of drivers.

            Comment


            • #7
              [QUOTE=Zapages;141165]Thank you guys for the great suggestions. I am trying to find a compatible motherboard for them on Newegg as I can't seem to find one amazon without them being out of stock based on http://processormatch.intel.com If you guys could kindly help us out that where to find compatible motherboards, I will really appreciate it.

              Will something like this be possible on the motherboard:

              2x E5-2650 v2 $2600 total
              64 GB of 1600 DDR3 of RAM (8x8) with the option of another (8 x 8) for a total of 128 GB of RAM (if this is possible. I would go with the former) (1000-1200 dollars)
              USB 3.0
              Firewire (maybe)
              Motherboard(700 dollars?)

              Graphics Card PCI-E (200-500 dollars, if we do some 3D protein modeling)
              RAID 0/1/5 Hard Drives (2 TB) with other drives being a few TBs as well. (600 dollars)

              Then Case, Cooling, DVD+/-RW/keyboard/mice/2 wide screen monitors ($800)[\QUOTE]

              Absolutely, if you have the budget get the 2650 v2s. I was just trying to keep it in the rough same ballpark as the system I got myself for about $5000 without the hard drives which was with the 2630 v1s (though I had more drives and more RAM).

              As for a mother board, you could do this: http://www.newegg.com/Product/Produc...-349-_-Product

              Personally, I’d stay with Supermicro boards. And building your own duel processor workstation isn’t so common so you’ll have limited choices. When I got my server I went through thinkmate and the price really wasn’t that different from a self build, except they charge too much for hard drives and only offer enterprise drives (personally, consumer drives are good enough for me). For example you could configure this http://www.thinkmate.com/system/hpx-xs8-2460 with 2x 2650s and 8x8GB of RAM, no OS and 1TB hard drive for about $5200, which isn’t much different then the total build cost you have above. Then in either case you buy the hard drives for the RAID(s) from newegg or some other vendor. It might save you $500 to build yourself, but you won’t have a single point of contact for any warranty issues, plus you’re liable for any damage during the build. Now I’ve build PCs, but its one thing to risk bending a pin on a $100 CPU, but another to do the same on a $1200 CPU twice!

              Yes, we will definitely will be dual boot Linux and Windows or just leave it with Linux (Ubuntu or RedHat) The reason behind having Windows is that there some other softwares, which only run on Windows as far as we know.
              Yes, there are some, I just want to caution against the dual boot strategy since that necessarily means only one or the other OS be active. With linux as your main OS then Windows as the guest, that will allow the Linux host to be on all the time for the bulk of the work, then Windows can come and go without interrupting potentially long running jobs in Linux.
              Last edited by Wallysb01; 05-24-2014, 02:32 PM.

              Comment


              • #8
                I am going to offer a different opinion.

                If you are at an academic institution then consider checking contract pricing for a workstation from Dell/HP (or whoever) with your purchasing department. At times you will get much better pricing/inclusive 3 year warranties and a single point of contact, should you need warranty service.

                Some of the components may not even be available in the retail channels so that would be other consideration, trying to buy retail. Buying individual components may save you some money but you would be dealing with separate vendors if there is a need to get warranty service.

                Focus on getting the most RAM and a reasonable set of storage (RAID 5 or better). Getting slightly slower/older CPU's would add at the most an hour or two to the compute time which you may not notice on overnight runs. A good data backup solution is a must if you care about preserving your analyses till the data is published.

                Comment


                • #9
                  If you are able to co-locate the hardware in a proper server room then I would recommend going with a real rack server. There is no need to keep a new shiny toy under the desk if it can be located in a temperature controlled/backup power supplied secure facility.

                  Use a regular PC/Mac to access the server remotely and do what ever work you need to do that way.

                  Comment


                  • #10
                    The newly released Ubuntu Orange Box might be nice for bioinfo stuff, though it's $12k so not quite in your price range..

                    Inside the Orange Box, you'll find ten Intel micro-servers powered by Ivy Bridge i5-3427U CPUs. Each mini-server has four cores, Intel HD Graphics 4000, 16GBs of DDR3 RAM, a 128GB SSD root disk, and a Gigabit Ethernet port. The first computer also includes a Centrino Advanced-N 6235 Wi-Fi Adapter, and 2TB Western Digital hard drive. These are all connected in a cluster with a D-Link Gigabit switch. Put it all together and you get a 40-core, 160GB RAM, 1.2TB SSD cluster in a box.
                    But you could build something similar with less components yourself. The gigabit switch is the key for many cores while avoiding the premium pricing of Xeons..
                    savetherhino.org

                    Comment


                    • #11
                      Seek advice for setting up a core for NGS analysis.

                      Dear friends,

                      I am going to set up a core for RNA-seq and Chip-seq data analysis and plan to buy a comoputer or a workstation. Honestly, I am not good at computering, as I am a biologist. It will be deeply appreciated if you could kindly give me advices on purchasing what king of computer I should buy. Regards, Peng Liu

                      Comment


                      • #12
                        Originally posted by pliu View Post
                        Dear friends,

                        I am going to set up a core for RNA-seq and Chip-seq data analysis and plan to buy a comoputer or a workstation. Honestly, I am not good at computering, as I am a biologist. It will be deeply appreciated if you could kindly give me advices on purchasing what king of computer I should buy. Regards, Peng Liu
                        Have you checked internally to see if central IT department (I assume there is one at your institution) would be willing to help with your needs? You would be much better off using a hosted server, that someone is professionally managing, which will leave time for you to get your core off the ground (which is going to be enough work already).

                        Comment


                        • #13
                          Thanks a lot and I will check it.

                          Comment


                          • #14
                            I second GenoMax, if you’re not particularly computationally inclined, managing your own workstation is maybe more than you should try to do.

                            Often what is possible is that you buy a computer (it would probably be of the rack mountable server type), to be placed in a common computing room IT or your university cluster uses, and you pay them some monthly/yearly fee to manage it.

                            If you’re not doing whole genome/transcriptome assembly, you should be fine with some sort of duel Intel Xeon E5 set up with 128 GB of RAM and some RAID system for data storage. Your IT probably has some preferred vendors you could talk to them about, but I’ve been pretty happy with these two:

                            For over 25 years, Thinkmate has been a leader in building the highest quality custom servers and workstations for business, research and government.


                            I’d go with E5-2600 v3, something like the 2x E5-2630v3s, 128GB of RAM, 4x3TB HDD, in a 1U or 2U rack mount server would probably fit the bill for a reasonable price tag ($5-6K).

                            And if you go that route, I’d suggest buying a direct or network attached storage (DAS or NAS) system to hold your data locally in your lab as well as on the out of lab server (good 4 bay NAS/DAS can be up to $2K with the drives).

                            EDIT: I guess I was looking at some bad prices on 2600v3s, so I’ve edited above to go back to recommending those. This looks like decent price (I’d just bump up to the 2630 and 128GB of RAM then talk to them about the hard drive prices): http://www.siliconmechanics.com/i573...xeon-e5-1u.php
                            Last edited by Wallysb01; 10-09-2014, 09:12 AM.

                            Comment


                            • #15
                              In addition to the above mentioned vendors, i can recommend these guys:



                              Not affiliated with the company, but I have known them for nearly ten years ;-)

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              37 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              41 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X