Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Feedback on workstation for bioinformatics

    Dear all,

    Following a discussion on a good workstation for bioinformatics work...

    We are working on bacterial/plasmids/viruses genomes. Things we are doing at small scale today and that we are going to do more (i.e. scale up to hundreds of bacterial genomes) are mapping, de novo assembly, genome alignments, pan/core genome analyses, comparison by SNP and pairwise blast etc.

    I approached our IT dept with basic specs for a workstation based on messages I found in the forum and in the respective forums of the programs we work with.

    Below is what a vendor suggested to the IT team and now I was asked to check if that looks good enough. Since my knowledge on hardware is poor, I would be glad to get some feedback here.

    I already asked that the OS is changed to Linux... I first thought on a dual-boot system, but as I read and think about it, I think that this should be mainly a Linux (Bio-Linux?) system, with a Windows VM if needed. I cannot think on any bioinformatic program that will work on Windows but not on Linux... we are still mostly working within Windows (with Linux VM), but I guess that we should make a final change over Linux now.

    Thanks in advance!

    Base Unit HP Z840 Workstation
    Packaging HP Single Unit Packaging
    Chassis HP Z840 1125W (1450W/200V) 90% Eff Chass
    Operating System 10*Pro 64 downgrade to Win7 Pro 64 *
    Add-On Selection Operating System Load to PCIe
    Recovery Media Windows 7 Pro 64-bit OS DVD+DRDVD
    Processor Intel Xeon E5-2630v3 2.4 1866 8C 1stCPU
    Processor 2 Intel Xeon E5-2630v3 2.4 1866 8C 2ndCPU
    System Memory 128GB DDR4-2133 (16x8GB) 2CPU RegRAM
    Graphics Card NVIDIA NVS 310 1GB 1st GFX
    Internal Storage 01 HP Z Turbo Drive 256GB PCIe 1st SSD
    Internal Storage 01 4TB 7200 RPM SATA 1st HDD
    Internal Storage 02 4TB 7200 RPM SATA 2nd HDD
    Internal Storage 03 4TB 7200 RPM SATA 3rd HDD
    Internal Storage 04 4TB 7200 RPM SATA 4th HDD
    Internal Storage 05 4TB 7200 RPM SATA 5th HDD
    Optical Device 1 9.5mm Slim SuperMulti DVDRW 1st ODD
    Media Card Reader HP 15-In-1 Media Card Reader
    Warranty HP 3/3/3 Warranty
    Country Kit HP Z840 Country Kit
    Add-On Selection HP Dual Processor Air Cooling Kit

  • #2
    I use an HP Z640 for analysis of human ngs data. Though it was not designed optimaly, it is well suited for our current needs.
    That being said a linux OS is a good choice, the flavor (ubuntu, centos, redhat) depends on your comort level and preference. We do run a windows only application, nextgene, but setup a VM rather than a dual-boot as the dual-boot was rather difficult with windows.
    The type of workstation that you need really depends on your data and the applications used (are they memory dependent or processor intensive). I to am getting ideas from others more experienced, but your off to a good start.

    Comment


    • #3
      My only word of caution regarding your linux OS is to choose it carefully.

      I currently use Ubuntu in a VM and I've had issues getting certain programs to compile correctly (i.e. CASAVA, bcl2fastq from Illumina). Unfortunately, my IT then built our linux workstation with Ubuntu so I'm having to revisit a lot of the same problems when I go to install software. On the plus side, they're usually problems for which I've already identified solutions, but it can get frustrating.

      I have a second VM that uses RedHat and I haven't had any problems or issues with it. Others may have a more informed experience with that OS, though.

      Comment


      • #4
        Is 128 GB the max RAM for this model? I almost wonder if you should drop the second CPU and get more RAM, if you are going to be doing a lot of de novo assemblies ( I am assuming the configuration has been maxed out for your budget).

        Comment


        • #5
          I like CentOS for the operating system.
          Definitely not Windows, under any condition.
          Should fire your IT team for proposing Windows.
          RedHat Enterprise Linux is basically the same thing as CentOS.
          CentOS is the community version of RedHat Linux.

          The DVD drive and the media reader are not necessary, but I suppose the cost of having them is minimal relative to the cost of the system.

          I don't see the utility of the professional graphics card for next-generation sequencing, but then again if you have the budget it won't do any harm. The money spent on the graphics card could be spent on doubling the RAM.

          Comment


          • #6
            The money spent on the graphics card could be spent on doubling the RAM.
            I should really check on that.

            Is 128 GB the max RAM for this model? I almost wonder if you should drop the second CPU and get more RAM
            Eh, a colleague suggested that I ask for further processors... The max RAM seems to be 512. But it is already considered a very unusual purchase in our institute, so I did not want to push that much in the specs. If I get it right, there is room to upgrade it later if necessary.

            About the budget, I actually just gave the IT people a basic configuration, like about 128 RAM and 16 cores, based on suggestions I've seen in the forum, without getting into too many other details. This machine is what the vendor suggested.

            I thought about Bio-Linux as the OS...

            Comment


            • #7
              I have 48 cores on my institute's server, and it is constantly overloaded.
              Luckily, I have access to thousands of core on an external computing cluster.
              I do work nearly exclusively with eukaryotic NGS data, though.

              You can certainly use all 16 cores, once you discover the joys of parallel processing.

              It's just a question of how patient you are, and what turnaround you want. The more cores, the more samples you can process in parallel, and the faster you can process individual samples when parallelization in possible.

              Comment


              • #8
                @sebl appears to belong to a lab (not a core?) and even though prediction of hundreds of samples sounds interesting it may be a while before the lab starts doing that many (reagent costs add up quickly, if you are really going to be running hundreds of samples, even bacterial). If there really are hundreds of samples then using a central compute facility becomes economical/effective.

                @blancha: You can't be the only user on your local server if it has 48 cores and it still stays busy. If you are the only user, then you must be analyzing hundreds of samples a week to keep all those cores busy

                Comment


                • #9
                  @GenoMax: Indeed.

                  Also, once we set up a pipeline for analysis, if it will take one day more to get it done it does not really matter most times, as long as the computer is able to process it in the end.

                  I agree that for really really large sets we may need some bioinformatics core etc. But we are not there yet

                  Comment


                  • #10
                    @GenoMax, I currently have 16 human exosome RNA-Seq samples to reprocess. I'm taking 4 cores per sample for the TopHat runs.
                    4*16 = 64 cores
                    I'm already exhausted my 48 cores. A TopHat run with one core would just be far too long.
                    And, yes, there is a proteomics web application running on the same server, so I have to be careful not to overload the server completely. I actually just keep 38 cores for my NGS pipelines, and leave the 10 others free for other uses.
                    I also have another project with 6 samples to reprocess that has currently been sitting in the queue on the computing cluster that I also use for the past 2 days, either because the cluster is overloaded or because the scheduler is malfunctioning again.

                    It doesn't take hundreds of samples to use 48 cores.
                    Granted, I should probably switch from TopHat to a faster aligner, but it's the only program in my pipeline that I have always been able to count on for giving reliable results. The researchers also still insist on using Cuffdiff, despite my best efforts to convince them to switch to featureCounts and DESeq2.

                    None of this is really relevant to @sebl since he has already said that turnaround is not an issue. But, one can really ever have too many cores. There is often a linear correlation between the number of cores available and the runtime for most bioinformatics programs.
                    Last edited by blancha; 12-08-2015, 01:00 PM.

                    Comment


                    • #11
                      @blancha: Sounds to me like your processes are I/O bound (not surprising) or memory limited. How much RAM is available per core? As you said our discussion is not relevant to @sebl's question though.

                      Comment


                      • #12
                        Our local server, at our institute, has 580 GB of shared memory.
                        So, RAM is generally not an issue.

                        On the Compute Canada cluster, each core requested comes with 2.7 GB RAM, which is generally sufficient.

                        Yes, there is a lot of I/O.

                        I should probably switch to a more efficient pipeline.
                        I should use STAR or Brian's BBMAP, but TopHat has just been my workhorse for years.
                        I can't wean the researchers off Cuffdiff, mainly because they always want the isoform data, which they end up discarding anyway.

                        Even without TopHat or Cuffdiff, some steps monopolize a processor. For example, I had to run bedtools genomecov on dozens of samples last week. I took 42 processors at the same time, which then paralyzed the proteomics web interface running on the same server. I had to reset the queue settings to use only 38 cores.

                        Anyway, I'm sorry to have hijacked @sebl thread, but there can just never be too many cores, either to process multiple samples together, or process one sample in parallel threads.

                        Comment


                        • #13
                          No problem. You keep the thread active so I may get more replies from people

                          What about Biolinux as OS? Any cons that I should be aware of?

                          Thanks again.

                          Comment


                          • #14
                            Originally posted by sebl View Post
                            No problem. You keep the thread active so I may get more replies from people

                            What about Biolinux as OS? Any cons that I should be aware of?

                            Thanks again.
                            Stick with a standard OS (centOS, ubuntu etc) and install apps as necessary to keep things flexible. Leave the systems administration to someone who's job description reflects that

                            Comment


                            • #15
                              Originally posted by blancha View Post
                              but there can just never be too many cores, either to process multiple samples together, or process one sample in parallel threads.
                              I am not 100% convinced about that but I am more patient and do have access to significant resources.

                              It sounds like you have a quad-socket server which would be on the end of not affordable for @sebl. I generally have found BBMap best for my needs and working mostly with a cluster there is no point in having more cores assigned to a job than there are in a physical server since the scheduler (and in turn the admins) don't like it.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X