Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VM specs for NGS bioinformatics

    I am in the process of building a VM for NGS data analysis of human DNA-seq data. The specs are below:

    Linux RedHat OS
    128GB RAM
    2 processors with 6 cores each
    500GB C drive RAID 10
    3 TB D drive

    Any suggestions?

    I am concerned about the cores and RAM, but its a start. Thank you .

  • #2
    Hmm a VM with 128G of RAM?

    I assume you are referring to a real server config and a not a VM.

    Comment


    • #3
      It is stated as a stand-alone server or VM but I believe the VM is a better option if it is feasible. What would you recommend? Thank you .

      Comment


      • #4
        Are we referring to the same thing? To me VM = Virtual Machine.

        Comment


        • #5
          Yes you are correct, VM is Virtual Machine, thank you .

          Comment


          • #6
            It doesn't make much sense to me to run a virtual machine on your own server, among other reasons because of the overhead.

            Commercial service providers often offer virtual machines on their own servers at a lower cost than a dedicated server.
            I haven't seen a commercial company offering a virtual machine with those specifications, though.
            Even a dedicated server from a third party with those specifications might be difficult to find at a reasonable cost.

            As far as I understand from your post, you would want to buy your own server, and run programs directly on the host operating system, without using a virtual machine.

            Comment


            • #7
              The main reasons for a Virtual Machine would be more scalable, less upgrade cost, and I think easier for our hospital to implement and maintain. I have read different posts about hardware requirements and just trying to get what is best suited. Currently, I use a Ubuntu 14.04, 64GB, xeon E5-2630 8 core CPU with 1TB HDD and that doesn't seem like it's enough, but maybe I'm wrong. Thank you .

              What do you mean by overhead?
              Last edited by cmccabe; 11-17-2015, 09:36 AM.

              Comment


              • #8
                Basically, your programs will run slower on the virtual machine than if you run them directly on your host operating system.

                As far as the specifications are concerned, my only recommendation would be to go for a rack server instead of a tower server, if at all possible. Of course, that implies you have the room for a rack server, as well as the staff to manage the rack server, which is not possible in many settings. If you don't have the staff or the room for a rack server, another possibility is using a third-party computing cluster. In Canada, we're lucky to have free computing time made available to researchers through Compute Canada. Again, this may not be possible in all countries.

                I had a long discussion on this subject with a colleague who also wanted to build his own set-up. Here were my recommendations.

                1. Establish your needs first, before trying to determine the appropriate specifications for the server. For example, how many flowcells per month? For how long should the data be stored?

                2. Can you host the server with a third party? Eliminates the cost of keeping staff, and reduces the operating complexity.

                3. If you will build your own server, go for a rack server, if at all possible. This is the cheapest and most scalable option. However, it requires a room for the server, and staff. The cost of keeping competent staff may exceed the cost of operating the server.

                4. If you must go for a tower server, be aware that this option is appropriate mainly for an individual laboratory, and may not suffice to serve the needs of an entire institute.

                Comment


                • #9
                  It's a bit off-subject, but Dell produced this interesting document on building a Linux cluster for next-generation sequencing analysis.

                  It's very technical, and it's about a cluster, not a tower server, but it's still a good read for anyone wanting to build their own platform.

                  Comment


                  • #10
                    I would add that 3.5 TB doesn't seem like very much.
                    Again, it depends on your needs.
                    How many flowcells per month? For how long will the data be stored on the machine?

                    It also depends on the configuration of your server.
                    Can storage be added at a later date, if needed?

                    Comment


                    • #11
                      @cmccabe: You can probably run a VM with the specs you originally listed but that would mean you would want server hardware underneath that would be several times more powerful (unless you plan to run only one VM, which would not make sense). Beyond a certain ceiling (in terms of RAM/sockets) the cost of such hardware escalates rapidly.

                      If your IT is serious about building this right, send them this way

                      Comment


                      • #12
                        We currently run Ion Torrent sequencing on a proton. The estimate of 3.6 TB was based on that data (180 samples per year, each sample is ~20GB). Our IT department, myself included, is new to this type of data... with only a couple of years of experience. There are plans to move to a NextSeq so we have already inquired as too increasing the TB to roughly 9-12. I work in Chicago at a small 300 bed childrens hospital, but NGS is ordered a lot so I am trying to get a better idea. A VM seems like a good option but it seems like its all in the configuration. Thank you .

                        Comment


                        • #13
                          I would recomment 10-20TB storage as a minimum, you'll fill it up in no time at all. You'll need to consider backup too (even external hard disks if you're on a tight budget!).

                          Comment


                          • #14
                            I am looking more into linux clusters as they seem to be a better overall fit. If designed correctly they will be a good fit for the lab today and allow for growth. Thank you .

                            Comment


                            • #15
                              Cluster administration is a non-trivial task so make sure you have someone willing to take that on.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM
                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 06:35 AM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 02:46 PM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-07-2024, 06:57 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-06-2024, 07:17 AM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X