Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Planning computing budget for Exome-seq data analysis

    Our lab is planning a project which aims to analyze up to 100 human biopsy exome-sequencing data for the next couple of years. I hope I could get some feedbacks here.

    Considering the data are likely come with higher coverage, we are thinking of upgrading our current setup. We are also interested in parallelized the primary analytical pipeline in order to save more time for downstream analysis (variants filtering, statistical testing using R).

    1. Is it sensible to look for a desktop workstation with 2x6 cores, 96G RAM, 2x1.5TB 7200RPM which at least serve us well for the next few years?

    2. Is ~USD8000 enough if we decided to purchase it in mid- 2012?

    Thanks!

  • #2
    Not so long ago I put together a system with 240GBytes ram, 32 cores (4x8) for just over $9000. 2x1TB RAID for OS drive, 3x3TBytes for data plus two backups.
    Based on the TYAN S8812 http://www.tyan.com/product_SKU_spec...&SKU=600000186 and I love it

    Comment


    • #3
      That will be fine, I'd vote to drop to 48GB ram and add more storage in a RAID configuration and increase the number of CPUs so you can multi-task. Most NGS tools do not leverage huge amounts of RAM but are very I/O and compute intensive.

      Comment


      • #4
        I agree with Jon, for that number of cores 48GB of RAM should be sufficient (it is for us). We have 3 4x4core machines w/48GB RAM and can comfortably push 48 exomes a week through our pipeline (we don't tend to push more than one sample per core).
        Last edited by Bukowski; 11-22-2011, 03:46 AM.

        Comment


        • #5
          I'll disagree, a bit, with Jon who said
          Most NGS tools do not leverage huge amounts of RAM but are very I/O and compute intensive.
          I'll agree with the former (I/O intensive) but disagree with the latter (CPU intensive). While many tools do offer parallel multi-cpu capability some do not and even those which are parallel will often have portions of their code/pipeline which become single-CPU. Note that Bukowski uses "one sample per core" or, if I am reading his comment correctly, single-CPU programs (albeit on multiple samples at a time.)

          Personally I much prefer high-memory machines over high-core machines. One way of looking at this is that while a program will take extra time to complete when it runs into CPU limits once a program runs into a memory limit then it will never complete. I don't want to have the latter situation. On the other hand I do a lot of denovo work and those programs tend to be memory intensive. So go with with the human exome people say.

          I do think that your disk space is rather wimpy. 2x1.5TB 7200RPM. Let's assume no RAIDing and thus you get, at the best, 3000GB or about 30GB per sample. Seems small especially since that 3TB is not really 3TB after disk overhead and even smaller if you go with a fast-RAID system. On the other hand you can always easily buy more disks.

          Getting back to the second part of your original question, as per 'dsenalik' your USD$8,000 budget should do just fine. Maybe just plan on spending that and seeing what you can purchase in the middle of 2012.

          Comment


          • #6
            Hard drive prices have gone up recently because of shortages due to flooding in Thailand halting production, so if you don't need all that storage space right away maybe add on more as you need later, although some reports are saying prices will stay elevated for 6 months to a year. Maybe you can still find some that haven't gone up yet, if you can, get them now.

            Eg. http://news.cnet.com/8301-13924_3-57...?tag=mncol;txt or just google it.

            Comment


            • #7
              Originally posted by westerman View Post
              I'll disagree, a bit, with Jon who said I'll agree with the former (I/O intensive) but disagree with the latter (CPU intensive). While many tools do offer parallel multi-cpu capability some do not and even those which are parallel will often have portions of their code/pipeline which become single-CPU. Note that Bukowski uses "one sample per core" or, if I am reading his comment correctly, single-CPU programs (albeit on multiple samples at a time.)
              I will clarify a little! The work is I/O intensive. We've had best success optimising our pipeline by increasing I/O performance. We parallelise where possible, so if systems are not saturated (sample per core), processes are threaded to fill core capacity where possible. Either by taking advantage of built in threading, or splitting jobs more naively across more cores. I gave the sample/core example to give an idea of the turnaround we can achieve with the setup.

              I would not recommend our setup for assembly either, having done some all I have ever wanted in that situation is 'moar RAM'.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X