Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Buying multicore pc for RNASeq de novo assembly

    Hello everyone,

    i would like your opinion on the hardware specs of a multicore pc that my lab wants to buy for analysis of RNASeq data.
    The analysis will include conversion from .bcl to fastq. and then to .bed files from human organisms. Our data was produced from Illumina 1500 HiSeq and the size of the files for this run ranges from 100GB to 800GB.

    I have searched other topics and articles and my conclusion is for
    16 cores with high MhHz,
    512GB,
    6TB of HDD.

    Linux for software,specifically CentOs.

    Thanks!

  • #2
    Go for it. For the listed applications this should be fully adequate.

    Size of the files seem a little odd. Are you saving images from your runs (that is the likely explanation)? There is really no need to do that any longer. You will be saving a ton of space by not doing that (unless a HiSeq 1500 can't do real time image analysis, I am not familiar with a 1500).

    Comment


    • #3
      HiSeq will produce raw FASTQ files in the range of 100-800GB (total for one lane / run). The image files are in the order of 1-5TB.

      On the computer selection aspect, you'll get a better price/performance ratio out of choosing a computer with lots of processing cores, but ignoring raw speed. Two 4-core processors will be cheaper than one 8-core processor, and a 3.2GHz processor will be cheaper than a 4.7GHz processor.
      Last edited by gringer; 11-26-2013, 09:19 AM.

      Comment


      • #4
        Since mebes referred to converting bcl files to fastq it is perhaps safe to infer that mebes is referring to raw data folder sizes.

        We mostly do rapid runs on multiple HiSeq 2500's but on HiSeq 2000, 2 x 100 bp runs generate no more than 600G of data (no images, before conversion to fastq) per flowcell.

        Since 1500 runs a single flowcell, 800 G may be possible for a 2 x 150 bp run. Once mebes responds we will know.
        Last edited by GenoMax; 11-26-2013, 09:45 AM.

        Comment


        • #5
          Thank you for answering and yes i was referring to raw data folder sizes.

          Any other suggestions or warnings from anyone?

          Thanks in advance!

          Comment


          • #6
            The 6TB of disk space will do for two or three full HiSeq runs; after that you will be out of space. Given the price of 512 GB memory your system is already expensive. I would add in more disk space. At least 2 TB per run. Also where and how are you planning to backup your data? A good backup can mitigate HDD concerns.

            Comment


            • #7
              If you want a personal option for storage, you could have a go at the backblaze 3.0 pod:

              Get all of the latest cloud storage news and insights from Backblaze - the leading independent cloud storage provider.


              You can order the 4U pod from 45Drives, to which you then add your own SATA hard drives:



              If you want the most reliable system, the hot-swappable drives can be set up as 3 banks of RAID6 (dual-parity) combined into either a single logical volume, or three separate volumes. With 45 4TB drives, that will give you 144TB of storage space:



              disclaimer: I have not yet convinced any of my clients to install one of these systems at their workplace, I just really like the look of the system.
              Last edited by gringer; 11-27-2013, 10:44 AM.

              Comment


              • #8
                Sobriety time. Here are all the reasons why you SHOULD NOT roll your own backblaze storage pod.


                Unless one is working with irreplaceable samples it probably does not make sense to store data long term for individual labs (for core facilities it is a business decision based on the SLA). You can submit a copy to SRA/EBI and have them store it long term.

                Mebes: I have assumed so far that you are an individual lab looking to purchase this hardware. If you are a core then you should never put all your eggs in one basket. You would want to have identical systems as backup if you expect to process tens of flowcells a month.
                Last edited by GenoMax; 11-28-2013, 10:12 AM.

                Comment


                • #9
                  Most of their "why not" arguments seem to be that the pods are too slow and require actual people to manage them:

                  This is cheap storage, not fast storage and certainly not highly-available storage. It carries a far higher operational and administrative burden than storage arrays traditionally sold into the enterprise.
                  It's a bit one-sided to post that URL without the very next one in the series by the same people:

                  After the last blog post explaining all of the sensible reasons for why you should never build a backblaze pod it's time now to talk about why we did decide to build one.


                  We are using the backblaze pod plus NAS appliance software from www.openfiler.com to build a “last resort” storage pool for scientific data that is not valuable enough to spend lots of money on a more traditional storage solution yet large enough in terabyte terms to represent a significant time-risk should an event occur that would require all this data be re-downloaded again via the internet.

                  We see this $12,000 appliance as a simple hedge against interrupting ongoing research activities. Totally worth it.
                  It's a good idea to consider whether the ongoing cost of high-performance storage is worth it. I prefer the idea of treating bioinformatics computers as another piece of laboratory equipment. You shouldn't expect them to be working every second of the day, and you should be prepared for failure (e.g. repeating experiments if there's a failure before you can submit your read data to SRA).
                  Last edited by gringer; 11-28-2013, 12:31 PM.

                  Comment


                  • #10
                    Originally posted by gringer View Post
                    It's a bit one-sided to post that URL without the very next one in the series by the same people:

                    After the last blog post explaining all of the sensible reasons for why you should never build a backblaze pod it's time now to talk about why we did decide to build one.
                    I knew you would post the other

                    As the blog post said the title was a "tongue in cheek attempt" to get one's attention.

                    I would not recommend building something like this unless you have access to a proper server room infrastructure.

                    Originally posted by gringer View Post
                    It's a good idea to consider whether the ongoing cost of high-performance storage is worth it. I prefer the idea of bioinformatics computers as another piece of laboratory equipment. You shouldn't expect them to be working every second of the day, and you should be prepared for failure (e.g. repeating experiments if there's a failure before you can submit your read data to SRA).
                    We process hundreds of flowcells each year so it is totally worth it to have access to high performance storage (it is also used for analysis by tens of users simultaneously).

                    If you only have couple of machines (and don't run them round the clock) then you are absolutely right.

                    Comment


                    • #11
                      We will use external HDDs for now because we have one machine (Illumina HiSeq 1500) and we are actually building our bioinformatics department now so any advice from more experienced scientists in the field is always welcome.

                      How did you go about purchasing your workstations?
                      What companies should we come in contact with?
                      Are you happy with yours?

                      Thanks in advance!

                      Comment


                      • #12
                        Originally posted by mebes View Post
                        We will use external HDDs for now because we have one machine (Illumina HiSeq 1500) and we are actually building our bioinformatics department now so any advice from more experienced scientists in the field is always welcome.
                        I approve of the external HDD option, because it forces the researchers to acknowledge the reality of data storage, data access, and data failure. Biologists are used to storing their biological samples, so why not hard drives as well? When you need to access your sample data you take it out of storage, put it into the analysis machine (e.g. using a hotplug SATA dock), do some processing and result generation, then take it out and store it again when finished.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Advancing Precision Medicine for Rare Diseases in Children
                          by seqadmin




                          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                          12-16-2024, 07:57 AM
                        • seqadmin
                          Recent Advances in Sequencing Technologies
                          by seqadmin



                          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                          Long-Read Sequencing
                          Long-read sequencing has seen remarkable advancements,...
                          12-02-2024, 01:49 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 12-17-2024, 10:28 AM
                        0 responses
                        32 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-13-2024, 08:24 AM
                        0 responses
                        48 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-12-2024, 07:41 AM
                        0 responses
                        34 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-11-2024, 07:45 AM
                        0 responses
                        46 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X