Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • computation power requirement for sequencing analysis

    Hi,

    Our facility may get some funding from our institute to purchase new hardware for sequencing data analysis.

    What's the minimal requirements for comparatively efficient and decent analysis of sequencing data?

    Thanks,
    Slny

  • #2
    If most of what you're doing is aligning reads back to a reference genome, a highly-threaded sequence aligner will be quickest with a bunch of cores. If you're doing genome assembly, most of the algorithms don't have much room for parallelization and you'd probably improve performance with a high clock speed.

    These days, a decent workstation with 4-8 cores and 1-2 GB RAM per core can do almost everything. If you need more than that, you probably have a specific purpose in mind and specific hardware requirements.

    Comment


    • #3
      Hi,

      Polyatail has good points there. What I want to add is that some applications (e.g., alignment) is CPU-intensive (so faster and more cores help a lot);other applications (e.g., de novo assembly) is memory intensive - if you do not have enough memory, the program will not run. To assemble a mammalian genome, you may need over 100GB memory.

      Douglas

      Comment


      • #4
        Memory is a key point. You can always wait twice as long for a job to complete because you have one-half of the processors you need. But if you have half of the memory you need then your job will never complete. If you have to make a choice then go for the larger memory.


        Disk speed (and if you are using a cluster, interconnect speed) is also important. Bioinformatics uses large data sets thus skimping on speed will cause your program to slow down. A two-tier storage is a good idea -- fast but small for the actual computation with slow and large for long-term storage.


        At Purdue we share computing resources with many other disciplines. I am always telling the pure computational people that their needs are not matching mine. We just had a planning meeting for our next shared compute cluster. Most people seem to be holding out for the 48 core, 96 GB per node machines but I would prefer the 24 core, 192 GB per node machines at about the same price. Those extra cores are not going to do me as much good as the extra memory.

        Comment


        • #5
          At Purdue we share computing resources with many other disciplines. I am always telling the pure computational people that their needs are not matching mine. We just had a planning meeting for our next shared compute cluster. Most people seem to be holding out for the 48 core, 96 GB per node machines but I would prefer the 24 core, 192 GB per node machines at about the same price. Those extra cores are not going to do me as much good as the extra memory.
          Couldn't agree more. Especially if your storage is lacking, and those 48 cores are pushing out a lot of data (i.e. sequence alignment, BAM files), IO will be a problem. With 192 GB, you'd even have the option of using a ramfs for scratch. Personally though, I'm holding out for the 1024 core, 16 TB per node machines. All signs seem to suggest that I'll be holding out for a while.
          Last edited by polyatail; 06-03-2011, 07:55 AM. Reason: you're -> your (don't laugh, it happens)

          Comment


          • #6
            Depends on what you mean by analyzing sequence data? Are you starting with raw reads or are you only performing tertiary analysis and have a core facility to do the initial heavy lifting for you? In other words, where in the pipeline from raw read files to finished data are you looking at? How much data are you handling at any one time as well (do you need to run multiple analyses simultaneously?).

            For working with raw read output, we went with one of the Penguin clusters pre-configured for use with our ABI SOLiD system (but Penguin makes clusters for any sort of use or configuration). The "base" machine from penguin for ABI data is a Scyld Beowulf 5 node cluster (head node + 4 compute nodes). Each node has a pair of 4-core Xeon's and 24Gb RAM. The whole cluster shares storage space on a ~26Tb RAID 5 array (ie. for data storage, scratch and temp files). Thus far, it's proving to be a decent little pre-packaged cluster.

            For end point analyses like differential sequence determination, I also have a R/BioConductor and ParTek machine with dual 4-core AMD cpu's and 32Gb RAM (your basic Dell off the shelf small server).

            Don't neglect file storage needs - whatever you get will need a decent amount of disc space both to keep data files, but also for temp and working files.
            Michael Black, Ph.D.
            ScitoVation LLC. RTP, N.C.

            Comment


            • #7
              I would go with a small/medium sized cluster and then go for large jobs in the cloud instead.

              From my point of view it is way too expensive to scale the computer needs after the peak requirements - atleast in my department we cannot afford to have clusters where we only use 1-5% of the power on a regular basis.

              rgds
              Mads

              Comment


              • #8
                Originally posted by MadsAlbertsen View Post
                I would go with a small/medium sized cluster and then go for large jobs in the cloud instead.

                From my point of view it is way too expensive to scale the computer needs after the peak requirements - atleast in my department we cannot afford to have clusters where we only use 1-5% of the power on a regular basis.

                rgds
                Mads
                Cloud is an option to consider. Just be sure of your available bandwidth for what you plan to move back and forth. I know for the setup at my Institute, we realized that we (currently at least) simply do not have a "fat enough" pipe to the outside world.
                Michael Black, Ph.D.
                ScitoVation LLC. RTP, N.C.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                29 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X