Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Denovo assembly system resources

    Hi,

    Hope someone can help me out with an IT/Systems question.

    I currently process fastq files using Trinity for assembly and this roughly takes 4 hours per sample. I have noticed that throughout this time CPU use almost 100% whilst RAM usage maxes out at around 70%.

    I am using a standalone workstation with 2 six core processors and 96 Gb RAM. I have access to 5 of these currently and they are all used independently. This is the system I inherited from my predecessor so I am open to change should it increase throughput.

    My question is....

    Would creation of a small beowulf style cluster using four of the workstations, allow increased system resources and perhaps speed up my assembly and processing time.

    I am no overly familiar with the IT infrastructure side of this so any advice would be appreciated.

    Thanks in advance.

  • #2
    I wouldn't have thought so. You require all the reads to assemble the genome, so splitting this across a cluster, without a shared/distributed memory model, doesn't fit the assembly paradigm which is why most people use a big box with lots of RAM.

    See:

    Comment


    • #3
      Hi Bukowski,

      Thanks for your reply.

      If we were to cluster the machines and apply a shared/distributed memory model would I likely see an increase in processing speeds due to higher memory/available cores?

      Sorry if this is a naive question but I need to find a way of increasing throughput if at all possible. Appreciate the advice.

      Comment


      • #4
        It sounds like your best bet is just doing things in an embarrassingly parallel manner which is what you're currently doing. I may have misinterpreted your original request, though but the short answer is no.

        If you build a cluster, you get a job scheduler, and the best thing about that is that you stop having to worry about manually managing the jobs - when one finishes on one machine, it just starts the next one in the queue - that's the benefit for you building a cluster of your machines.

        I also didn't spot you were using Trinity, so I'm going to assume that you're doing transcriptome assemblies - Trinity is already using the resources efficiently in the machine, so the run time you see, is just the run time. Providing it's not maxing out the memory, it matters not a jot if your CPU utilisation is high - all you care about in terms of performance is that it's not swapping out to disk.

        Your process is CPU bound not memory bound. The only benefit you would gain from a cluster with a shared memory architecture doesn't solve your apparent issue, which isn't to do with RAM.

        https://github.com/trinityrnaseq/tri...g-Requirements suggests you need 256GB of RAM in a machine - but I don't know what organism you're working on or how many reads you have in a sample.

        You might want to look at end of run profiling:

        Trinity RNA-Seq de novo transcriptome assembly. Contribute to trinityrnaseq/trinityrnaseq development by creating an account on GitHub.


        This might give you more of an idea where the bottleneck is.

        Comment


        • #5
          Perfect.

          Thanks for the comprehensive and helpful response. Stops me wasting any more time looking into this.

          Thanks,
          Sanderson.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 09:21 AM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          40 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 08:48 AM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-01-2024, 06:45 AM
          0 responses
          48 views
          0 likes
          Last Post seqadmin  
          Working...
          X