Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Denovo assembly system resources

    Hi,

    Hope someone can help me out with an IT/Systems question.

    I currently process fastq files using Trinity for assembly and this roughly takes 4 hours per sample. I have noticed that throughout this time CPU use almost 100% whilst RAM usage maxes out at around 70%.

    I am using a standalone workstation with 2 six core processors and 96 Gb RAM. I have access to 5 of these currently and they are all used independently. This is the system I inherited from my predecessor so I am open to change should it increase throughput.

    My question is....

    Would creation of a small beowulf style cluster using four of the workstations, allow increased system resources and perhaps speed up my assembly and processing time.

    I am no overly familiar with the IT infrastructure side of this so any advice would be appreciated.

    Thanks in advance.

  • #2
    I wouldn't have thought so. You require all the reads to assemble the genome, so splitting this across a cluster, without a shared/distributed memory model, doesn't fit the assembly paradigm which is why most people use a big box with lots of RAM.

    See:

    Comment


    • #3
      Hi Bukowski,

      Thanks for your reply.

      If we were to cluster the machines and apply a shared/distributed memory model would I likely see an increase in processing speeds due to higher memory/available cores?

      Sorry if this is a naive question but I need to find a way of increasing throughput if at all possible. Appreciate the advice.

      Comment


      • #4
        It sounds like your best bet is just doing things in an embarrassingly parallel manner which is what you're currently doing. I may have misinterpreted your original request, though but the short answer is no.

        If you build a cluster, you get a job scheduler, and the best thing about that is that you stop having to worry about manually managing the jobs - when one finishes on one machine, it just starts the next one in the queue - that's the benefit for you building a cluster of your machines.

        I also didn't spot you were using Trinity, so I'm going to assume that you're doing transcriptome assemblies - Trinity is already using the resources efficiently in the machine, so the run time you see, is just the run time. Providing it's not maxing out the memory, it matters not a jot if your CPU utilisation is high - all you care about in terms of performance is that it's not swapping out to disk.

        Your process is CPU bound not memory bound. The only benefit you would gain from a cluster with a shared memory architecture doesn't solve your apparent issue, which isn't to do with RAM.

        https://github.com/trinityrnaseq/tri...g-Requirements suggests you need 256GB of RAM in a machine - but I don't know what organism you're working on or how many reads you have in a sample.

        You might want to look at end of run profiling:

        Trinity RNA-Seq de novo transcriptome assembly. Contribute to trinityrnaseq/trinityrnaseq development by creating an account on GitHub.


        This might give you more of an idea where the bottleneck is.

        Comment


        • #5
          Perfect.

          Thanks for the comprehensive and helpful response. Stops me wasting any more time looking into this.

          Thanks,
          Sanderson.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X