SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
cuffmerge assembly vs denovo assembly of RNAseq data skm Bioinformatics 0 10-16-2013 09:16 PM
question about denovo assembly kenietz Bioinformatics 26 05-13-2013 05:12 PM
Denovo assembly Thenna Bioinformatics 2 05-06-2013 06:09 AM
denovo assembly nagaraj Bioinformatics 5 07-11-2012 06:13 AM

Reply
 
Thread Tools
Old 03-15-2019, 12:52 AM   #1
sanderson83
Junior Member
 
Location: UK

Join Date: Mar 2019
Posts: 3
Default Denovo assembly system resources

Hi,

Hope someone can help me out with an IT/Systems question.

I currently process fastq files using Trinity for assembly and this roughly takes 4 hours per sample. I have noticed that throughout this time CPU use almost 100% whilst RAM usage maxes out at around 70%.

I am using a standalone workstation with 2 six core processors and 96 Gb RAM. I have access to 5 of these currently and they are all used independently. This is the system I inherited from my predecessor so I am open to change should it increase throughput.

My question is....

Would creation of a small beowulf style cluster using four of the workstations, allow increased system resources and perhaps speed up my assembly and processing time.

I am no overly familiar with the IT infrastructure side of this so any advice would be appreciated.

Thanks in advance.
sanderson83 is offline   Reply With Quote
Old 03-15-2019, 04:28 AM   #2
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 385
Default

I wouldn't have thought so. You require all the reads to assemble the genome, so splitting this across a cluster, without a shared/distributed memory model, doesn't fit the assembly paradigm which is why most people use a big box with lots of RAM.

See:
https://ieeexplore.ieee.org/document/6165266
https://www.hpc.informatik.uni-mainz...nome-assembly/
Bukowski is offline   Reply With Quote
Old 03-26-2019, 05:42 AM   #3
sanderson83
Junior Member
 
Location: UK

Join Date: Mar 2019
Posts: 3
Default

Hi Bukowski,

Thanks for your reply.

If we were to cluster the machines and apply a shared/distributed memory model would I likely see an increase in processing speeds due to higher memory/available cores?

Sorry if this is a naive question but I need to find a way of increasing throughput if at all possible. Appreciate the advice.
sanderson83 is offline   Reply With Quote
Old 03-26-2019, 09:09 AM   #4
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 385
Default

It sounds like your best bet is just doing things in an embarrassingly parallel manner which is what you're currently doing. I may have misinterpreted your original request, though but the short answer is no.

If you build a cluster, you get a job scheduler, and the best thing about that is that you stop having to worry about manually managing the jobs - when one finishes on one machine, it just starts the next one in the queue - that's the benefit for you building a cluster of your machines.

I also didn't spot you were using Trinity, so I'm going to assume that you're doing transcriptome assemblies - Trinity is already using the resources efficiently in the machine, so the run time you see, is just the run time. Providing it's not maxing out the memory, it matters not a jot if your CPU utilisation is high - all you care about in terms of performance is that it's not swapping out to disk.

Your process is CPU bound not memory bound. The only benefit you would gain from a cluster with a shared memory architecture doesn't solve your apparent issue, which isn't to do with RAM.

https://github.com/trinityrnaseq/tri...g-Requirements suggests you need 256GB of RAM in a machine - but I don't know what organism you're working on or how many reads you have in a sample.

You might want to look at end of run profiling:

https://github.com/trinityrnaseq/tri...time-Profiling

This might give you more of an idea where the bottleneck is.
Bukowski is offline   Reply With Quote
Old 03-27-2019, 06:16 AM   #5
sanderson83
Junior Member
 
Location: UK

Join Date: Mar 2019
Posts: 3
Default

Perfect.

Thanks for the comprehensive and helpful response. Stops me wasting any more time looking into this.

Thanks,
Sanderson.
sanderson83 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO