SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks memory usage oscarluoinau Bioinformatics 9 12-10-2012 06:55 AM
segment_juncs memory usage while running Tophat genec Bioinformatics 1 11-22-2011 07:09 AM
bwa mt branch extreme memory usage? Kotoro Bioinformatics 4 11-20-2011 12:17 AM
Memory usage Seta General 2 04-05-2011 10:44 AM
SHRiMP Memory Usage DNAjunk Bioinformatics 5 08-05-2009 02:07 PM

Reply
 
Thread Tools
Old 07-14-2010, 01:26 PM   #1
smg283
Junior Member
 
Location: Hilo , HI

Join Date: May 2009
Posts: 6
Default Memory Usage in Newbler 2.3

I am using Newbler 2.3 on a 8 core workstation with 64 gb memory (running Ubuntu). It seems that during my newbler assemblies, memory usage is limited to ~20 gb. Is there some reason newbler wont use all the memory available (the processors usage is not limiting, they could be running at 10% and still memory usage wont go over ~22 gb)?
smg283 is offline   Reply With Quote
Old 07-15-2010, 05:24 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,167
Default

Why do you think Newbler needs more than 22 GB of RAM? Is it performing a lot of swap operations? In my experience Newbler is fairly memory efficient.
kmcarr is offline   Reply With Quote
Old 07-15-2010, 11:37 AM   #3
smg283
Junior Member
 
Location: Hilo , HI

Join Date: May 2009
Posts: 6
Default

This is during the "reading flowgrams" step when it is generating an output, it is the most time intensive step of my assemblies (these are eukaryote sized assemblies). There is both available CPU and memory, so it should be able to go faster, but something seems to be holding it back. The CPU usage is very low (~10% for each processor). The memory always tops out at ~ 20-21 gigs, even though that is only about a third of the available memory. With Celera assembler, you can change the memory limits during compiling the source, otherwise you are limited in your max memory usage, I didn't know if Newbler had similar limits.
smg283 is offline   Reply With Quote
Old 07-16-2010, 05:49 AM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,167
Default

Quote:
Originally Posted by smg283 View Post
Is there some reason newbler wont use all the memory available ...?
Yes, and it's the same reason any computer program may not use all available memory, because it does not need to. Newbler has loaded all the data it needs, and created all of it's required data structures and that all totals up to (in your case) 20-22 GB.

Quote:
There is both available CPU and memory, so it should be able to go faster, but something seems to be holding it back.
Not all algorithms are perfectly parallelizable. Embarrassingly parallel problems can be split into completely independent threads, fully utilizing all available cpus. An example of this would be a BLAST search; each query sequence can be searched against the database independently of all other queries. Tightly coupled problems can not be completely separated; one part of the problem may depend on the result of some other part, meaning it can not start until the first part of the problem is finished. If there are not enough independent parts of the problem able to run concurrently then some of your cpus will be idle. This is just a fact of life in computer science. Genome assembly is a complex problem with many stages. Some of these are more easily parallelized than others. Next time you run Newbler look at the cpu usage during the "Detangling alignments" step. This process is essentially single threaded; one cpu will be utilized at 100% while all the rest sit idle.

None of your observations sound unusual or unexpected to me for a program like Newbler.
kmcarr is offline   Reply With Quote
Old 07-16-2010, 08:20 AM   #5
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

See Amdahl's Law.
nilshomer is offline   Reply With Quote
Old 11-09-2010, 07:46 AM   #6
martin2
Member
 
Location: Prague, Czech Republic

Join Date: Nov 2010
Posts: 40
Default

Quote:
Originally Posted by smg283 View Post
This is during the "reading flowgrams" step when it is generating an output, it is the most time intensive step of my assemblies (these are eukaryote sized assemblies). There is both available CPU and memory, so it should be able to go faster, but something seems to be holding it back. The CPU usage is very low (~10% for each processor). The memory always tops out at ~ 20-21 gigs, even though that is only about a third of the available memory. With Celera assembler, you can change the memory limits during compiling the source, otherwise you are limited in your max memory usage, I didn't know if Newbler had similar limits.

Try to force run only on a single CPU core. Is it newbler or gsRunProcessor which which into some of the log files that it will split the memory between the forked threads? I forgot ... And, there is also "-m" commandline switch to force in-memory computation, if my memory serves me right. ;-)
martin2 is offline   Reply With Quote
Reply

Tags
memory, newbler

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:09 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO