SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
HAPS: Hybrid Assembly Pipeline with SOLiD System relipmoc SOLiD 1 01-13-2012 12:47 AM
Computer requirements to do Bioinformatic analysis Giorgio C Bioinformatics 8 01-03-2012 07:10 AM
Linux system requirements for NGS analysis programs RBGSYD General 6 10-18-2011 11:26 PM
do anyone know Blat for 32 bit linux machine ? papori Bioinformatics 5 03-30-2011 01:57 AM
Linux Distribution on next-gen analysis DZhang Bioinformatics 2 07-02-2010 11:11 PM

Reply
 
Thread Tools
Old 02-11-2009, 10:14 PM   #1
rjsb
Junior Member
 
Location: KÝbenhavn

Join Date: Jan 2009
Posts: 1
Default System requirements linux comp. for off-machine assembly/analysis

Dear all,

I (we) would like to assembly 454-reads (50-100 Mb, possibly 200-400 Mb) on a computer off the actual sequencing FLX-machine. Roche proposes for this a 64-bit dual processor (dual x86 CPU) with 8 Gb RAM computer running Linux (brochure october 2008).

1. Is this requirement still valid?
2. Should I apply a computer with Quad-core and 8 or 16 Gb memory?
3. Are there other people running the Roche 454-software off-machine?

thanks,

Richard
rjsb is offline   Reply With Quote
Old 02-12-2009, 07:57 AM   #2
cariaso
Member
 
Location: Wageningen, the Netherlands

Join Date: Jan 2008
Posts: 31
Default Requirements seem to have changed

BioTeam has recently been setting up a decent sized off rig analysis cluster for a client. The 454 software comes with a script valTool.sh which will report if your rig is large enough. I was quite surprised to see that this wanted 16G on the master node, since we were following the same guidelines you mentioned. We had plenty of extra nodes, but all were configured with 8G of ram. During initial testing on one of these 8G machines it was thrashing hard. Long before base calling ever finished we found some mpi environment variables which allowed the work to run across the cluster quite quickly, but I'd be very wary of a single node analysis rig with only 8G.

Other notable requirements include :
  • Master linux kernel >= 2.6.9-34 smp 64b
  • Disk space accessible from Master >= 1TB available
  • Compute nodes require >= 4GB RAM and same CPU/ARCH/OS specs as head/master


cariaso@BioTeam.net

Last edited by cariaso; 02-12-2009 at 10:44 AM. Reason: base calling, not assembly
cariaso is offline   Reply With Quote
Old 02-12-2009, 08:18 AM   #3
Tom Bair
Member
 
Location: Iowa

Join Date: Oct 2008
Posts: 28
Default

last time I did a top when using gsMapper or gsAssembler it was only using 1 core. The image analysis/base caller for titanium is mpi/multi core aware but I don't think the other tools are so the only thing that will help you is the additional memory. On the other hand we are using an 8 core 32G machine to do image/base calling ~14 hours per full plate. So you may want to take that under consideration if that is in your plans.

Slow to post so adding:

I think cariaso is talking about base calling. Not assembly. I think. FLX is easy either way it is just titanium that taxes everything.

Last edited by Tom Bair; 02-12-2009 at 08:20 AM. Reason: Slow to post
Tom Bair is offline   Reply With Quote
Old 02-12-2009, 10:46 AM   #4
cariaso
Member
 
Location: Wageningen, the Netherlands

Join Date: Jan 2008
Posts: 31
Default

true I did intend base calling. corrected. It seems I've been doing too many assemblies this week.
cariaso is offline   Reply With Quote
Old 04-03-2009, 10:52 AM   #5
cdwan
Junior Member
 
Location: Boston

Join Date: Apr 2009
Posts: 6
Default runAssembly run times

I didn't see any examples of run times for various sizes of assembly, so I thought I would post some here. Apologies if this isn't the right place.

We're running Roche's "runAssembly" wrapper, version 2.0.00.20

The interesting discovery that prompts this post is the "-large" flag. If you provide this flag to runAssembly, it "shortcuts some of the computationally expensive tasks" in the algorithm.

Here are some runtimes, for single threads running on dedicated x86_64 linux machines with 8GB of RAM.

1 data directory: 9.5M "seeds". 15 min, 9 min with LARGE flag
2 data directories: 14M "seeds". 31 min. 21 min with LARGE flag
3 data directories: 23M "seeds". 85 min. 21 min with LARGE flag
4 data directories: 31M "seeds". still running. 30 min with LARGE flag.
...
10 data directories: 78M "seeds". killed. 42 min with LARGE flag.

These are sequences from a prokaryote. Your milage may vary.
cdwan is offline   Reply With Quote
Old 04-21-2009, 07:17 AM   #6
erimar77
Junior Member
 
Location: Saint Louis, MO

Join Date: Apr 2009
Posts: 2
Default

Quote:
Originally Posted by cdwan View Post
10 data directories: 78M "seeds". killed. 42 min with LARGE flag.
What do you mean by "killed"? Did the software fail? I've had newbler assembler fail with large amounts of data as well.
erimar77 is offline   Reply With Quote
Old 04-21-2009, 07:34 AM   #7
hlu
Member
 
Location: Branford, Connecticut

Join Date: Jan 2009
Posts: 32
Default

De Novo Assembly into large genome 50 Mb to 100 MB, that is into insect range, beyond fungal genomes.

It requires lots of memory. 8Mb memory machine is not enough.

We are using 4 core, 32 MB machine, 64 bits. Our machine works for GS Assembly for fungal. But insect assembly is tough. Fungal runAssembly on this machine for 1 run only takes 1 hour or 2. But I did an insect assembly before on 35 runs of FLX, it took about 10 days to finish.

-large flag for gs Assembly helps on speed. But still, I would prefer a beefy machine with huge memory. I would say as large memory as possible.

Assembly is memory hog computation.
hlu is offline   Reply With Quote
Old 04-21-2009, 07:35 AM   #8
cdwan
Junior Member
 
Location: Boston

Join Date: Apr 2009
Posts: 6
Default

Quote:
Originally Posted by erimar77 View Post
What do you mean by "killed"? Did the software fail? I've had newbler assembler fail with large amounts of data as well.
We have no idea whether it would have succeeded eventually or not. It seemed to be progressing - slowly - through the all vs. all comparison stage. We ran out of time to mess with it.
cdwan is offline   Reply With Quote
Reply

Tags
cpu requirement, memory requirement

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO