SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Computer scientist with experience in bioinformatics aurelielaugraud Academic/Non-Profit Jobs 1 05-30-2012 03:59 AM
PacBio RS latest specs anyone? markwatson Pacific Biosciences 10 09-23-2011 08:23 AM
Computer Scientist diving into bioinformatics... where to start? Fixee General 10 06-23-2011 01:00 PM
Is there a list os Bioinformatics innovation or startup team or company? node General 0 05-04-2011 11:23 PM
Desktop computer specs for routine analyses? Post your rig ECO Bioinformatics 11 10-20-2008 08:31 AM

Reply
 
Thread Tools
Old 09-29-2009, 06:57 AM   #1
peromhc
Senior Member
 
Location: Durham, NH

Join Date: Sep 2009
Posts: 108
Default Bioinformatics Computer: List your specs

Hi All,

In reading the forums, it seems like many people are having questions that involve computer power... How much RAM, how many processors, how long for analyses... I suspect that as NGS becomes more mainstream, there will be a lot of labs trying to build workstations to handle the work.. I myself am building a computer to do de novo alignment of a eukaryotic transcriptome using Solexa, and have toiled over its configuration.

So rather than start another of those "how much RAM threads", it might be interesting and useful for people to describe the computer they are running analyses on. For instance, my current build includes:

PROJECT: de novo alignment of a rodent transcriptome
PLATFORM:Solexa 100bp paired end
PROGRAMS USED: Velvet, AbySS

MOTHERBOARD: TYAN S7016: Dual SocketXeon 5500 series. 18 DIMMS
CPU: two Xeon E5520. 8 cores total
RAM: 72gb total. (18 x 4gb sticks)

It seems like this covers the basics, and allows for useful comparison.. This type of thread might be really useful if enough people replied.

In addition to the workstation type configuration, it would be really interesting to see how many people are using supercomputers or large clusters to do assemblies..
peromhc is offline   Reply With Quote
Old 09-29-2009, 11:06 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
... it would be really interesting to see how many people are using supercomputers or large clusters to do assemblies..
That would be me. Of course I do not have the cluster all to my self all of the time but it is handy to have it when I need it.
westerman is offline   Reply With Quote
Old 09-29-2009, 11:54 AM   #3
peromhc
Senior Member
 
Location: Durham, NH

Join Date: Sep 2009
Posts: 108
Default Cluster structure?

Quote:
Originally Posted by westerman View Post
That would be me. Of course I do not have the cluster all to my self all of the time but it is handy to have it when I need it.
Westerman, care to tell me about your cluster. How many nodes, how much ram per node? Are you running analyses in parallel, etc?

Matt
peromhc is offline   Reply With Quote
Old 09-29-2009, 12:57 PM   #4
What_Da_Seq
Member
 
Location: RTP

Join Date: Jul 2008
Posts: 28
Default

Also Dual 4 core Xeon (8 cores total). 32GB RAM, Redhat 5, Novoalign
What_Da_Seq is offline   Reply With Quote
Old 09-29-2009, 11:27 PM   #5
dawe
Senior Member
 
Location: 45°30'25.22"N / 9°15'53.00"E

Join Date: Apr 2009
Posts: 258
Default

It much depends on what kind of analysis we are doing.
We run standard Illumina pipeline on a 4 quad-core Xeon + 32 Gb RAM (HP DL580 G5). We use that also for standard ChIP-seq analysis and bwa alignments. We are going to cluster that server with the former IPAR module (which is 2 quad-core Xeon + 16 Gb which is now running FreeBSD 8 + zfs for tests)
We run other tasks (motif discovery, statistical analysis…) on a small cluster (3 sun X4150, 64 Gb RAM 4 6-core Xeon) which is shared with other groups in the institute...

d
dawe is offline   Reply With Quote
Old 09-30-2009, 02:59 AM   #6
mads b
Junior Member
 
Location: Copenhagen

Join Date: Aug 2009
Posts: 4
Default

We run standard Illumina pipeline on
2x Quad core Xeon 5550 32 GB ram

Software: CLCbio Genomic Workbench.

Configuration allow at 7 parallel de novo or reference assemblies to be finished in 10-15 min each.
mads b is offline   Reply With Quote
Old 09-30-2009, 03:14 AM   #7
dawe
Senior Member
 
Location: 45°30'25.22"N / 9°15'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by mads b View Post
Software: CLCbio Genomic Workbench.
I've tried a full demo but I found it very slow in importing data and analyzing them... Can you share your impressions on CLC GW? Which genomes/applications do you use it for?

d
dawe is offline   Reply With Quote
Old 09-30-2009, 05:41 AM   #8
mads b
Junior Member
 
Location: Copenhagen

Join Date: Aug 2009
Posts: 4
Default

I am in general very satisfied with the program

I am at present analyzing bacterial genomes of 2-4 megabases sequenced as 38 bp single illumina reads (an aspergillus genome of 35 megabases is in the GA at the moment)

i just tested time consumption on a file of 8.3 mio reads: import time 3 min 50 sec.(remember to use import function in "high throughput seq" in toolbox). Many files can be imporrted simultaneously (if you are systematic withh the process..... ;-) ).

De novo assembly was 9 minutes creating 107 contiqs.... This does satisfy me. (but you always want it faster, of course).

Because of the graphic interface GW might be slower???? than other programs. But as a non-bioinformatician I really get a lot of help from the graphic interface.
mads b is offline   Reply With Quote
Old 09-30-2009, 05:42 AM   #9
mads b
Junior Member
 
Location: Copenhagen

Join Date: Aug 2009
Posts: 4
Default

and by the way....I am running windows 7. Don´t know whether it makes any difference....
mads b is offline   Reply With Quote
Old 09-30-2009, 05:49 AM   #10
dawe
Senior Member
 
Location: 45°30'25.22"N / 9°15'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by mads b View Post
i just tested time consumption on a file of 8.3 mio reads: import time 3 min 50 sec.(remember to use import function in "high throughput seq" in toolbox). Many files can be imporrted simultaneously (if you are systematic withh the process..... ;-) ).
De novo assembly was 9 minutes creating 107 contiqs.... This does satisfy me. (but you always want it faster, of course).
Because of the graphic interface GW might be slower???? than other programs. But as a non-bioinformatician I really get a lot of help from the graphic interface.
Mmm, I've tried it for ChIP-seq analysis for mouse samples... importing 1 lane (15 mio reads 36 bp) + aligning to reference + ChIP analysis = 6 hours + RAM draining + a crash...
I don't think the GUI or Windows make the difference (it's java after all).

d
dawe is offline   Reply With Quote
Old 09-30-2009, 06:32 AM   #11
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by peromhc View Post
Westerman, care to tell me about your cluster. How many nodes, how much ram per node? Are you running analyses in parallel, etc?

Matt
Being at a university I have access to a couple of different clusters. One has 4 boxes with 16 cores and either 32GB or 64GB -- in other words 64 cores total. We recently purchased more cores although less memory per core. My other cluster also has 64 cores with 128 GB per box. If required (and if I can go through the hoops to set it up) the university has a Condor pool with way more cores than even I could use (thousands).

And yes, analyses are run in parallel as much as possible. I find the major problem being handling the files individually. At the point the analysis often goes down to one CPU reading and writing to one disk.
westerman is offline   Reply With Quote
Old 09-30-2009, 07:31 AM   #12
pssclabs
Junior Member
 
Location: NY, NY

Join Date: Sep 2009
Posts: 6
Default Bioinformatics Computer: List your specs

We have been discussing different hardware specifications with various next generation sequencing companies. It does appear that there is no standard configuration but one thing is for certain, the computing demands will increase. Bottlenecks are the usual suspects, disk I/O and network backplane. Unfortunately the cost to resolve these bottlenecks are tremendous. We are trying to develop a "building block" approach that will grow with the computing demands over time.

I think you are correct in your basic configuration although the memory does seem to be overkill. But that may be because of your own application needs.
pssclabs is offline   Reply With Quote
Old 10-01-2009, 01:15 PM   #13
peromhc
Senior Member
 
Location: Durham, NH

Join Date: Sep 2009
Posts: 108
Default

So am I correct in assuming that none of you using clusters for your analyses rely on VELVET heavily??
peromhc is offline   Reply With Quote
Old 10-04-2009, 06:27 PM   #14
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by peromhc View Post
So rather than start another of those "how much RAM threads", it might be interesting and useful for people to describe the computer they are running analyses on. For instance, my current build includes:
Main server:

PROJECT: de novo assembly and alignment of a bacterial genomes, N-way comparitive SNP analysis, transcriptomes
PLATFORM:Illumina 36bp PE, Illumina 80bp MP, 454 FLX, 454 Titanium
PROGRAMS USED: Velvet, Shrimp, Nesoni
CPU: 2 x quad core Xeon 5482 (8 cores, 1600 FSB)
RAM: 64 GB total. (16 x 4gb sticks)

Workstations:

PROJECT: everything bacterial
PLATFORM:Illumina 36bp PE, Illumina 80bp MP, 454 FLX, 454 Titanium
PROGRAMS USED: CLC Genome workbench
CPU: 1 x quad core Intel Core2 (4 cores, 1333 FSB)
RAM: 16 GB total. (4 x 4gb sticks)
Torst is offline   Reply With Quote
Old 10-09-2009, 10:21 AM   #15
The_Roads
Member
 
Location: USA

Join Date: May 2009
Posts: 37
Thumbs up

we've been using CLCGWB for a while now, v3.6.5 is fantastic. i'd agree with the comment above, i dont think you can under estimate the benefits of putting a biologist in the driving seat with software with a good gui like GWB. initially we worked with in house computing groups and command line NGS assemblers and it was very inefficient. each time there were questions or ideas for alternative assemblies/analysis there was wait time until the appropriate users were available and computing time could be found. it just wasn't competitive. clearly we gave up before pushing through the learning curve but i'd say unless you are a large institute and have full time access to a large number of dedicated well trained specialists a gui is the way to go.

as for specs we use single quad dell T5400s with extra hdds and 32Gb ram. lack of fast direct storage is our problem but overall works fine for us.

has anyone got plans to use Illumina ipars as assemblers once they come offline? I'm hoping the storage array will solve our storage problem (well for a while at least..)
The_Roads is offline   Reply With Quote
Reply

Tags
computer, cpu, ram, solexa, xeon

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:32 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO