SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Power Analysis - Sample Size Calculation jroussarie Bioinformatics 2 11-07-2012 11:15 AM
GeneProf - Next-Gen Analysis for Next-Gen Data florian Bioinformatics 0 01-30-2012 02:21 AM
Newbie needing advice on required computing power for small-scale NGS facility dalesan Bioinformatics 7 10-03-2011 05:15 AM
computation power requirement for sequencing analysis slny Bioinformatics 7 06-03-2011 12:04 PM
Power analysis for RNAseq dglemay RNA Sequencing 0 03-03-2011 08:34 PM

Reply
 
Thread Tools
Old 04-06-2010, 08:55 AM   #1
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default Computing power for in-house Next Gen analysis

What with costs of outsourcing the standard bioinfomatics needed for next gen data, quickly being able to reach dizzing heights (I was just quoted 1000 Euros for clustering and contig assembly for one cDNA library), I wonder if any advice could be given on a decent powerful set up for in-lab use?

The trade-off between computing time and accumulating outsourcing costs is important, yet would not be to upset if it took one week for contig and clustering of a cDNA library on our own machine.

Does anyone have a powerful setup of their own, which is not an expensive cluster system, able to execute the basic necessities e.g. alignments, BLASTs, SNP searches?

Thanks

Jack

Biology
Dalhousie University
Canada
JackieBadger is offline   Reply With Quote
Old 04-06-2010, 07:50 PM   #2
msincan
Member
 
Location: Maryland

Join Date: Dec 2009
Posts: 19
Default

I am also very interested in learning about this. If no contributions lets start an effort to come up with our own answer.
msincan is offline   Reply With Quote
Old 04-07-2010, 07:01 PM   #3
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 198
Default

You prob can buy a 24 GB ram desktop for 2.5 k USD and install linux on it and work on that.
if you are tech savvy you can create a rocks cluster by stringing up a couple of these desktops for a beowulf type cluster.
KevinLam is offline   Reply With Quote
Old 04-09-2010, 12:14 PM   #4
raela
Member
 
Location: Ithaca, NY

Join Date: Apr 2010
Posts: 39
Default

I looked into this about two months ago. Minimum recommendations I was given were 64bit proc, at least 8 cores (so 2x quad), at least 16gb ram, at least 1tb hdd. I put one together with only 8gb ram to start out with and the total cost was about $1.7k USD.
raela is offline   Reply With Quote
Old 04-09-2010, 11:46 PM   #5
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

If you're only talking about analysing a relatively small number of experiments for a single lab then I don't think you need anything too fancy. The main requirement is for lots of memory (16GB is a pretty cost effective option these days). This will in turn require a 64 bit OS to make use of it. Most tools work most easily under Linux so that would probably be the way to go.

In terms of CPU you don't actually get much advantage from having lots of cores. Very few mapping or assembly tools are multi-threaded so they're only going to occupy a single core. Memory constraints will probably prevent you from running multiple jobs in parallel.

The other thing I'd look into is ensuring you can back up whatever data you create. It's surprising how quickly your storage will fill up. Having a large, fast local disk is a must, and then mirror this to an external storage system to ensure you don't lose anything if you lose data from the main system.
simonandrews is offline   Reply With Quote
Old 07-23-2010, 02:48 AM   #6
aleferna
Senior Member
 
Location: sweden

Join Date: Sep 2009
Posts: 121
Default In house analysis

We bought a 24 thread (Dual Xeon 6 Core with HT) 32GB RAM server for mapping recently (6,000 EUR). I've been able to look at Solexa data without much trouble. One important thing was that I put in a 8 Sata hdd Raid 5. This makes a huge difference because I can get up to 600Mb/s read speed.

Now I only use the cluster when I need to verify alignments with high sensitivity blat.
aleferna is offline   Reply With Quote
Old 03-04-2011, 05:01 AM   #7
dglemay
Member
 
Location: California

Join Date: Feb 2011
Posts: 16
Default

If you don't have node-locked licenses, another possibility is to "rent" time on machines with a lot of RAM through Amazon Web Services.
dglemay is offline   Reply With Quote
Old 03-04-2011, 06:06 AM   #8
karve
Member
 
Location: Colorado

Join Date: Feb 2011
Posts: 12
Default

Costing is a deeply slippery subject in computing in general - I've done it often in my 30 years in the biz and you really can make the numbers come out the way you want to. the infamous jargon phrase - total cost of ownership ( TCO ) is infamous for a reason - check out the wiki on where that comes from.

For this particular industry here is a pertinent blog entry I came across last month:

http://www.politigenomics.com/2009/1...computing.html

the money quote ( hahahahahahaha ) there is:

Using the entire cost of the Dell workstation (even though you require less than 25% of its computational capacity), the break even point is about 14 genomes. It would take about 1.5 years (about half the expected life of IT hardware) at current throughput to sequence 14 genomes with a single Illumina GA IIx. At data rates expected in January 2010, it would take less than a year to break even.

You must of course consider the source - I don't mean to be mean to the author David Dooling - reads like a really decent human being - but he is the LIMS and IS man at the St. Louis School of Med.

My experience - I'm learning here so making many false steps in data downloads/redownloads, running and rerunning and that immediately points to doing it in-house for me When I actually got my act together, doing the IT bit for SNP generation for a 100Mb mouse chromosome on a 4 node 10 core cluster took 3 hours, but about 1 hour index prep time and 5 hours of effective 3Mbps bandwidth time for the data download and about 30 Gb of storage. Oh and the temperatures on 4 PCs went up to an average of 64C. ( I should be able to work out something from that - next run I'll put that meter called Kill-A-Watt on it to see what power they actually draw).

Unquestionably while I'm learning, in-house is the way to go. Once I'm productionized - will this field ever become productionized ? - EC2/Amazon/Azure ( not Google's cloud computing concept - its quite different, chops the task off if any instance takes more than 30 secs of elapsed time ) will be very strong candidates again.

I am very pro-Amazon style cloud computing - computing power as an utility feels like the way to go, but the fact that they threw off WIKILEAKS from their cloud deeply worried me !

I was reasonably pro out-sourcing software dev - on a project by project basis - and to the right set up to India..

For this stuff - for me, not yet.
karve is offline   Reply With Quote
Old 03-04-2011, 06:46 AM   #9
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

On a computer with 16GB of RAM and 4 fast processors it takes 2 days to assemble a single HiSeq lane of data to the reference human genome.
NextGenSeq is offline   Reply With Quote
Old 03-04-2011, 08:39 AM   #10
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 198
Default

Quote:
Originally Posted by JackieBadger View Post
What with costs of outsourcing the standard bioinfomatics needed for next gen data, quickly being able to reach dizzing heights (I was just quoted 1000 Euros for clustering and contig assembly for one cDNA library), I wonder if any advice could be given on a decent powerful set up for in-lab use?

The trade-off between computing time and accumulating outsourcing costs is important, yet would not be to upset if it took one week for contig and clustering of a cDNA library on our own machine.

Does anyone have a powerful setup of their own, which is not an expensive cluster system, able to execute the basic necessities e.g. alignments, BLASTs, SNP searches?

Thanks

Jack

Biology
Dalhousie University
Canada

My 2 cents:
if it's de novo transcriptomics you are outsourcing, I can understand the pricing the difficulties. If i can, I would outsource it myself. There's so many ways you can tweak it to get the most out of your data.

to me machine price is 'cheap' as it wont be for the sole purpose of said project. It can always double for something else.
development time and manpower costs are always the killer in the cost equation.
KevinLam is offline   Reply With Quote
Old 03-04-2011, 07:13 PM   #11
frozenlyse
Senior Member
 
Location: Australia

Join Date: Sep 2008
Posts: 136
Default

Quote:
Originally Posted by NextGenSeq View Post
On a computer with 16GB of RAM and 4 fast processors it takes 2 days to assemble a single HiSeq lane of data to the reference human genome.
I assume you mean align not assemble? And if you use one of the BWT based aligners (bowtie/bwa/soap2) you could probably do that in 2 hours not 2 days
frozenlyse is offline   Reply With Quote
Old 03-05-2011, 01:12 AM   #12
csoong
Member
 
Location: Connecticut

Join Date: Jun 2009
Posts: 74
Default

with already 100M reads of 2*100bp from a single hiseq lane, in house resources seem unlikely to catch up the amount of data increase. i vote for cloud solutions before any revolutionary computation architectures become available (like a quantum computer?).
csoong is offline   Reply With Quote
Old 03-05-2011, 07:27 AM   #13
BaCh
Member
 
Location: Germany

Join Date: May 2008
Posts: 79
Default

Quote:
Originally Posted by csoong View Post
with already 100M reads of 2*100bp from a single hiseq lane, in house resources seem unlikely to catch up the amount of data increase. i vote for cloud solutions before any revolutionary computation architectures become available (like a quantum computer?).
Which does not come in exactly cheap either. The biggest instance Amazon currently has to offer is ~68 GiB and will hurt you at USD 2.28 per hour. An own computer with that much memory can be had for ~10k EUR or perhaps less.

Especially in the "learning" phase where you try out different programs, analyses etc.pp, the cloud prices quickly sum up. Not to speak about data transfer to and from the cloud.

Once you have a SOP (Standard Operation Procedure) for a certain kind of data, then it might be OK.

B.
BaCh is offline   Reply With Quote
Old 03-05-2011, 12:54 PM   #14
karve
Member
 
Location: Colorado

Join Date: Feb 2011
Posts: 12
Default

One other consideration - if you can build your own computer(s), then the h/w cost for a top end set of clustered PCs is quite a lot less - if you can overclock, unlock not-so-duff cores you get even more bang for the buck - if you watch for the deals that come along at any of the various online sites, there's yet another advantage. I added a 6 core, AMD Phenom II, 16 Gb, Mboard, 2 Tb disk and cannibalizing existing bits for the rest - for USD 420.
karve is offline   Reply With Quote
Old 03-06-2011, 04:08 AM   #15
csoong
Member
 
Location: Connecticut

Join Date: Jun 2009
Posts: 74
Default

the high through put sequencing technologies might follow most other then-high-throughput technologies such as microarrays and that biology labs eventually would not have interest dealing with the nitty gritty stats and computer issues.
csoong is offline   Reply With Quote
Old 03-06-2011, 04:59 PM   #16
karve
Member
 
Location: Colorado

Join Date: Feb 2011
Posts: 12
Default

Yet another viewpoint - you can guess I take an interest in this right ? :-) - I took a look at the pricing of a standard configuration for what's called an entry level system - a quad CPU Intel 6 core rack server system ( single mother board 4 CPU slots ) -HP (Hewlett-Packard) ProLiant Entry-level Server - 4 x Xeon X7460 2.66GHz - Rack ( RedHat Operating System)

It comes in at somewhere between 16,000 USD to 25,000 USD - ( and no drives ! and only 16gb memory and only DDR2 at that ) -

http://computers.pricegrabber.com/se...m91898999.html

The wide spread in the price (16K to 25K ) alone makes me think hard about this. Then I checked the benchmarking at a standard place - the rating comes in at 18000. http://www.cpubenchmark.net/multi_cpu.html

Upthread I described my set up costs - for my overclocked, multiple core single slot motherboard setup - it comes in as base ( excluding overclocking and core restoration and excluding case, power supply and other cannibalized parts) benchmark at 5800 and cost 420 USD - ( add on the exclusions and call it 500 )

So to get to a 18,000 benchmark just multiple it by 3 - from the parallelism aspect, the same software I use is what would be there on the Proliant - I nor they have any special magic sauce. But to overestimate lets call it multiply by 4.

I'm kinda just doing the math and for a similar benchmark as the Proliant system - I'd come in at 2000 USD - 2000, not 20,000 ! ( and in my case you'd get redundancy, you'd get 48Gb extra memory and 6Terabytes of disk thrown in for free ! )

The numbers really are so far off - its weird ! Not that this surprises me - computing is like any other commercial field.

Once I understand this domain better, say 3 months time, I'd love to set up some head to head tests.

Let us know what you go for.

Last edited by karve; 03-06-2011 at 05:19 PM. Reason: round up my costs to make the numbers memorable
karve is offline   Reply With Quote
Old 03-06-2011, 05:35 PM   #17
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

We ended up going for 36GB Duel-quad core set-up which came to around $5K
It performs de novo transcriptome assemblies and SNP searches in under an hour, so suits our purposes very well.
For de novo whole genome assemblies one would need to purchase one of the more powerful cluster set-ups, I imagine.

The trickiest part of this whole equation, as mentioned, is in fact the bioinformatics. A true learning curve to say the least !
JackieBadger is offline   Reply With Quote
Old 03-06-2011, 07:01 PM   #18
frozenlyse
Senior Member
 
Location: Australia

Join Date: Sep 2008
Posts: 136
Default

Quote:
Originally Posted by JackieBadger View Post
It performs de novo transcriptome assemblies and SNP searches in under an hour, so suits our purposes very well.
Woah what organism are doing transcriptome assembly on in an hour? And how many reads?
frozenlyse is offline   Reply With Quote
Old 03-07-2011, 12:21 AM   #19
BaCh
Member
 
Location: Germany

Join Date: May 2008
Posts: 79
Default

Quote:
Originally Posted by karve View Post
I'm kinda just doing the math and for a similar benchmark as the Proliant system - I'd come in at 2000 USD - 2000, not 20,000 ! ( and in my case you'd get redundancy, you'd get 48Gb extra memory and 6Terabytes of disk thrown in for free ! )
You are comparing apples and oranges there.

On the one hand: what you pay for on the bigger servers is the ability to have, e.g., anything between 1 and 8 CPUs with anthing between 2 and 8 cores each and up to ~400 GiB RAM (HP) or even 1 TiB RAM (Dell) *on* *one* *machine*. No cluster, no MPI, no whatever "cludge" to distribute your jobs across several machines, it's all there, ready to be easily accessed by a program.

On the other hand: a typical server with "home-grade" hardware has as first limitation the amount of RAM you can plug in. Motherboards for home use I know have a maximum of 6 memory slots which can be fitted with modules of 4 GiB each, adding up to 24 GiB. This is already plenty for quite some applications, and if one does not need more than a quad-core and this amount of RAM, then the machines will come in at something like 2 k.

That being said: the configuration the original author now chose (dual-quad core with 36 GiB for USD 5k) seems quite reasonable to me. Not too shabby, and if it turns out it is too small, can certainly be upgraded.

B.

PS: Me too I'd be interested what organism you have to assemble transcript de-novo in an hour ... how many reads and which program?
BaCh is offline   Reply With Quote
Old 03-07-2011, 03:16 AM   #20
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

Quote:
Originally Posted by frozenlyse View Post
Woah what organism are doing transcriptome assembly on in an hour? And how many reads?

A couple of species of skate..... around 250MB for each transcriptome
JackieBadger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:06 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO