SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using exome data for IBD analysis Scotch Bioinformatics 3 04-26-2012 12:11 PM
RNAseq analysis on a budget pippi Bioinformatics 47 09-13-2011 07:06 AM
Qs in exome sequencing data analysis Maone Genomic Resequencing 4 06-17-2011 07:32 AM
Maone, newbie in exome sequencing and data analysis Maone Introductions 0 06-15-2011 07:11 AM
Planning a cancer exome sequencing project sadiqsaleem09 Genomic Resequencing 6 05-09-2011 08:38 PM

Reply
 
Thread Tools
Old 11-18-2011, 12:15 AM   #1
zxyeo
Junior Member
 
Location: Singapore

Join Date: Jun 2011
Posts: 6
Default Planning computing budget for Exome-seq data analysis

Our lab is planning a project which aims to analyze up to 100 human biopsy exome-sequencing data for the next couple of years. I hope I could get some feedbacks here.

Considering the data are likely come with higher coverage, we are thinking of upgrading our current setup. We are also interested in parallelized the primary analytical pipeline in order to save more time for downstream analysis (variants filtering, statistical testing using R).

1. Is it sensible to look for a desktop workstation with 2x6 cores, 96G RAM, 2x1.5TB 7200RPM which at least serve us well for the next few years?

2. Is ~USD8000 enough if we decided to purchase it in mid- 2012?

Thanks!
zxyeo is offline   Reply With Quote
Old 11-19-2011, 02:08 PM   #2
dsenalik
Carrot Scientist
 
Location: Madison WI USA

Join Date: Nov 2009
Posts: 42
Default

Not so long ago I put together a system with 240GBytes ram, 32 cores (4x8) for just over $9000. 2x1TB RAID for OS drive, 3x3TBytes for data plus two backups.
Based on the TYAN S8812 http://www.tyan.com/product_SKU_spec...&SKU=600000186 and I love it
dsenalik is offline   Reply With Quote
Old 11-20-2011, 08:14 AM   #3
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

That will be fine, I'd vote to drop to 48GB ram and add more storage in a RAID configuration and increase the number of CPUs so you can multi-task. Most NGS tools do not leverage huge amounts of RAM but are very I/O and compute intensive.
Jon_Keats is offline   Reply With Quote
Old 11-22-2011, 02:43 AM   #4
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

I agree with Jon, for that number of cores 48GB of RAM should be sufficient (it is for us). We have 3 4x4core machines w/48GB RAM and can comfortably push 48 exomes a week through our pipeline (we don't tend to push more than one sample per core).

Last edited by Bukowski; 11-22-2011 at 02:46 AM.
Bukowski is offline   Reply With Quote
Old 11-22-2011, 08:49 AM   #5
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

I'll disagree, a bit, with Jon who said
Quote:
Most NGS tools do not leverage huge amounts of RAM but are very I/O and compute intensive.
I'll agree with the former (I/O intensive) but disagree with the latter (CPU intensive). While many tools do offer parallel multi-cpu capability some do not and even those which are parallel will often have portions of their code/pipeline which become single-CPU. Note that Bukowski uses "one sample per core" or, if I am reading his comment correctly, single-CPU programs (albeit on multiple samples at a time.)

Personally I much prefer high-memory machines over high-core machines. One way of looking at this is that while a program will take extra time to complete when it runs into CPU limits once a program runs into a memory limit then it will never complete. I don't want to have the latter situation. On the other hand I do a lot of denovo work and those programs tend to be memory intensive. So go with with the human exome people say.

I do think that your disk space is rather wimpy. 2x1.5TB 7200RPM. Let's assume no RAIDing and thus you get, at the best, 3000GB or about 30GB per sample. Seems small especially since that 3TB is not really 3TB after disk overhead and even smaller if you go with a fast-RAID system. On the other hand you can always easily buy more disks.

Getting back to the second part of your original question, as per 'dsenalik' your USD$8,000 budget should do just fine. Maybe just plan on spending that and seeing what you can purchase in the middle of 2012.
westerman is offline   Reply With Quote
Old 11-22-2011, 09:57 AM   #6
biznatch
Senior Member
 
Location: Canada

Join Date: Nov 2010
Posts: 124
Default

Hard drive prices have gone up recently because of shortages due to flooding in Thailand halting production, so if you don't need all that storage space right away maybe add on more as you need later, although some reports are saying prices will stay elevated for 6 months to a year. Maybe you can still find some that haven't gone up yet, if you can, get them now.

Eg. http://news.cnet.com/8301-13924_3-57...?tag=mncol;txt or just google it.
biznatch is offline   Reply With Quote
Old 11-23-2011, 12:22 AM   #7
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Quote:
Originally Posted by westerman View Post
I'll disagree, a bit, with Jon who said I'll agree with the former (I/O intensive) but disagree with the latter (CPU intensive). While many tools do offer parallel multi-cpu capability some do not and even those which are parallel will often have portions of their code/pipeline which become single-CPU. Note that Bukowski uses "one sample per core" or, if I am reading his comment correctly, single-CPU programs (albeit on multiple samples at a time.)
I will clarify a little! The work is I/O intensive. We've had best success optimising our pipeline by increasing I/O performance. We parallelise where possible, so if systems are not saturated (sample per core), processes are threaded to fill core capacity where possible. Either by taking advantage of built in threading, or splitting jobs more naively across more cores. I gave the sample/core example to give an idea of the turnaround we can achieve with the setup.

I would not recommend our setup for assembly either, having done some all I have ever wanted in that situation is 'moar RAM'.
Bukowski is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO