SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat help: how to maximize CPU performance lewewoo Bioinformatics 3 10-19-2011 05:09 PM
Advice for setting up a cpu cluster Kennels General 4 10-14-2011 12:43 AM
How to assign more CPU cores to terminal on Mac? harrike General 1 04-14-2011 09:56 AM
Computer hardware requirements Najim Bioinformatics 25 04-30-2010 04:46 PM
newbler cpu core usage malatorr 454 Pyrosequencing 2 01-12-2010 04:59 AM

Reply
 
Thread Tools
Old 08-25-2010, 12:51 PM   #1
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default Computer Hardware: CPU vs. Memory

Hi, I am planning to build a computer for next-gen analysis with a tight budget. The main application is de novo assembly, re-sequencing, and RNA-seq.

I can choose either two AMD 8-core CPUs (16 cores total) with 16G memory or one AMD 8-core CPU with 24G memory. My question is whether I should invest in # of cores or memory capacity in this case.

Thank you,
Douglas
DZhang is offline   Reply With Quote
Old 08-25-2010, 12:57 PM   #2
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 179
Default

Definitely more memory. A lot of people are writing terrible code that wastes tons of memory, and it's better to run programs slowly than not be able to run them at all.
jwfoley is offline   Reply With Quote
Old 08-25-2010, 01:07 PM   #3
john_mu
Member
 
Location: Stanford, CA

Join Date: May 2010
Posts: 88
Default

Yes, more memory. Sequencers are only going to spit out more reads.

Also, more memory usually means programs can potentially run faster.
__________________
SpliceMap: De novo detection of splice junctions from RNA-seq
Download SpliceMap Comment here
john_mu is offline   Reply With Quote
Old 08-25-2010, 01:12 PM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Also look at the max memory the motherboard can hold, because you'll probably want to add more memory later (e.g. using consumables budget or next year's money).
maubp is offline   Reply With Quote
Old 08-25-2010, 01:26 PM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,168
Default

You say "main application" but then list three different applications, that have very different requirements. To be fair resequencing and RNA-Seq share a lot of requirements, a primary one being mapping reads to a reference. Mappers do not require a ton memory but can be sped up (in a nearly linear fashion) by adding cpus. As john mentions sequencers are going to be spitting out more reads, but if your pipeline involves mapping those reads to a reference more memory won't do you much good at all, but doubling the # of cpus sure will.

On the other hand de novo assembly is a memory pig, and most algorithms are not highly threaded, meaning additional cpus will not provide much benefit for this application.

You really need to define your requirements better. What specific programs do you think you'll be using? What are their resource requirements to perform projects sized similarly to yours?
kmcarr is offline   Reply With Quote
Old 08-25-2010, 01:31 PM   #6
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Thank you all for great suggestions and comments. Ideally I should bulid two machines - one is for de novo and the other for mapping. Due to the limited budget, I will choose somewhere in between. it is a great suggestion to choose a mainboard with upgrade potential!
DZhang is offline   Reply With Quote
Old 08-25-2010, 02:15 PM   #7
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

I'd also suggest adding a SDD drive to use as a scratch. Then you need cheap 1T drives
to store your data (SATA-II 7k2 should be fine).
Let us know what machine(s) you end up getting.
__________________
-drd
drio is offline   Reply With Quote
Old 08-25-2010, 02:49 PM   #8
mnkyboy
Member
 
Location: Seattle, WA

Join Date: Mar 2009
Posts: 87
Default

You might be better off using AWS (the cloud).
mnkyboy is offline   Reply With Quote
Old 08-25-2010, 09:05 PM   #9
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by DZhang View Post
Hi, I am planning to build a computer for next-gen analysis with a tight budget. The main application is de novo assembly, re-sequencing, and RNA-seq.

I can choose either two AMD 8-core CPUs (16 cores total) with 16G memory or one AMD 8-core CPU with 24G memory. My question is whether I should invest in # of cores or memory capacity in this case.

Thank you,
Douglas
More memory... but I would check that the disk I/O is fast and efficient.
dawe is offline   Reply With Quote
Old 08-26-2010, 12:19 AM   #10
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by DZhang View Post
Hi, I am planning to build a computer for next-gen analysis with a tight budget. The main application is de novo assembly, re-sequencing, and RNA-seq. I can choose either two AMD 8-core CPUs (16 cores total) with 16G memory or one AMD 8-core CPU with 24G memory. My question is whether I should invest in # of cores or memory capacity in this case.
De novo assembly needs more RAM, while re-sequencing (read mapping) and RNA-seq (read mapping + analysis) require less RAM and more CPU.

Frankly, the difference between 16GB and 24GB RAM is not that much, and won't help with de novo too much. More importantly is the RAM PER CPU, your choices are 1 GB/core (x16) or 3 GB/core (x8).

I assume you are working on large genomes for which you have references, like human or mouse? In that case I think you will be doing much more read mapping than de novo, so one would think more cores is better, but 1 GB/core is a bit low for mapping to large genomes, so you may have idle CPUs anyway! So the 24GB RAM would probably be my choice in the end.

The issue of fast disk subsystem is a crucial one, which usually gets ignored. A good RAID controller or smart use of Linux md software RAID with multiple 7200rpm spindles should be enough on your tight budget. But remember, if your disks are slow, you can't get data into RAM fast, and processes wait on I/O a lot - especially when there are so many cores competing for disk I/O ! More RAM helps here too, for disk cache etc.

As an aside, does your institute or partner institute have access to a HPC facility where you can get some CPU allocation?
Torst is offline   Reply With Quote
Old 08-26-2010, 04:54 AM   #11
adamdeluca
Member
 
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95
Default

Look at the motherboard, because what you want is expandability. Boards for the AMD 6100 typically come with 1, 2 or 4 processor sockets and have 8, 16 or 32 memory slots respectively. Processor slots need to be populated with identical cpus (not all need to be filled), and memory slots should be populated in groups of 4 identical sticks.

Your to 24GB is likely a 1P board with 4x4GB + 4x2GB, and thus would fill all of your cpu and memory slots (no expandability without throwing away components).

The 16GB configuration would likely be a 2p board with 8x2GB or 4x4GB and thus would leave 8 or 12 open memory slots (room to grow).
adamdeluca is offline   Reply With Quote
Old 08-27-2010, 12:06 AM   #12
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 197
Default

Quote:
Originally Posted by jwfoley View Post
Definitely more memory. A lot of people are writing terrible code that wastes tons of memory, and it's better to run programs slowly than not be able to run them at all.
AGREED.
I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed?
but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.
KevinLam is offline   Reply With Quote
Old 08-27-2010, 12:17 AM   #13
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by KevinLam View Post
AGREED.
I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed?
An SSD is only marginally faster than a HDD when compared to RAM. A good RAID array of HDDs still beats a single SSD too (for throughput, not latency though).

HDD ~ 75 MB/s
SSD ~ 300 MB/s
RAM ~ 10000 MB/s (!)

Quote:
but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.
Just use your SSD as your virtual memory / swap disk?

Some software is now being intelligently written to exploit RAM/HDD tradeoff, for example this read mapper: Syzygy
Torst is offline   Reply With Quote
Old 08-27-2010, 12:23 AM   #14
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 197
Default

Quote:
Originally Posted by Torst View Post
An SSD is only marginally faster than a HDD when compared to RAM. A good RAID array of HDDs still beats a single SSD too (for throughput, not latency though).

HDD ~ 75 MB/s
SSD ~ 300 MB/s
RAM ~ 10000 MB/s (!)



Just use your SSD as your virtual memory / swap disk?

Some software is now being intelligently written to exploit RAM/HDD tradeoff, for example this read mapper: Syzygy
Well SSDs vary in speeds as well and while you have a point about SATA HDD RAID.
You can easily have SATA SSD RAID.
4 x SSD would have ~ 1200 MB/s by your numbers
only 8.33x slower than RAM!

btw your url is not formatted properly went to some weird site
http://www.nicta.com.au/research/res...Q0MDAzJmFsbD0x

Last edited by KevinLam; 08-27-2010 at 12:27 AM.
KevinLam is offline   Reply With Quote
Old 08-27-2010, 12:28 AM   #15
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by KevinLam View Post
Well SSDs vary in speeds as well and while you have a point about SATA HDD RAID.
You can easily have SATA SSD RAID.
4 x SSD would have ~ 1.2 GB/s by your numbers
Yes you can have SSD RAID of course, and there are plenty of people with Enterprise budgets to do so - but I can't afford it!

The other issue is that 4xSSD RAID0 = 1.2 GB/s = 9.6 Gbit/sec. Even SATA3 is only 6.0 Gbit/sec, so you have to start investing in more expensive interconnects like 10GigE, multiple FC, etc. And have a PCIe bus and CPU<->BUS connection that can cope too!

My point is that it is still long way away from RAM throughput and latency (SSD = micro/milli seconds, RAM = nanoseconds).
Torst is offline   Reply With Quote
Old 08-27-2010, 06:23 AM   #16
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 179
Default

Quote:
Originally Posted by KevinLam View Post
AGREED.
I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed?
but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.
As others noted, this is probably done at the OS level (and the OS is probably Linux if you're building a server this powerful, so it should be easy), but an SSD is way slower than RAM.

Just remember that even the least efficient software is designed to work on someone's machine, so there's an upper limit on how much RAM you'll ever need to be able to run a program. That limit might be on the order of tens of gigabytes (I've heard 40-50 for certain well-known pipelines). But I don't think there's a reason to complement that with SSDs, because they're definitely not going to buy you any additional speed as virtual memory.
jwfoley is offline   Reply With Quote
Old 09-22-2010, 04:52 AM   #17
Markiyan
Senior Member
 
Location: Cambridge

Join Date: Sep 2010
Posts: 115
Default Storage perfomance notes.

PS:
SSD drives HATE random writing in small blocks (they want to write in 64-128-256) KB blocks, but are OK random reads.
Harddisks struggle with random reads/writes, and also with simultanious reading/writing in multiple threads.
Raid5 is TERRIBLE for random writing. (4-5 times slower than RAID10).
My suggestion for the high perfomance system:
(each group is on separate physical HDD or raid array)
[System+Software]
[SWAP]
[SCRATCH]
[Input data]
[output data]
or at least:
[System+Software]
[SWAP+SCRATCH]
[all data]
Remember, that having input/output data on the same RAID0 array is always slower, than having them on separate disks w/o raid
(in memory constrained situation).
If working with tons of small files (phd_dir) order them by the inode number, and then read them sequentially - will be a lot faster (10-20X) than random IO on the same HDD.
Use symlinks to faciliate data processing/organisation.
If you want to use RAID - use RAID10 on the GOOD controller (Adaptec) with at least .25-1GB of the onbord cache and it's own I/O CPU. The perfomance gains with cheap onboard controllers (w/o cache) are often negative... so use separate disks, if can't afford proper RAID.
Markiyan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:53 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO