![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tophat help: how to maximize CPU performance | lewewoo | Bioinformatics | 3 | 10-19-2011 06:09 PM |
Advice for setting up a cpu cluster | Kennels | General | 4 | 10-14-2011 01:43 AM |
How to assign more CPU cores to terminal on Mac? | harrike | General | 1 | 04-14-2011 10:56 AM |
Computer hardware requirements | Najim | Bioinformatics | 25 | 04-30-2010 05:46 PM |
newbler cpu core usage | malatorr | 454 Pyrosequencing | 2 | 01-12-2010 05:59 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: East Coast, US Join Date: Jun 2010
Posts: 177
|
![]()
Hi, I am planning to build a computer for next-gen analysis with a tight budget. The main application is de novo assembly, re-sequencing, and RNA-seq.
I can choose either two AMD 8-core CPUs (16 cores total) with 16G memory or one AMD 8-core CPU with 24G memory. My question is whether I should invest in # of cores or memory capacity in this case. Thank you, Douglas |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Stanford Join Date: Jun 2009
Posts: 181
|
![]()
Definitely more memory. A lot of people are writing terrible code that wastes tons of memory, and it's better to run programs slowly than not be able to run them at all.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Stanford, CA Join Date: May 2010
Posts: 88
|
![]()
Yes, more memory. Sequencers are only going to spit out more reads.
Also, more memory usually means programs can potentially run faster.
__________________
SpliceMap: De novo detection of splice junctions from RNA-seq Download SpliceMap Comment here ![]() |
![]() |
![]() |
![]() |
#4 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
Also look at the max memory the motherboard can hold, because you'll probably want to add more memory later (e.g. using consumables budget or next year's money).
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
You say "main application" but then list three different applications, that have very different requirements. To be fair resequencing and RNA-Seq share a lot of requirements, a primary one being mapping reads to a reference. Mappers do not require a ton memory but can be sped up (in a nearly linear fashion) by adding cpus. As john mentions sequencers are going to be spitting out more reads, but if your pipeline involves mapping those reads to a reference more memory won't do you much good at all, but doubling the # of cpus sure will.
On the other hand de novo assembly is a memory pig, and most algorithms are not highly threaded, meaning additional cpus will not provide much benefit for this application. You really need to define your requirements better. What specific programs do you think you'll be using? What are their resource requirements to perform projects sized similarly to yours? |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: East Coast, US Join Date: Jun 2010
Posts: 177
|
![]()
Thank you all for great suggestions and comments. Ideally I should bulid two machines - one is for de novo and the other for mapping. Due to the limited budget, I will choose somewhere in between. it is a great suggestion to choose a mainboard with upgrade potential!
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
I'd also suggest adding a SDD drive to use as a scratch. Then you need cheap 1T drives
to store your data (SATA-II 7k2 should be fine). Let us know what machine(s) you end up getting.
__________________
-drd |
![]() |
![]() |
![]() |
#8 |
Member
Location: Seattle, WA Join Date: Mar 2009
Posts: 87
|
![]()
You might be better off using AWS (the cloud).
|
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: 45°30'25.22"N / 9°15'53.00"E Join Date: Apr 2009
Posts: 258
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: The University of Melbourne, AUSTRALIA Join Date: Apr 2008
Posts: 275
|
![]() Quote:
Frankly, the difference between 16GB and 24GB RAM is not that much, and won't help with de novo too much. More importantly is the RAM PER CPU, your choices are 1 GB/core (x16) or 3 GB/core (x8). I assume you are working on large genomes for which you have references, like human or mouse? In that case I think you will be doing much more read mapping than de novo, so one would think more cores is better, but 1 GB/core is a bit low for mapping to large genomes, so you may have idle CPUs anyway! So the 24GB RAM would probably be my choice in the end. The issue of fast disk subsystem is a crucial one, which usually gets ignored. A good RAID controller or smart use of Linux md software RAID with multiple 7200rpm spindles should be enough on your tight budget. But remember, if your disks are slow, you can't get data into RAM fast, and processes wait on I/O a lot - especially when there are so many cores competing for disk I/O ! More RAM helps here too, for disk cache etc. As an aside, does your institute or partner institute have access to a HPC facility where you can get some CPU allocation? |
|
![]() |
![]() |
![]() |
#11 |
Member
Location: Iowa City, IA Join Date: Jul 2010
Posts: 95
|
![]()
Look at the motherboard, because what you want is expandability. Boards for the AMD 6100 typically come with 1, 2 or 4 processor sockets and have 8, 16 or 32 memory slots respectively. Processor slots need to be populated with identical cpus (not all need to be filled), and memory slots should be populated in groups of 4 identical sticks.
Your to 24GB is likely a 1P board with 4x4GB + 4x2GB, and thus would fill all of your cpu and memory slots (no expandability without throwing away components). The 16GB configuration would likely be a 2p board with 8x2GB or 4x4GB and thus would leave 8 or 12 open memory slots (room to grow). |
![]() |
![]() |
![]() |
#12 | |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]() Quote:
I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed? but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.
__________________
http://kevin-gattaca.blogspot.com/ |
|
![]() |
![]() |
![]() |
#13 | ||
Senior Member
Location: The University of Melbourne, AUSTRALIA Join Date: Apr 2008
Posts: 275
|
![]() Quote:
HDD ~ 75 MB/s SSD ~ 300 MB/s RAM ~ 10000 MB/s (!) Quote:
Some software is now being intelligently written to exploit RAM/HDD tradeoff, for example this read mapper: Syzygy |
||
![]() |
![]() |
![]() |
#14 | |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]() Quote:
You can easily have SATA SSD RAID. 4 x SSD would have ~ 1200 MB/s by your numbers ![]() only 8.33x slower than RAM! btw your url is not formatted properly went to some weird site http://www.nicta.com.au/research/res...Q0MDAzJmFsbD0x Last edited by KevinLam; 08-27-2010 at 01:27 AM. |
|
![]() |
![]() |
![]() |
#15 | |
Senior Member
Location: The University of Melbourne, AUSTRALIA Join Date: Apr 2008
Posts: 275
|
![]() Quote:
![]() The other issue is that 4xSSD RAID0 = 1.2 GB/s = 9.6 Gbit/sec. Even SATA3 is only 6.0 Gbit/sec, so you have to start investing in more expensive interconnects like 10GigE, multiple FC, etc. And have a PCIe bus and CPU<->BUS connection that can cope too! My point is that it is still long way away from RAM throughput and latency (SSD = micro/milli seconds, RAM = nanoseconds). |
|
![]() |
![]() |
![]() |
#16 | |
Senior Member
Location: Stanford Join Date: Jun 2009
Posts: 181
|
![]() Quote:
Just remember that even the least efficient software is designed to work on someone's machine, so there's an upper limit on how much RAM you'll ever need to be able to run a program. That limit might be on the order of tens of gigabytes (I've heard 40-50 for certain well-known pipelines). But I don't think there's a reason to complement that with SSDs, because they're definitely not going to buy you any additional speed as virtual memory. |
|
![]() |
![]() |
![]() |
#17 |
Senior Member
Location: Cambridge Join Date: Sep 2010
Posts: 116
|
![]()
PS:
SSD drives HATE random writing in small blocks (they want to write in 64-128-256) KB blocks, but are OK random reads. Harddisks struggle with random reads/writes, and also with simultanious reading/writing in multiple threads. Raid5 is TERRIBLE for random writing. (4-5 times slower than RAID10). My suggestion for the high perfomance system: (each group is on separate physical HDD or raid array) [System+Software] [SWAP] [SCRATCH] [Input data] [output data] or at least: [System+Software] [SWAP+SCRATCH] [all data] Remember, that having input/output data on the same RAID0 array is always slower, than having them on separate disks w/o raid (in memory constrained situation). If working with tons of small files (phd_dir) order them by the inode number, and then read them sequentially - will be a lot faster (10-20X) than random IO on the same HDD. Use symlinks to faciliate data processing/organisation. If you want to use RAID - use RAID10 on the GOOD controller (Adaptec) with at least .25-1GB of the onbord cache and it's own I/O CPU. The perfomance gains with cheap onboard controllers (w/o cache) are often negative... so use separate disks, if can't afford proper RAID. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|