SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Minimum hardware requirements for TopHat Hunny Bioinformatics 0 04-28-2011 09:33 PM
Computer hardware requirements Najim Bioinformatics 25 04-30-2010 04:46 PM
Questions about hardware specifications for analysis sallusti10 Illumina/Solexa 7 02-09-2010 11:39 AM
workstation hardware Berlinq Bioinformatics 7 12-10-2009 01:18 AM
a hardware question dina Bioinformatics 1 09-22-2009 11:29 PM

Reply
 
Thread Tools
Old 12-10-2010, 03:45 AM   #1
jdjax
Member
 
Location: Denmark

Join Date: Dec 2010
Posts: 23
Question Server hardware and OS

Hello

I would like to pose a question to computer scientists and other researchers using NGS about the server hardware, OS and software for NGS analysis.

In a few months I will be working with RNA-seq data from an Illumina GAII, with the data I am going to align the reads with the reference BAC libraries and ESTs available, annotate candidate genes and find SNPs for further plant breeding experiments. I am also going to generate a database so that all the data generated from the RNA-seq and my analysis can be used for future use.

My supervisor has asked me to handle purchasing, setting up and maintaining the server for this project. The IT director likes dell servers and has mentioned getting another dell for this project. But I just cringe at the thought of running Linus on a dell.
I personally want a Sun Fire x64 server with Solaris OS mainly because of the ZFS.

Considering the RNA-seq analysis software and storage/backup of 100s of Gb -- Which sever hardware and OS system would work best?

I want to thank you in advance for your opinion.
__________________
jdjax
Ph.d. Student
Ċarhus University
jdjax is offline   Reply With Quote
Old 12-10-2010, 05:34 AM   #2
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Yeah, we all want a decent file system, with checksums.
I haven't had much luck with OpenSolaris on third party hardware though that won't be problem with a machine from SUN. Keep in mind that you'll need a service agreement.

RAID, in any variation is no replacement for backups though.

Don't underestimate the difficulty of getting analysis software written for Linux running on Solaris - it can be done, but I would not imagine it as trivial.
Potentially, you can get a virtual Linux running on it (via XEN), but don't skimp on the RAM ZFS supposedly likes a big cache and you have to statically assign the RAM between the systems.
ffinkernagel is offline   Reply With Quote
Old 12-10-2010, 07:09 AM   #3
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

I would get Linux servers for computation. As finkernagel implied, getting analysis software running on Solaris can be either a pain or impossible. We have both Solaris and Linux (centos/redhat) machines and there is just some software we can not run on Solaris. The reverse is only true in a few edge cases.

The file system is a different matter and, arguably, should be decoupled from the compute machines. For the file system you will want to consider both fast access for computing and longer term slow storage (as well as archival media). We use both a BlueArc system (expensive at $18K for 7.5 TB but it handles everything we throw at it) and Sun "thumpers" (aka, 4500s; 48TB using ZFS, and relatively cheap).


And personally I recommend that unless you are running a really big set of machines, I would let someone else (your IT guys or the "cloud") handle the hardware & base OS software. Instead you should concentrate on the science and analysis of the NGS data. There are enough headaches in that.
westerman is offline   Reply With Quote
Old 12-10-2010, 07:12 AM   #4
dawe
Senior Member
 
Location: 45°30'25.22"N / 9°15'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by jdjax View Post
My supervisor has asked me to handle purchasing, setting up and maintaining the server for this project. The IT director likes dell servers and has mentioned getting another dell for this project. But I just cringe at the thought of running Linus on a dell.
I personally want a Sun Fire x64 server with Solaris OS mainly because of the ZFS.

Considering the RNA-seq analysis software and storage/backup of 100s of Gb -- Which sever hardware and OS system would work best?

I want to thank you in advance for your opinion.
I have direct experience with nexenta CP 3 + ZFS. The question is: why would you need ZFS? If you need it for dedup, well, consider that most of illumina data won't be duplicated, only some of your analysis will be. If you need it for compression, consider that most of the data will be in compressed format. If you need it for FS healing, well, that is great, but you a RAID 6 may be enough. Indeed I have Nexenta + ZFS for a Galaxy instance, where there's an high chance of having duplicated big text files. ZFS is not the most performant FS (unless you are able to tune it very well). Also, I had to spend some additional time on building 64bit apps for NGS.
We have an HP server + a MSA disk array (RAID 6) + Ubuntu 10.10 and it works great!

d
dawe is offline   Reply With Quote
Old 12-12-2010, 11:38 PM   #5
jdjax
Member
 
Location: Denmark

Join Date: Dec 2010
Posts: 23
Default

Thanks for your input. I appreciate you help.
__________________
jdjax
Ph.d. Student
Ċarhus University
jdjax is offline   Reply With Quote
Old 12-13-2010, 05:38 AM   #6
drio
Senior Member
 
Location: 41°17'49"N / 2°4'42"E

Join Date: Oct 2008
Posts: 323
Default

My 2 cents,

I seem you have some experience with unix. I'd suggest you take care of the project. Yes, it is going to be more work for you but you will have full root access to the hardware. If there is any problem, you can blame yourself.

It is a pity that most of the scientific hardware this days is only tested in windows, I mean, linux.

Here is another alternative to the single machine approach:

Hardware

+ 24U rack (1u) -- consider a 42u if you are planning to expand
+ 1 APC (2u)
+ 4 8 cores 32G 1u machines. 2 x 1T 10k rpm drives SATA-II (16G would be enough); People seems to like HP, any other suggestions?

+ 1G network switch (1u) -- What do you guys use for this?
+ 2 cores 8G 1u machine -- (storage server)
+ 20Tb external disk storage (check suggestions in the same thread)
+ power strip (1)
+ something else?

Software

+ Your favorite linux distro on the machines
+ GApipeline
+ NFS
+ SAMBA (storage server)
+ ...

The GAII windows box dumps data via samba in the external storage.
You could compute the GERALD in the external storage or move the necessary
bits (lane by lane) to the local 1T disks (if you want to speed things up).
Disable ELAND and get only the GApipeline stats. Compute the alignments with BWA.

Once you have this up and running, explore installing and setting up a job scheduler:
SGE seems the favorite one out there (PBS is good too).
__________________
-drd
drio is offline   Reply With Quote
Old 12-13-2010, 10:21 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

All respect to Drio, however I want to reiterate one of his sentences while changing it to reemphasize my point:

Quote:
Originally Posted by drio View Post
I seem you have some experience with unix. I'd suggest you take care of the project. Yes, it is going to be more work for you but you will have full root access to the hardware. If there is any problem, you can blame yourself.
Taking his last sentence, I will re-write it a bit: "If there is any problem, you can spend hours fixing it by yourself".

There is no doubt that there is a lot of fun and a lot of control in building and running your own systems. And you can learn a lot of computer-geekiness that will help out in the future. But if you have competent IT staff that listen to your needs and are responsive to fixing problems, then by all means let them do the work and handle the headaches. Concentrate on what you are good at -- NGS analysis.

There are so many computer-related tasks that we no longer do ourselves -- e.g., running cat-5 or fiber wiring throughout our building; the network routing to get our machines out to the internet; mail and web servers -- why should setting up a compute and disk cluster be any different? Why should we who focus on bioinformatics do the grunt work? Instead let the seriously knowledgeable people spend the time to make everything run smoothly. If you want the experience of networking, building machines, and setting up servers then doing this at home is a low cost and low pressure way of learning -- no one will be yelling at you in the middle of the night to get the cluster back up and running.

On the other hand if your supervisor is saying "let's set up our own cluster because we can do it better than they can and, by the way, here is the $$$$ to build it properly", well then, just take the money and have fun. Just don't expect to get much sleep.


Going back to Drio's specifications. I'd recommend two 8-core but 64+ GB machines over four lower memory machines. A program can always run longer if there is not enough compute power but rarely can it be made to run at all if there is not enough memory.

The 48U cabinet seems like overkill to me. I'd stick with 24U although, as usual, if money is not object then why not 48U? I'd just spend the $$$ elsewhere. Drio's recommendation is has about 12Us taken up. If you do start expansion then you can always buy another rack.

A proper server will have OOB (out-of-band) capability. For this you should hook the OOB to a separate switch in order to be redundant. Doesn't need to be 1G.

Carefully spec out the UPS (what Drio is calling the APC -- brand name there) so that it will cover your power requirements. Give yourself lots of extra capability.

HP, Dell,IBM. Doesn't matter much if you get their server lines. Equally important is to get up-front and paid for a 3 or better yet 5-year service contract. That way you will not have to worry about failures. Plan on a 5-year time to obsolescent for your cluster.

Heating, cooling, power outlet and noise. While a 24U or 48U will easily fit in a lab, the heat given off may surprise you. Putting this into a back corner of the lab may not work. Be prepared. Check to make sure that you have an adequate and dedicated power outlet(s). Multiple smaller racks can help out on the heating/cooling issue since, in theory, you can spread the racks around the lab.
westerman is offline   Reply With Quote
Old 12-13-2010, 12:08 PM   #8
drio
Senior Member
 
Location: 41°17'49"N / 2°4'42"E

Join Date: Oct 2008
Posts: 323
Default

Good discussion.

I see your point, and it makes sense. But, since this was a single GAII machine and jdjax seems
to have some unix and systems skills this would be a great opportunity for expanding his knowledge while keeping full control of the different elements of the pipeline (informatics only of course).

Sorry about the APC, that was the brand of the last UPC I used.

Can you elaborate more on your network switches? For what he is doing typical 1G would be
fine but I'd like to know what people uses when there are more sequencers.
__________________
-drd
drio is offline   Reply With Quote
Old 12-13-2010, 12:59 PM   #9
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by drio View Post
Can you elaborate more on your network switches? For what he is doing typical 1G would be
fine but I'd like to know what people uses when there are more sequencers.
On the switches? I haven't paid much attention. They are pretty generic. The 454, which does not have a built-in cluster, is plugged into a linksys 1Gb which then goes into the wall. The SOLiD (in a different room), which does have the vendor-supplied cluster, has some sort of switch for the cluster. Probably the cluster switch is a Dell-rebranded one since the cluster is made of Dell computers. I just plugged the cable into the switch without looking hard at it.


The "wall" (Purdue provided networking) is probably Cisco intra- and inter-building.

Our compute cluster -- which we built some time ago from bits and pieces -- has a 48 port TrendNet TEG-448WS 1Gb switch for the main traffic and a D-Link DGS-1016D for the OOB traffic.

Our purchased-by-us-but-run-by-Purdue-It shared cluster has 10Gb switches. Useful for flinging around the data from a SOLiD run especially to a 'bluearc' storage system. 10 Gb is less useful for the 454 runs. At one time I knew the brand name of the 10Gb switch (Purdue advertised it as the "first academic 10GB cluster" and gave out T-shirts with the vendors on it to those of us who helped build the cluster however my shirt gave up the ghost a while back.)


A 1 GB switch will be good enough for jdjax's project. I doubt if 10Gb would be useful. I also think that almost any brand-name switch will work.
westerman is offline   Reply With Quote
Old 12-13-2010, 11:45 PM   #10
jdjax
Member
 
Location: Denmark

Join Date: Dec 2010
Posts: 23
Default

Thank you for all your input. There is a cold server room in the basement where the server will be placed. Money is not too much of a problem we have about $60,000 to spend on this, including service packages.

The IT guys only have experience with Windows OS, so they will not be able to help me with the Linux/Unix OS. Also the majority of the network and other servers are dells so I am not sure how helpful the IT guys will be. IT did mention that he can take care of all the hardware but the software I will have to take care of myself.

I have basic knowledge of Solaris and I have worked with Ubuntu on my desktop. I am assuming that Ubuntu sever OS is will be similar. So to save me from future headaches I can just go with Ubuntu.

Most of my experience is with software, the hardware is the one portion of the system I am lacking. Thank you for all your suggestions, I have been spending a lot of time 'Googleing' all the terms listed so I have a better idea what you are referring to.

If you have any more suggestions or comments, please post them. Thanks again.
__________________
jdjax
Ph.d. Student
Ċarhus University
jdjax is offline   Reply With Quote
Old 12-14-2010, 04:22 AM   #11
jdjax
Member
 
Location: Denmark

Join Date: Dec 2010
Posts: 23
Default

UMM..... I would appreciate your help again.

The IT guy wants to have Red Hat Linux or SUSE Linux on the server, BUT from looking over this section of the forum there seems to be a lot of problems with these OS.

What are your thoughts and concerns!?!

Thanks.

The IT guy and his server specialist that he likes mentioned these specs for the server:

HP DL585G7 6176SE 4P 64GB ICE
*4 - 12 Core AMD CPU
* 2.3 GHz pr. Core
* 104 GB Memory
* 12 MB Cache L3
* 2 stk. 146 GB SAS Drives 15K
* P410 Smart Array 1 GB FBWC
* 4 NIC port 1 Gbit

Thanks again.
__________________
jdjax
Ph.d. Student
Ċarhus University

Last edited by jdjax; 12-14-2010 at 04:26 AM. Reason: adding more information
jdjax is offline   Reply With Quote
Old 12-14-2010, 04:53 AM   #12
drio
Senior Member
 
Location: 41°17'49"N / 2°4'42"E

Join Date: Oct 2008
Posts: 323
Default

Looks good... and expensive?

The drives are pretty small. Not that it will be a problem for what you are trying to do right now but having a local scratching are can be very useful.

On the other hand, with 104G, you will have 52G of ramdisk (/dev/shm). That's very useful too.
__________________
-drd
drio is offline   Reply With Quote
Old 01-14-2011, 06:18 PM   #13
amitra
Junior Member
 
Location: galveston

Join Date: May 2009
Posts: 8
Default

Just wanted some comments on the RAID array created using Solid State Disks.

I got around 547 MegaBytes/s consistent sequential read throughputs with two Crucial C300 SSDs in RAID0
amitra is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO