SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
suggestion for hardware configuration for NGS analysis sunsnow86 Bioinformatics 3 04-24-2015 08:04 AM
Hardware requirement for bacterial NGS analysis chariko Bioinformatics 7 11-19-2013 04:16 AM
PubMed: SwiftLink: Parallel MCMC linkage analysis utilising multicore CPU and GPU. Newsbot! Literature Watch 0 12-16-2012 03:51 AM
Computer Hardware: CPU vs. Memory DZhang Bioinformatics 16 09-22-2010 05:52 AM

Reply
 
Thread Tools
Old 01-02-2015, 10:39 AM   #1
eb0906
Junior Member
 
Location: North Dakota

Join Date: Jan 2015
Posts: 3
Question Hardware for NGS analysis - GPU vs CPU?

Hi all,

Our small core lab purchased two Dell Precision T7610 Tower Workstations equipped with 1 Intel Xeon E5-2687W v2 Eight-core 3.4 GHz Turbo, 25 MB processor, 64 GB 1866MHz DDR3 RAM, 1GB NVIDIA Quadro K600 Video card, 256 GB Solid-state drive and two 1TB SATA drives, DVD-RW drive, 10Gb Network adapter, and an Nvidia Tesla K20C Computer Processor.

I am a novice user, but some initial thoughts I have are:

1) Do we have enough RAM to support multiple (2-3) RNA-seq analyses? For example, alignments, mapping, differential expression analysis, etc.

2) Do we need an additional CPU? (Assuming we will be analyzing at least 2 RNA-seq experiments at any given time and will have additional users (2-3) logged on and trying to analyze their own data.)

3) It is my understanding that the greatest limiting factor in computational requirements for NGS analysis is I/O. At this point, is there any advantage to having a GPU versus CPU when it comes to NGS analysis?
eb0906 is offline   Reply With Quote
Old 01-02-2015, 11:36 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

Quote:
Originally Posted by eb0906 View Post
Hi all,

Our small core lab purchased two Dell Precision T7610 Tower Workstations equipped with 1 Intel Xeon E5-2687W v2 Eight-core 3.4 GHz Turbo, 25 MB processor, 64 GB 1866MHz DDR3 RAM, 1GB NVIDIA Quadro K600 Video card, 256 GB Solid-state drive and two 1TB SATA drives, DVD-RW drive, 10Gb Network adapter, and an Nvidia Tesla K20C Computer Processor.

I am a novice user, but some initial thoughts I have are:

1) Do we have enough RAM to support multiple (2-3) RNA-seq analyses? For example, alignments, mapping, differential expression analysis, etc.

2) Do we need an additional CPU? (Assuming we will be analyzing at least 2 RNA-seq experiments at any given time and will have additional users (2-3) logged on and trying to analyze their own data.)

3) It is my understanding that the greatest limiting factor in computational requirements for NGS analysis is I/O. At this point, is there any advantage to having a GPU versus CPU when it comes to NGS analysis?
It is tricky to provide meaningful answers for these kind of questions since the actual workflow will vary from time to time plus it is hard for outsiders to completely understand how your lab/users operate on a daily basis.

But here goes.

#1. Probably. Depending on memory usage you may have to limit number of jobs that can be running at a given time. If you work with small genomes it may not be a big problem.

#2. If you do get an additional CPU you should look into getting more RAM (hopefully the RAM slots are not maxed out otherwise you will need to discard some memory sticks to get higher capacity ones), at least for one of the two machines. 2 x 1 TB is not much storage (hopefully you have other storage available over the network). It is not going to be enough to support multiple users.

#3. At this time there is likely no practical benefit in your case to worry about GPU computing.
GenoMax is offline   Reply With Quote
Old 01-02-2015, 12:09 PM   #3
eb0906
Junior Member
 
Location: North Dakota

Join Date: Jan 2015
Posts: 3
Default

Thanks, Genomax!

You are right; it is hard to anticipate workflows.

1) It's interesting that you mention we probably have enough RAM. Currently, one of my colleagues is running cuffdiff on 16 c.elegans samples (15M reads/sample), and it looks like it's stalling at the 'Processing Loci' step with 98% of the memory in use. Is this typical? This is our first time using these workstations for RNA-seq analysis, so we are not sure what to expect with processing time.

2) I agree, and yes, we do have additional server space, 20 TB local and 110 TB on the network.
eb0906 is offline   Reply With Quote
Old 01-02-2015, 12:36 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

Quote:
Originally Posted by eb0906 View Post
1) It's interesting that you mention we probably have enough RAM. Currently, one of my colleagues is running cuffdiff on 16 c.elegans samples (15M reads/sample), and it looks like it's stalling at the 'Processing Loci' step with 98% of the memory in use. Is this typical? This is our first time using these workstations for RNA-seq analysis, so we are not sure what to expect with processing time.
Is there anything else running on the system (what OS are you running BTW)? On a single server (without a job queuing system) you (or a sys admin) is going to have to keep an eye on things since resource constrained jobs would slow everything to a crawl or at the worst case lead to a hung/non-responsive server.

With newer UNIX/Linux distros just looking at free memory (in top or a similar tool) in not enough. The OS normally caches RAM and will use it in most efficient way as needed. If system starts using a large amount of swap space (how much swap is configured on your machines) then there may be a problem. Have you looked at the swap usage?
GenoMax is offline   Reply With Quote
Old 01-02-2015, 01:35 PM   #5
eb0906
Junior Member
 
Location: North Dakota

Join Date: Jan 2015
Posts: 3
Default

The OS is Red Hat Enterprise and it's a single server with no job queuing system (as far as I know as I have not personally run anything yet).

This is what my colleague sent for the current run:
top - 15:26:10 up 3 days, 5:09, 7 users, load average: 19.31, 19.18, 19.27
Tasks: 392 total, 4 running, 388 sleeping, 0 stopped, 0 zombie
Cpu(s): 51.7%us, 26.1%sy, 0.0%ni, 22.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65919692k total, 65459552k used, 460140k free, 5228k buffer PID USER PR NI VIRT
Swap: 25001980k total, 18036092k used, 6965888k free, 292000k cached
RES SHR S %CPU %MEM TIME+ COMMAND
5673 usr 20 0 78.8g 60g 1960 S 1016.5 96.9 4879:22 cuffdiff

Is the above helpful? This is all new to me.
eb0906 is offline   Reply With Quote
Old 01-02-2015, 04:41 PM   #6
cmbetts
Senior Member
 
Location: Bay Area

Join Date: Jun 2012
Posts: 118
Default

Quote:
Originally Posted by eb0906 View Post
The OS is Red Hat Enterprise and it's a single server with no job queuing system (as far as I know as I have not personally run anything yet).

This is what my colleague sent for the current run:
top - 15:26:10 up 3 days, 5:09, 7 users, load average: 19.31, 19.18, 19.27
Tasks: 392 total, 4 running, 388 sleeping, 0 stopped, 0 zombie
Cpu(s): 51.7%us, 26.1%sy, 0.0%ni, 22.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65919692k total, 65459552k used, 460140k free, 5228k buffer PID USER PR NI VIRT
Swap: 25001980k total, 18036092k used, 6965888k free, 292000k cached
RES SHR S %CPU %MEM TIME+ COMMAND
5673 usr 20 0 78.8g 60g 1960 S 1016.5 96.9 4879:22 cuffdiff

Is the above helpful? This is all new to me.
You might need to mask rRNA and other abundant RNA species. I've had similar issues with cufflinks hanging at this step when processing human RNA-Seq data on a very similarly built workstation. Building a GTF of rRNA from the UCSC repeatmasker table to use with the -M flag fixed it right up for me. I couldn't find the original thread where I found the solution, but this one seems pretty similar

http://seqanswers.com/forums/showthread.php?t=12458
cmbetts is offline   Reply With Quote
Old 01-02-2015, 04:44 PM   #7
cmbetts
Senior Member
 
Location: Bay Area

Join Date: Jun 2012
Posts: 118
Default

Of course google finds it for me right after I posted. I'm not sure if it's a STAR specific issue, but I was using it as my aligner when I ran into the problem and found the solution on their message boards
https://groups.google.com/forum/#!ms...U/RXnlXBr5oHYJ
cmbetts is offline   Reply With Quote
Reply

Tags
cpu requirement, gpu, ngs data analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO