SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
phi x control on the same lane with sample? jasmineja Illumina/Solexa 17 11-30-2011 12:10 PM
ChIP-Seq: ChIP-Array: combinatory analysis of ChIP-seq/chip and microarray gene expre Newsbot! Literature Watch 0 05-19-2011 02:50 AM
3.4GHz Quad-Core Intel Core i7 versus 3.1GHz Quad-Core Intel Core i5 brachysclereid Bioinformatics 2 05-03-2011 05:31 PM
BFAST: Need "Phi X 174" file example dataset CS Student Bioinformatics 10 04-03-2011 11:32 PM

Reply
 
Thread Tools
Old 06-18-2012, 12:06 PM   #1
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default Intel Xeon Phi co-processors - Supercomputer on a chip

Intel announced Xeon Phi co-processors with a claimed teraflop of performance in a PCIe form factor.

http://blogs.intel.com/technology/20...nd-innovation/

Any code that runs on standard xeon is supposed to run on this.
GenoMax is offline   Reply With Quote
Old 08-28-2013, 10:27 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Purdue is building a cluster with Xeons plus the Xeon Phi. Impressive statistics but all for non-bioinformatics programs (if you discount modeling as being bioinformatics). The Phi (for those who do not know) is basically a 1GHz, 60-core, 4-thread-per-core, 8 GB extremely fast memory x386 chip that is optimized for floating point. So if I had a program that could work with 240 threads then the Phi would be great. But I am wracking my brain for a bioinformatics program that could actually use 240 threads without creating I/O bottlenecks. Any ideas?

Oh yes, going back to GenoMax's post, the presenters at the seminar I went to today did say that the same code will run on the 8-core Xeon chip and the 60-core Phi processor.
westerman is offline   Reply With Quote
Old 08-28-2013, 11:07 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

What kind of storage infrastructure are they planning to use?

We need the new optical interconnects (1.6 terabits/s) promised by Intel now.
GenoMax is offline   Reply With Quote
Old 08-28-2013, 11:41 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Going into specifics. Each node has two 8-core Xeon with 64 GB memory plus two 60-core Xeon Phi coprocessors. The nodes are connected to a 1.6 PB "Lustre" scratch file space via a QDR Infiniband, 18GB/sec throughput. If I got my information down correctly that 18 GB is not shared with other nodes. And yes, the brochure has the big 'B' implying Bytes instead of bits -- Wikipedia says QDR at 12x is 96 Gbit/sec actual, uni-directional.

So, it would take, oh not that long to load in a hiseq lane's worth of data but then what would the Phi chip do with it? Need software!!
westerman is offline   Reply With Quote
Old 08-28-2013, 01:34 PM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

The storage would likely be mounted through an Infiniband switch so an individual node would not see anywhere close to the 18GB/s speed since that bandwidth would be shared.

Did you get any details about how the Xeon Phi processors show up on the node? Are they going to appear to the OS as "16 + 120" cores? If that is not the case then is there some other piece of software that allows the on board Xeon's to talk with the co-processors?

As with the graphics cards we are going to be limited by the bandwidth of the PCI-E bus.

Last edited by GenoMax; 08-28-2013 at 01:37 PM.
GenoMax is offline   Reply With Quote
Old 08-28-2013, 03:27 PM   #6
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Quote:
Originally Posted by GenoMax View Post
As with the graphics cards we are going to be limited by the bandwidth of the PCI-E bus.
PCIe v3 x16 is 15.75 GB/s. For reference Sandy Bridge QPI is 64 GB/s. So it is still pretty damn good. However, I still don't think there is much bioinformatically that will make great use of 60 slow cores (and very low RAM) vs 16 fast cores (and potentially lots of RAM). Even things like blat require about 2GB per process. Bowtie or BWA would probably hit the GB/s bandwidth limitation before soaking 60 cores, but I might be wrong about that.
Wallysb01 is offline   Reply With Quote
Old 08-28-2013, 04:32 PM   #7
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

This mostly seems like Intel's answer to the Tesla series from NVIDIA that have been out for a while now, only you won't need to tweak your code as much. I know our engineering department has discussed installing a few of these in our cluster, but they already have a bunch of nodes that include Tesla chips and as far as I know the only people who use them are doing computational modeling.

The Phi seems to be another leap in hardware like the Tesla was, but the software just isn't there yet to fully take advantage. My suspicions are that by the time a compelling piece of software comes out to use that much processing firepower, the sequencing realm will have advanced to the point that it might not matter.
mcnelson.phd is offline   Reply With Quote
Old 08-28-2013, 11:21 PM   #8
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

Wouldn't this be great for things like Blast?
rhinoceros is offline   Reply With Quote
Old 08-29-2013, 03:12 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

I keep forgetting that Phi is ultimately a co-processor and as such is going to be limited to an extent.
GenoMax is offline   Reply With Quote
Old 08-29-2013, 04:12 AM   #10
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Quote:
Originally Posted by rhinoceros View Post
Wouldn't this be great for things like Blast?
Yes, but Blast and a lot of similar applications (HMM, BLAT, etc.) that would be suited for this chip have already been ported for the Tesla chips, and they still don't get significant usage.

These chips will ultimately find a use in bioinformatics, but I don't expect anything amazing. The problem with bioinformatics isn't really a lack of processing power, but well designed software and algorithms that can easily and readily take advantage of that power to do something better and more useful than previous options.
mcnelson.phd is offline   Reply With Quote
Old 09-03-2013, 10:26 AM   #11
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Back from an extended vacation weekend. In response to GenoMax's comment about infiband (IB) speeds this is what I am getting from the technical people. They talk about our old cluster and our new cluster. The major difference between the two is (1) the Phi co-processors and (2) all nodes now have 64 GB memory instead of a mixture of 32 GB and 64 GB.

Quote:
The old clusters's IB fabric is connected at 56 Gbps, but is oversubscribed at
2:1 - if every node is using every IB port in the old cluster at full rate,
they'll only get 50% of the maximum bandwidth.

The new cluster's peak is only 40 Gbps, but is not oversubscribed - meaning that
every node could fully use its IB connection and still get line rate.

This is on the IB fabric in general including access to the storage space. For access to storage, however, the effective performance will be limited by the performance of the underlying storage. The new cluster's scratch storage space goes up to 20 GB/sec in aggregate.
So ... as always, there is a bottleneck. But it seems like infiband is not it. Rather the storage unit would be the culprit especially if I was hitting it with multiple nodes simultaneously. Or, as some of you mentioned, the PCI bus.

We still need a good bioinformatics use case for the Phi co-processors. I am not sure if Blast is it, though, since it seems like it would be I/O-limited or memory-limited.

Really, what we need to consider is what bioinformatics program is compute-bound? Or a program where we can trade off IO & memory for compute requirements. As mcnelson.phd bioinformatics is generally not limited by cpu power.
westerman is offline   Reply With Quote
Old 09-04-2013, 05:59 AM   #12
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

And just to reinforce what the Phi needs in terms of programming, here is a quote from a Dr. Dobb's article.
Quote:
Succinctly put the single key concept to understand about Intel Xeon Phi is that that the program must express sufficient parallelism and vector capability to achieve high performance. Measurements presented in this tutorial suggest that the application or offload region must use at least 120 concurrent threads of execution.
While I would like to have a reason to use Purdue's new Phi-enabled cluster, I just don't see any bioinformatics program taking advantage of it.
westerman is offline   Reply With Quote
Old 09-04-2013, 06:33 AM   #13
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

You should still see a nice boost to your normal workload once the new cluster becomes operational.

On a different note if you have not used HiSeq Analysis Software (HAS) before then this could be a good test for your cluster. HAS is designed to basically take over all hardware resources on a node (note: a minimum of 48 GB RAM is needed) and run WGS/Enrichment workflows at maximum speed possible on the hardware in question (HAS will generate tens (perhaps hundreds in your case) of alignment threads automatically).

If your admins are willing to install HAS (currently supplied as an RPM on iCom) and you have access to a flowcell with human samples (ISAAC aligner only has hg19 indexes, I have not tried to build others yet) give it a whirl. We have got it working on LSF though illumina does not officially support it.

PS: I am not sure if the Phi processors can access the RAM on the node (i.e. not the local RAM). If they can't then this may not be worth pursuing.

Last edited by GenoMax; 09-04-2013 at 06:44 AM.
GenoMax is offline   Reply With Quote
Old 09-04-2013, 08:08 AM   #14
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by GenoMax View Post
You should still see a nice boost to your normal workload once the new cluster becomes operational.
Well, the way clusters work here at Purdue is that we researchers buy portions of them. In other words I will not obtain an increase in compute ability (except for testing purposes) unless we spend money. The price of the nodes is not high but purchasing decisions are, of course, always a balancing decision. The new cluster is 2-3x faster (not considering the Phi co-processor) than what I currently use however it has less memory (64 GB versus 128 GB) therefore it is arguably not a good purchase especially since a lot of my work seems to be memory limited.

If I could make the argument that the new cluster would increase compute speeds by a significant amount via using the Phi then the purchasing decision would be much easier. As it is we will probably wait until 2014 when we hope that Central IT (the people running the clusters) decide that a slower yet large memory cluster would be a Good Thing To Have. Of course who gets into the "let's have some bragging rights" supercomputing top 100 list that way. :-(


Another way to get into the new cluster would be to create a internal grant proposal to port some bioinformatics software to the Phi coprocessors. But I am just not getting a good idea of what, if any, software could be profitably ported.


Quote:
On a different note if you have not used HiSeq Analysis Software (HAS) before then this could be a good test for your cluster. ...
Correct me if I am wrong but it appears that HAS is only good for HG19. With only one human sample coming through our lab in the last 5 or so years I always look for the magical "un-characterized plant and animal genomes supported" sticker on software before getting excited about the software.

Quote:
PS: I am not sure if the Phi processors can access the RAM on the node (i.e. not the local RAM). If they can't then this may not be worth pursuing.
As far as I can tell the Phi processors can access node memory but only via the "slow" PCI bus. The recommendation is to transfer ~8GB out of 64GB of main memory to the Phi's memory and let the Phi work with that 8 GB before transferring it back to the node's memory.
westerman is offline   Reply With Quote
Old 09-04-2013, 09:23 AM   #15
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

Quote:
Originally Posted by westerman View Post
Correct me if I am wrong but it appears that HAS is only good for HG19. With only one human sample coming through our lab in the last 5 or so years I always look for the magical "un-characterized plant and animal genomes supported" sticker on software before getting excited about the software.
Isaac (http://bioinformatics.oxfordjournals...ent/29/16/2041) was possibly designed for human data but should be usable for other genomes. Illumina has released pre-built indexes only for Hg19 but indexes can be built for other genomes (I have not tried it yet). Isaac aligner in HAS uses 32-mer seeds.
GenoMax is offline   Reply With Quote
Old 09-04-2013, 10:00 PM   #16
Dario1984
Senior Member
 
Location: Sydney, Australia

Join Date: Jun 2011
Posts: 166
Default

Quote:
Originally Posted by GenoMax View Post
Isaac (http://bioinformatics.oxfordjournals...ent/29/16/2041) was possibly designed for human data but should be usable for other genomes. Illumina has released pre-built indexes only for Hg19 but indexes can be built for other genomes (I have not tried it yet). Isaac aligner in HAS uses 32-mer seeds.
Another PhD student discovered it doesn't work for the sheep genome, which has | in the name of chromosomes.

e.g.

>gi|406684590|gb|CM001582.1| Ovis aries breed Texel chromosome 1, whole genome shotgun sequence
ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCC ...

I don't see the point of variant calling for a draft genome, though.
Dario1984 is offline   Reply With Quote
Old 06-03-2015, 10:17 PM   #17
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

So is anyone using Phi in bioinformatics these days ? We do have a great number of compute bound processes (blastx, blastx) in a productive pipeline which could be sped up.

I do wonder about the utility for typical IO and memory bound bioinformatics applications such as NGS though.
colindaven is offline   Reply With Quote
Old 06-04-2015, 06:31 AM   #18
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by colindaven View Post
So is anyone using Phi in bioinformatics these days ? We do have a great number of compute bound processes (blastx, blastx) in a productive pipeline which could be sped up.

I do wonder about the utility for typical IO and memory bound bioinformatics applications such as NGS though.
I haven't found a good use for the Phi co-processor. As you imply -- IO and memory bound problems are not a good match.
westerman is offline   Reply With Quote
Reply

Tags
hpc, intel phi co-processors, parallel apps

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:56 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO