Seqanswers Leaderboard Ad

**westerman** · 08-28-2013, 10:27 AM

Purdue is building a cluster with Xeons plus the Xeon Phi. Impressive statistics but all for non-bioinformatics programs (if you discount modeling as being bioinformatics). The Phi (for those who do not know) is basically a 1GHz, 60-core, 4-thread-per-core, 8 GB extremely fast memory x386 chip that is optimized for floating point. So if I had a program that could work with 240 threads then the Phi would be great. But I am wracking my brain for a bioinformatics program that could actually use 240 threads without creating I/O bottlenecks. Any ideas?

Oh yes, going back to GenoMax's post, the presenters at the seminar I went to today did say that the same code will run on the 8-core Xeon chip and the 60-core Phi processor.

**GenoMax** · 08-28-2013, 11:07 AM

What kind of storage infrastructure are they planning to use?

We need the new optical interconnects (1.6 terabits/s) promised by Intel now.

**westerman** · 08-28-2013, 11:41 AM

Going into specifics. Each node has two 8-core Xeon with 64 GB memory plus two 60-core Xeon Phi coprocessors. The nodes are connected to a 1.6 PB "Lustre" scratch file space via a QDR Infiniband, 18GB/sec throughput. If I got my information down correctly that 18 GB is not shared with other nodes. And yes, the brochure has the big 'B' implying Bytes instead of bits -- Wikipedia says QDR at 12x is 96 Gbit/sec actual, uni-directional.

So, it would take, oh not that long to load in a hiseq lane's worth of data but then what would the Phi chip do with it? Need software!!

**GenoMax** · 08-28-2013, 01:34 PM

The storage would likely be mounted through an Infiniband switch so an individual node would not see anywhere close to the 18GB/s speed since that bandwidth would be shared.

Did you get any details about how the Xeon Phi processors show up on the node? Are they going to appear to the OS as "16 + 120" cores? If that is not the case then is there some other piece of software that allows the on board Xeon's to talk with the co-processors?

As with the graphics cards we are going to be limited by the bandwidth of the PCI-E bus.

**Wallysb01** · 08-28-2013, 03:27 PM

Originally posted by GenoMax View Post

As with the graphics cards we are going to be limited by the bandwidth of the PCI-E bus.

PCIe v3 x16 is 15.75 GB/s. For reference Sandy Bridge QPI is 64 GB/s. So it is still pretty damn good. However, I still don't think there is much bioinformatically that will make great use of 60 slow cores (and very low RAM) vs 16 fast cores (and potentially lots of RAM). Even things like blat require about 2GB per process. Bowtie or BWA would probably hit the GB/s bandwidth limitation before soaking 60 cores, but I might be wrong about that.

**mcnelson.phd** · 08-28-2013, 04:32 PM

This mostly seems like Intel's answer to the Tesla series from NVIDIA that have been out for a while now, only you won't need to tweak your code as much. I know our engineering department has discussed installing a few of these in our cluster, but they already have a bunch of nodes that include Tesla chips and as far as I know the only people who use them are doing computational modeling.

The Phi seems to be another leap in hardware like the Tesla was, but the software just isn't there yet to fully take advantage. My suspicions are that by the time a compelling piece of software comes out to use that much processing firepower, the sequencing realm will have advanced to the point that it might not matter.

**rhinoceros** · 08-28-2013, 11:21 PM

Wouldn't this be great for things like Blast?

**GenoMax** · 08-29-2013, 03:12 AM

I keep forgetting that Phi is ultimately a co-processor and as such is going to be limited to an extent.

**mcnelson.phd** · 08-29-2013, 04:12 AM

Originally posted by rhinoceros View Post

Wouldn't this be great for things like Blast?

Yes, but Blast and a lot of similar applications (HMM, BLAT, etc.) that would be suited for this chip have already been ported for the Tesla chips, and they still don't get significant usage.

These chips will ultimately find a use in bioinformatics, but I don't expect anything amazing. The problem with bioinformatics isn't really a lack of processing power, but well designed software and algorithms that can easily and readily take advantage of that power to do something better and more useful than previous options.

**westerman** · 09-03-2013, 10:26 AM

Back from an extended vacation weekend. In response to GenoMax's comment about infiband (IB) speeds this is what I am getting from the technical people. They talk about our old cluster and our new cluster. The major difference between the two is (1) the Phi co-processors and (2) all nodes now have 64 GB memory instead of a mixture of 32 GB and 64 GB.

The old clusters's IB fabric is connected at 56 Gbps, but is oversubscribed at
2:1 - if every node is using every IB port in the old cluster at full rate,
they'll only get 50% of the maximum bandwidth.

The new cluster's peak is only 40 Gbps, but is not oversubscribed - meaning that
every node could fully use its IB connection and still get line rate.

This is on the IB fabric in general including access to the storage space. For access to storage, however, the effective performance will be limited by the performance of the underlying storage. The new cluster's scratch storage space goes up to 20 GB/sec in aggregate.

So ... as always, there is a bottleneck. But it seems like infiband is not it. Rather the storage unit would be the culprit especially if I was hitting it with multiple nodes simultaneously. Or, as some of you mentioned, the PCI bus.

We still need a good bioinformatics use case for the Phi co-processors. I am not sure if Blast is it, though, since it seems like it would be I/O-limited or memory-limited.

Really, what we need to consider is what bioinformatics program is compute-bound? Or a program where we can trade off IO & memory for compute requirements. As mcnelson.phd bioinformatics is generally not limited by cpu power.

**westerman** · 09-04-2013, 05:59 AM

And just to reinforce what the Phi needs in terms of programming, here is a quote from a Dr. Dobb's article.

Succinctly put the single key concept to understand about Intel Xeon Phi is that that the program must express sufficient parallelism and vector capability to achieve high performance. Measurements presented in this tutorial suggest that the application or offload region must use at least 120 concurrent threads of execution.

While I would like to have a reason to use Purdue's new Phi-enabled cluster, I just don't see any bioinformatics program taking advantage of it.

**GenoMax** · 09-04-2013, 06:33 AM

You should still see a nice boost to your normal workload once the new cluster becomes operational.

On a different note if you have not used HiSeq Analysis Software (HAS) before then this could be a good test for your cluster. HAS is designed to basically take over all hardware resources on a node (note: a minimum of 48 GB RAM is needed) and run WGS/Enrichment workflows at maximum speed possible on the hardware in question (HAS will generate tens (perhaps hundreds in your case) of alignment threads automatically).

If your admins are willing to install HAS (currently supplied as an RPM on iCom) and you have access to a flowcell with human samples (ISAAC aligner only has hg19 indexes, I have not tried to build others yet) give it a whirl. We have got it working on LSF though illumina does not officially support it.

PS: I am not sure if the Phi processors can access the RAM on the node (i.e. not the local RAM). If they can't then this may not be worth pursuing.

**westerman** · 09-04-2013, 08:08 AM

Originally posted by GenoMax View Post

You should still see a nice boost to your normal workload once the new cluster becomes operational.

Well, the way clusters work here at Purdue is that we researchers buy portions of them. In other words I will not obtain an increase in compute ability (except for testing purposes) unless we spend money. The price of the nodes is not high but purchasing decisions are, of course, always a balancing decision. The new cluster is 2-3x faster (not considering the Phi co-processor) than what I currently use however it has less memory (64 GB versus 128 GB) therefore it is arguably not a good purchase especially since a lot of my work seems to be memory limited.

If I could make the argument that the new cluster would increase compute speeds by a significant amount via using the Phi then the purchasing decision would be much easier. As it is we will probably wait until 2014 when we hope that Central IT (the people running the clusters) decide that a slower yet large memory cluster would be a Good Thing To Have. Of course who gets into the "let's have some bragging rights" supercomputing top 100 list that way. :-(

Another way to get into the new cluster would be to create a internal grant proposal to port some bioinformatics software to the Phi coprocessors. But I am just not getting a good idea of what, if any, software could be profitably ported.

On a different note if you have not used HiSeq Analysis Software (HAS) before then this could be a good test for your cluster. ...

Correct me if I am wrong but it appears that HAS is only good for HG19. With only one human sample coming through our lab in the last 5 or so years I always look for the magical "un-characterized plant and animal genomes supported" sticker on software before getting excited about the software.

PS: I am not sure if the Phi processors can access the RAM on the node (i.e. not the local RAM). If they can't then this may not be worth pursuing.

As far as I can tell the Phi processors can access node memory but only via the "slow" PCI bus. The recommendation is to transfer ~8GB out of 64GB of main memory to the Phi's memory and let the Phi work with that 8 GB before transferring it back to the node's memory.

**GenoMax** · 09-04-2013, 09:23 AM

Originally posted by westerman View Post

Correct me if I am wrong but it appears that HAS is only good for HG19. With only one human sample coming through our lab in the last 5 or so years I always look for the magical "un-characterized plant and animal genomes supported" sticker on software before getting excited about the software.

Isaac (http://bioinformatics.oxfordjournals...ent/29/16/2041) was possibly designed for human data but should be usable for other genomes. Illumina has released pre-built indexes only for Hg19 but indexes can be built for other genomes (I have not tried it yet). Isaac aligner in HAS uses 32-mer seeds.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Intel Xeon Phi co-processors - Supercomputer on a chip

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News