First Intel® Cluster Ready Certified Xeon® Clusters

April 7th, 2009 12:08 pm
Posted by Arend Dittmer
Tags: , , , , ,

Our first benchmarks with the "real world" applications ANSYS and LS-DYNA show impressive performance numbers for our new Relion servers that are based on Intel Nehalem CPUs. To make it as easy as possible for our HPC customers to take advantage of these performance gains we will also be offering Intel Cluster Ready (ICR) certified Nehalem clusters that will be running Penguin Computing's cluster management solution Scyld ClusterWare.

We at Penguin really like the ICR certification concept. It instills confidence in customers who are unsure as to whether their applications will run on an HPC cluster "out of the box". Customers who may otherwise be deterred by the seeming complexity of Linux clusters will be more likely to purchase a cluster when they see that the cluster has undergone a certification process for all software and hardware components that are required by their applications. The fact that a company like Intel is behind this certification initiative adds a lot of credibility.

While ICR increases the overall demand for Linux clusters the program still allows for differentiation. Scyld ClusterWare for example has a unique single system image architecture. Scyld offers a single process space across all systems in the cluster. It also replaces remote execution services with a processes migration mechanism, obviating the need for remote execution services and ensuring consistency of the application environment. ICR accommodates this functionality and still meets the objective of ensuring interoperability of clusters and applications.

ICR though is still relatively new and I would be lying if I said that there is no room for improvement. One of the key obstacles to rolling it out on a bigger scale has been the issue of variation. Typically no two clusters we ship are identical and while the tool has made great progress accommodating the variance of hardware configurations there are still many manual steps involved that keep us from applying the certification "per default" on all Intel based clusters. The other issue that is inherent to trying to certify against many applications is the fact that each application has unique requirements making the common denominator very broad. The Java runtime (JRE) for example is required by relatively few applications. ICR however does require the installation of JRE on every compute node. An alternative to a single broad certification may be the certification against applications of a certain category or for a specific industry vertical to cut down on cluttering systems with packages that are not needed.

Comments

Comment from skillerne
Time April 29, 2009 at 6:03 am

Great input on areas of improvement for ICR. What can be done by Intel to remove the "manual steps" required per your notes in the blog?

JOIN THE CONVERSATION


You must be a Registered Member in order to comment on Cluster Connection posts.

Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.

Login     Register Now


Author Info


Arend Dittmer is Director of Product Management at Penguin Computing and has over ten years of experience in the field of Linux clustering.