How I've experienced changes in HPC…

September 23rd, 2010 12:21 pm
Posted by Christopher Heller
Tags: , , , , , , , ,

Since this is my first article in a series of blogs I intend to write about HPC and Intel® Cluster Ready (ICR), I’ll start with my story to provide you a bit more about my background, and a little history of how Intel Cluster Ready has changed HPC for me.

How it started and evolved

Coming into HPC in 2005 was a great learning experience for me, as most of my former technical experience had been networking related. This included a position where I utilized WAP related technologies in the early "Wild West" days of using 802.11 wireless networking to span great distances, providing cheap broadband to rural commercial interests to save them the costs of ISDNs and even more cost-prohibitive T1s.

So, I started in 2005 by working for a year with one of the Intel HPC teams as a contractor, before officially joining Intel in 2006. Among my first tasks at Intel was to build a cluster. At that time this was accomplished by going through a how-to "recipe" document which contained a great many steps in order to manually set up the head (master) node, and then clone that node with “dd” (which was  slow) or kick-start (much better) and then continue following the steps to create a cluster "by hand." This method did not scale well beyond ~8 nodes as there was much room for operator error.

The next "evolution" I experienced was utilizing some home-grown setup scripts from a predecessor and writing some of my own. These were primitive setup scripts that left a lot to be desired, however, easier than the “by hand” method. Among these scripts was a relatively simple cluster health check called "cluster-checker.sh." If given a correct hostfile, this script would go through and check the compute nodes for simple problems such as ping, rsh/ssh connectivity, and a few others.

Another useful find was OSCAR - Open Source Cluster Application Resources. What was really neat about OSCAR was that the developers had engineered a setup framework that greatly simplified the setup of a cluster, in most cases. This seemed like absolute luxury at the time. After I joined Intel, I wanted to be a part of this group that had helped make my job easier. (My later experience with NPACI ROCKS was similar as it was also an easier way to set up and configure a cluster.)

Developing Intel Cluster Ready

The research and development work associated with the Intel® Cluster Checker tool was an exciting time! We had to ensure that:

  • The tool enforced the Intel Cluster Ready Specification with the – compliance flag.
  • The wellness checks covered a wide range of functions and could execute in acceptable times.

This required making some tests fairly generic to apply to a large range of hardware/OS/clusterware solutions. The process also included a campaign to encourage ISVs to register their applications as Intel Cluster Ready, to verify that their applications will run successfully on any certified Intel Cluster Ready system. These registrations added critical libraries and binaries to be placed into the software images that allowed the registered applications to “just run.”

Making it easier

With all of these changes, the process of building out and troubleshooting an HPC cluster became significantly easier. These processes encouraged the further enhancement of existing recipes we had developed over the years to become “Intel Cluster Ready Reference recipes.” The development of these recipes was key in helping to better understand what it meant for the software stack to truly be Intel Cluster Ready compliant, and search out the best ways to facilitate the automation of these recipes working with the commercial provisioning vendors.

We're proud to say that since we began this endeavor just 3 years ago, we have a number of provisioning vendors that offer ICR compliant clusterware solutions to OEM's to sell to their partners. Also, equally important, we are very proud of our OEM and system partners who have modified their own clusterware stacks to align with Intel Cluster Ready. They are the ones who primarily benefit from Intel Cluster Checker, for many, it has become a critical tool from the engineering of a cluster recipe (software stack), to checking that stack in manufacturing. This ensures that the customer gets a certified Intel Cluster Ready system, and the customer can use the Intel Cluster Checker tool to test the cluster wellness, or diagnose issues with their cluster system. Simplifying the entire process.

This goes full circle when the customer calls the OEM or provisioning vendor for support and poses the question, “Is your system Intel Cluster Ready?” If so, the process becomes much easier with a tool that can establish the functionality/wellness of a cluster, including diagnosing many common problems we see when dealing with clusters on the system side.

We have seen great success in working with the provisioning vendors and OEMs regarding compliance to Intel Cluster Ready. This is in no small part due to the features and functionality provided by Intel Cluster Checker. While its primary purpose lies in enforcing the Intel Cluster Ready specification, it has expanded into a tool with over 100 various tests that deal with consistency, performance, and correct configurations. With the auto-configuration tool, it can easily handle heterogeneous clusters, perform single node and network performance tests, and much more.

It has changed HPC for me, are you ready for it to change HPC for you?

Learn more about Intel Cluster Ready, and check back for future blogs on related HPC topics!

JOIN THE CONVERSATION


You must be a Registered Member in order to comment on Cluster Connection posts.

Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.

Login     Register Now


Author Info
Christopher Heller


Christopher Heller is an HPC solutions engineer for volume High Performance Compute clusters in the Software and Services Group at Intel. He has been with the Intel® Cluster Ready program since 2007. Christopher served for several years on the OSCAR core team and supports many Intel Cluster Ready partners in the ecosystem. Christopher joined Intel in June 2006.