June 12th, 2009 3:27 pm
Posted by Brock Taylor
Tags: HPC, Intel Cluster Ready, updates
When does the software running on a system become long in the tooth (or too old in case you aren't familiar with the phrase)? This question gets amplified for clusters because there are more driver instances and more components involved in the total solution than with just a single server or workstation. Network drivers can play a more vital role in cluster performance, and some systems will include additional drivers for high performance fabrics. Clusters commonly have provisioning middleware, system monitoring utilities, and job management software as well. From the moment the system software is installed, the clock begins to tick on the age of its components, and it usually is not too long before at least one component in the software stack updates to a newer version. With any system, when to update is an important question and one that requires consideration of many factors like system stability and correctness. So, when does the availability of a newer version of a component warrant an update to a cluster's software stack?
I see that 'when and why' to make updates to the software stack really varies from cluster to cluster. For systems designed to squeeze out every ounce of performance, updates are probably applied frequently to roll in the latest and greatest driver improvements. For instance, if a NIC driver update decreases small message latency by 10%, there's likely to be great interest in deploying that new driver on cluster running applications that are sensitive to the latency of the interconnect. Clusters meant for maximum up time probably lock down the software for longer periods of time favoring stability over incremental improvements. Intel Cluster Ready solutions might favor this approach as the solution provides the verification of functionality and performance at deployment time, and it provides the means to monitor that the cluster remains in a working and stable state. If the software has not been updated, it should still function the same as it did at deployment time. Security is yet another concern or reason for updating software. Clusters that are exposed to the "outside world" may need to stay up to date with security patches for the operating system, and we know that security patches can update frequently.
The latest and greatest approach or frequent update model, however, may present bigger hurdles and challenges to the cluster user who is not necessarily an expert on the systems side. The question is not, "can software be updated after the initial deployment," but rather, "is it easy enough to enable all users or administrators to make those updates?" Updating software on a cluster requires following the mechanisms used to build the cluster in the first place. Adding or updating software is properly done through the provisioning middleware (assuming the cluster was built using provisioning middleware, of course), but will non-experts know how to do this? Furthermore, adding or changing parts of the software stack can affect the compliance to Intel Cluster Ready (ICR). If a user breaks the compliance the result is likely application failures, support calls, and frustration. Cluster experts may jump on updates immediately, but updates also need to be simple and undaunting for non-experts.
So, at some point, the answer to the question, "is my software too old?" is yes. My next post will look at the task of updating an Intel Cluster Ready solution after its initial deployment and highlight important steps in that process.
JOIN THE CONVERSATION
You must be a Registered Member in order to comment on Cluster Connection posts.
Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.
Login Register Now