May 26th, 2009 9:21 am
Posted by Gary Tyreman
Tags: cluster management, improvement, innovation, Intel Cluster Ready, operations, Univa UD
In this space there are a few truths. One is that cluster software and hardware do not remain static. Hardware fails, software is updated and the cluster therefore changes. Beyond provisioning and monitoring, required functionality includes automated package management, improved configuration management and a cluster-oriented operational toolset. (I’ll post additional thoughts in the future about package and configuration management.)
Over the past six years I have discussed management aspects of clusters with a broad cross-section of cluster administrators (large and small), each with their own experiences and views. It was this feedback and perspective that inspired my discussions with Intel and Dell about ISV enablement that ultimately paved the way for Intel Cluster Ready in 2007.
The purpose of the Intel Cluster Ready program was essentially to “level the playing field” for three constituents:
• ISVs who had to deal with too many incompatible stacks,
• End-users who faced too many vendor-specific products,
• The HPC ecosystem that wasted time solving the same problem again and again.
In this post I wanted to highlight what I consider “circular irony”: managing a cluster requires cluster ‘management’. That is, cluster management is more than provisioning, scheduling and packaging some pre-existing tools, and although it seems like middleware vendors don’t seem to get that, there is hope.
It has always struck me as ironic that the various cluster management packages available today rely on a somewhat narrow set of tools that are positioned to comprise all of cluster “management”. Missing, are tools for the operators to actually do something, or affect changes to the cluster and I don’t mean monitoring or staring at consoles. It is, after all, a fact that operating costs typically exceed the capital outlay over the life of the cluster. Sites are generally left to create their own tools or to muddle along forgoing the “network effect” of productized systems software.
Increasing Efficiency of Strategic Assets
High Performance Technical Computing (HPTC) has become linked to an organization’s value chain. Clusters are the dominant architecture in HPTC and these environments have become a strategic element of the product or service, often underpinning an organization’s competitive advantage. For such a strategic asset, one could make a very strong case for the development of a greater amount of efficiency and value!
And Intel Cluster Ready is helping turn that hope into reality.
A key benefit of the Intel Cluster Ready program is that it has allowed Univa engineers to focus on the countless yet-to-be-solved software aspects of the management and operations of a cluster by providing a replicable baseline from which to work.
Cluster users – engineers, scientists, “quants,” analysts and researchers – are interested in the outcome of the science or the software aspects of the cluster. An admin or operator of a cluster is primarily tasked with supporting that: maintaining uptime and handling hardware and software changes as required in support of system use. This involves adding, removing and updating software, installing security patches and replacing failed hardware.
It’s in this stratum that cluster systems software has the greatest room for improvement and thanks to ICR, Univa is able to.
After all, it’s about time.
JOIN THE CONVERSATION
You must be a Registered Member in order to comment on Cluster Connection posts.
Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.
Login Register Now