The Irony of Cluster Management

May 26th, 2009 9:21 am
Posted by Gary Tyreman
Tags: , , , , ,

In this space there are a few truths. One is that cluster software and hardware do not remain static. Hardware fails, software is updated and the cluster therefore changes. Beyond provisioning and monitoring, required functionality includes automated package management, improved configuration management and a cluster-oriented operational toolset. (I’ll post additional thoughts in the future about package and configuration management.)

Over the past six years I have discussed management aspects of clusters with a broad cross-section of cluster administrators (large and small), each with their own experiences and views. It was this feedback and perspective that inspired my discussions with Intel and Dell about ISV enablement that ultimately paved the way for Intel Cluster Ready in 2007.

The purpose of the Intel Cluster Ready program was essentially to “level the playing field” for three constituents:

•    ISVs who had to deal with too many incompatible stacks,
•    End-users who faced too many vendor-specific products,
•    The HPC ecosystem that wasted time solving the same problem again and again.

In this post I wanted to highlight what I consider “circular irony”: managing a cluster requires cluster ‘management’. That is, cluster management is more than provisioning, scheduling and packaging some pre-existing tools, and although it seems like middleware vendors don’t seem to get that, there is hope.

It has always struck me as ironic that the various cluster management packages available today rely on a somewhat narrow set of tools that are positioned to comprise all of cluster “management”. Missing, are tools for the operators to actually do something, or affect changes to the cluster and I don’t mean monitoring or staring at consoles. It is, after all, a fact that operating costs typically exceed the capital outlay over the life of the cluster. Sites are generally left to create their own tools or to muddle along forgoing the “network effect” of productized systems software.

Increasing Efficiency of Strategic Assets

High Performance Technical Computing (HPTC) has become linked to an organization’s value chain. Clusters are the dominant architecture in HPTC and these environments have become a strategic element of the product or service, often underpinning an organization’s competitive advantage. For such a strategic asset, one could make a very strong case for the development of a greater amount of efficiency and value!

And Intel Cluster Ready is helping turn that hope into reality.

A key benefit of the Intel Cluster Ready program is that it has allowed Univa engineers to focus on the countless yet-to-be-solved software aspects of the management and operations of a cluster by providing a replicable baseline from which to work.

Cluster users – engineers, scientists, “quants,” analysts and researchers – are interested in the outcome of the science or the software aspects of the cluster. An admin or operator of a cluster is primarily tasked with supporting that: maintaining uptime and handling hardware and software changes as required in support of system use. This involves adding, removing and updating software, installing security patches and replacing failed hardware.

It’s in this stratum that cluster systems software has the greatest room for improvement and thanks to ICR, Univa is able to.

After all, it’s about time.

JOIN THE CONVERSATION


You must be a Registered Member in order to comment on Cluster Connection posts.

Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.

Login     Register Now


Author Info
Gary Tyreman


Gary Tyreman brings more than 20 years of executive software experience to his role as the President and CEO of Univa Corporation. Gary leads corporate development and fundraising activities and is the architect of Univa's data center optimization strategy, which couples the strategic addition of Grid Engine expertise with Univa's innovative and industry-leading integrated cloud computing management products. Gary has established Univa as a top multi-national competitor and has expanded the markets the company serves. Prior to taking the position as CEO, Gary spent three years as Univa's Senior Vice President of Products and Alliances.

At Univa UD, Gary is Vice President and General Manager of the High-Performance Computing Division. In this role he oversees all aspects of the company's HPC business, including strategic planning, engineering, marketing, sales and business development. He also directs the growth of the company's online open source community.

Prior to joining Univa UD in 2008, Gary was Vice President and Business Manager for Platform Computing HPC division. During nearly five years there, he led the company's business planning, innovation and product management efforts while marshaling a team that developed some of the industry's most popular software.

Tyreman was among the first in the industry to recognize the emerging entry-level user in the HPC space and was responsible for developing a vision for how to simplify running applications off the shelf, a key to unlocking value among organizations new to HPC. He worked with Intel Corp. to develop his innovations, which were taken into account when Intel announced the Intel Cluster Ready program last year, making it easier to design, build, sell, program, acquire and deploy clusters built with Intel components.

Prior to his tenure at Platform Computing, Tyreman held a variety of executive positions in product management and marketing in technology growth companies, including Hummingbird, Delano and Itemus.

Gary is actively involved in the standards community and has held key positions in the X Consortium (X.org) and Open Grid Forum.