Capacity vs Capability Clusters

June 16th, 2009 9:35 am
Posted by Douglas Eadline
Tags: , , , ,

Does your HPC cluster need 10,000 (or more!) cores? Probably not.

Everyone in the high-performance computing industry watches the Top500 List. Twice a year the worlds fastest computers (mostly clusters) are ranked by how well they run a very large benchmark program.

The Top500 List is an interesting competition that measures a great deal of computing muscle but it also helps track the history HPC systems; the type and number of processors, operating systems, amounts of memory, and system architecture are all detailed for each system on the list.

The ranking extends back to 1993 when the list began. As a mater of fact x86 clusters are have only recently jointed the list. The fastest x86 cluster in November 2008 used 51,200 cores to run the benchmark.

While these levels of computing are heroic, they actually don't reflect how most HPC systems are constructed.

So, how many cores does an average HPC program use? It depends on who you ask, but there seem to be three distinct types of cluster systems:

  • Small (64 processor cores or less)
  • Large (over 64 cores but below 10,000 cores)
  • Staggering (over 10,000 cores)

While 10,000 cores is an exciting number to visualize, clusters that use the smallest number of cores actually represent the largest segment of the HPC market.

In general, users that run smaller programs often share a cluster with other users. Conversely, the mammoth programs that use thousands of cores often consume every core in the cluster. To differentiate between these two types of usage, clusters are often classified as either a Capacity or Capability system.

Capacity clusters are the most common and are used to deliver a certain amount of "computing capacity" to the end users. For instance, a capacity cluster may support hundreds of users running any number of programs. These programs require a number of cores much less than the total number of cores in a cluster. In these types of clusters, all the compute resources are managed by a job scheduler which determines which programs (jobs) run on the cluster.

A Capability cluster is designed to handle (or be capable) of running large groundbreaking programs that were previously not possible to run. These systems usually push the limits of cluster technology because large numbers of systems must work together for long periods of time. In contrast, a capacity cluster has the ability to tolerate failure and continue running user programs.

Chances are if you are using a cluster it is a capacity system. If that is the case, the Top500 might interest you, but it probably has very little to do with your performance. If you are one of the the few high-end capability users, the Top500 is just for you.

Comments

Pingback from Cluster Connection » Cluster or Constellation
Time June 24, 2009 at 9:51 am

[...] and using Linux clusters since 1995. Doug holds a Ph.D. in Chemistry from Lehigh University. Capacity vs Capability ClustersMPI [...]

JOIN THE CONVERSATION


You must be a Registered Member in order to comment on Cluster Connection posts.

Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.

Login     Register Now


Author Info


Dr. Douglas Eadline has worked with parallel computers since 1988 (anyone remember the Inmos Transputer?). After co-authoring the original Beowulf How-To, he continued to write extensively about Linux HPC Clustering and parallel software issues. Much of Doug's early experience has been in software tools and and application performance. He has been building and using Linux clusters since 1995. Doug holds a Ph.D. in Chemistry from Lehigh University.