**July 23rd, 2009 12:50 pm**

*Posted by Douglas Eadline*

**Tags:**BLAS, HPL, linear algebra, Linpack, Math Kernel Library, Top500

*Ever wonder what the Top500 list actually measures? *

Every six months the largest computers in the world are ranked as to how many floating point operations per second (FLOPS) they can perform. The results are tabulated on the Top500 list. The actual benchmark is called HPL, which stands for High Performance Linpack. The Linpack benchmark was designed to measure the floating point performance of various systems and the HPL version is designed to run on parallel computers like clusters.

The Linpack problem is something you may have seen in high school. Given a set of linear equations of the form,

3x + 2y - z = 1 |

2x - 2y + 4z = -2 |

-x + |
1
2 |
y - z = 0 |

Solve for *x*,*y*, and *z*. If you recall further, a system of linear equations can be generalized as the following:

A ⋅ x = b |

Where *A* is a square matrix of the coefficients (the *3*, *2* values etc. in the above equations), *x* is a vector list of the "unknowns" (*x*, *y*, *z*) and *b* is a vector list of the "answers" (1,-2,0). The Linpack benchmark solves for the unknowns. The size of the test is the the number of equations and is often referred to as **N**. If **N** were 10, then *A* would be a 10x10 matrix, *x* would be list of 10 unknowns, and *b* would be a list of 10 answers.

In terms of memory, if **N** were 10,000, then a desktop computer would need about 1GB of memory. If **N** were 1,000,000 then 7.5 TB of memory would be needed. Enter the cluster, where HPL is designed to distribute these really large problems over the nodes. The grunt work on each node is actually done by the BLAS library, which stands for Basic Linear Algebra Subprograms. These routines are used to solve the subprogram given to each node. At various points, the nodes must exchange information using MPI (Message Passing Interface).

There are various checks in the program to make sure the calculations are correct. HPL also creates random data for each size problem. To keep things fair, the random data is the same for each value of **N**. The BLAS library is available in many forms. There are several optimized versions that can make a huge difference in performance. For instance, on Intel processors, Intel offers the Math Kernel Library (MKL) that contains very fast hand optimized math routines (including the BLAS libraries).

In addition to problem size, there are many other tunable parameters for the HPL benchmark. The benchmark can take hours or days to run, thus getting a good "HPL number" can take a long time. It also requires the entire cluster, something that often disappoints regular users. There are many applications that use HPL type math and thus the benchmark is very relevant for some users. In other cases, where users codes are not solving *dense linear equations*, the benchmark offers a good historical measure of HPC progress, but offers little insight in terms of their application performance. Next time you hear HPL, BLAS, Linpack, MKL, and other funny sounding acronyms you will know it is just a bunch of software for solving big versions of those little high school math problems.

## JOIN THE CONVERSATION

You must be a **Registered Member** in order to comment on Cluster Connection posts.

Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.

**Douglas Eadline**

Dr. Douglas Eadline has worked with parallel computers since 1988 (anyone remember the Inmos Transputer?). After co-authoring the original Beowulf How-To, he continued to write extensively about Linux HPC Clustering and parallel software issues. Much of Doug's early experience has been in software tools and and application performance. He has been building and using Linux clusters since 1995. Doug holds a Ph.D. in Chemistry from Lehigh University.