October 8th, 2009 3:45 pm
Posted by Douglas Eadline
Tags: cloud, grid, HPC, InfiniBand, Top500, virtualization
Are clouds a good place to do build HPC Clusters?
The use of virtualization and multi-core processors has made cloud computing an option for many users. The ability to buy cloud time as you need it and not purchase hardware is certainly attractive from a financial standpoint. The concept is not new and has its roots in time shared mainframes and grid computing. One might assume the the vast amount of computing resources in clouds may make them ideal candidates for HPC clustering. Unfortunately, it is not as simple as collecting cores.
One of the issues facing clouds is I/O. Basically, I/O is often not predictable or repeatable. From a storage standpoint read and write times can be fast, but not always fast. In terms of messages between servers, most clouds do not support high performance interconnects and similarly make no guarantees as to latency or bandwidth consistency. While grids paid attention to certain HPC performance guarantees in terms of I/O, clouds, in order to offer ease of use, have declined such guarantees. Unless a cloud has been specifically designed for HPC, the user cannot expect consistent and/or high performance. There are two papers which discuss this very idea. The first paper looks at Benchmarking Amazon EC2 for High-performance Scientific Computing and the second paper asks, Can Cloud Computing Reach The TOP500?. Both papers conclude that the cloud is not mature enough for HPC applications.
The limitations of the cloud become more apparent when one looks a little deeper at HPC applications. First, many applications rely on user space communication (i.e. high performance MPI programs transfer data directly from one node to another without using kernel services.) Such a close to the wire operation runs counter to the virtualization model. Secondly, as reported in the first paper (above), the performance of OpenMP applications was reduced by 7-21% when running in the EC2 cloud.
Recently Penguin Computing began offering POD (Penguin on Demand) for HPC cloud computing. The POD cloud offers both Ethernet and InfiniBand connections between nodes thus providing a dedicated high performance computing environment. This service can be considered a specialized HPC cloud.
There are some other other important issues to consider with cloud computing -- security and reliability. When data leaves your domain over the Internet it is virtually impossible to guarantee 100% security. If your organization can live with this situation, using the cloud may be an option. If on the other hand, you need to keep a tight reign on your data, then you may not want to be injecting it into the cloud. The other issue is reliability. If your day to day operations are based on using a cloud, then a contingency plan is a must. Interruptions in Internet traffic due to congestion or hardware failures can be common in some areas. In addition, the cloud provider may have issues (even go out of business) and thus not meet the service requirements.
I believe the cloud is an interesting model, but it is not a real solution for HPC (in its current form). My issue with clouds is that they are often categorized as "grid like" and then are somehow (incorrectly) considered "HPC like." Cloud offers utility computing like grid promised, but has pushed the application layer further away from the hardware. HPC practitioners spend a lot of time making sure the application is as close to the hardware as possible. At this point in time, HPC in the cloud is more of a curiosity than a solution. When examining HPC benchmarks it becomes clear that clouds are not the best means to provide HPC cycles. Whether efforts like POD can meet the HPC users needs in the cloud is still unknown.
To be fair, there are some HPC applications that lend themselves to clouds quite well. (i.e. those that do not require predictable I/O) Folding@home and Seti@home are two good examples. These applications could easily run in a cloud (in a sense they do run in the Internet cloud). Keep in mind they have been designed to work in a robust distributed fashion and are not virtualized. Clouds can be enticing and even enabling for some applications, but remember a collection of servers (in the cloud or in a rack) does not a cluster make.
JOIN THE CONVERSATION
You must be a Registered Member in order to comment on Cluster Connection posts.
Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.
Login Register Now