June 24th, 2009 9:51 am
Posted by Douglas Eadline
Tags: HPC, HPN-SSH, openssh, Pittsburgh Supercomputing Center, rsh, scp, sftp, ssh
Using a cluster SSH may clear some cluster bottle necks.
There is an ongoing debate in many cluster circles. Much akin to the vi vs emacs wars, the rsh (remote shell) vs SSH (secure shell) decision always results in a lively discussion. Before we dig into high performance SSH, lets set some background. First, when we refer to SSH, we are referring to the OpenSSH package included in virtually all Linux distributions. Second, the SSH protocol provides a secure way for users to gain access or transfer data to clusters. Although user programs are run through the batch scheduler, at some point the scheduler or MPI starter must remotely start processes on cluster nodes. At this point either rsh or SSH is used.
What Is Wrong With rsh?
There is nothing particularity wrong with rsh other than it was not really designed for HPC and clusters. rsh is part of the Berkeley remote commands (or utilities) also known as the "r" commands. These commands allow remote action on another machine. When they were designed, network computing was a more trusted environment than it is today and as such the "r" commands transmit everything in clear text. In addition, rsh has a limit of 512 ports, which means there can only be 512 rsh sessions operating at one time. This limit can be a problem for large clusters. Additionally, rsh can also be surprisingly hard to get properly configured due to the confounding issues of PAM, tcp wrappers, firewalls, etc. You may wish to consult the Beowulf Maling List archives for some lively discussion on this topic.
Because rsh (and rcp and telnet) lack encryption, their use is discouraged in the open Internet and even within LANs. A secure protocol called SSH was developed in 1995 by Tatu Ylönen while working at the Helsinki University of Technology in Finland. (Note: Finland is also the birthplace of Linux). In addition to being secure, SSH did not have a port limit, which made it more suitable for cluster computing.
What Is Wrong With SSH?
OpenSSH is a de facto standard for secure computing across the internet. Using SSH, one can securely login to any computer that supports the SSH protocol (provided you have an account). For cluster computing, security between nodes is not a major concern, but anytime passwords are flying around as plain text, administrators get nervous. Beyond security (authentication and encryption), OpenSSH has much better usability and utility than rsh. It is easier to set up, manage, and debug.
In clusters and HPC, moving data around securely is often done using SFTP (secure ftp) and SCP (secure copy). These services are part to the standard OpenSSH package. Transfers and authentication are secure as well. One problem HPC users have with OpenSSH is that on today's networks, SSH based protocols are quite slow. It is often assumed that the encryption step limits the performance. This conclusion is not the case with today's networks, however. The rate limiting step is due to something called "receive window size" and for standard OpenSSH it is quite small.
High Performance SSH
Because SSH is widely available protocol, researchers at the Pittsburgh Supercomputer Center decided to "patch" (change the underlying source code) the OpenSSH package. Their efforts have resulted in the HPN-SSH project. Their modifications have transformed OpenSSH from a slow but reliable protocol, to a fast and reliable protocol that is much more suitable for HPC users.
Essentially, what the HPN-SSH team did was to make the receive window adjustable and as such they were able to increase performance significantly. Once they opened the throughput bottleneck however, they noticed, that now the encryption step had become rate limiting. Fortunately, they found a way to parallelize the encryption step and use multiple cores in today's processors to improve performance. The results of their work are shown in Figure One below. (The figure is reproduced from the paper High Speed Bulk Data Transfer Using the SSH Protocol (pdf), by Chris Rapier, Benjamin Bennett, Pittsburgh Supercomputing Center)
Figure One: Throughput of HPN-SSH versus SSH 4.6 along a 1Gb/s transatlantic path
In the figure, SSH/AES is the standard OpenSSH result using AES encryption. The HPN-SSH/AES is the result for the improved HPN-SSH with encryption and the HPN-SSH/NONE is without encryption. (i.e. After an encrypted authentication session, the rest of the transfer was carried out without encryption.)
As you can see, the results are quite impressive. If you use OpenSSH for cluster data transfers, take a look at HPN-SSH it may just be what you need to get things moving a bit quicker.
JOIN THE CONVERSATION
You must be a Registered Member in order to comment on Cluster Connection posts.
Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.
Login Register Now