July 20th, 2009 8:33 am
Posted by Douglas Eadline
Tags: BMC, FQDN, hostname, IP address, IPMI
The simple act of locating a node may not be so simple
When people look at racks of cluster nodes, they often ask "How do you tell which node is which?" A very good question. It turns out that there are internal and external ways to identify cluster nodes. The external method is for physically locating the node in a rack. If a node is having problems the system administrator may need to attach a monitor or look at case lights. Without help, identifying a node can somewhat difficult. In the early days, system administrators often had to know the physical location of the node in a rack, then go to the rack and count nodes until the node was identified. In some cases, the node was labeled with a sequential ID number so it could be easily located.
New servers often have a "trouble" or identification light that is highly visible. The identify light is controlled by IPMI, an out-of-band (OOB) node access method that talks with a baseboard management controller (BMC). The BMC is independent of the system CPUs and powered as soon as the node is plugged in to main power. The system administrator can turn the light on from the BMC management console or depress a switch directly on the node.
When a problem is detected on a node, the system administrator is usually notified in some fashion. Either through the BMC software or some other node monitoring software. The administrator may then turn on the identification light to locate the server in the rack and obvious things like cables can then be checked. Other methods of identification are also available and were used in the past when IPMI was not available. Powering down the node was one option so it could be identified by its "no lights."
Internally, clusters are identified using software. All nodes have a unique network name and address often assigned when the node is provisioned. The network address or IP address is unique for each node. The network name can either be a fully qualified domain name (FQDN) and/or a "nickname". For instance, a node can appear in the /etc/hosts file as follows, (IP address, FQDN, nickname):
10.1.0.72 node08.somedomain.com node08
Once the node is booted it is identified by its IP address (network address) and hostname (nickname). Hostnames are often of the form "node001" or similar so that they can be easily identified. Some hostnames are designed to identify the node and rack location as "n24r12" which means "node 24 in rack 12".
As clusters get larger, identifying nodes, both externally and internally becomes important. Translating from an internal IP address to a physical location in a rack is an important and sometimes overlooked "feature" for the first time cluster administrator. Just as important, is being able to convert from the rack location to the node IP address. Often times a simple label with both IP address and the nodes unique Ethernet address (often called MAC address) is the best solution. Through good planing an understanding of the physical and network layout, locating troubled nodes can be one of the easier parts of your job.
JOIN THE CONVERSATION
You must be a Registered Member in order to comment on Cluster Connection posts.
Members enjoy the ability to take an active role in the conversations that are shaping the HPC community. Members can participate in forum discussions and post comments to a wide range of HPC-related topics. Share your challenges, insights and ideas right now.
Login Register Now