<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cluster Connection &#187; Douglas Eadline</title>
	<atom:link href="http://www.clusterconnection.com/author/deadline/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.clusterconnection.com</link>
	<description>Simplify HPC. Share the knowledge.</description>
	<lastBuildDate>Fri, 30 Dec 2011 21:23:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Does More Cores Mean Less Nodes?</title>
		<link>http://www.clusterconnection.com/2009/10/does-more-cores-mean-less-nodes/</link>
		<comments>http://www.clusterconnection.com/2009/10/does-more-cores-mean-less-nodes/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 18:19:06 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[HPC]]></category>
		<category><![CDATA[nodes]]></category>
		<category><![CDATA[Processor Cores]]></category>
		<category><![CDATA[revenue]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/?p=1479</guid>
		<description><![CDATA[Packing cores into a node means less servers are needed, but the market is still growing. Ever since the shift to multi-core processors began, I have always had a nagging question - Does more cores mean less nodes? I have wrestled with this question and finally realized that there is no simple answer. I should [...]]]></description>
			<content:encoded><![CDATA[<p><em>Packing cores into a node means less servers are needed, but the market is still growing.</em></p>
<p>Ever since the shift to multi-core processors began, I have always had a nagging question - <em>Does more cores mean less nodes?</em> I have wrestled with this question and finally realized that there is no simple answer. I should preface this post by stating that I am talking about HPC and not the general market place where multi-core and virtualization are all the rage.</p>
<p>To understand why I ask this, consider that back when Linux clustering began, the single core Pentium Pro and Dec Alpha where the two processors of choice. Large clusters were maybe 64 or 128 nodes (often tower cases), which translated into 64 or 128 cores. Today you can easily pack 128 cores into 16 nodes - not even a full rack chassis. Given this trend, is the node count for clusters getting smaller?</p>
<p>From my anecdotal evidence, I seem to notice several trends. First, the HPC market, after a downturn, seems to be growing again and is projected to reach $11.7 billion by 2012 (<a href="http://www.hpcwire.com/topic/systems/IDC-HPC-Will-Resume-Growth-After-Dipping-in-2009-38620187.html">IDC</a>). This revenue is up from a $9.6 billion figure for 2008. Thus, the hunger for nodes is increasing and not deceasing, which begs a further question. <em>Are node counts increasing or are more people buying clusters?</em> (i.e. instead of a few people buying larger clusters, are there a lot of people buying smaller manageable clusters.) For the marketing types out there, maybe you know the answer. If not, that is a good question to ask. Leave a comment and give us a clue.</p>
<p>Second, from my experience there is plenty of HPC work to go around. Whether nodes are in a large data center cluster or in a small local blade system, it seems the cores are busy. Perhaps we are seeing the rise of the <em>Closet Cluster? </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/10/does-more-cores-mean-less-nodes/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Cluster Bookshelf</title>
		<link>http://www.clusterconnection.com/2009/10/the-cluster-bookshelf/</link>
		<comments>http://www.clusterconnection.com/2009/10/the-cluster-bookshelf/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 22:46:44 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Becker]]></category>
		<category><![CDATA[beowulf]]></category>
		<category><![CDATA[books]]></category>
		<category><![CDATA[Gropp]]></category>
		<category><![CDATA[reviews]]></category>
		<category><![CDATA[Sterling]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/?p=1579</guid>
		<description><![CDATA[An all too short list of HPC cluster books Every so often I am asked about books on Beowulf/HPC clustering. The good news is there are books available at several levels. The bad news is no one book covers everything because the number of topic areas is so vast. To make your search easier, I [...]]]></description>
			<content:encoded><![CDATA[<p><em>An all too short list of HPC cluster books<br />
</em></p>
<p>Every so often I am asked about books on Beowulf/HPC clustering. The good news is there are books available at several levels. The bad news is no one book covers everything because the number of topic areas is so vast. To make your search easier, I have surveyed all of the cluster books I know about. I have not included some that are out of print or don't really discuss HPC. Some of the books are dated, but still contain good general information. Others are just plain old, but are included here for completeness. If you know of any other books, please add a comment. My list is as follows:</p>
<p><a href="http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book.php">Engineering a Beowulf-Style Compute Cluster</a> by Robert G. Brown<br />
A freely available book (pdf) that discusses building and designing Beowulf style clusters. Robert Brown is a long time contributor to the Beowulf/HPC cluster community.</p>
<p><a href="http://my.safaribooksonline.com/0131448536">Building Clustered Linux Systems</a> by Robert W. Lucke<br />
A very good overview of cluster computing methods and hardware. The book provides a rather wide coverage of options, but does not dive too deep into any one approach. It is somewhat Hewlett Packard focused as author works for HP. (648 Pages) ISBN: 0-13-144853-6</p>
<p><a href="http://www.sun.com/x64/ebooks/hpc.jsp">HPC For Dummies</a> (an ebook) by Douglas Eadline<br />
A very short introduction to HPC, but not for dummies either! The ebook is available for free from Sun Microsystems after registration. The book provides backgound and best practices. (authored by yours truly)</p>
<p><a href="http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=9947">Beowulf Cluster Computing with Linux</a>, Second Edition by William Gropp, Ewing Lusk and Thomas Sterling<br />
Updated edition, now edited by William Gropp, Ewing Lusk (in addition to Sterling). Provides a good, but high level view of Linux clustering. This edition includes ROCKS and OSCAR coverage plus other important issues. (504 pages) ISBN 0-262-69292-9</p>
<p><a href="http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=8681">Beowulf Cluster Computing with Linux</a> by Thomas Sterling<br />
The next book after the original "How to Build a Beowulf" by Tom Sterling (see below). Coverage is much expanded, but now very high level as chapters are written by key players in the cluster community. (536 pages) ISBN 0-262-69274-0</p>
<p><a href="http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=8682">Beowulf Cluster Computing with Windows</a> by Thomas Sterling<br />
Approximately 75% of the material is the same as the Linux book by Sterling (no file system coverage). Newer versions of Windows HPC Server 2008 make this a bit dated. If you need Windows clusters, you may want to start here. (488 pages) ISBN 0-262-69275-9</p>
<p><a href="http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=3898">How to Build a Beowulf</a> by Thomas Sterling, John Salmon, Donald J. Becker and Daniel F. Savarese<br />
The first book on Beowulf cluster computing. Published in 1999, it is now quite dated. Both hardware and software have moved past the text. It does provide good coverage of the issues facing the cluster builder. (261 pages) ISBN 0-262-69218-X</p>
<p><a href="http://www.amazon.com/exec/obidos/tg/detail/-/0138997098/ref=pd_cps_eb_1/104-1154502-8274348?v=glance&amp;s=books">In Search of Clusters</a> by Gregory Pfister<br />
A good (and very technical) book about the advantages of cluster architectures. Also provides a very detailed analysis of programming models  This book seems to be out of print. Worth reading if you can find a copy. It does not cover Linux Beowulf clusters. (608 pages) ISBN 0138997098</p>
<p><a href="http://oreilly.com/catalog/9780596005702/">High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI</a> by Joseph D. Sloan<br />
O'Reilly's second attempt at a Linux cluster book. Many feel this second attempt has missed the mark (again). Also note, the The OpenMosix Project has officially closed as of March 1, 2008. (367 pages) ISBN: 0-596-00570-9</p>
<p><a href="http://www.redbooks.ibm.com/abstracts/sg246041.html">Linux HPC Cluster Installation</a> An IBM Redbooks publication<br />
This document focus on xCAT (xCluster Administration Tools) for installation and administration. All nodes and components of the cluster, such as compute nodes and management nodes, are installed with xCAT. Dated material.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/10/the-cluster-bookshelf/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Clustering in the Cloud</title>
		<link>http://www.clusterconnection.com/2009/10/clustering-in-the-cloud/</link>
		<comments>http://www.clusterconnection.com/2009/10/clustering-in-the-cloud/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 22:45:49 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[grid]]></category>
		<category><![CDATA[HPC]]></category>
		<category><![CDATA[InfiniBand]]></category>
		<category><![CDATA[Top500]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/2009/10/clustering-in-the-cloud/</guid>
		<description><![CDATA[Are clouds a good place to do build HPC Clusters? The use of virtualization and multi-core processors has made cloud computing an option for many users. The ability to buy cloud time as you need it and not purchase hardware is certainly attractive from a financial standpoint. The concept is not new and has its [...]]]></description>
			<content:encoded><![CDATA[<p><em>Are clouds a good place to do build HPC Clusters?</em></p>
<p>The use of virtualization and multi-core processors has made cloud computing an option for many users. The ability to buy <em>cloud</em> time as you need it and not purchase hardware is certainly attractive from a financial standpoint. The concept is not new and has its roots in time shared mainframes and grid computing. One might assume the the vast amount of computing resources in clouds may make them ideal candidates for HPC clustering. Unfortunately, it is not as simple as collecting cores.</p>
<p>One of the issues facing clouds is I/O. Basically, I/O is often not predictable or repeatable. From a storage standpoint read and write times can be fast, but not always fast. In terms of messages between servers, most clouds do not support high performance interconnects and similarly make no guarantees as to latency or bandwidth consistency.  While grids paid attention to certain HPC performance guarantees in terms of I/O, clouds, in order to offer ease of use, have declined such guarantees. Unless a cloud has been specifically designed for HPC, the user cannot expect consistent and/or high performance. There are two papers which discuss this very idea. The first paper looks at <a href="http://www.usenix.org/publications/login/2008-10/openpdfs/walker.pdf">Benchmarking Amazon EC2 for High-performance Scientific Computing</a> and the second paper asks, <a href="http://www.cs.utexas.edu/users/pauldj/pubs/uchpc09.pdf">Can Cloud Computing Reach The TOP500?</a>. Both papers conclude that the cloud is not mature enough for HPC applications.</p>
<p>The limitations of the cloud become more apparent when one looks a little deeper at HPC applications. First, many applications rely on <em>user space</em> communication (i.e. high performance MPI programs transfer data directly from one node to another without using kernel services.) Such a <em>close to the wire</em> operation runs counter to the virtualization model. Secondly, as reported in the first paper (above), the performance of OpenMP applications was reduced by 7-21% when running in the EC2 cloud.</p>
<p>Recently Penguin Computing began offering POD (Penguin on Demand) for HPC cloud computing. The POD cloud offers both Ethernet and InfiniBand connections between nodes thus providing a dedicated high performance computing environment. This service can be considered a specialized HPC cloud.</p>
<p>There are some other other important issues to consider with cloud computing -- security and reliability. When data leaves your domain over the Internet it is virtually impossible to guarantee 100% security. If your organization can live with this situation, using the cloud may be an option. If on the other hand, you need to keep a tight reign on your data, then you may not want to be injecting it into the cloud. The other issue is reliability. If your day to day operations are based on using a cloud, then a contingency plan is a must. Interruptions in Internet traffic due to congestion or hardware failures can be common in some areas. In addition, the cloud provider may have issues (even go out of business) and thus not meet the service requirements.</p>
<p>I believe the cloud is an interesting model, but it is not a real solution for HPC (in its current form). My issue with clouds is that they are often categorized as "grid like" and then are somehow (incorrectly) considered "HPC like." Cloud offers utility computing like grid promised, but has pushed the application layer further away from the hardware. HPC practitioners spend a lot of time making sure the application is as close to the hardware as possible. At this point in time, HPC in the cloud is more of a curiosity than a solution. When examining HPC benchmarks it becomes clear that clouds are not the best means to provide HPC cycles. Whether efforts like POD can meet the HPC users needs in the cloud is still unknown.</p>
<p>To be fair, there are some HPC applications that lend themselves to clouds quite well. (i.e. those that do not require predictable I/O)  <a href="http://folding.stanford.edu/">Folding@home</a> and <a href="http://setiathome.berkeley.edu/">Seti@home</a> are two good examples. These applications could easily run in a cloud (in a sense they do run in the Internet cloud). Keep in mind they have been designed to work in a robust distributed fashion and are not virtualized. Clouds can be enticing and even enabling for some applications, but remember a collection of servers (in the cloud or in a rack) does not a cluster make.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/10/clustering-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Support, Why Do I Need Cluster Support?</title>
		<link>http://www.clusterconnection.com/2009/10/support-why-do-i-need-cluster-support/</link>
		<comments>http://www.clusterconnection.com/2009/10/support-why-do-i-need-cluster-support/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 22:45:30 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Beowulf Mailing List]]></category>
		<category><![CDATA[bug fixes]]></category>
		<category><![CDATA[dependencies]]></category>
		<category><![CDATA[ICR]]></category>
		<category><![CDATA[support]]></category>
		<category><![CDATA[updates]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/?p=1536</guid>
		<description><![CDATA[Supporting a successful HPC cluster takes time and money, take your pick Many of the HPC people I know are what you would call rugged individualists. The have been around since the beginning and were responsible for moving the market/community along when commodity HPC was less than fashionable. This group consists mostly of developers, implementers, [...]]]></description>
			<content:encoded><![CDATA[<p><em>Supporting a successful HPC cluster takes time and money, take your pick</em></p>
<p>Many of the HPC people I know are what you would call <em>rugged individualists</em>. The have been around since the beginning and were responsible for moving the market/community along when commodity HPC was less than fashionable. This group consists mostly of developers, implementers, and administrators. Many of these people developed, by way of discussion on the <a href="http://www.beowulf.org/mailman/listinfo/beowulf" class="broken_link">Beowulf Mailing List</a>, the best practices used today. The Beowulf Mailing list is a true resource if there ever was one. A <em>newbie</em> can ask a question and get polite (and lengthy) answers by list members. The list holds a large amount of open community knowledge because all the HPC plumbing is open source. Unhindered discussions can take place at any level between any number of people.</p>
<p>There is also the false notion that open source software is "free as in beer." This idea is not quite true because software unlike toasters have a usage cost. Once you install, configure, and study any software you have already made an "investment" in the package. Continued use furthers this investment. The size of the investment is up to you. And, because the software is open, in theory you can fix any problem. Thus, you have the choice to decide how much time you can invest in a particular software package before it becomes "expensive" to you. At some point, the cost (or dilution) of your time may come into play. Spending weeks fine tuning a single application at the expense of other responsibilities is probably not going to play out very well.</p>
<p>In terms of support responsibilities, many clusters have minimal issues once they are configured correctly. There are, of course, hardware failures, but in general, once everything is booted, things often work quite well. There are two areas that need attention, however. The first is software updates. Updates are needed for several reasons including security, bug fix, or feature updates. These types of updates are usually easy to manage unless they have dependencies, which means there may be a whole raft of packages that need updating. If you don't get the dependencies right, then there can be problems with the entire cluster.</p>
<p>The other issue is local integration. This is what I consider the "last mile problem" for clusters. Very often local file system issues need to be worked out and managed in addition to creating job submission policies. There is usually some end-user assistance needed as well as questions on how to compile and submit jobs to the queue. Of course, "Why is my job sitting in the queue?" is probably the question that gets asked the most.</p>
<p>If your job includes time for installing, integrating, and updating software and you happen to be one of those rugged individualist HPC people, then you probably have no interest in professional support. If on the other hand, you are new to clustering (or Linux) and already have many responsibilities, then you may want to consider using professional support services. As with all open source software, the choice is yours. In terms of commercial software, or commercial support of open software, there are many options. In any case, purchasing support for business critical applications is always a good idea. As is the use of a <a href="http://software.intel.com/en-us/cluster-ready/">Intel Cluster Ready</a> (ICR) solution. By adhering to the ICR specification support is much easier -- for you and/or a vendor. That is, a reference platform allows you and your vendors to <em>work from the same page</em>. Without a common framework, support vendors and others in your organization may have to decipher/debug how you configured the cluster.</p>
<p>In conclusion, cluster support can be commercial or it can be institutional. In either case, there is a cost. If you do it on your own, it will cost time and if you hire a consultant or company, it will cost money. To supplement either effort, there is a large amount of information on the web that can be useful when identifying and solving problems. Support is an important part of any successful HPC cluster,  just ask the old-timers. They figured it out, so you don't have to.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/10/support-why-do-i-need-cluster-support/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Skinny on Solid State Disks</title>
		<link>http://www.clusterconnection.com/2009/09/the-skinny-on-solid-state-disks/</link>
		<comments>http://www.clusterconnection.com/2009/09/the-skinny-on-solid-state-disks/#comments</comments>
		<pubDate>Tue, 22 Sep 2009 22:54:20 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[IOPS]]></category>
		<category><![CDATA[JEDEC]]></category>
		<category><![CDATA[NAND flash]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[SSD]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/?p=1475</guid>
		<description><![CDATA[What you need to know about the latest trend in storage hardware The Solid State Drive (SSD) has had a break-out year. Unlike traditional mechanical hard disk drives that use spinning platters with movable read/write heads, SSDs have no moving parts. The SSDs is made entirely out of a special type of flash memory -- [...]]]></description>
			<content:encoded><![CDATA[<p><em>What you need to know about the latest trend in storage hardware</em></p>
<p>The Solid State Drive (SSD) has had a break-out year. Unlike traditional mechanical hard disk drives that use spinning platters with movable read/write heads, SSDs have no moving parts.  The SSDs is made entirely out of a special type of flash memory -- the same kind of NAND flash memory found in thumb-drives and memory sticks. Overall, SSDs are faster, quieter, more energy efficient, but less dense than the traditional spinning platter drive.</p>
<p>The staple of the storage industry has been the mechanical drives, where I/O rates are limited to the mechanical properties of a drive. Unlike other semiconductor trends, the sustained write rate has barely doubled from 50 to 90 MB/second over the past 8-10 years. All this is about to change as the use of flash memory will allow stroage to take advantage of a semiconductor growth curve similar to that of processors and memory. (i.e. mechanical drives are limited by the physical motion of spinning disks).</p>
<p>Perhaps the most important feature offered by SSDs is the read and write performance. The IOPS (I/Os per second) rate for an SSD is usually two to five times that of a traditional mechanical hard drive. When reading, performance is mostly constant because the seek time is virtually instantaneous and does not depend on the physical location of the data on a platter. As a result, file fragmentation has almost no impact on read performance. In addition, because there are no moving parts, SSDs use as little as one-fifth the power of a mechanical drive. Another interesting feature is the SSD failure mode. Most SSD failures tend to happen when writing. In contrast, mechanical drives tend to have most failures when reading. Thus, once data is written, it is more likely it can be read from a failed SSD.</p>
<p>SSDs do suffer from “degradation” over time that results in reduced performance and limited lifetimes (i.e. there are a limited amount of read/write cycles avaliable for NAND memory). Vendors have taken this into account and include <em>wear leveling</em> algorithms in SSDs that spread write access evenly over the entire device.</p>
<p>If you are interested in the exploring SSDs, there are some key points to consider. First, SSDs are not the best solution in every case. Currently, their capacity is much less than that of traditional mechanical drives and as such may not be suitable for some of the large HPC data sets. In addition, the cost is per MB is higher and are more susceptible to data loss from energy and power surges.</p>
<p>Second, the very fast read times offered by SSDs has made them a good candidate for improving OLTP (Online Transaction Processing) systems where frequently read tables and indexes can be accelerated. Check both the read and write IOPS, as there is usually a big difference between these values. Typically, the read speed is 10 times the write speed resulting in asymmetric performance. In terms of clusters, using SSDs for NFS mounts or read-only data may be helpful.</p>
<p>Third, because SSDs are new, questions still remain about how much of that speed they can deliver for the long haul (due to degradation). Typically, an SSD will show an initial decrease in performance and then level off. Even with a performance drop over time, SSD drives are almost always faster than traditional hard drives. The JEDEC standards organization plans to publish two standards by the end of this year for SSD endurance metrics.</p>
<p>Finally, pay particular attention to “write endurance,” this number should pertain to random writes. For instance, an Intel® X25-E Extreme 64 GB SATA Solid-State Drive is rated for 2 petabytes of lifetime random writes.</p>
<p>In terms of software, one difficulty facing the industry are the legacy assumptions built into file systems. These assumptions will need to be challenged in order to take advantage of SSD technology. For instance user applications and file systems will need to account for the asymmetric read/write performance of SSDs. Many computer applications rely on synchronous patterns of read/write operations, wherein a given write or update must be completed and the write confirmed before additional application read requests can be issued. With SSDs this process may need to be reconsidered.</p>
<p>There is no doubt that SSDs are the future of storage. Indeed, SSDs are even changing the way we compute. For example, CAE (Computer Aided Engineering) applications can use the speed advantage of SSDs in their out-of-core algorithms. The power of semiconductor manufacturing technology combined the speed of NAND flash memory are about to make the storage market stand still!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/09/the-skinny-on-solid-state-disks/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Future File Systems: Btrfs and ZFS</title>
		<link>http://www.clusterconnection.com/2009/09/future-file-systems-btrfs-and-zfs/</link>
		<comments>http://www.clusterconnection.com/2009/09/future-file-systems-btrfs-and-zfs/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 02:51:34 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Btrfs]]></category>
		<category><![CDATA[File Systems]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ZFS]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/?p=1421</guid>
		<description><![CDATA[The prediction is in. What file system will move us into the future? The thirst for better file system technology is not new to the Unix/Linux world. There is a rich history of trying to optimize the balance between the storage system and the process improvements in a demanding user environment. The efforts to build [...]]]></description>
			<content:encoded><![CDATA[<p><em>The prediction is in. What file system will move us into the future?</em></p>
<p>The thirst for better file system technology is not new to the Unix/Linux world. There is a rich history of trying to optimize the balance between the storage system and the process improvements in a demanding user environment. The efforts to build a better file system are quite numerous and are built on the efforts of many people. For example Kirk McKusicks’ original Berkeley Fast File System improved on the original V7 release. Steven Tweedie’s ext3 took ideas from the database logs and Margo Seltzer’s LFS and added them to Linux’s implementation of UFS -- Ted Tsao’s ext2.  In the mean time, DEC released Megasafe, SGI released XFS and Sun released ZFS all to the wild. And now Oracle has developed Btrfs for Linux.</p>
<p>So why do we as users care?  There better be a good reason to change a file system because a new file system usually means converting to and trusting a new format with your data. Thus, any new format or change must provide a compelling reason or solve a big problem. Otherwise what is "good enough and works" is often better than that which is "new and fancy."</p>
<p>If you follow the details of file systems development, then this quick update may not be of interest to you. For the rest of us, who just choose whatever file system the installer offers, you may want to read further because changes are afoot.</p>
<p>If you are like me, you probably are running Linux with the ext3 file system. There is nothing wrong with ext3 as it is stable, robust, and a standard Linux file systems. And, one other thing, it is old. Even if you are running the newer ext4, you are still running a 30-year old file format that is more than a little short on features.</p>
<p>There are those that believe ext4 is going to be the end of the line and a switch over to Btrfs is very likely. Btrfs (pronounced "butter-F-S") is being developed by Chris Mason at Oracle. It is an open source project that has recently been added to the Linux kernel (as of 2.6.29) as experimental code.</p>
<p>Btrfs is based on several new ideas including b-trees (binary trees, which is where the <em>btr</em> comes from in Btrfs) and "copy-on-write" or COW. While, I won't go into the details, b-trees and COW allow for some new features that would be difficult in the ext* line of file systems. (If you want to learn more about the technical details of Btrfs, see  <a href="http://lwn.net/Articles/342892/">A short history of btrfs</a> on lwn.net.) Some of the new features include file-system snapshots, check-summing, online defragmentation, compression, extents, resizing, and more. In particular, Btrfs allows one thing that has been difficult to achieve in the past -- optimizing both access time and disk space.</p>
<p>The fact that Oracle sponsors Btrfs has lead to some concern. Recently, Oracle purchased Sun Microsystems which has been developing the ZFS file system for many years. ZFS is similar to Btrfs (it uses COW) and provides many of the same features, but it is very different in its internal implementation. ZFS will also "run" under Linux using <a href="http://zfs-on-fuse.blogspot.com/">Fuse</a>. Mason and other have assured the community that Btrfs is important to Oracle and they will continue development. In addition, the open source nature of Btrfs ensures that it cannot be "taken away" now or in the future.</p>
<p>There is plenty more to consider and I suggest reading <a href="http://www.linux-mag.com/id/7308/">Linux Don't Need No Stinkin' ZFS: BTRFS Intro &amp; Benchmarks</a> by my friend Jeff Layton. The consensus seems to be that Btrfs is destined to become the default Linux file systems within two years. ZFS on the other hand must overcome some licensing issues before it can even make it into the Linux kernel for testing. Your next Linux install may offer a new and better (or "btr") file system than in the past.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/09/future-file-systems-btrfs-and-zfs/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Computing To Compete</title>
		<link>http://www.clusterconnection.com/2009/09/computing-to-compete/</link>
		<comments>http://www.clusterconnection.com/2009/09/computing-to-compete/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 02:47:08 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[aerodynamics]]></category>
		<category><![CDATA[Council on Competitiveness]]></category>
		<category><![CDATA[industry]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/?p=1522</guid>
		<description><![CDATA[The Optimized Mudflap And Other HPC Success Stories Welcome to the world of industrial HPC. Today, we will consider a small part of those large trucks that spend most of their day crisscrossing our highways. At highway speeds, anything that moves through the air has an aerodynamic cost. Pushing a big box takes more energy [...]]]></description>
			<content:encoded><![CDATA[<p><em>The Optimized Mudflap And Other HPC Success Stories</em></p>
<p>Welcome to the world of industrial HPC. Today, we will consider a small part of those large trucks that spend most of their day crisscrossing our highways.  At highway speeds, anything that moves through the air has an aerodynamic cost. Pushing a big box takes more energy than a round ball, which is why better aerodynamics means less energy and lower costs. Almost all trucks have some kind of mudflaps to prevent road dirt and debris from hitting the truck. Midsized truck maker Kenworth wondered how much it costs to move those mudflaps though the air. To answer the question, they turned to HPC where they were able to determine  trimming and tapering the mudflaps can cut about $400 from a typical trucks annual gas bill. This amount adds up quickly when you have a fleet of 1000 trucks. And, based on the mudflap success, Kenworth has started using HPC to help increase the efficiency of their truck designs, thus saving customers even more money.  You can read more about these efforts in <a href="http://money.cnn.com/2009/02/19/technology/fortt_kenworth.fortune/index.htm">Heavy-duty Computing</a> from Fortune Magazine.</p>
<p>If mudflaps don't pique your interest, but saving money with HPC does, then you may be interested learn more about <a href="http://www.compete.org/">The Council on Competitiveness</a> (CoC). Who or what is the CoC? They are a group of corporate CEOs, university presidents, and labor leaders committed to enhanced U.S. competitiveness in the global economy. One of their main focus areas is HPC. That is correct. Not only can HPC dock bio-molecules, design jets, and find oil, it can also help many companies save money and be more competitive. The CoC is a nonpartisan, nongovernmental organization based in Washington, D.C. The Council shapes the debate on competitiveness by bringing together business, labor, academic and government leaders to evaluate economic challenges and opportunities. <a href="http://www.compete.org/about-us/initiatives/hpc">The High Performance Computing Initiative</a> is intended to stimulate and facilitate wider usage of HPC across the private sector to propel productivity, innovation, and competitiveness. Click the link to find out more how other companies have cashed in on HPC. Yours could be next.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/09/computing-to-compete/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Is InfiniBand (and other interconnects) So Fast?</title>
		<link>http://www.clusterconnection.com/2009/08/why-is-infiniband-and-other-interconnects-so-fast/</link>
		<comments>http://www.clusterconnection.com/2009/08/why-is-infiniband-and-other-interconnects-so-fast/#comments</comments>
		<pubDate>Wed, 26 Aug 2009 17:16:12 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[10 Gigabit Ethernet]]></category>
		<category><![CDATA[Gigabit Ethernet]]></category>
		<category><![CDATA[InfiniBand]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[MPI]]></category>
		<category><![CDATA[Myrinet]]></category>
		<category><![CDATA[Open-MX]]></category>
		<category><![CDATA[TCP]]></category>
		<category><![CDATA[UDP]]></category>
		<category><![CDATA[user-space]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/2009/08/why-is-infiniband-and-other-interconnects-so-fast/</guid>
		<description><![CDATA[The advent of user-space protocols provides a fast way to move data The above title is misleading because "fast" can mean many different things. In the case of HPC, "fast" means whatever it takes to keep the cores busy! In a previous post, I mentioned four parameters that are used to define an interconnect (throughput, [...]]]></description>
			<content:encoded><![CDATA[<p><em>The advent of user-space protocols provides a fast way to move data</em></p>
<p>The above title is misleading because "fast" can mean many different things. In the case of HPC, "fast" means whatever it takes to keep the cores busy! In a previous <a href="/2009/07/cluster-interconnects-messaging-rate/">post</a>, I mentioned four parameters that are used to define an interconnect (throughput, latency, N/2, and messaging rate). Of course, applications are the best way to evaluate an interconnect.</p>
<p>The most popular interconnects for HPC are Ethernet (GigE and 10-GigE), InfiniBand, and Myrinet. (At this point, many people lump Myrinet into the 10 GigE category as it supports the standard protocol as well as the Myricom protocols.) Each of these interconnects are used in both mainstream and HPC applications, but one usage mode sets HPC applications apart from almost all others.</p>
<p>When interconnects are used in HPC the best performance comes from a "user space" mode. Communication over a network normally takes place through the kernel. (i.e. the kernel manages, and in a sense guarantees, data will get to where it is supposed to go). This communication path, however, requires memory to be copied from the users program space to a kernel buffer. The kernel then manages the communication. On the receiver node, the kernel will accept the data and place it in a kernel buffer. The buffer is then copied to the users program space. The excess copying often adds to the latency for a given network. In addition, the kernel must process the TCP/IP stack for each communication. For applications that require low latency, the extra copying from user program space  to kernel buffer on the sending node and then from kernel buffer to user program space on the receiving node can be very inefficient.</p>
<p>To improve latency, many vendors of high performance interconnects use a "user space" protocol instead of the kernel. <em>Figure One</em> illustrates this difference. For example, the solid lines indicate a standard Ethernet MPI connection. Note the communication passes through the kernel on both send and receive. Interconnects like Myrinet and InfiniBand provide a low latency user space protocol that does not use the kernel or incur any TCP/IP overhead. Instead, it moves data from the memory of one process to the memory of the other process (dashed lines). The fast interconnects also provide TCP and UDP layer layer so that they can be used with regular through kernel network services as well. (i.e. to run NFS etc.)<br />
<center><img class="aligncenter size-full wp-image-1378" title="kernel-user-space" src="/wordpress/wp-content/uploads/2009/07/kernel-user-space.png" alt="kernel-user-space" width="608" height="378" /><br />
<em>Figure One: Kernel Space vs User Space Transfer</em></center></p>
<p>Special libraries must be used to access the user space protocol. Users generally do not write code at this level. Instead, virtually all MPI libraries support either the <a href="http://www.openfabrics.org/">Open Fabrics Enterprise Distribution</a> OpenIB interface or the Myrinet MX interface. Users need to relink their applications to a "user space" MPI library to improve performance. Some MPI libraries (e.g.<a href="http://software.intel.com/en-us/articles/intel-mpi-library/">Intel MPI</a> and <a href="http://www.open-mpi.org/">Open MPI</a>) allow run-time selection of the actual interconnect and thus avoid relinking/recompiling of codes.</p>
<h3>What about Ethernet?</h3>
<p>In the past, almost all user space implementations were done for high speed (i.e. expensive) networks. Users of Ethernet were confined to using kernel based (TCP/IP) MPI implementations. There are now three Linux projects that bring user-space communications to Ethernet. The first and oldest, is the <a href="http://www.disi.unige.it/project/gamma/">Genoa Active Message MAchine</a> or GAMMA. GAMMA is famous for achieving less than 10 μsecond latencies over GigE. It does require a patch to the Ethernet driver and only supports certain Intel Ethernet chip-sets. Results have been impressive.</p>
<p>Another optimized communication protocol is <a href="http://software.intel.com/en-us/articles/intel-direct-ethernet-transport/">Intel® Direct Ethernet Transport</a> (DET) which works by providing a uDAPL like InfiniBand interface over GigE. uDAPL is the User Direct Access Programming Library that defines a single set of user APIs for all RDMA-capable transports. DET includes a kernel module and a uDAPL library for Ethernet and will work on almost any Ethernet NIC. It can linked with any software requiring uDAPL library.</p>
<p>A newer and popular effort is the <a href="http://open-mx.gforge.inria.fr/">Open-MX</a> project. Open-MX is based on the Myrinet MX protocol. Essentially, any software that links to the Myricom MX library should be able to link with Open-MX. Currently, Open MPI, MPICH2, and the PVFS2 file system have all been shown to work with Open-MX. While Open-MX will work with almost all GigE and 10-GigE chip-sets without modifying drivers, it does require kernel 2.6.15 or higher to work. Depending on the chip-set Open-MX latencies as low as 10 μseconds for GigE have been reported.</p>
<p>In terms of 10 Gigabit Etherenet, the processor overhead to keep the pipe full and manage TCP/IP communications has become quite excessive. In order to offload work from the processor the <a href="http://en.wikipedia.org/wiki/IWARP">iWARP</a> protocol used. iWARP enabled hardware allows TCP/IP-based Ethernet to address the three major sources of networking overhead -- transport (TCP/IP) processing, intermediate buffer copies, and application context switch overhead.</p>
<p>From the users perspective, user-space protocols are hidden under the MPI layer. Thus, there is almost no programming price to pay for better performance. If your cluster has InfiniBand or Myrinet, chances are you are already running in user-space, but it is always good to check. Ask your system administrator or consult your cluster documentation. And, stay out of the kernel!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/08/why-is-infiniband-and-other-interconnects-so-fast/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why I like Intel® Cluster Ready</title>
		<link>http://www.clusterconnection.com/2009/08/why-i-like-intel-cluster-ready/</link>
		<comments>http://www.clusterconnection.com/2009/08/why-i-like-intel-cluster-ready/#comments</comments>
		<pubDate>Thu, 20 Aug 2009 19:50:20 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[ICR]]></category>
		<category><![CDATA[ISV]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/2009/08/why-i-like-intel-cluster-ready/</guid>
		<description><![CDATA[No, I'm not a marketing droid, although I could play one on TV I know what you are thinking. Here comes the Intel Cluster Ready (ICR) pitch, blah, blah, blah. First, this is not a product pitch, because ICR is not a product. Second, I might prefer it was "Production Cluster Ready" or something similar [...]]]></description>
			<content:encoded><![CDATA[<p><em>No, I'm not a marketing droid, although I could play one on TV</em></p>
<p>I know what you are thinking. Here comes the <a href="http://software.intel.com/en-us/cluster-ready/">Intel Cluster Ready</a> (ICR) pitch, blah, blah, blah. First, this is not a product pitch, because ICR is not a product. Second, I might prefer it was "Production Cluster Ready" or something similar that did not have a vendor moniker on it, but, hey, Intel put a lot of work into this framework (and continues to do so). They deserve the recognition. Finally, the real reason I am writing about it is because I think it is a really good idea.</p>
<p>Allow me to elaborate on the "good idea" part. Suppose you are an ISV (Independent Software Vendor) and a client comes to you and says, "Hey, I got one of those cluster things, I want to buy your software for it." At first, you think, maybe we should wait and see if there are more customers who need this before we spend a lot of time creating a cluster version. Then more customers come to you with the same request. You now see a market opportunity because these "cluster things" cost less than big-iron supercomputers, so you figure you can sell more software.</p>
<p>But, there is a problem. Each cluster seems to be a little different. They all have the same basic stuff, but there are enough details for the devils to make your life miserable. There are issues, which MPI and interconnect, compiler, libraries, kernel, Linux distribution, etc. In addition, you found that many times you ended up solving "cluster problems" for your customers and you don't want to be in the cluster support business. As an ISV, you may start to wonder if these "cluster things" will ever work out. Other ISVs see what you are going through and decide it is too much of a mess to jump into. The last thing you want is a customer that thinks your software is broken because it won't run on their handcrafted  "cluster thing."</p>
<p>Now suppose a vendor who had a vested interest in HPC said, "Let's put together a framework (not a strict specification) that allows ISVs to run codes on clusters with minimal hassle." And, that my fellow HPC mavens is why I think Intel ICR is a really good idea. So do these <a href="http://software.intel.com/en-us/articles/intel-cluster-ready-participating-vendors/#systems">ISVs</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/08/why-i-like-intel-cluster-ready/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Increase Your Top Mojo</title>
		<link>http://www.clusterconnection.com/2009/08/increase-your-top-mojo/</link>
		<comments>http://www.clusterconnection.com/2009/08/increase-your-top-mojo/#comments</comments>
		<pubDate>Thu, 20 Aug 2009 19:49:11 +0000</pubDate>
		<dc:creator>Douglas Eadline</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[multi-core]]></category>
		<category><![CDATA[top]]></category>

		<guid isPermaLink="false">http://www.clusterconnection.com/2009/08/increase-your-top-mojo/</guid>
		<description><![CDATA[The top monitoring tool can be used in very simple ways to generate useful information Top is one of the most used and feature full Linux monitoring utilities. Many users, however, are not aware of the many useful ways top can be employed. On the contrary, many users simple run the top command and watch [...]]]></description>
			<content:encoded><![CDATA[<p><em>The top monitoring tool can be used in very simple ways to generate useful information</em></p>
<p>Top is one of the most used and feature full Linux monitoring utilities. Many users, however, are not aware of the many useful ways top can be employed. On the contrary, many users simple run the <tt>top</tt> command and watch the display. To finish they usually enter <tt>q</tt> or CTRL-C when they are done. There is much more that top can do. An example of a top screen is shown below in <em>Figure One</em>.</p>
<div id="attachment_1440" class="wp-caption aligncenter" style="width: 495px"><img class="size-full wp-image-1440" title="top-default1" src="/wordpress/wp-content/uploads/2009/08/top-default1.png" alt="Figure One: Default top screen" width="485" height="274" /><p class="wp-caption-text">Figure One: Default top screen</p></div>
<p>The top display shows some general statistics for the machine then in the lower window shows the running processes. The default sorting of the process is by <tt>%CPU</tt>. You can also sort by memory or time usage by entering <tt>M</tt> and <tt>T</tt> respectively. You can move back to the <tt>%CPU</tt> view by entering <tt>P</tt>. Of course you can get further help by pressing <tt>h</tt> or consulting the man page.</p>
<p>The first thing most people don't know is you can vary the update delay using the <tt>-d</tt> argument. For instance, the standard delay between updates is 3 seconds. If you want to watch a remote system and not generate a huge amount of traffic or constant screen updating, you can enter a delay value. For instance;</p>
<pre>$ top -d 30</pre>
<p>will update every 30 seconds. If you need an update in the middle of the delay, just hit the space key. In a similar way, you can have top run once and record its output by entering the following;</p>
<pre>$ top -b -n 1 &gt;top.out</pre>
<p>The <tt>-b</tt> is for batch mode (no input) and the <tt>-n</tt> is for the number of iterations. This command might be useful for getting a quick "snapshot" of a remote node using ssh or rsh. You can also look at individual processes or users. For instance;</p>
<pre>$ top -p 21864</pre>
<p>will show you only process 21864 and</p>
<pre>$ top - u deadline</pre>
<p>will show only deadline's jobs. This type of command can be useful if for example you want to see what your job is doing on a particular node. For instance, if I wanted to know what my jobs were doing on node024, then I could enter;</p>
<pre>$ ssh deadline@node024 top -b -n 1 -u deadline</pre>
<p>The resulting output would be something like:</p>
<pre>top - 12:51:57 up 47 days, 14:30,  0 users,  load average: 2.07, 2.02, 2.00
Tasks:  66 total,   3 running,  63 sleeping,   0 stopped,   0 zombie
Cpu(s): 76.5% us,  0.7% sy,  0.0% ni, 22.6% id,  0.0% wa,  0.1% hi,  0.1% si
Mem:   2056056k total,   538728k used,  1517328k free,    40644k buffers
Swap:  1959920k total,        0k used,  1959920k free,   369464k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6561 deadline  18   0 44396 4516  800 R   97  0.2  25970:31 mds
 6562 deadline  25   0 44396 4496  800 R   95  0.2  25970:27 mds
 6556 deadline  16   0  5324 1040  884 S    0  0.1   0:00.00 bash
 6558 deadline  17   0  5324 1040  884 S    0  0.1   0:00.00 bash</pre>
<p>There are some other simple top commands that can be useful. Since almost all processors are multi-core these days, it is helpful to see all the cores in the display. For instance, in <em>Figure One</em> there are four processes running on a quad-core processor. If you want to see the individual cores, press <tt>1</tt>. As an example, the display in <em>Figure Two</em> is the same as <em>Figure One</em>, but now showing the cores.</p>
<div id="attachment_1427" class="wp-caption aligncenter" style="width: 495px"><img class="size-full wp-image-1427" title="top-show-cores" src="/wordpress/wp-content/uploads/2009/08/top-show-cores.png" alt="Figure Two: Top showing all cores" width="485" height="274" /><p class="wp-caption-text">Figure Two: Top showing all cores</p></div>
<p>Another piece of information users often want to know is are my processes moving from core to core. You can monitor this with top by entering <tt>f</tt> to change the display fields. This panel allows you to toggle the possible display fields that top uses. In this case, we will turn on the "Last used cpu" field which indicates the last core on which the process was running. <em>Figure Three</em> is an example of this display for the same processes running in <em>Figures Two</em> above.</p>
<div id="attachment_1428" class="wp-caption aligncenter" style="width: 495px"><img class="size-full wp-image-1428" title="top-show-cores-last-proc" src="/wordpress/wp-content/uploads/2009/08/top-show-cores-last-proc.png" alt="Figure Three: Top with &quot;last cpu&quot; column &quot;P&quot;" width="485" height="274" /><p class="wp-caption-text">Figure Three: Top with &quot;last cpu&quot; column &quot;P&quot;</p></div>
<p>As you can imagine there are many more things top can display. We cannot go into all the details here. Consulting the man page will get you started on more top options. Just remember, you probably just learned enough here to impress your friends with some fancy <em>top mojo</em>!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clusterconnection.com/2009/08/increase-your-top-mojo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

