What exactly is a cluster, anyway?

Every administrator has a handy definition for what a cluster is -- the problem is that no two definitions are alike. We cut through the confusion with a look at the types of clusters available, and the rationale behind choosing to cluster

August 1998

Abstract

Clustering is the hallmark of enterprise computing environments. This month, Rawn launches a series of articles on clustering technology for several popular computing environments. In this first installment, he describes the reasons for clustering, and the types of clustering solutions available. (2,300 words)

Mail this
article to
a friend

lustering brings the promise of enterprise-safe computing to your network. It builds a network of computers that allows greater system management, availability, and growth opportunities. In the coming months, we'll take a romp through the world of clustering. We'll look into the various forms clusters assume in the enterprise computing environment -- from simple disk-sharing clusters to the earthquake-proof fault-tolerant clusters of massive enterprise servers; from a software mechanism to unite users and resources, to the automatic distribution of the processing load or the contents of an entire server to another system. This month, we begin with the various cluster categories and a general overview of their respective benefits.

A cluster -- also commonly known as a server farm -- is a group of computers working together to share resources or workload. That's the simplest and broadest definition. Honestly, there are all kinds of clusters, from disk-sharing clusters to fully-redundant fault-tolerant operating systems and hardware. Thus, the word cluster actually denotes a whole family of technologies under a common moniker.

Clustering is almost the hallmark of an enterprise server or operating system. Windows NT, for example, is still not considered up to par for an enterprise operating system due to its limited capabilities in clustering, fault-tolerance, and multiprocessing. Still, you can build lower forms of NT clusters for resource sharing that go way beyond single-machine servers.

Clustering isn't new. I once worked with a fault-tolerant computer from Stratus, a leading minicomputer vendor in the '80s that is still in existence today. This machine had two of everything -- CPUs, RAM, disks, built-in uninterruptible power supplies (UPSs), and even a separate monitoring system to keep track of the operations of the two processors. To Stratus's credit, the system came with a detailed service and system administration contract that was partially built into the hardware. The monitoring system kept track of the performance of all parts on the machine; if any part reported a risk of failure, it would dial out through an internal modem to an 800 number in the service department of Stratus. The system administrators would of course be informed, but sometimes the service personnel would come around with a new part for the machine before its administrators even knew about the problem. Now that's what service is all about in enterprise computing! Administrators coming up from the PC server side are consistently amazed when I tell that story. Heck, some are still amazed (and surprisingly doubtful) when I tell them that even a low-end Unix server in high use can run for hundreds of days without any downtime.

Digital Equipment Corp. (DEC) was one of the pioneers in creating clusters in its virtual memory system (VMS) products. The race at the time was to build a group of VAX minicomputers that would work effectively for a lot of users. DEC was trying to keep up in processing power with its mainframe competition -- at a much more attractive price, of course.

In fact, the matter of price was probably one of the primary reasons clusters were created. Rather than build massive, expensive computers with tightly integrated hardware and software components, the theory was to connect several smaller computers into a tight software environment to accomplish the same end. Since then, clusters have evolved to replace even the high end in networked enterprise computing platforms.

Why cluster at all?
Clustering can enhance system and network administration, user and application availability, and programming. In general a cluster of computers has the capacity to provide a high degree of availability. With a cluster of two or more servers, you should be able to take one of your machines down for maintenance without greatly impacting user login or use of the system. The percentage of availability that results from clustering solutions is significant. For example, 99 percent availability means that your cluster can run 24 hours a day, seven days a week, 52 weeks a year with a possible total downtime of approximately five days, either in planned or unplanned maintenance. For most companies this is entirely satisfactory. But some business-critical systems (such as those used by banks, stock exchanges, some healthcare and medical applications, military guidance and monitoring, or air traffic control) require a higher degree of availability. In these cases, 99.999 percent availability offers a downtime of approximately three minutes a year. This level of high availability is based on a combination of system hardware, the operating system, the networking system, and direct system administration.

For the network administrator, a cluster of machines provides a centralization of resources and, often, sysadmin tools, saving on time and work needed to manage resources. You can address resources or machines individually or in groups, depending upon the quality of the software administration tools available. Although you're still responsible for multiple machines, clustering makes it easier to fix or bypass problems, often without seriously affecting the overall performance of the cluster. If they're willing to give a little in performance, users benefit from not stressing out when the server goes down or becomes unavailable.

Security is higher in a cluster than across several independent servers. Since a cluster appears as a single machine, there is one environment to monitor and manage. The various users and storage systems are located in the same cluster environment, making it easier to track down problems.

It's also fairly easy to add capacity or processing power to a cluster, by adding another machine or node. Of course, most clusters have their limits, but adding a new node doesn't usually require you to completely reconfigure the environment or the new machine. With some clustering software, all you have to do is plug the new machine into the cluster, and the operating system and software environment are automatically installed onto the new node.

For programmers, a cluster can mean more coding. Parallel computing requires a different programming style than the sequential programming used by most programmers. On the other hand, a parallel computer can run multiple calculations at the same time on several processors, often reducing the runtime for an application from days and hours to minutes and seconds. And, of course, having an application on a redundant system almost guarantees that it will continue running even if a minor earthquake hits the building or the power systems go down.

Advertisements

Types of clusters
Practically any system with two or more similar components, or any system that shares one resource or another in concert, has been labeled a cluster. Let's eliminate some of the more obvious marketing-speak and secondary categories right now.

First, a single, central server exporting a drive to several client computers is not a cluster. Although the clients are sharing a resource (the drive space), a cluster is specific to the server environment. Correspondingly, if you have two or more servers connected together sharing drives to these clients, you can consider it a cluster.

A multiprocessor server is also not a cluster. In fact, it's the very opposite of a cluster. Rather than connecting several independent machines together, it builds a single system (with almost equivalent power to the cluster system) using multiple processors.

A network of servers, which you have to manually administrate on a regular basis to share resources, can be called a cluster only with great difficulty. A cluster usually includes some form of software or hardware integration that automatically handles sharing. Though you may have to configure the cluster during installation and make changes to it over time, you shouldn't have to actively do this as you would manual tape backups.

A system in which user information is distributed across the network isn't really a cluster either. That's more of a namespace, distributed or otherwise. It only becomes a cluster if the operating system uses this information to automatically assign resources to the user. For example, the Network Information Services (NIS) password system by itself doesn't work as a clustering system. However, if more than one server is involved, the NIS password system combined with the automounter can be considered a cluster.

Finally, distributed computing environments like the recently announced Jini and JavaSpaces system from Sun go beyond clustering. In a true distributed computing environment, there is little or no distinction between the member machines in the system. Applications run and automatically move around machines as handled by the distributed operating system services. These systems are on a higher plane of existence than clustering. We won't go into the distributed space in this series, but we will include a variant that can be considered a type of cluster.

You can see why it's a bit difficult to describe a cluster. It's one of those conceptual terms that means different things to different people. An experienced system administrator can tell you what it is, but his or her definition will probably differ from the definition offered by another administrator working in a different computing environment. Which puts us back where we started, with the original question: What is a cluster?

Clusters fall into several categories, but the following should give you a general idea of those available:

A group of servers that balance the processing load or user load by using a central server or router that assigns the load to different servers.: This can be achieved in one of several ways: an initial or central server determines the load of other servers and sends the new request to the least loaded server; a router or cluster manager assigns the new load to secondary servers on a pseudo-random or fixed algorithm or based on network address; a central server assigns a request to a particular server based on the request type or user preference information.
A group of servers that act as a central system, coalescing individual resources, in whole or in part, for use by clients.: This is done in one of two ways. Either the individual resources are ordered in some structure as a single virtual resource -- for example, Network File System (NFS). Alternatively, the individual resources are pooled in no particular order and assigned jobs as they become available (for example, a printer pool or a modem pool).
A group of servers that execute the exact same application at the same time in parallel across the servers.: This is done primarily in fault-tolerant, redundant, or replicated systems to make sure that exact or correct functions are executed as required.
A group of servers that execute parts of the same application across the servers to make computing faster.: This is parallel or distributed computing in its pure form and is so much more advanced than simple clustering that it is almost beyond it. The difference between true distributed computing environments and cluster-based distributed computing lies in how the distributed servers are interfaced together. If the servers are in a completely seamless environment where individual node identity isn't an issue for the programmer or administrator, you have a true distributed-computing environment. A series of machines that have a distributed space but that also have to be managed or identified individually would be called a multicomputer: a type of cluster. A parallel database partitioned across several machines is usually considered a cluster.

Even within a category, the actual implementation can vary in degree of complexity. For example, it's simple to build a cluster of machines to redirect Web requests among a group or farm of Web servers simply by using a different IP address for each request -- a common practice known as round-robin DNS. On the other extreme, Citrix Winframe and MetaFrame for multiuser NT systems allow you to carefully plan how to balance the processing load -- based on user load, application load, system load, CPU utilization, and disk access utilization -- across a cluster of servers.

Moving on
Rather than making do with a short list of the different types of implementations, we'll go through several of these types in more focused articles in the coming months. The articles in this series will span Unix, VMS, and Windows environments, and we'll try to follow interoperation between the respective clusters. We'll also cover directory software systems, management software tools, hardware systems, networking methodologies, and service contracts.

In the meantime, we'd like your comments on particular products or clustering technology that you use in your environment or want to use, and why you think others should consider using them too.

Resources

Sun has a mechanism for clustering its line of enterprise-level servers http://www.sun.com/clusters
Hewlett-Packard is a leader in clustering technology for both its HP-UX line and its NT servers http://www.hp.com/netserver/products/cluster/index.htm
Whitepaper discussing how Digital's OpenVMS clustering works http://www.openvms.digital.com/openvms/whitepapers/ci_connect/ciconfig_webpage_contents.html
Compaq's other acquisition, Tandem, is also a leader in clustering in its NonStop server line and ServerNet technology http://www.servernet.com/
Microsoft's pages dedicated to discussing clustering technology for Windows NT and its upcoming MS Cluster Server http://www.microsoft.com/ntserverenterprise/deployment/faq/clustering/default.asp
Full listing of previous Connectivity columns in SunWorld including columns on DSL http://www.sunworld.com/common/swol-backissues-columns.html#connectivity

About the author
Rawn Shah is chief analyst for Razor Research Group, covering WAN and MAN networking technology and network-centric computing. He has expertise in a wide range of technologies including ATM, DSL, PC-to-Unix connectivity, PC network programming, Unix software development, and systems integration. He helped found NC World magazine in December 1996, and has led the charge to the deployment of network-centric computing in the corporate world. Reach Edgar at rawn.shah@sunworld.com.

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-08-1998/swol-08-connectivity.html
Last modified:

Comments:
Name:
Email:
Company Name: