When client/server goes critical, vendors cluster together
To rival and beat mainframe reliability, Unix vendors push clustering for high availability, scalability, and performance
Lately, Unix hardware vendors can't stop talking about clusters. And now, even Microsoft is getting into the act. Earlier this month, Sun expanded clustering capabilities to its line of Ultra Enterprise servers, based on the 64-bit UltraSPARC CPUs. We tell you why users are favoring clusters and catch you up on rival offerings from Hewlett-Packard, IBM, and DEC, and on Microsoft's yet-to-be shipped Wolfpack clustering technology. (2,000 words)
The same principal is at work in the Unix server market today with clustering software. Clustering software enables two or more servers -- typically symmetric multiprocessing (SMP) system with multiple CPUs -- to work together as a single logical system, even though they are physically separate. A high-speed interconnect bus allow instantaneous data transfers between systems.
Because multiple SMP systems can be tied together into a single computing complex, clusters provide users scalability and performance beyond the range a commercial system vendor typically offers. But that's not why most users go the clustering route.
The first reason for clusters is for high availability, said Jean Bozman, Unix research manager for International Data Corp.'s (IDC) Unix and Advanced Operating Environments Service (Palo Alto, CA).
Vendors who push clustering for performance and scalability reasons may, in fact, be trying to make up for lower performance in their regular system configurations. The real benefit is that by enabling two or more servers to operate in clusters, Unix server vendors are providing their users the type of uptime and service levels once associated only with mainframes. And Unix vendors aren't alone; in its pursuit of capturing a greater share of enterprise client/server applications, Microsoft Corp. and some of its hardware OEMs are also moving forward with a clustering solution for the Windows NT Server operating system.
Both Unix and NT server vendors are chasing the same target: Mission-critical client/server applications that can't afford major service disruptions. "There's a demand for mission-critical client/server systems," said John Mann, vice president, client/server, at The Yankee Group Inc., a Cambridge, MA market research and consulting firm. "But client/server systems with 78 percent availability is nothing like a mainframe that runs at 99.9 percent availability. For client/server to go mission-critical, it's got to have more robustness. Clusters get you part of the way there."
For the time being, Unix server vendors have an edge against NT on the clustering front, as no Microsoft-developed products are yet shipping. Although Unix systems have become increasingly more reliable over the past few years, as a general rule they still don't match the availability and reliability of IBM's MVS/390 systems. With clustering, however, Unix vendors can meet or beat mainframe reliability.
For instance, CNA Insurance Inc.'s Personal Lines Division in Reading, PA, pulled the plug on its mainframe system and installed a pair of Hewlett-Packard HP 9000 Unix-based servers running MultipleComputer/ServiceGuard (MC/ServiceGuard), HP's clustering software. The mainframe used to take two hours or more to reboot after a system fault, said Monty Mohanty, CNA's director of business technology. With MC/ServiceGuard, the Unix-based HP 9000 systems resumed operations in minutes.
High availability (HA) through clustering shouldn't be confused with fault tolerance. There's a subtle but importance difference between the two. IDC's Bozman explains: "High availability means that the system detects a problem, interrupts service for, say, 45 seconds or so while it rolls over the workload to another processor, and continues. There is an interruption in service. With a fault-tolerant system, there's no interruption. Absolutely none."
But users pay a stiff price for zero interruption of services that fault tolerant systems, such as those that Tandem Computer Inc. and Stratus Computer Inc., provide. Many users don't want to pay extra for applications that don't absolutely require 100-percent uptime.
Jeff DeHaven, director of information services for @Home, said the start-up cable-based Internet Service Provider (ISP) needed a highly-available server platform as the back-end of its new Internet and Web services. The company, headquartered in Mountain View, CA, will launch its high-speed Internet access services in Hartford, CT, Chicago, and Orange Co., CA next month, then expand nationwide over the next six to nine months.
DeHaven said he ruled out traditional fault-tolerant systems in favor of Unix clusters for several reasons. First, fault-tolerant systems carry premium price tags; exactly how much more depends on the application and configuration. Secondly, many fault-tolerant architectures use a "hot standby" approach, meaning a fully duplicate system sits idle and doesn't do any real work until the primary system fails. That accounts, in part, for their higher cost, because in a cluster, all systems are doing useful work. Third, traditional fault-tolerant systems are proprietary, so vendors have users locked in. Finally, fault-tolerant vendors don't have a history with the Internet, so their offerings of Web software, development tools, Java, and the like are slim.
DeHaven said @Home, a joint-venture of TCI, Cox Cable, and other cable TV companies, wanted a Unix-based, open systems platform for which development tools and equipment for Internet access would be easily available. After a six-month evaluation period, the company chose two Sun Microsystems Ultra Enterprise 4000 Servers and Sun's new Solstice-HA software. Each 4000 has four 167-MHz UltraSPARC CPUs.
Sun, which formally introduced new clustering products on October 8, has offered SPARCcluster-HA and SPARCcluster-PDB (for Parallel Database) for its line of SuperSPARC-based SPARCserver 1000 and SPARCcenter 2000 enterprise servers for more than a year. Those products provide HA features, including fault detection and rollover for pairs of like Sun SPARC-based systems only.
Sun has now expanded clustering capabilities to its line of Ultra Enterprise servers, based on the 64-bit UltraSPARC CPUs. These new "Smart Clusters" products, including Solstice-HA and Ultra Enterprise-PDB, are turnkey HA solutions for hardware and software failures. Solstice-HA supports highly-available computing, NFS, and Web services, while PDB supports specific parallel databases, including Oracle Parallel Server, Informix OnLine XPS, and Sybase MPP.
Mark Nagaitis, product line manager for Sun's server group, said the company laid out a road map of clustering enhancements through 1998. The first products, Solstice-HA and Ultra Enterprise-PDB, will be available for pairs of systems next month. For the first time, Sun permits clusters based on pairing different systems, including 1000s, 2000, and any member of the Ultra Enterprise server family.
In the first quarter of next year, Sun will expand support for up to four-way and eight-way clusters, and in early 1998, 16-way clusters. Since a single top-of-the-line Ultra Enterprise 6000 Server can support up to 30 CPUs, Sun's clusters will eventually support configurations up into the hundreds of CPUs, a space now occupied only by massively parallel processing (MPP) systems.
Sun hopes to distinguish itself from its competitors in the HA/cluster market by marketing its HA clusters as packaged solutions. "Traditionally, our competitors have supplied users with a platform and a set of APIs to which they have to write scripts and do their own customization," Naguitis said. "Sun's `Smart Cluster' solutions package Sun-developed and Sun-supported scripts that do the fault probes. There's no customization involved."
Sun is something of a laggard in this market, analysts say. Although its SPARCcluster products have been selling for over a year, it's still not a market or technology leader. "Sun is coming from behind in this market," said Bill Moran, senior analyst at D.H. Brown Associates Inc., Port Chester, NY.
For instance, HP, IBM, and Digital Equipment Corp. (DEC) already support larger clusters than Sun. HP's MC/ServiceGuard works with its line of HP 9000 HP-UX-based Unix business servers. This software facility detects errors or failures on one node in a cluster and automatically transfers the workload or applications to another active node in the same cluster. MC/ServiceGuard supports clusters of up to eight HP 9000s, which can be a mixture of uniprocessor and SMP systems. HP 9000 SMP systems support up to four CPUs.
IBM's clustering technology has its roots in that vendor's legacy days, said Dave Tarek, product manager for IBM's RS/6000 Scalable Processor (SP) systems. It currently offers the System/390 Parallel Sysplex for mainframe-based HA applications.
For the Unix-based RS/6000 line, IBM's clustering software, called High Availability Clustering Multiprocessor (HACMP), provides error and fault detection for system software, hardware, and applications. HACMP currently supports up to eight nodes, each of which can be an SMP system with up to four CPUs. Tarek said IBM will deliver larger clusters over the next year, including 16-way and larger configurations. "We already have 128-way clusters working in the lab with no problem," he said.
Under the code-name "Phoenix," IBM is also developing cross-platform clustering technology that will enable developers to build highly available mission-critical applications using clusters based on IBM OS/2, Windows NT, AIX, or other Unix operating systems.
DEC (Maynard, MA) has one of the longest track records with clustering technology, dating back to its VAXcluster software for its VAX line of commercial minicomputers. Earlier this year, DEC moved clustering capabilities to its line of RISC-based Unix servers, the AlphaServer family, with its TruCluster software. TruCluster provides highly-available, scalable database and compute services, recovery and fail over, and highly-available Web services. Up to eight nodes can be linked together in a TruCluster environment, with each node being an SMP system that supports up to four CPUs each.
Even smaller Unix server vendors are getting into the clustering act to offer their customers HA solutions. Data General (Westboro, MA) offers SMP servers built on Intel CPUs that run either its DG-UX Unix operating systems or Microsoft NT Server. In both cases, DG resells clustering software from Veritas Software, a Mountain View, CA software company best known for its Unix filesystem and volume management packages. Veritas' VxServerSuite is one of the few cross-platform HA software suites that support both Unix and NT servers. The suite provides highly-available compute, database, Web, and NFS services.
Microsoft itself is developing clustering technology for NT under the code-name of "Wolfpack." A number of PC server vendors have signed on, including Compaq, DEC, HP, Intel, NCR, and Tandem. No Microsoft NT clustering products are currently shipping, although the company has demonstrated a pair of NT SMP servers using the "Phase 1" Wolfpack clustering technology. The base-level provides error detection and quick rollover for pairs of like NT servers. It is slated to ship to OEMs in the fourth quarter of this year. Beyond two-node clusters, users will have to wait or look elsewhere, such as to Veritas' offerings. Microsoft has not committed to ship larger than two-node clusters until sometime in 1998, and then, only to beta.
In the meantime, some OEMs in the Wolfpack team are coming out with proprietary NT clusters of their own, said Donna Scott, research director, System and Network Management Service, Gartner Group (Stamford, CT). These include NCR's LifeKeeper and other offerings.
Scott anticipates that when the first Wolfpack-based NT cluster appear next year, some vendors and users will begin to migrate. However, those NT users who require clusters larger than two nodes will stick with vendor-proprietary solutions until sometime in 1999, when the first Wolpack clusters larger than two nodes are currently scheduled to appear.
Although NT and Unix clusters will compete, each targets different market segments, Scott said. "Unix clusters predominate at the high end," she said. "Between 40 and 50 percent of all high-end Unix systems are shipped in clusters, and are used for large database applications, or to run enterprise applications such as SAP. NT won't have comparable features to what Unix clusters offer today until 1999."
NT clusters, both those available today and in the future, are aimed at "low-end, and more mainstream" applications, Scott said, such as departmental applications or retail point-of-sale (POS).
Whatever and whenever it does deliver on Wolfpack, Microsoft could have a strong impact on the HA cluster market, if only because of the emerging strength of NT as an enterprise server platform. "Microsoft could have a huge advantage if they had (HA clusters) today," said D.H. Brown's Moran.
In the meantime, Microsoft's absence leave the mission-critical client/server arena wide open for Unix servers. So chalk up HA clusters as one more advantage of Unix servers over NT in the enterprise.
About the author
Philip J. Gill (firstname.lastname@example.org) is a free-lance writer and editor who specializes in Unix, open systems, and Internet technologies. Reach Philip at email@example.com.
If you have technical problems with this magazine, contact firstname.lastname@example.org