Click on our Sponsors to help Support SunWorld

RAID basics, Part 3: Understanding the implementation of hardware- and software-based solutions

What are the benefits and drawbacks of each?

By Chuck Musciano

August  1999
[Next story]
[Table of Contents]
Subscribe to SunWorld, it's free!

In this continuing series on RAID storage, Chuck moves beyond the various types of RAID and looks at the physical systems that actually implement these storage systems in the real world: conventional disks, software-based RAID, and high-speed RAID controller hardware. (2,500 words)

Mail this
article to
a friend
For the past two months, we've explored the most common and popular methods of RAID disk storage. These methods use redundant arrays of inexpensive disks to create large, virtual storage devices that offer differing amounts of reliability and performance. With the advent of massive databases and online storage needs measured in terabytes, RAID has become the standard way to create and manage the huge disk farms required to store all that data.

RAID comes in many flavors; in this series, we've covered the most common:

This version of RAID storage concatenates multiple physical disks together to create a single virtual drive. Data may be striped across the disks to improve performance. This form of RAID is great for managing large virtual storage areas with some performance improvement, but offers no additional reliability.

RAID 1 is another name for mirroring, or creating a realtime copy of every physical device. Data written to the mirrored pair is copied to both physical devices. If one of the devices fails, the other picks up the slack so that the host system is never aware that a drive has gone bad. In its pure form, RAID 1 provides the ultimate in redundancy, but offers no performance improvements or virtual device management.

RAID 0+1
Combining RAID 0 and RAID 1 yields the best of both worlds: complete mirroring for redundancy with striping and concatenation for large volume management and performance improvement. The only down side is cost, since RAID 0+1 (like RAID 1) needs two bytes of raw storage for every byte of data stored.

RAID 3 reduces the cost of redundant storage by using parity instead of mirroring to protect against disk failure. Drives are grouped into sets; one drive is designated the parity drive for the set and contains parity data computed from the other drives. If any single drive fails, the data stored on it can be recreated from the parity data. Costs are reduced, since fewer drives are needed to implement redundant storage, but performance is hindered by the parity computation and the bottlenecks caused by a single dedicated parity drive.

RAID 5 is a refinement of RAID 3 that uses the concept of parity but distributes the parity data across all the drives in the RAID set. Bottlenecks induced by the parity drive are eliminated and costs are still reduced, but performance can still be a problem.
In exploring these different RAID models, we gave little consideration to the physical implementation of the storage subsystem. Now that we understand how these different RAID types work in theory, we need to drill down one layer into our storage subsystem and see how to implement RAID in the real world.

One way to create a large storage subsystem is to not use RAID at all. Such systems often go by the moniker of "just a bunch of disks," or JBOD.

JBOD is the way we used to do things back in the olden days, when normal systems maintenance included loading coal into the machine hopper and adjusting the processor drive belts. To create a JBOD storage subsystem, you take a bunch of disks and attach them to your system, usually via one or more SCSI controllers. Individual drives may have some smarts, including an on-drive cache, but there is little other hardware support for your storage subsystem.

Drive management in a JBOD environment occurs at the system level. You use conventional tools to format, partition, and mount the physical drives. Users then take advantage of the file systems you've mounted. For any sort of realistic performance management, users, especially database administrators, will need to know exactly where every file system is created and mounted, so that they don't create applications that saturate one drive or SCSI controller.

For large installations, JBOD is simply not an option. If you work in a small shop on a shoestring budget, however, JBOD can help meet some of your disk storage needs. In these cases, you'll be relying on your applications and databases to provide redundancy and recovery tools, along with robust backups to help you through serious drive failures.

The one advantage of a JBOD installation is that it forces you to become intimately familiar with your drives, controllers, and disk performance. Once you have fully grasped the inner workings of disk storage, a later transition to a more sophisticated RAID subsystem will be much easier, since you will understand how to balance loads across multiple controllers, detect disk hot spots, and recognize application usage patterns among your disk drives. In addition, a JBOD architecture is the basis on which software RAID solutions are constructed. Learning how to make JBOD work effectively will help you make a software-based RAID configuration work better as well.


Software RAID
With that in mind, let's address software-based RAID. This variety of RAID implementation begins with a large number of disk drives (a JBOD solution) and adds system-level software that implements one or more flavors of RAID atop those drives. This system software is usually in the form of a kernel-level device driver that creates virtual storage devices from the physical drives, coupled with some sort of management console used to configure and administer the virtual storage devices. The software may be bundled as part of the operating system (like Sun's Online DiskSuite) or a third-party product (like the Veritas Volume Manager).

Let's look at the latter product as an example. Veritas Volume Manager begins by discovering and presenting to you every physical device it finds on your system. You then encapsulate those devices, moving them under the complete control of Veritas. From this point forward, you will not be using the devices in a traditional fashion; instead, you use Veritas to configure and present the devices to you in a controlled manner.

Once encapsulated, you can break up each physical device into one or more subdisks -- a contiguous area of storage on the device. These subdisks are then assembled into larger units of storage called a plex. One or more plexes, in turn, are combined to create a volume, which is then presented to Unix as a disk device. As far as Unix is concerned, a Veritas volume is a slice on a physical disk drive, ready to have a file system placed upon it and mounted for use.

In the simplest case, a physical device can be converted to a single subdisk, which is used to create a plex, which becomes a volume that supports a file system. You've done little here except to add complexity and overhead. The benefits of Veritas Volume Manager become evident when you combine multiple subdisks to create a plex that supports some flavor of RAID.

For example, you might create subdisks on five separate physical devices and combine them into a single plex. Veritas can then use that plex in a RAID 0 mode, striping data across the five subdisks, or in a RAID 5 mode, distributing data and parity across the subdisks. RAID 5 plexes can only be turned into volumes, creating a file system that can tolerate the loss of one of the five drives from which its subdisks are derived. A RAID 0 plex can be associated with another RAID 0 plex of the same size to create a pair of mirrored plexes. This pair can be turned into a volume that has full mirrored redundancy.

Veritas offers other features, such as dirty region logs that minimize RAID synchronization time, and various performance monitoring options that let you identify and eliminate bottlenecks in your virtual devices. It is important to remember, however, that everything Veritas does is done in software, using host CPU cycles to compute parity, mirror data, and recover from drive failures. This overhead is usually small, typically just a few percent of the overall system usage. Still, it represents cycles that could be put to better use running applications instead of managing storage. It also means that some additional disk I/O is occurring so that Veritas can inquire and update device status as it manages your volumes, adding to the bandwidth on your disk channels. Again, the overhead is small but measurable.

The biggest concern with software RAID is that almost all of the configuration burden is left with the system administrator. A reliable RAID volume is not assembled by slapping together just any set of subdisks; you need to carefully select and combine subdisks to ensure that you are not saturating controllers or inadvertently creating single points of failure in your disks subsystem. For example, any two plexes can be associated to create a mirrored volume, but if the subdisks in the first plex are from the same physical device as those in the other, the loss of that device will ruin both plexes and the mirrored volume will fail.

The administration of software RAID systems requires discipline and an eye for detail. For large configurations, good naming conventions and solid documentation are invaluable. Standards help: I've administered Veritas Volume Manager managing 450 drives in one disk farm, and the only way to stay sane was to carve up disks and create volumes using standard-sized chunks of memory.

Software RAID is less expensive than a hardware solution, but you'll make up the difference in administrative costs. Still, it can be a good solution, especially for small installations. If you are deploying a server with, say, six drives, which you intend to mirror into three logical volumes, software RAID is a perfect way to create that configuration.

Hardware RAID
Hardware RAID systems take all the processing that products like Veritas Volume Manager do using the host processor and move it into dedicated processors separate from the server. Typically, a hardware RAID device has a large number of physical drives in it, along with a processor, a lot of firmware, and some amount of cache memory. The device is then attached to the host machine via SCSI or fibre channel technology. The host system sees a number of devices on the attached RAID system that it treats as physical disk drives. These devices are actually logical devices created from the physical drives and managed by the internal processor in the RAID system.

Many vendors market hardware RAID solutions, from low-end devices from companies like CLARiiON and Box Hill to high-end systems from EMC and Sun. While pricing and architecture differ dramatically between vendors, the basic architectural components -- drives, processor, and cache -- are the same in all devices.

When looking for a hardware RAID system, it is important to understand the internal architecture that unites the disks and processor. There is nothing magic inside a RAID system, and you'll most likely find either a number of SCSI chains sporting from five to seven drives each, or a direct-to-disk fibre channel that attaches the drives in one large bus. The processor accepts I/O requests from the server via the appropriate SCSI or fibre channel connection, maps the request to one or more drives in the system, collects the results from the drives, and passes the results back to the host system.

In most high-performance RAID systems, you'll find a generous amount of cache memory. Writes are placed into the cache and an acknowledgement is sent back to the server as quickly as possible; the data is copied from the cache to the drives at some later point in time. Reads are staged into the cache as they are delivered to the server, and data may be read ahead in the hopes that future reads can be satisfied directly from cache. The more cache in the system, the greater the chance that all I/O operations can be initially handled from cache, resulting in maximum throughput from the server's perspective.

Cache introduces its own risks: a power failure at the wrong moment will corrupt data held in cache that has not yet been written to disk. Good systems have battery backup on their cache to avoid this problem, and often mirror the cache to avoid hardware memory failure.

All hardware RAID systems have some sort of configuration tool that lets you create volumes within the unit. Almost all support RAID 0, 1, 0+1, and 5, with varying levels of control regarding how internal drives are mapped into different volumes. In general, hardware systems offer less flexibility than software RAID solutions, but provide more error checking and configuration control to keep you from making configuration mistakes.

A final difference between hardware- and software-based systems is the ease with which you can remove and replace failed drives. Most hardware system have drives that are hot swappable, meaning that they can be spun down and removed while the system continues to run. This is a glaring problem with software-based RAID, which is generally implemented atop JBOD disk farms whose drives are simply plugged into a shared bus. Pulling a drive requires shutting down the entire bus, which usually in turn requires a brief system outage. If you cannot tolerate downtime to replace failed drives, you should consider only hardware-based RAID systems.

Obviously, hardware-based RAID is more expensive than software-based systems. If you have the budget to buy a hardware solution, however, then you ought to do it. The improved performance and generally higher reliability are well worth the investment.

Mixing and matching
It is also possible to have the best of both worlds by mixing hardware and software RAID systems. In this configuration, you use a hardware RAID system to present redundant devices to your host system, and then use a software solution on the system to further configure these devices for additional redundancy or performance.

For example, you might attach two hardware RAID devices to your system using independent fibre channel connections. Each RAID device contains some number of hot-swappable disk drives, with a heavily cached controller managing all the devices. You configure the RAID systems to represent a number of RAID 0 devices to the server, relying on the cache to make the I/O fast and the hot-swappable drives to make the system manageable.

On the server, you then use a software RAID solution to marry pairs of logical drives presented by the RAID systems to create mirrors. Pure mirrors do not require a lot of server overhead -- certainly not as much as, say, software-based RAID 5. What you wind up with is a RAID 0+1 configuration, with the 0 handled in hardware and the 1 handled in software.

Why go to all this trouble? What you've created is a mirrored high-performance disk system that can not only tolerate the loss of a drive at the hardware RAID level, but can survive the loss of an entire RAID unit at the server level. The need for this is not as far-fetched as it sounds: while the RAID unit is probably not going to burst into flames, it is entirely possible that someone could kick a cable out of the wall or that a power problem might take down one of the units. If you cabled the RAID units into different controllers on different boards in your backplane, you can even tolerate the loss of a controller board without taking your server down.

Such a solution is obviously expensive, but if uptime is critical to your installation, it may be well worth the cost. The lesson is that there is no one right solution to any storage problem, and that an intelligent combination of hardware and software systems is often the right thing for your site.

Take some time to write out in pencil the right configuration for your site. Next month, we'll close our series by addressing the last piece of the puzzle: how to connect all these devices to your systems without bottlenecking on all that I/O.

Click on our Sponsors to help Support SunWorld


Chuck Musciano's series on RAID in SunWorldRelated articles in SunWorldRAID software Other SunWorld resources

About the author
Chuck Musciano started out as a compiler writer on a mainframe system before shifting to an R&D job in a Unix environment. He combined both environments when he helped build Harris Corporation's Corporate Unix Data Center, a mainframe-class computing environment running all-Unix systems. Along the way, he spent a lot of time tinkering on the Internet and authored the best-selling HTML: The Definitive Guide. He is now the chief information officer for the American Kennel Club in Raleigh, NC, actively engaged in moving the AKC from its existing mainframe systems to an all-Unix computing environment.

What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough

[Table of Contents]
Subscribe to SunWorld, it's free!
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact

Last modified: