Storage beyond RAID
An introduction to network-attached storage and storage area networks
It seems that RAID isn't enough to satisfy the growing corporate need for storage accessibility and reliability. RAID, which began in SCSI-based systems, is now being used in other disk protocol systems such as Fibre Channel as well. This breaks the limitations imposed by SCSI on both number of physical drive units and distance separation. Furthermore, vendors have found ways to take storage directly to the network with systems like network-attached storage and storage area networks. Rawn follows up last month's column about RAID storage technology with this look at some of the newer technologies and storage systems available for your network. (2,300 words)
he two current darlings of enterprise storage management are most widely known by their acronyms. SAN stands for storage area networks, a scheme in which several storage systems are connected together into a single network. Private channels are then linked directly to bus adapters or storage controllers on hosts that need to access them. NAS stands for network-attached storage, in which each storage system is directly connected to the LAN the hosts are on, and their drives are exported using NFS or CIFS. Each technology is implemented very differently. The SAN storage system requires each connecting host to have a fairly expensive host bus adapter with Fibre Channel (FC) or SCSI; however, this provides high-bandwidth links directly to the storage system. In NAS, the host need only have an Ethernet card, which can communicate with the NAS unit using standard network protocols such as TCP/IP. The complete similarities and differences between the two technologies are described in Table 1.
|Large storage capacities (in the terabyte range)||Large storage capacities (in the terabyte range)|
|Storage system may have internal RAID||Storage system may have internal RAID|
|Supports hot-swap disks||Supports hot-swap disks|
|Provides access to multiple servers||Provides access to multiple servers|
|Can be a hierarchical storage system with tapes, optical drives, etc.||Can be a hierarchical storage system with tapes, optical drives, etc.|
|Can have hubs, switches, and routers between storage system and host connection (with SCSI or FC)||Can have hubs, switches, and routers between storage system and host connection (with Ethernet, WAN lines, etc.)|
|Can provide failover at storage-system level||Can provide failover at storage-system level|
|Can provide failover at switch/router level with some networks||Can provide failover at host bus adapter level with some cards|
|Can provide high-bandwidth connections (up to 2 Gbps -- duplex Fibre Channel)||Can provide high-bandwidth connections (up to 1 Gbps -- Gigabit Ethernet)|
|Each connection is dedicated per host (a full 2 Gbps to each host)||Each connection is not normally dedicated per host (1 Gbps is shared between hosts)|
|Requires host bus adapter card for connection (expensive)||Requires LAN network card for connection (cheaper)|
|Host bus adapter is intelligent, and has large on-board cache (smarter and faster processing)||LAN adapter is simple and usually has no significant on-board RAM (not so smart)|
|Low-overhead storage system communications protocol (SCSI or FC)||High-overhead network and application protocols (NFS or CIFS over UDP or TCP over IP)|
|Always on separate storage network (no unnecessary network traffic)||Can be on same network as hosts or on separate private network (can have unnecessary network traffic)|
|Does not really have an internal operating system (less overhead)||Has its own internal operating system (more overhead but possibly smarter operation)|
|Does not have its own internal filesystem (filesystem is host-dependent)||Has its own internal filesystem (filesystem is NAS-dependent)|
|Software security dependent upon host||Internal software and system security|
|High overhead to connect additional servers||Low overhead to connect additional servers|
You can see from Table 1 that each technology has merit and can be used for different purposes. When you need fast online storage such as that required by digital video services, huge databases, search engines, huge FTP sites, etc., a SAN model may be appropriate. When you need mass storage but cost is more important than speed, NAS is probably the better choice. SANs also require more management because they employ a separate network cabling system (two networks are harder to manage than one).
Vendors such as EMC, Compaq, IBM, Data General/Clariion, Sun, StorageTek, and many others are working on SAN products from host bus adapters to storage systems. On the NAS front, two vendors are most prominent: Network Appliances and Auspex. The multibillion dollar storage market has sufficient room for both product types, as evidenced by the numerous startup companies working on components and products for both technologies.
Fibre Channel surfing
Fibre Channel (FC) must be one of the most interesting developments in the storage market. It provides a new connectivity, signaling, and framing protocol system that can be deployed as a network. Although FC isn't particular to storage networks, it has been received with open arms in that arena. FC itself is a communications protocol like Ethernet, Token-Ring, FDDI, or ATM. It can be used to deliver any sort of packet from SCSI to IP. Did I just say SCSI over FC? Yes, one storage access protocol over another. If you need to understand the differences between SCSI and FC, Ron Levine's feature article, "Fibre Channel vs. SCSI: Which is more advantageous for your storage area network?" offers an excellent overview (see Resources).
When most people talk about FC however, they really mean Fibre Channel Arbitrated Loop (FC-AL), a particular wiring system for FC that uses optical fiber cabling. FC-AL is a loop network just like Token-Ring and FDDI. It provides a bandwidths of up to 100 Mbps (actually 1.062 Gbps) in half-duplex mode, or twice that in full-duplex mode. In terms of raw speed, that's the same speed offered by Gigabit Ethernet. The 1-Gbps rate is only limited by current technology. Next, we'll see 4-Gbps FC networks.
Like Ethernet, FC has hubs and switches. An FC loop can have up to 127 separate addresses for devices. When compared to SCSI's 7-address limit, this makes a lot of sense and saves dollars when it comes to the multidisk enclosures of storage systems. Theoretically, it's possible to network up to three levels of addresses, which gives you a maximum device limit of around 16 million. FC loops can connect with two forms of optical cabling: multimode fiber (50 or 62.5 micron wavelength fibers), or single-mode fiber (9 microns). Similar to the cabling used for ATM networks, the multimode cables allow a maximum device distance of up to 1 km, while single-mode can go up to a distance of 10 km without any signal repeaters.
Although FC is fast becoming de rigeur for high-end storage systems, it's also being considered for creating clusters of servers. The actual devices are already available and vendors simply need to work on developing the device drivers and OS software to create FC based clusters. Some of these technologies such as Intel's NGIO (Next Generation Input/Output) and the Compaq/HP/IBM-led Future I/O architecture for PC systems can use FC for clustering as well as storage connections.
The next step in storage system evolution involves the concept of extended data availability and protection (EDAP). Created by the RAID Advisory Board in 1997, EDAP introduces a classification system for the resilience of the entire storage system and not just disk-based storage. The participating vendors decided that they needed to agree upon a common description for their storage systems which they could use in their literature (mostly in brochures, proposals, and articles like this one).
EDAP defines the properties of disk systems and controllers according to the level of reliability they can provide. There are two sets of definitions, one for disk systems and the other for array controllers, but the definition for each is the same; it's just the context and the acronyms that differ. Table 2 lists these acronyms and their associated reliability-level definitions.
|Failure resistant disk system (FRDS)|
|1.||Protection against data loss and loss of access to data due to disk failure|
|2.||Reconstruction of failed disk contents to a replacement disk|
|3.||Protection against data loss due to a "write hole"|
|4.||Protection against data loss due to host and host I/O bus failures|
|5.||Protection against data loss due to component failure|
|6.||FRU monitoring and failure indication|
|Failure resistant disk system plus (FRDS+): Items 1 to 6, and|
|7.||Disk hot swap|
|8.||Protection against data loss due to cache component failure|
|9.||Protection against data loss due to external power failure|
|10.||Protection against data loss due to a temperature-out-of-operating-range condition|
|11.||Component and environmental failure warning|
|Failure tolerant disk system (FTDS): Items 1 to 11, and|
|12.||Protection against loss of access to data due to device channel failure|
|13.||Protection against loss of access to data due to controller failure|
|Failure tolerant disk system plus (FTDS+): Items 1 to 13, and|
|14.||Protection against loss of access to data due to host and host I/O bus failures|
|15.||Protection against loss of access to data due to external power failure|
|16.||Protection against loss of data access due to FRU replacement|
|17.||Disk hot spare|
|Failure tolerant disk system plus plus (FTDS++): Items 1 to 17, and|
|18.||Protection against data loss and loss of access to data due to multiple disk failures in an FTDS+|
|Disaster tolerant disk system (DTDS): Items 1 to 16, and|
|19.||Protection against loss of data access due to zone failure (at most 1 km distance between zones)|
|Disaster tolerant disk system plus (DTDS+): Items 1 to 15, and|
|20.||Long distance protection against loss of data access due to zone failure (at least 10 km distance between zones)|
There are several terms that need to be explained. First of all, write hole is a term used to describe an error condition that can occur when another error in the system like a brown-out or physical disconnection occurs in the middle of a write operation. Since a write operation across multiple drives requires careful coordination, a disruption when it is occurring results in data errors. The storage system should be able to recover from the disruption and complete the write operation properly.
An FRU is a field replaceable unit. This can be a single drive, a power supply, a cooling fan, the little magic disk elf, etc. Practically any physical unit that can be replaced as a whole by a system administrator without involving overly complicated removal procedures, or software magic, is an FRU.
Finally, zone is a term used when you have wide area or non-local distribution of your disk system. In such a case, you may have a storage system in one room in one building, and a second one down the street or even across the country. The catch is that on the software side and from a server point of view, they have to be part of the same storage system or network. Zones come into play in storage area networks when you have huge storage systems for your servers and mainframes that need redundancy or separation for technical, security, or business reasons.
There are basically three main classes, as follows:
Failure tolerant -- FT systems will maintain the consistency of the data even when a failure has occurred and will continue to be operational while it is being fixed
Disaster tolerant -- DT systems will continue to be operational even if all the devices at one particular physical location have failed, since these devices are backed up with an identical storage system at another physical location
You can probably guess which of the above is the most expensive and which is the least.
The acronyms for EDAP can get confusing, so just keep in mind that in general DT > FT > FR, and the plusses simply mean "greater protection." Hence, the EDAP classes in Table 2 can be thought of as
DTDS+ > DTDS > FTDS++ > FTDS+ > FTDS > FRDS+ > FRDS (for disk systems); and DTAC+ > DTAC > FTAC++ > FTAC+ > FTAC > FRAC+ > FRAC (for array controllers).
EDAP takes over the role the different levels in RAID were supposed to play. The classification system is still brand new and we'll begin seeing high-end storage vendors provide this level of support in the coming months. EDAP should transform storage networking, both SAN and NAS, as we know it, providing reliability once only available with mainframe storage systems.
Looking several years down the road, there are even more possibilities for storage systems. The Future I/O Group, which is looking at creating a new peripheral interconnect system to replace the PCI buses in today's servers, may blur the distinction between SAN and NAS. The group is proposing the use of the Internet Protocol version 6 (IPv6) on top of hardware protocols like Ethernet and Fibre Channel as the communications system between the controller and storage units. The merits of this method are debatable, but it nevertheless takes an evolutionary step in the general system architecture of computers.
About the author
Rawn Shah is an independent consultant based in Tucson, AZ. He has written for years on the topic of Unix-to-PC connectivity and has watched many of today's existing systems come into being. He has worked as a system and network administrator in heterogeneous computing environments since 1990.
If you have technical problems with this magazine, contact email@example.com