Unix tape services: Get on track
Can you provide mainframe style tape management in the Unix world?
Tape services for Unix can't quite compare to those in the mainframe world; but that doesn't mean Unix tape management isn't possible. Chuck covers all the bases, from the software tools -- backup systems and tape management systems -- to the hardware options, even helping you to calculate the number of tape drives you'll need. (2,300 words)
t seems that another great tradition is dying, falling prey to the inexorable advance of technology. While it may seem trivial to some, I openly mourn the passing of one of the great visual clichés of the cinema, used to represent computing in every movie made until the early '90s. I am speaking, of course, of the 9-track tape drive.
There is no other visual effect that matches the allure of the 9-track tape drive. As the reels whirl and the tape floats up and down in the vacuum columns, every viewer knows big problems are being solved, numbers are being crunched, and progress is being made. As scientists pace anxiously in white lab coats and teletypes chatter, nothing builds dramatic tension more than the sight of those spinning tapes.
Movies today offer meager replacements for this visual device, the most annoying being the bizarre video display that emits typing sounds as characters are displayed. This is a poor substitute for those magnificent drives, the ultimate symbol of mainframe computing and big iron. Just a glimpse of a tape drive is enough to let you know that the job, whatever it may be, is well in hand.
Unix on the big screen
When those drives appear on the silver screen, it's a sure bet that no one in the audience is suddenly convinced that a Solaris system is on the job. While mainframes and tapes are forever cemented together, tapes are probably the last thing that come to mind when you think of Unix systems. And for good reason: Unix tape services are simply lousy.
Don't misunderstand. Unix supports tapes -- all sorts of formats and drives. On most systems, any application can open the tape device and read and write data all day long. Users can use utilities like mt to rewind, position, and mark tapes. Backups are (or should be) written to tape every day.
Regrettably, we've just completely covered all the tape services you get with Unix. While fully functional, tapes under Unix are barely usable. There is no native support for tape management, cataloging, archiving, duplication, rotation, and mounting. Everything is a manual operation, and support for large tape libraries is nonexistent.
Mainframe tape services
Tape services in the mainframe world are an entirely different story. From their earliest days, mainframes have had to deal with volumes of data that grossly exceeded the capacity of their expensive, small disk drives. The development of tape technology in the mainframe world wasn't driven by a need for backups or disaster recovery. Instead, it was required just to hold all the data accessed by the system on a daily basis. The focus was not only on reading and writing tapes quickly, but on cataloging and storing those tapes simply and easily. Typical data centers had thousands of tapes hanging in racks, each serialized and retrievable by an operator in a matter of minutes.
Mainframe operating systems allow users to assign a unique number to a tape (the volser, or volume serial number). This volser, along with information about the user and access privileges for the tape itself, is maintained in a central system catalog. On many systems, tapes and disk volumes are stored in the same catalog and can be used by applications almost interchangeably. A request for a certain volser might mount a tape or a disk filesystem, with the application none the wiser.
Once created, a tape might reside in the data center indefinitely. Depending on user specifications, the tape could be rotated off site after a certain period of time, where it might eventually be scratched and made available for reuse. It's a trivial operation to create a tape and indicate that it be retained for seven years (complying with an IRS regulation, for example). Years later, a request for that volser will mount the desired tape, with the data intact and ready for use. A similar request in most Unix shops results in frantic phone calls to the operator, followed by hopeless rummaging through boxes of old tapes kept in a spare closet.
Because tape operations have always been an integral part of mainframe computing, mainframe discipline surrounds and supports such services in traditional data centers. It's possible to provide mainframe-like tape services within Unix, and it starts with that same discipline. While implementing tape services in Unix also requires a serious commitment to high-end hardware and software tools, your efforts will never succeed if you don't first commit to providing robust, manageable tape services that meet your users' needs.
If your Unix environment supports users who have never used mainframe tape services, it's most likely that their needs will center around disaster recovery and backup-and-restore services. In these cases, you'll need to focus on an integrated backup and tape management system that can provide easy, automatic data backup.
If you have users that started on a mainframe and migrated to Unix, however, you have a much bigger problem on your hands. These users will want to spool data to tape on a regular basis. They'll want to create tapes in specific formats and ship them to other data centers. They'll want off site storage and a variety of retention schemes. And they'll still want strong disaster recovery and backup-and-restore services.
First, the software solutions
All those mainframe tape services weren't bundled into the operating system from day one. Like everything else, they were integrated into the system as third-party packages that grew and evolved over the years.
Tape support for Unix systems is available in two different kinds of tools: backup systems and tape management systems. Backup systems focus on creating and managing system backups, and integrate tape support as part of an overall backup package. If your user base only needs backup services, these packages may work for you. Tape management systems exist only to manage, catalog, and support tape services. If you need to support more traditional, general-purpose tape usage, a complete tape management package may be a better choice. Remember that these two products aren't mutually exclusive. You'll always need a backup system; you may want to add extra tape support as your users' needs expand over time.
Selecting a backup package is a complete discussion in itself (which we'll have sometime next year), so we'll focus on the tape aspects of the backup system. A good tool, such as Veritas NetBackup, supports all sorts of tape options, including variable tape retention periods, multiple tape pools from which to select tapes, automatic tape duplication, and automatic management of off site tape rotation.
The purpose of your backup system is to completely automate backups. If you install a system that requires operator intervention for even the simplest backups, the element of human error will compromise your backup plans. By selecting a tool that can work with a variety of tape systems and automate almost all tape-related tasks, your operators will be freed to do more important work while backups hum along in the background. In general, you should expect your operators to bulk-load tapes into your tape units, and have the hardware and software do the rest of the work, including figuring out which tape gets mounted when.
A pure tape management tool goes beyond backup support to extend tape services to general users. Using a media management tool such as Veritas Media Librarian or IBM's ADSM, users can request tape mounts, manage catalogs, and perform tape-related activities without involving your operators. These tools extend beyond simple tape management and often allow management of optical drives and other removable media as well.
For these tools, the goal is flexibility and usability across a wide range of platforms. You'll want both command line and graphical interfaces to the tool, so tapes can be accessed from either scripts and scheduled jobs as well as from a user's desktop. By supporting a broad range of systems, you can amortize your investment in these tools across more systems, making the expense easier to justify when you acquire the tools.
Regardless of which tool you choose, offering improved tape services will require discipline, training, and above all, close interaction with your users to make sure they get the services they need and know how to use them. Nothing is more disastrous than having a user create a tape only to discover years later that it was lost, cataloged incorrectly, or is otherwise inaccessible. These tools only offer the basic layer of services. It's up to you to provide the discipline to make them work effectively, and to marry them with appropriate hardware that can handle your users' data capacities and throughput demands.
Onto the hardware solutions
The first step in providing better tape hardware is to replace that aging QIC drive you've got hanging off the back of your box. Technically, you can offer tape services using a single drive and manual tape management, but you'll drive your operators crazy storing, mounting, and cataloging tapes. Your only solution is to buy a tape library.
Tape libraries come in all shapes and sizes, from little five-tape stackers with a single drive to gigantic StorageTek silos holding thousands of tapes and more than a dozen tape drives. Needless to say, you can spend a lot of money on these units, and you'll want to do your homework before investing in a unit. Keep in mind that your tape library will be around longer than most of your systems, so make sure you buy a unit with a lot of expansion and growth capability.
The two principal measures of a tape unit are the number of tapes stored in the unit and the number of drives supported by the unit. Often, this is a tradeoff, with modular units within the library chassis holding either tape storage slots or tape drive bays. You configure the unit with the slots and bays you need to handle your data volume and tape throughput.
While library vendors are more than happy to analyze your needs and recommend a solution, a little back-of-the-envelope arithmetic can quickly approximate a solution. If you have a 400-gigabyte (GB) database that is fully backed up once a week, with daily incremental backups of 40 GB each, a single week of backups will consume 640 GB of storage. If you want to retain four weeks of backups online, you'll need 2.56 terabytes of tape storage available. Before you begin hyperventilating, consider that a single DLT IV tape cartridge holds 35 GB of uncompressed data. Assuming no compression, you'll need 74 cartridges to hold all that data. To leave some headroom for spare tapes, cleaning cartridges, and a bit of growth, a 100-tape library should do the job nicely.
To determine the number of tape drives you'll need, compute the maximum bandwidth needed to move all your data into the library. If you need to back up that 400 GB of data in two hours, you'll need to move 200 GB per hour or 55 megabytes (MB) per second. Because a single DLT 7000 drive streams at 4.5 MB per second, you'll need at least 13 drives running in parallel! Because this is hardly practical, you'll need to either grow that backup window or shift to different drive technology. Shifting to a StorageTek Redwood tape drive, for example, increases throughput to 11 MB per second for each drive and requires only six drives for your backup. Or, growing the backup window to six hours reduces overall throughput to 18.5 MB per second, which can be handled by just four DLT 7000 drives. Other solutions might include switching to hot database backups with longer backup windows, or using extra disk space to make a mirror of the database that can be copied to tape all day long using just a single drive.
Once you've done the analysis and made the tradeoffs, you need to look for a unit that supports what you need now and provides for future growth. Many of the larger units can be connected together, allowing you to add additional capacity at a later date without losing your original investment. Most units also use standard drive form factors, allowing higher density, faster tape units to be installed at a later date. Make sure you spend some time looking at the internal cabling of the unit, ensuring that multiple drives are sharing a single SCSI bus that might drop overall throughput below your requirements. Finally, choose an interconnection strategy (usually SCSI, Ethernet, or, soon, Fibre Channel) that allows you to connect all your systems easily.
One last bit of advice
From a pure enjoyment perspective, make sure you buy a unit with visible moving parts and at least a few displays or flashing lights. Tape libraries are the last remnants of computing equipment that's fun to watch. Like those old 9-track tape drives, it may be that robotic tape libraries will become the next cinematic metaphor for big computers (though watching a robot zip around, grabbing tapes and ejecting cartridges is probably more excitement than most sysadmins can handle). Nevertheless, don't miss out on your share of the fun when you invest in mainframe-class tape support for your Unix systems.
About the author
Chuck Musciano has been running various Web sites, including the HTML Guru Home Page, since early 1994. He serves up HTML tips and tricks to hundreds of thousands of visitors each month. He's been a beta tester and contributor to the NCSA httpd project and speaks regularly on the Internet, World Wide Web, and related topics. Chuck was formerly SunWorld's Webmaster columnist and is currently CIO at the American Kennel Club. Reach Chuck at firstname.lastname@example.org.
If you have technical problems with this magazine, contact email@example.com