The job scheduling juggle
Job scheduling has made the leap from the mainframe to Unix. We list the features and criteria you should consider in choosing your Unix job scheduling tools
One of the most important tools of the mainframe world, production-quality job scheduling, has arrived for Unix -- bigger and better than it ever was. Chuck compares native Unix job scheduling with mainframe scheduling and looks at just how far Unix job scheduling tools have yet to come. (2,600 words)
Job scheduling is one of the premier services in the mainframe environment. Over the years, mainframe job schedulers have evolved into highly sophisticated tools that can create, track, and synchronize thousands of jobs simultaneously. Job scheduling is one of the most important components of a production computing environment, and no mainframe-class Unix shop is without production job scheduling tools.
Regrettably, native Unix job scheduling is pathetic. Fortunately, several companies have created job scheduling tools for Unix that meet and exceed the standard set by their mainframe cousins. We'll explore what Unix offers, why it's inadequate, how mainframe systems meet the challenge of job scheduling, and what you can expect from a scheduling tool for Unix.
Native Unix scheduling
Unix admins are quick to point out that Unix offers cron, which handles the basics of simple job scheduling. Mainframe folks, on the other hand, fall over laughing when anyone suggests that cron provides adequate scheduling capabilities. And for good reason.
Cron offers exactly one feature: the ability to start a Unix command at a specific time, based upon an adequate set of time and date matching criteria. If you have simple job scheduling requirements, such as kicking off a backup every Sunday at 2 a.m., or cleaning out /tmp each night at midnight, cron will do the trick. If you need any capabilities beyond this, you're out of luck.
Cron's biggest problem is that it cannot correlate the execution of one job with the results of another. If your backup job fails, cron doesn't know that it should suspend the job that updates your tape catalogs or deletes yesterday's old files. If the backup finishes early, cron can't move up other jobs that you want run upon completion of the backup. Cron can't start jobs that aren't time-dependent, making it impossible to create a job that runs when a file disappears or when a system resource hits a certain threshold.
In short, cron is pretty much a fancy alarm clock, waking up at preset times to run a job. Detection of job failure is simplistic, and it can't rerun a failed job at a later date. If you're lucky, cron will send you e-mail that a job has failed, but more often than not, that e-mail goes to the superuser, not your personal mailbox. There is no way to tell cron to restart a failed job, or to automatically run a recovery job if some other job has failed.
The cron configuration file ( crontab ) is a pain to maintain, and making even minor changes to a job's start time can be error-prone. There are no layered tools to make job creation easier, although the at command is useful for one-shot jobs. In short, cron is simplistic and tedious, and lacks critical features.
To correct this, a lot of people roll their own job management systems. They use cron to kick off a job controller and create scripts that detect failure conditions, initiate other jobs, and provide some modicum of checkpoint/restart capabilities. While these solutions often work adequately for small job streams, they rarely scale to handle the job loads of a typical mainframe. They also lack sophisticated user interfaces and reporting tools that allow you to keep audit trails of your job streams.
Even worse, a home-grown job scheduler quickly turns into a full-time programming job. As you increase your dependence on the tool, you'll find yourself adding more and more features. The result is usually a hodgepodge of scripts, programs, and Unix utilities that only a few people actually understand. The thought of basing production job streams on this kind of solution should make any good admin cringe.
Mainframe job scheduling
Mainframe job scheduling is the complete opposite of native Unix job scheduling. Tools like CA-7 provide robust scheduling capabilities that can handle huge, complex job streams with ease. Mainframe system programmers are accustomed to high-quality job scheduling as a basic feature of their environment.
Mainframe schedulers provide the ability to group jobs into collections, treating the collection as a single entity whose execution, success, or failure can be tracked and used to trigger other jobs or collections of jobs. You can trigger jobs and job collections using time triggers or nontemporal criteria: the creation of a file, the mounting of a tape, or the shutdown of a database. The job scheduler is aware of almost all activity within the system and can respond accordingly. These capabilities make life much easier for the mainframe system programmer.
Using screen-oriented user interfaces, system operators can track the status of jobs, noting which are running long and which are completing. Using this interface, operators can suspend jobs, delay execution, restart jobs, and track schedule slippage. It's possible to alert an operator if a job exceeds a maximum run time, or if a job has failed to start due to unmet execution criteria. If a user decides that certain jobs should be cancelled one night, or shifted to a different execution window, one phone call to the operator is usually all that's needed to accomplish the change.
The mainframe scheduler also has good reporting tools, creating execution logs and reporting job failure and success. Analyzing these reports over a period of time lets you see trends, such as accounting job streams that take longer and longer or backup jobs that begin to press against the limits of your backup windows. These reports are also a great line of defense against angry users who claim that their jobs don't run on time, or correctly, or even at all. You can also use these reports to find windows of idle time in your schedule, shifting jobs to periods of low usage to improve overall system performance.
Keep in mind that mainframe systems were not born with great job scheduling. These tools are third party add-ons, and are not part of the base operating systems. Their features have evolved over many years, tuned and shaped by the demands of thousands of system managers. It's easy for mainframe people to be smug about these features, but the real oldtimers remember when none of them existed.
Mainframe features in Unix
Fortunately, Unix folks have plenty to brag about, too, since everything that existed in the mainframe world has been ported over to Unix systems, along with a few great features that the mainframes never had. Companies like Platinum Technology, Computer Associates, Unison, and COSbatch have created advanced scheduling solutions for heterogeneous computing environments, letting you manage job schedules on your systems more easily than ever before.
What kind of scheduler is right for your environment? Here are some features and criteria you should consider when testing and comparing these products:
A good scheduler will support nontemporal job triggers: file creation, system alerts, etc. You should be able to constrain jobs to run within certain time windows, or within certain resource usage limits. Runaway jobs should be detected and an operator alerted. You must be able to suspend a job stream, slip a schedule to another time of day, and cancel a single instance of a job without affecting its overall schedule. There should be no limit to the number of jobs you can create, and the system should be as easy to use with 10 jobs as it is with 10,000.
In an interesting role reversal, the best Unix schedulers can handle all your MVS scheduling needs, triggering jobs on your mainframe while retaining schedule control on the Unix systems. More importantly, you can integrate schedules across your systems. The completion of a job on your NT server can trigger a job on the mainframe, which in turn might kick off a job on your Unix box.
Scheduling tools are expensive, and the ability to leverage a single investment across all of your systems makes a lot of sense. You'll also save money in terms of training and support. Once your operator knows how to use the scheduler, you can add new systems to your environment without having to retrain your operators.
As you begin to offer services to a broader range of users, you might find it helpful to deliver a "read only" monitor to your users, letting them observe their job streams without giving them the ability to change anything. This can cut down on "check-in" calls to your operators.
Finally, you'll want command-line and programmatic interfaces that allow applications to interact directly with your scheduler. This way, your custom tools can schedule jobs, check statuses, and manage job streams automatically. This is particularly important when you begin to integrate schedule management into your overall systems management strategy. For example, if your network management system detects a router failure, it might suspend certain data transfer jobs until the outage is repaired.
Deciding on a scheduling solution is a difficult task, made all the harder by the fact that several good choices exist for most Unix environments. That said, I'll offer a blatant plug for Platinum Technology's AutoSys product. While not perfect, I've seen it handle huge, complex job loads with a minimum of difficulty. In a feature by feature comparison, it outperforms similar products from Unison and Computer Associates. While it may not be the solution for everyone, it's certainly worth your consideration. If nothing else, it has almost all of the capabilities I've outlined above.
None of these scheduling products are cheap. Even in small environments with just a few servers, you can expect to pay somewhere between $10,000 and $20,000 to install a scheduler, with fancier analysis and support tools driving the price even higher. If you're supporting dozens or hundreds of machines, you can easily drive the price into the high five or low six figures.
Still, this is one of the few investments that you'll never regret. I can't overstate the stability it will bring to your production environment. Your administrators' lives will be easier, your users will be happier, and you'll be able to turn your attention to the dozens of other problems confronting your transition to mainframe-class Unix. If nothing else, be thankful that for this one piece of the enterprise computing puzzle, the quality of available Unix tools meets and exceeds those of the mainframe world.
About the author
Chuck Musciano has been running various Web sites, including the HTML Guru Home Page, since early 1994, serving up HTML tips and tricks to hundreds of thousands of visitors each month. He's been a beta tester and contributor to the NCSA httpd project and speaks regularly on the Internet, World Wide Web, and related topics. Chuck is currently CIO at the American Kennel Club. Reach Chuck at firstname.lastname@example.org.
If you have technical problems with this magazine, contact email@example.com