The job scheduling juggle

Job scheduling has made the leap from the mainframe to Unix. We list the features and criteria you should consider in choosing your Unix job scheduling tools

By Chuck Musciano

July 1998

Abstract

One of the most important tools of the mainframe world, production-quality job scheduling, has arrived for Unix -- bigger and better than it ever was. Chuck compares native Unix job scheduling with mainframe scheduling and looks at just how far Unix job scheduling tools have yet to come. (2,600 words)

Mail this
article to
a friend

n the mainframe world, the fundamental unit of work is the job. A job is roughly equivalent to a Unix shell script, in that it usually contains several job steps, each of which runs a single executable. Jobs are defined using job control language (JCL), describing the files and resources used by each job step. Jobs can be run by submitting the JCL to the system, but are usually scheduled for execution by the system's job scheduler.

Job scheduling is one of the premier services in the mainframe environment. Over the years, mainframe job schedulers have evolved into highly sophisticated tools that can create, track, and synchronize thousands of jobs simultaneously. Job scheduling is one of the most important components of a production computing environment, and no mainframe-class Unix shop is without production job scheduling tools.

Regrettably, native Unix job scheduling is pathetic. Fortunately, several companies have created job scheduling tools for Unix that meet and exceed the standard set by their mainframe cousins. We'll explore what Unix offers, why it's inadequate, how mainframe systems meet the challenge of job scheduling, and what you can expect from a scheduling tool for Unix.

Native Unix scheduling
Unix admins are quick to point out that Unix offers cron, which handles the basics of simple job scheduling. Mainframe folks, on the other hand, fall over laughing when anyone suggests that cron provides adequate scheduling capabilities. And for good reason.

Cron offers exactly one feature: the ability to start a Unix command at a specific time, based upon an adequate set of time and date matching criteria. If you have simple job scheduling requirements, such as kicking off a backup every Sunday at 2 a.m., or cleaning out /tmp each night at midnight, cron will do the trick. If you need any capabilities beyond this, you're out of luck.

Cron's biggest problem is that it cannot correlate the execution of one job with the results of another. If your backup job fails, cron doesn't know that it should suspend the job that updates your tape catalogs or deletes yesterday's old files. If the backup finishes early, cron can't move up other jobs that you want run upon completion of the backup. Cron can't start jobs that aren't time-dependent, making it impossible to create a job that runs when a file disappears or when a system resource hits a certain threshold.

In short, cron is pretty much a fancy alarm clock, waking up at preset times to run a job. Detection of job failure is simplistic, and it can't rerun a failed job at a later date. If you're lucky, cron will send you e-mail that a job has failed, but more often than not, that e-mail goes to the superuser, not your personal mailbox. There is no way to tell cron to restart a failed job, or to automatically run a recovery job if some other job has failed.

The cron configuration file ( crontab ) is a pain to maintain, and making even minor changes to a job's start time can be error-prone. There are no layered tools to make job creation easier, although the at command is useful for one-shot jobs. In short, cron is simplistic and tedious, and lacks critical features.

To correct this, a lot of people roll their own job management systems. They use cron to kick off a job controller and create scripts that detect failure conditions, initiate other jobs, and provide some modicum of checkpoint/restart capabilities. While these solutions often work adequately for small job streams, they rarely scale to handle the job loads of a typical mainframe. They also lack sophisticated user interfaces and reporting tools that allow you to keep audit trails of your job streams.

Even worse, a home-grown job scheduler quickly turns into a full-time programming job. As you increase your dependence on the tool, you'll find yourself adding more and more features. The result is usually a hodgepodge of scripts, programs, and Unix utilities that only a few people actually understand. The thought of basing production job streams on this kind of solution should make any good admin cringe.

Mainframe job scheduling
Mainframe job scheduling is the complete opposite of native Unix job scheduling. Tools like CA-7 provide robust scheduling capabilities that can handle huge, complex job streams with ease. Mainframe system programmers are accustomed to high-quality job scheduling as a basic feature of their environment.

Mainframe schedulers provide the ability to group jobs into collections, treating the collection as a single entity whose execution, success, or failure can be tracked and used to trigger other jobs or collections of jobs. You can trigger jobs and job collections using time triggers or nontemporal criteria: the creation of a file, the mounting of a tape, or the shutdown of a database. The job scheduler is aware of almost all activity within the system and can respond accordingly. These capabilities make life much easier for the mainframe system programmer.

Using screen-oriented user interfaces, system operators can track the status of jobs, noting which are running long and which are completing. Using this interface, operators can suspend jobs, delay execution, restart jobs, and track schedule slippage. It's possible to alert an operator if a job exceeds a maximum run time, or if a job has failed to start due to unmet execution criteria. If a user decides that certain jobs should be cancelled one night, or shifted to a different execution window, one phone call to the operator is usually all that's needed to accomplish the change.

The mainframe scheduler also has good reporting tools, creating execution logs and reporting job failure and success. Analyzing these reports over a period of time lets you see trends, such as accounting job streams that take longer and longer or backup jobs that begin to press against the limits of your backup windows. These reports are also a great line of defense against angry users who claim that their jobs don't run on time, or correctly, or even at all. You can also use these reports to find windows of idle time in your schedule, shifting jobs to periods of low usage to improve overall system performance.

Keep in mind that mainframe systems were not born with great job scheduling. These tools are third party add-ons, and are not part of the base operating systems. Their features have evolved over many years, tuned and shaped by the demands of thousands of system managers. It's easy for mainframe people to be smug about these features, but the real oldtimers remember when none of them existed.

Advertisements

Mainframe features in Unix
Fortunately, Unix folks have plenty to brag about, too, since everything that existed in the mainframe world has been ported over to Unix systems, along with a few great features that the mainframes never had. Companies like Platinum Technology, Computer Associates, Unison, and COSbatch have created advanced scheduling solutions for heterogeneous computing environments, letting you manage job schedules on your systems more easily than ever before.

What kind of scheduler is right for your environment? Here are some features and criteria you should consider when testing and comparing these products:

Mainframe capabilities

This may seem obvious, but you should expect to have all the fundamental mainframe capabilities in your Unix scheduler. You must be able to create jobs and job collections, create dependencies between those jobs and collections, and specify starting and completion criteria for your jobs.

A good scheduler will support nontemporal job triggers: file creation, system alerts, etc. You should be able to constrain jobs to run within certain time windows, or within certain resource usage limits. Runaway jobs should be detected and an operator alerted. You must be able to suspend a job stream, slip a schedule to another time of day, and cancel a single instance of a job without affecting its overall schedule. There should be no limit to the number of jobs you can create, and the system should be as easy to use with 10 jobs as it is with 10,000.

Heterogeneous scheduling

This is the best feature of Unix job scheduling, setting it apart from mainframe tools: the ability to schedule jobs across multiple Unix platforms, and to even schedule jobs on non-Unix systems, including Windows NT and MVS. While mainframe schedulers are usually bound to their system, Unix schedulers reach out across the network to start jobs on any system in your environment, regardless of the host operating system.

In an interesting role reversal, the best Unix schedulers can handle all your MVS scheduling needs, triggering jobs on your mainframe while retaining schedule control on the Unix systems. More importantly, you can integrate schedules across your systems. The completion of a job on your NT server can trigger a job on the mainframe, which in turn might kick off a job on your Unix box.

Scheduling tools are expensive, and the ability to leverage a single investment across all of your systems makes a lot of sense. You'll also save money in terms of training and support. Once your operator knows how to use the scheduler, you can add new systems to your environment without having to retrain your operators.

Multiple user interfaces

A good scheduler offers a range of interfaces, each suited to a different task. At the very least, you'll want a robust management console that your operators will use to track and manage all your jobs. You may also need a more full-featured interface that your system programmers (or, in some shops, dedicated scheduling administrators) can use to create and edit job streams.

As you begin to offer services to a broader range of users, you might find it helpful to deliver a "read only" monitor to your users, letting them observe their job streams without giving them the ability to change anything. This can cut down on "check-in" calls to your operators.

Finally, you'll want command-line and programmatic interfaces that allow applications to interact directly with your scheduler. This way, your custom tools can schedule jobs, check statuses, and manage job streams automatically. This is particularly important when you begin to integrate schedule management into your overall systems management strategy. For example, if your network management system detects a router failure, it might suspend certain data transfer jobs until the outage is repaired.

Client/server architecture

Modern Unix schedulers use a client/server approach to schedule management. The schedule exists as a separate database on a master server, while agents are installed on every client machine that needs to run jobs. A daemon on the master system monitors the database, sending messages to clients as jobs need to be run. The clients, in turn, send responses to the master as jobs complete, allowing the database to be updated, triggering other jobs. Each agent implements the same set of scheduling operations regardless of the host operating system, making it easy to trigger jobs in one environment based upon conditions in another.

High availability

Once you commit to a client/server architecture for your scheduler, you can easily extend the architecture to support high availability requirements. Some Unix schedulers can create multiple slave schedulers. These slaves retain duplicates of the scheduling database, but don't actually schedule jobs. If the master system goes offline, one of the slave systems takes over, keeping the schedule running until the master is brought back online. Since scheduling is a fairly lightweight operation, it's possible to use a pair of low-end Unix workstations to act as redundant master/slave scheduling systems, ensuring that your schedule will be running even if one of the systems fails. This also means that you can take one of the systems down for maintenance without interrupting your schedule.

"What if" modeling

Problems always crop up in a production environment, and it's nice to know in advance how the schedule will be impacted. A good schedule allows you to ask what-if questions, showing the effect of various changes on your overall schedule. You might want to shift your backup windows, or change the order of certain job streams. A good analysis tool can illustrate the impact on all your other jobs, letting you avoid unforeseen problems before they occur. You can also experiment with alternative schedules, checking to see which one best distributes job loads across your systems. Coupled with heterogeneous scheduling, you might find that altering jobs on one system has a dramatic impact on completely unrelated systems on your network.

Systems management

Your scheduler should integrate cleanly into your overall systems management strategy. This means that job completion and failure events should be handled as typical network events, with SNMP traps that can be caught and displayed by any SNMP-compliant tool. You'll also want good reporting tools and the ability to capture realtime execution data from your scheduler. Finally, your systems management tools should be able to talk to the scheduler using an open API, letting them alter job schedules based upon traps and events on other systems.

Deciding on a scheduling solution is a difficult task, made all the harder by the fact that several good choices exist for most Unix environments. That said, I'll offer a blatant plug for Platinum Technology's AutoSys product. While not perfect, I've seen it handle huge, complex job loads with a minimum of difficulty. In a feature by feature comparison, it outperforms similar products from Unison and Computer Associates. While it may not be the solution for everyone, it's certainly worth your consideration. If nothing else, it has almost all of the capabilities I've outlined above.

None of these scheduling products are cheap. Even in small environments with just a few servers, you can expect to pay somewhere between $10,000 and $20,000 to install a scheduler, with fancier analysis and support tools driving the price even higher. If you're supporting dozens or hundreds of machines, you can easily drive the price into the high five or low six figures.

Still, this is one of the few investments that you'll never regret. I can't overstate the stability it will bring to your production environment. Your administrators' lives will be easier, your users will be happier, and you'll be able to turn your attention to the dozens of other problems confronting your transition to mainframe-class Unix. If nothing else, be thankful that for this one piece of the enterprise computing puzzle, the quality of available Unix tools meets and exceeds those of the mainframe world.

Resources

Unison Maestro http://websrvsc.unison.com/marketin/products.html
Platinum Autosys http://www.platinum.com/products/sysman/asys_ps.htm
COSbatch http://cosbatch.com/
CA Unicenter http://www.cai.com/
"Can mainframers and Unix die-hards get along?" February 1998 SunWorld feature story http://www.sunworld.com/swol-02-1998/swol-02-mainunix.html
"Is mainframe-class availability possible in a Unix environment?" March 1998 SunWorld feature story http://www.sunworld.com/swol-03-1998/swol-03-24x7.html
"Battening down the hatches -- How can you implement mainframe-style access control in Unix?" April 1998 SunWorld feature story http://www.sunworld.com/swol-04-1998/swol-04-unixsecurity.html
"When disaster hits will you be ready? A little planning can make disaster recovery easier with Unix than on the mainframe," May 1998 SunWorld feature story http://www.sunworld.com/swol-05-1998/swol-05-recovery.html

About the author
Chuck Musciano has been running various Web sites, including the HTML Guru Home Page, since early 1994, serving up HTML tips and tricks to hundreds of thousands of visitors each month. He's been a beta tester and contributor to the NCSA httpd project and speaks regularly on the Internet, World Wide Web, and related topics. Chuck is currently CIO at the American Kennel Club. Reach Chuck at chuck.musciano@sunworld.com.

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-07-1998/swol-07-scheduling.html
Last modified:

Comments:
Name:
Email:
Company Name: