In the ideal world, system administration happens transparently with seamless transitions through new versions of software, upward compatibility in file formats, and users who happily agree to switch to the tool du jour because it simplifies license management. While that notion is complete fantasy, you can make great strides toward minimizing the pain of effecting change in the environment.
Wrapped to go
Putting tools, utilities, and commercial software inside of execution wrappers provides performance and accounting benefits. A wrapper is just a container that builds up the invocation environment, logs usage records, and fires off the intended executable. Here's a simple wrapper for starting Interleaf:
#! /bin/sh if [ -f $HOME/desktop ] then echo renaming $HOME/desktop to $HOME/desktop.preileaf mv $HOME/desktop $HOME/desktop.preileaf fi ileaf_home=/interleaf/ileaf5 set path = ( $ileaf_home/bin $path ) exec $ileaf_home/bin/ileaf
The example above points out a few of the up and down sides of using wrappers:
Wrappers should be put into a single directory searched by every user's path. The wrapper home should precede the individual tool homes in the shell search path to ensure that your trappings aren't bypassed. The distinct advantage to consolidating wrappers appears when you use the automounter to manage tool locations. Each shell searches every directory in the path for executables. If some of the directories are automounted, starting a shell initiates a "mount storm" as the tools' directories are accessed and searched. In addition to making shell start-up inexorably long, mount storms may decrease client reliability. Why complete mounts that may not be used, but leave you with an exposure to a fileserver crash? Let the wrapper do the dirty work, mounting the tool when needed.
Turn and burn, no churn
Sysadmins are always asked to perform mutually exclusive tasks while upsetting the fewest number of users. Most frequently, constraints are imposed by software upgrades required to fix bugs, gain access to critical new features, or expand a pool of floating licenses. Before you install, check your facts.
When considering removing a package or requiring some users to switch to a company-standard tool, determine how many people actively use it. Have your wrapper maintain statistics showing who started the tool and how long it was in use. When a vocal minority calls for an upgrade, but the process would affect everyone, don't do it "in place" -- run both old and new versions in parallel, subject to licensing restrictions.
Track floating-license usage to estimate concurrent demand. What are the peaks and the longer-term averages? If you're considering additional licenses, know how many concurrent developers will be in the license queue the week before a milestone. It's valuable to know the average number of instances of a tool as well, so you can differentiate power users from a larger user base.
Wrapping a 3270 terminal emulator, for example, will tell you the number of windows per user. If half your users are single-window watchers while the other half cruise through four at a time, you might have a training problem. A more reasonable explanation for this skewed distribution is widely different workloads, where some users enjoy a 3270 screen-scraping app that powers them through the multiple windows. That fact is something to be filed away: When those single-window users switch into power mode, you're going to see a three- or four-fold increase in demand for 3270 resources, possibly requiring more logical unit (LU) connections to the mainframe or a wider pipe to the 3270 gateway hosts. These observations may save you from a panic-driven need to install a second gateway server and modify your 3270 LU search algorithm later.
Here's an advanced wrapper that logs start and stop times for a 3270 emulator:
#! /bin/sh echo start te3270 $LOGNAME `date` `uname -n` > /var/wrapper/logs /tools/network/bin/te3270 $* status=$? echo stop te3270 $LOGNAME `date` `uname -n`> /var/wrapper/logs exit $status
This wrapper logs data locally, so you'd need to comb every desktop or application server to build an enterprisewide view of tool usage. Write the log records to a file that's NFS mounted (note the host name in the output line) and you'll have a central repository.
While this example mentions the te3270 executable by name, you could substitute $0 in each case and make this wrapper generic for every tool in the /tools/network/bin directory. Note that we don't use exec because we want to be able to log the exit time for the wrapper, and we save the status of te3270 so the wrapper returns the same exit status to the shell or invoking process. You want your wrapper to look, feel, and smell like the underlying tool, so it doesn't confuse window managers or pipelines that depend on exit codes.
Experts will note flaws in the trivial wrapper. First, it doesn't lock the log file, so two or more simultaneous writes may cause corruption. Second, if the wrapper is terminated because the window system crashes or the machine reboots, you end up with an unterminated transaction in your log. A database system would roll back the stop-less-start entry. Replace the echo lines with a piece of isql or a C program that dumps the records into a database, and you've improved the quality of your logging. The final flaw is that the wrapper tracks instances without qualification.
Queue this
Want to know how many concurrent instances of tools are running? Try this awk script on /var/wrapper/logs:
$1 == "start" { count[$2]++ if (count[$2] > concurrent[$2]) concurrent[$2] = count[$2] } $1 == "stop" { count[$2] -- } END { for (tool in concurrent) print tool, concurrent[tool] }
A counter for each tool is bumped when the start record is seen, and the high-water concurrent mark is updated. When the stop record is encountered, the counter gets decremented. At the end of the list, pairs of tool names and maximum instance counts are printed.
What can you learn from this? Look for economies of scale, where one large resource pool will outperform several smaller environments. The best example is a "build host," a large multiprocessing machine loaded with memory that doubles as a fileserver for your software development seats. Users can compile code on their desktops, but this creates significant network traffic as headers. Source files are dragged from fileserver to local host, and object files are written back. Memory may become a constraint when large links are completed or if bursts of compilations contend with developers' toolsets. Eliminate the network traffic and remove the memory bottleneck by running compiles on the fileserver, adding CPU and memory to it to avoid impairing its file service.
How much do you add? Use compiler wrapper statistics to gauge the level of concurrent builds. Set up queuing software to streamline and meter the resources for an average queue depth. Forcing the jobs to run out of a queue clamps the resources they'll use and prevents your resource pooling from becoming a disaster during peak loads. The user advantages? Multiprocessing machines handle parallel versions of make, decreasing time for a complex build. Faster CPUs and more memory cuts the compile cycle time. Implement this by changing the wrappers around make, cc, and CC to use rsh or rexec.
Stealth users
Wrappers are ideal for tracking usage of applications that are short-lived, or for which coarse-grain data is appropriate. Compilers fit the bill, as they typically run a few minutes per command-line invocation. What about people who fire up Frame and let it run for a month or two at a time? How do you distinguish the junkies from the dabblers?
Pick a sample period, say a four-hour chunk from 1:00 pm to 5:00 pm every work day for a week. During this time, record the average user CPU utilization every 10 minutes using vmstat or sar. Do the math and you'll know how many seconds of CPU time were consumed by user-level activity. If you see user CPU at 45 percent, that's 270 seconds out of 10 minutes on a uniprocessor, but 1,080 seconds on a 4-processor system (CPU utilization is typically shown as an average of all processors). During the same interval, look at the processes of interest and check their accumulated CPU time. Subtract your last sample to get the CPU usage during the interval and divide it by the total user CPU seconds to rank the activity of the monitored processes.
Relative terms are important for an accurate user profile, because not all apps are completely CPU-bound. Driving Frame at full speed may only use 10 percent of the CPU. Seeing a mere 10 percent CPU utilization for the whole system makes it seem close to quiescent, but account for the usage as "Frame 100 percent, other apps 0," and it's clear what's in use during the sample period.
How should you monitor an application process? Run the monitor in the background and have your wrappers write their process IDs into a file that is periodically read for new entries. Start the script right out of the wrapper, making sure it terminates itself when the application under surveillance exits.
Employ data provided by wrappers to avoid being blindsided by seemingly innocuous requests. Those five new users may sit on the high-power curve and use the same resources as 15 average users. The word processor upgrade that marketing needs to get a bug fixed may render all of the documents on which sales relies unprintable. Navigating your way through this maze of twisty dependencies, all alike in importance, requires solid data to plan and justify changes to the environment. There's a social element, too: Learn what the people you support do during the course of a normal day. Application wrappers provide a peephole into your users' work habits. Your job is to make the environment friendly and conducive to them.
About the author
Hal Stern is an area systems engineer with Sun Microsystems in the Northeast, where he specializes in networking, performance tuning, and kernel hacking. He can be reached at hal.stern@sunworld.com.
You can buy Hal Stern's Managing NFS and NIS at Amazon.com Books.
(A list of Hal Stern's Sysadmin columns in SunWorld Online.)
If you have problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/asm-05-1995/asm-05-sysadmin.html.
Last updated: 1 May 1995.