SysAdmin Advanced Systems Magazine Originally published in the April 1995 issue of Advanced Systems.

SysAdmin

Think locally, expand globally

Learn to think in local execution terms while paying heed to the pressure of global expansion.

By Hal Stern

Mastering the intricacies of Unix shells will impress both practical users and theoretical computer scientists. Much of the shell's flexibility and power comes from its variable substitution, global wildcard expansion, and alternate command execution modes. For the pragmatic, expansion and substitution let you manipulate command output as easily as any other variable. Theoreticians are quick to point out that any language that allows recursion or self-modifying code -- like taking an arbitrary string and executing it on demand -- produces an exponentially large number of possible programs, not all of which can even be proven to be syntactically correct -- a fact all too clear when you try debugging some of the more obscure shell constructs.

Variably cloudy
Variable substitution is probably the most well-known global expansion method. Variables are referenced using the $var syntax, where var is the name of a variable. Define shell variables on the fly, with no initialization or declarations required:

set suffix=.nsf

Anything appearing after a dollar sign is taken as a variable name, so be sure to separate variables used in string concatenation with curly braces:

set filename=${prefix}.nsf

Several built-in variables are initialized in each shell environment:

$, the current process ID, useful for naming temporary files;
?, exit status of the last command executed;
#, the number of command line arguments;
1, 2, 3, and so on, the actual arguments passed to the shell. On the command line. $* refers to all arguments.

The command line argument vector can be loaded at any time using set to explode a variable or set of words into arguments, for example, set $hostname $*. A script that wants to set default arguments may add them to the command line arguments, reducing the argument parsing step to a single loop over all positional variables. Typically, the marker -- is used to demarcate the passed-in arguments from those set within the script:

set v h -- $*

This fragment sets $1 to v, $2 to h, and so on, with the first command line argument showing up in $4.

The C-shell also supports the $argv[1] syntax used in a C program to reference positional arguments. Define multiword variables by putting the array elements in parentheses:

set speeds = ( slow medium fast )

You can access an element of a multiword variable using its positional subscript, that is, speeds[2] would be "medium." Here's one case where awk and Perl shine brightly: They support associative arrays where the indices are arbitrary, mnemonic strings. Want to sort statistics by host name? In the C-shell this requires that you build two arrays, one for host names and one for statistics, and remember the ordinal value of each host name. In awk, it's just an array assignment using the host names as subscripts. The following awk example shows how to set an entry in an associative array, and how to iterate the string subscripts to print the array contents in tabular format:

{ stats[$1] += $2 }
END { for (h in stats)
print h, stats[h] }

Perl goes one step further with primitives to extract the keys and value pairs from an associative array. If you think in associative memory terms, try Perl.

Wild thing
Wildcards used for filename matching are a close second to variables in popularity, and they exceed variable substitutions in terms of the confusion they can generate. The four basic wildcard building blocks are part of the shell's list of metacharacters. They use a derivative of regular expression syntax:

? matches any single character, so main.? matches main.o, main.c, and main.C;
* matches any sequence of characters, including a zero-length sequence;
[] matches values in a range of characters, for example, [0-9]* for any filename beginning with a digit;
{ } matches items from a set, such as {main,lib,test}.c.

Filename substitution applies every shell command, including those that don't expect filenames as arguments. To generate a list of files for processing, try echo *. It's equivalent to the unsorted output of ls. As mentioned last month, wildcard expansion of filenames is done by the shell before handing arguments off to an executable. There's no sorting or duplicate suppression done, so you must take care not to name files twice when you're renaming or removing them. For example, the following purges a directory of object files and anything prefixed with old.:

rm old.* *.o

If you have a file named old.o, it will match both expressions, and will be included in the rm argument list twice. The first encounter removes the file, the second encounter generates an error. Multiple matches will occasionally confuse new users who make such errors when removing files with a suitably large hammer.

Variable substitution and wildcard expansion are performed first in the shell's evaluation of a command or script line. If no matches for a wildcard are found, you get the error No match, and processing of that command stops. You can prevent an unmatched wildcard expression from adversely affecting script execution by setting the nonomatch variable in your .cshrc or .profile. You won't get an error on a malformed or lonely wildcard expression. If you want to completely disable global expansion, set the variable noglob. Doing so makes the shell metacharacters behave like any other ASCII character, with no substitution performed. If you're comfortable with single-pass substitution, but want finer control over expression evaluation, you have to quote liberally and literally.

Don't quote me
Quotations group arguments together, define your intended semantics for strings, and force or delay global expansions. There are three quote forms used in the shells: double quotes ("), single quotes (Ô), and back quotes or back ticks (`). We'll cover back quotes shortly.

The difference between single and double quotes is simple: Single quotes mean "do not evaluate what's inside" while double quotes imply immediate evaluation of their contents. Both quoting forms treat the result as a single string or argument, a useful construction for passing arguments with embedded white space or shell metacharacters. For example, to pass Hal Stern as a single argument to a script named newuser, enclose the string in double quotes:

newuser -h /home/stern -n "Hal Stern"

When the sequence in double quotes contains variables or filename wildcards, those are immediately expanded, but the result is treated like a single string (try rm "*" as opposed to rm *).

Double quotes make amends for many shell programming shortcuts. Consider testing for a zero-length string using the forms if [ -z $name ] and if [ -z "$name" ]. If $name is an empty string, the first if clause is malformed because the string is nothing more than white space. Surround the variable name with double quotes, as in the second example, and an empty string becomes the argument to the -z test. If you expect to be doing string tests on, or making case labels out of arguments with embedded white space, use double quotes to treat the string as a monolithic item instead of as multiple tokens.

Single quotes are a bit stronger because they defer global expansion of their contents. They're the perfect vehicle for passing wildcard specifications to utilities like find:

find /home/stern -type f -name Ô*.doc' -print

The example prints the names of all regular files that end in .doc. Confuse single and double quotes here, and you get random results:

find /home/stern -type f -name "*.doc" -print

First *.doc is expanded to match filenames in the current directory. If there are none, you get a No match error; if there is exactly one, you'll send find on a mission to locate exact matches of that single file name. Match more than one file, and you have violated the syntax for a find clause, so it complains before doing any searching.

Nested quotes have their own set of rules. Escape double quotes with a backslash to include them as part of a string, especially one that is enclosed in double quotes. Within single quotes, a set of double quotes have no effect and are left in place, since no substitution occurs. Taking the opposite tack, nesting a single quoted string within double quotes still performs variable and wildcard expansion; the immediate evaluation mode of double quotes supersedes the deferred evaluation of single quotes. When you've mastered the basic quoting forms to defer global expansion, it's time to use back quotes and eval to alter command line execution.

Evaluation copy
Back quotes, or accentes graves for those who prefer to code with a distinctly continental flair, are the third substitution mechanism. Enclosing something in back quotes means "evaluate this, execute it as a command, and substitute the command's output for the back quoted string." Note that back quotes provide an auxiliary execution mechanism, which is not to be confused with the shell's built-in exec function that does a transfer-of-control to another executable image. Back quotes are best applied when you need to save the output of a command or pipeline for further processing. They also nest, but be aware that embedding one execution inside another is bound to create confusion for the next person who looks at your script.

Here's a simple example that converts any uppercase letters in a filename into lowercase, and then renames the file if it doesn't already exist:

newfn=`echo $1 | tr ABCDEFGHIJKLMNOPQRSTVWXYZ abcdefghijklmnopqrstuvwxyz`
if [ ! -f $newfn ]
then
mv $1 $newfn
fi

Back quotes produce output that is used as data. What if you have a tool that produces commands? Enter the next execution altered state with an explicit eval command. eval runs the command line parser and execution module over its argument, making it ideal for sourcing commands in the middle of a script. You can always save the output of a configuration tool into a file and source it:

configtool > /tmp/conf$$
source /tmp/conf$$
# csh

However, the more elegant solution is to use eval `configtool` to run the configuration utility and handle its output in line. Get rid of an extra level of command indirection with eval; introduce a level with back quotes.

The most common use of eval is to set terminal characteristics or tell an xterm the correct window size. Run rlogin inside an xterm and the remote system may not pick up the right window size, making vi, emacs, and even more frustrating or useless. Put the following in your .cshrc to force the remote to set the appropriate fields in its local line discipline, evaluating the output of the X Window System resize command:

set noglob; eval `resize`; unset noglob

The output of resize is a pair of set commands that turn on noglob and nonomatch, with a pair of stty commands nestled in between to set the row and column sizes of your window. The outermost setting of noglob ensures that eval doesn't doctor the resize command output, while the innermost set commands make sure the actual execution goes off without global expansion.

A variation on eval exists for mathematical functions in the Bourne and C shells. String variables participate in numerical evaluations within the expr construct in the Bourne shell:

set count=`expr $count + 1`

Correct use of white space is crucial for proper expression parsing. Without spaces around the plus sign, assuming a value of "6" for count, the result expression would be 6+1 instead of 7. The C-shell syntax is even uglier, using the built-in @ operator:

@ count = $count + 1

The space before the variable name is required. Performing even simple addition is such a chore in the shell that multiword, subscripted variables provide too small a return on the code investment. Associative arrays are a more civilized approach.

Global expansion or substitution mechanisms are as fundamental to the shell as data structures, indirection, and recursion are to C programs. Harness variables, execution flows, and filename matching to parameterize shell script fragments the same way frequently used code gets generalized into a common library. Despite the theoretical claims to the contrary, a liberal dose of global thinking turns your favorite scripts into system administration koans.

About the author
Hal Stern is a Distinguished Systems Engineer with Sun Microsystems, where he specializes in networking, performance tuning, and kernel hacking. He can be reached at hal.stern@sunworld.com.

[Amazon.com Books] You can buy Hal Stern's Managing NFS and NIS at Amazon.com Books.

(A list of Hal Stern's Sysadmin columns in SunWorld Online.)

If you have problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/asm-04-1995/asm-04-sysadmin.html
Last updated: 1 April 1995