Originally published in the December 1993 issue of Advanced Systems.

SysAdmin

Rule, set, and match

Intuitive, gracious mail delivery carries significant value. learn how to extract this grade of service from sendmail.

By Hal Stern

With mail delivery, you're forced to surmise what users intended to do from the way their mail bounces. Here are two ways to make your postmaster job easier: Establish and conform to uniform rules to send mail to users, and parse every imaginable address that users try to use as a valid destination. With a clearly marked in path and a wide and forgiving the out path, you'll make most people happy most of the time.

We descend deeper into the guts of sendmail and decipher address parsing and rewriting rules. Address manipulation is the real power of sendmail. A well-written sendmail configuration will redirect mail to several mail hubs, route mail to any number of delivery agents, enforce standardized user naming for outgoing mail, and hide any interior sysadmin atrocities. We'll examine a sendmail configuration file and review the basics of address parsing and rewriting rules. Sample rules for common problems of name hiding and alias inversion tie theory to practice

Declaration of independents
Unlike other utilities that do a once-through consumption of their start-up data, sendmail doesn't read its configuration file from top to bottom. The configuration file, sendmail.cf, is more like a piece of source code that defines subroutines and variables. Subroutines are executed in a particular order, and variables control the flow of execution within the code segments. If you think of the sendmail configuration as 90 percent parser and 10 percent magic, you'll have the proper perspective.

To the uninitiated, sendmail.cf is hundreds of lines of data that could be compressed EBCDIC. Don't let the representations intimidate you just because they're written in the language of compiler hounds. Every sendmail.cf file contains definitions, header declarations, and rewriting rules. Like source code, the rules contain logic and subtlety, and form the bulk of the file contents.

Every line has a type that uniquely identifies its purpose. The first character is the type field, followed by space-separated values. Tabs and the pound sign (#) begin comments. Following are the nontrivial types.

D (define a variable). The second character is an alpha variable name, yielding only 52 unique variables. Variables contain a single value, which might be a string containing embedded spaces. For example, to define the variable y as "Big Data Group" use the line DyBig Data Group.

C (define a class). A class is a multivalued variable, like an argument vector in the shell. A class z that contains the names thud, plop, and drip is defined like this: Dzthud plop drip.

F (build a class from a file). This type offers a nice separation of control information from the data values. You can modify a configuration without editing dozens of sendmail.cf files if all sendmail configurations use the same shared class-definition files. To extract the contents of class z from a file, use Fz/usr/local/share/silly-hosts.

H (header definition). Don't touch! Vendors go to great pains to make the default headers comply with the Request For Comments (RFCs), especially RFC 822 governing header formats and values. Adding or changing headers may break other mailers.

P (establish message precedence). A concept as clever as it is unused.

M (define a mailer). Once sendmail is done rewriting the recipient addresses, it chooses a mailer to deliver the message. Valid mailers are defined in M-type clauses, which include flags to be passed to the mailer and the additional sendmail rewriting rules that should be used for post-processing of sender and recipient addresses. A typical mailer definition is Muucp, P=/usr/bin/uux, F=msDFMhuU, S=13, R=23, A=uux - -r $h!rmail ($u); it defines a mailer named uucp, gives the program (command) to run it, gives a set of flags specifying how the mailer behaves and what kind of arguments it handles, give sender and recipient rewriting rules, and gives an argument string. The arguments are filled with the hostname ($h) and username ($u) determined by the address parsing described in detail below.

S (start a ruleset). Line S5 marks the beginning of ruleset 5. Rulesets are groups of rewriting rules. S lines provide the only demarcation between rulesets.

R (a rewriting rule). As with all parsing tools, a rewriting rule contains a pattern on the left side and a transformation rule for input on the right side.

Solaris 2.2 has two new line types.

L (define variable, like a D-type line, but take value from a configuration table). The location and mission of these tables will be discussed later.

G (create a class, like a C-line, using a configuration table instead of a hard-coded value list). Variables and classes are analogous to global variables and initialized data in C programs. They provide state for the dynamic code execution, which contains the more interesting and complex components. In sendmail, the code is a large and onerous looking set of rewriting rules.

Play by the rules
One sendmail primary function is to digest mail addresses and turn them into a triple of (destination machine, destination user, mailer). Building a sendmail configuration is an exercise in interoperability. Rewriting rules are the engine that produces transformations. Rules are organized into sets, at least one rule in each set. Rulesets structure address parsing the same way subroutines process data handling, modularizing common routines and dividing complex tasks into manageable elements.

There are three basic ruleset sequences used in normal mail delivery:

The delivery triple of (destination, user, machine) is determined by first feeding the address into ruleset 3, followed by ruleset 0. The output of ruleset 0 has to be a delivery triple.
The sender's address is given extra pampering on the way out. It's rewritten by ruleset 3, then goes to ruleset 1 and any ruleset specified for sender address rewriting. Sender address rewriting rulesets are specified by clauses like S=13 in the mailer definition lines. Sender addresses end up at ruleset 4 for final mix-down.
Rulesets 3 and 1 massage the address, followed by the recipient address rewriting clause, such as R=24, from the definition, followed by ruleset 4.

Why the effort given to sender and recipient addresses? It's in these extra rulesets that you implement tricks like name hiding and alias inversion. Rulesets 3 and 0 make sure you can turn an address into a destination, while the sender and recipient rules makes sure the mail has a nice return address and neatly formatted destination.

Language barriers
Rewriting rules are cryptic, primarily due to the terse notation used for regular-expression pattern matching. The left side of a rule is a pattern, and the right side is a replacement string. Regular expressions provide wildcard matching, while specific tokens, such as strings, are matched literally. Token boundaries are noted by dollar signs. The regular expression syntax is similar to, but not exactly like, that of egrep:

The tokens that match each subexpression are passed to the right side as a vector of variables: $1, $2, and so on. There are other special-purpose tokens like $: which we'll talk about later. You'll also see angle brackets, , used to place a "focus" on the parsing. Focus is usually shifted to the destination system part, leaving the username before the focus. Many rules use the focus markers to designate a substring being converted to a potential destination site name.

One note: When an address is handed to a ruleset, each rule is applied to it serially. If an address matches the left side of a rule, it is replaced with the right-side output and the rule is re-executed. Rules may recursively trim down or expand an address as required.

The rule $+@<$-.east> $1@$2 matches one or more things followed by an @ sign, followed by one thing suffixed with .east. The obvious match is stern@sunne.east but polyglot!tim@sunne.east also fits, since any number of tokens, including addressing metacharacters, can appear before the @ sign. Both addresses are rewritten without the .east suffix, which is a simple way to strip off unneeded domain qualifiers.

Rule recursion makes adding tokens more complex. If you add a token in the rewriting phase, make sure the rule's output won't match its input string, or you'll get stuck in a rewriting loop. For example:

$-@<$+.east>	$1@$2.ma.east

looks harmless; just adding a local name before anything with a trailing .east. But if you give this rule stern@ma.east, you get stern@ma.ma.east back, and the output matches the left side again. You can prevent an endless stream of tokens by keeping patterns inside the focus simple, rarely using patterns that match more than one token. For example, changing the $+.east to $-.east breaks the infinite loop after just one match.

Ex post facto changes
Given the somewhat frail nature of many sendmail configurations, you might restrict your sendmail.cf hacking to new rulesets. The work you'll want to tackle probably involves cleaning up addresses, which is easily done with new, independent rulesets. When adding your own own variables and classes, use uppercase letters since lowercase values are reserved by sendmail.

How do you integrate your work into the mainstream of address processing? Say you create ruleset 11 to clean up some bizarre UUCP connections that leave you with both bad feelings and unworkable return addresses. To insert it in the middle of ruleset 0, use the equivalent of a sendmail subroutine call:

R+	$:$>11	throw at ruleset 11 once

The left side matches everything. The right side starts with the magic $: token, which means "execute this rule only once." This is a mechanism to avoid infinite loops. The $>11 construct says "feed the whole address to ruleset 11, and insert its output here." Voila! You've called your own rules with a restricted, one-line change to the main ruleset.

Table the motion
Most variables and classes are defined within the configuration file, which is fine if you use the same configuration in all places. However, many sites require minor variations on each mail hub. The domain name is the most likely candidate requiring customization. If you're adopting NIS+ or DNS, changing domain names is likely to break mail as you install the new naming services. Instead of taking the domain name, or some other variable definition, from a line in the configuration file, it would be nice to pull it out of a local naming service.

Enter the sendmailvars table. New in Solaris 2.2, it contains variable and class definitions that are managed by the name service, outside of individual sendmail.cf files. Entries in the map are in the standard key-value pairs:

variable	value
class	value value value

The key is the value of a variable or class defined in a G- or L-type line in sendmail.cf. For example, declare variable X using the table-driven L-type line LXdirect-connects.

The value of X in its definition is used as a key into sendmailvars. When X appears in a rewriting rule, its value from sendmailvars will be substituted. This adjunct table lives in /etc/mail/send mailvars, or in the NIS/NIS+ map of the same name. The default nsswitch.conf setup is to use files first, then NIS+.

You're the mail hacker for a university. You want each department to use the right domain name, but you threaten anyone who touches your sendmail.cf files with a one-way trip to Jurassic Park. Decouple the configuration framework (sendmail.cf file) from the per-site data, which can be put into sendmailvars. Distribute the sendmail.cf file with the $m variable defined as Lmpublic-domain. Create an entry in sendmailvars that defines public domain. For example: public-domain .cs.ufred.edu. Since this file is different on each hub, you can keep a pristine sendmail.cf file and still allow local variations. If you use NIS+ to disseminate sendmailvars, you can declare an organizationwide default table, and then let selected subdomains override its values with their own sendmailvars tables.

Stupid sendmail tricks
To gain familiarity with sendmail rulesets, write or modify some yourself. Here are some common problems and some minimal solutions to them.

Forwarding UUCP mail. Your mail hub is a large, centrally managed machine, but for security you've moved your UUCP connections to a smaller machine on the other side of a firewall. How do you hide this fact from your users, so they don't have to deal with clumsy UUCP addresses? Let's say your UUCP machine is called uucpgate. Add a rule to ruleset 0 that catches all traffic directed to hosts of the form name.uucp:

R$-@<$-.uucp>  $#ether $@uucpgate $:$1@$2.uucp

The left side matches anything going to a .uucp host. The right side specifies the ether mailer, denoted by the $# prefix, with uucpgate as the destination host. The username to be passed through, following the $:, is the same pattern that matched the left side. This rule just forwards the mail to another machine for UUCP activity. On the UUCP gateway, it will be handled by the UUCP mailer. A sendmail die hard would then edit the rewriting rules for the UUCP mailer on the UUCP gateway to remove the gateway machine name from outgoing mail.

Alias inversion. Internally, you call him by his login, joebob. But to the outside, he's known as Joseph.Alpert. How can you make outgoing mail obey a first.last name convention? The first step is to define the mappings between first and last name monikers and usernames. Make sure the first.last name aliases point to the mailhost, for example:

Joseph.Alpert:	joebob@mailhub
Ellen.Tishman:	ellen@mailhub

If you look at your NIS maps, you'll see one called mail.byaddr. This is the reverse of the normal NIS aliases map, which uses the left side as a key. In the mail.byaddr map, each alias that has only one target on the right side is represented with that target as a key and the alias as its value. The problem is to get sendmail to take sender names, look them up in mail.byaddr, and rewrite outgoing mail with the public names.

Again, define a new ruleset for this excursion. Let's define ruleset 25, which will be called by the ether mailer for sender address rewriting. The ether mailer definition now looks like Mether, P=[TCP], F=msDFMuCX, S=11, R=25, A=TCP $h. Near the top of the sendmail configuration file, along with the other variable definitions, we'll add a line pointing to mail.byaddr:

DZmail.byaddr

Define ruleset 25 to cross-reference the map named by the Z variable:

S25
R$-<@mailhub>	$:$>3${Z$1@$2$}

Let's dissect this slowly. The left side matches users on the local mail hub. The right side starts with the token that suppresses further parsing, and then it has a $>3 to feed the results into ruleset 3 for further cleaning. The stuff in the braces is the sendmail equivalent of an NIS lookup, or a ypmatch. $1@$2 is the key, which is used in the map defined as variable Z -- mail.byaddr. So mail sent by joebob@mailhub matches Joseph.Alpert in the inverted alias file, and the sender's name is replaced with its first.last form in the exiting mail. This feature is Sun-specific, although the IDA sendmail kit has a similar function that allows you to do an arbitrary lookup in a DBM file. Since NIS maps are stored as DBM file, you get the same flexibility.

Name hiding. You have five machines that all run their own configurations, but you'd like to appear as a single mail hub to the world. The easiest way: Put the machine names into a class:

CVoxygen helium hydrogen neon argon

Then match members of the class in a rewriting rule that removes the host name part of the address:

R$-<@$=V>	$1	strip off @host

The $=V syntax matches any hostname in the list assigned to class V. If the class is likely to grow, or vary from machine to machine, consider putting it into a file, and define the class using a filename:

FV/usr/local/mail/hidden-hosts

Focus on separating configuration data from parsing rules, and you'll have the most flexible configuration files.

He is Parser Brown
Implementing address-rewriting rules often requires serious debugging. The best place to start is by running the parser interactively, using the -bt flag to specify address-test mode, and using the -Cpathname flag to specify an alternative configuration file. If you tinker with sendmail.cf, make a copy first and exercise your changes before publishing.

Fire up sendmail in debugging mode using a command line like this:

% /usr/lib/sendmail -bt -C/home/stern/sendmail.cf.new

You'll be prompted to enter a ruleset and an address. Ruleset 3 always runs, and you dictate what to run after that. For the basic "ruleset 3-then ruleset 0" work, use 0 decgate::fred@vmshost.west.

If you want to see how a recipient address is parsed, when your local mailer uses ruleset 22 for recipient rewriting, try this: 2,22,4 hal@east. If a name expands to an alias, you'll be subjected to a large volume of output as each recipient is parsed. Or, get Rob Kolstad's address verifier. It takes an input address and generates the triple (destination, user, mailer) on output. It's available via anonymous ftp from boulder.co.edu:/pub/sendmail.test.shar.

If the inner mechanics of token matching interest you, turn on some of the 21 sendmail debugging flags. Flags can be assigned a level of detail. Try running a test address through sendmail started with the -d21.99 flag, which is the sendmail debugging equivalent of taking a bus to your kitchen.

Everything you do adds cost or value to your organization. Correct, intuitive, gracious mail delivery has significant value. The cost: Learning to extract this level of service from sendmail.

About the author
Hal Stern, an area technology manager with Sun. He can be reached at hal.stern@sunworld.com.

[Amazon.com Books] You can buy Hal Stern's Managing NFS and NIS at Amazon.com Books.

(A list of Hal Stern's Sysadmin columns in SunWorld Online.)

If you have problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/asm-12-1993/asm-12-sysadmin.html
Last updated: 1 December 1993