Twisty little passages all automounted alike
A second look at automounter secrets
The real world hollers for scalability, multiple network connections, and lower costs of administration. Your personal life benefits from the time savings of the last item, so this month we'll take a pragmatic view of using the automounter, focusing on environments characterized by multiple network paths between clients and servers, replicated servers sharing access to popular filesystems, and using client-side caching to improve performance. (2,900 words)
Last month, we took a simplistic view of the automounter, the de facto Unix standard for managing a client's NFS configuration. While covering the basics of creating maps, avoiding name space conflicts, and using shorthands like wildcards, the tips and techniques mentioned were best suited for machines with only a few exported filesystems, on a single network, with a single well-known hostname. Simple administrative techniques suffice for small networks, but be wary of extending them into a policy of "one server, one network, one filesystem." As your organization grows, you'll be dealing with several hundred small NFS servers. After you've spent hours trying to figure out where to put the newest user's home directory, and how to even out load imbalances, you'll come to appreciate the quandary faced by managers of large Novell NetWare sites. A thousand pennies are more cumbersome than a $10 bill.
We'll start with the problem of maintaining bandwidth into popular filesystems like the local development tools repository, and seeing how the automounter selects one server from the bunch. We'll refine that view a bit by discussing ways to tweak the selection algorithm when you have a maze of twisty little network passages to your servers. From there, we'll tackle the multi-homed host problem where each NFS server is seen as a little network of twisty names, and see how to optimize the automounter for use with the Solaris CacheFS client-side NFS cache. A bevy of warnings and caveats round out this month's installment on advanced client mount-point management.
The Tessier-Ashpool fileservers
From an NFS client perspective, replicated servers are complete clones of one another. That is, they offer the same filesystem, with the same file layouts, and, ideally, the same file attributes. They may be different kinds of machines, with other NFS duties as well, as long as they provide identical filesystem views for the replicated data set. One host might be a small desktop server offering only /usr/local for example, while another provides /usr/local in addition to home directories and an ftp archive. From the automounter's point of view, the two servers form a replicated server set for /usr/local. Note that the automounter doesn't ensure that the servers actually contain the same filesystem hierarchies; you could replicate across wildly different hosts and end up with wildly confused users. Building and maintaining the replicated hosts is your job; getting the NFS clients connected to one server in the set is up to the automounter.
Why bother with replicated hosts, when you can build a beefy, solar plexus of an NFS server, connected to dozens of networks? There are a variety of performance reasons:
Note that replicated servers are for read-only data, not writable filesystems. NFS does not (currently) support write multicasting, or transactional writes to multiple servers to keep them all updated synchronously. If you want read and write replication to improve reliability, you're looking at a high-availability solution rather than the performance win offered by server replication. In a high-availability pair, the two servers offer exactly the same filesystem, since they share disks. One can take over for the other in the middle of an NFS traffic stream. With server replication, you're buying some protection from a server that crashes before you attempt to access it, but if the server crashes once you've mounted it, the situation is no better than having done the mount from a single source. A more detailed look at the replicated server mounting mechanics makes the distinction a bit clearer.
Finding the Chosen One
Setting up a replicated server set is as simple as naming the hosts in the map entry:
/usr/local toolbox:/usr/local \ workhorse:/export/usr/local \ distbox:/usr/local/sun4
If all of the servers use the same filesystem naming convention, you can simplify the entry by enumerating the server names:
With such a nice assortment of machines, how does the automounter choose one? There's a three-step process:
The subnet address match works in Solaris 2.4 and later releases. In earlier automounter implementations there's no check on subnet addresses, so it's conceivable that a fast server sitting across a few lightly loaded router hops would turn in a faster round trip lap time than a server on the same subnet -- not the best binding from a network perspective, but the best the automounter could choose without further information.
Note that the hard work is done at mount time by the automounter, and once the lucky server is identified, it's mounted and used as if it was the only server named in the map entry. This means that the usual automounted filesystem dependency rules apply: If the server crashes after you've mounted it, your client can get stuck on the non-responsive mount point. In addition to providing better response time for heavily-used volumes, the biggest reliability benefit of replicated volumes is seen when you mount, time out, and unmount the filesystem repeatedly. When you're casually bouncing through volumes, and the automounter is able to unmount the quiescent ones, you reduce your risk of being impaired by a crashed server -- if you go to re-mount the volume after your former server crashed, you'll pick up a different server on the next reference to the filesystem.
Again, once a server has been selected from the list, there's no re-adjustment made by the client. If hundreds of clients all pounce on the same server at once, the server has no way to retroactively say "Those null RPC response times were too optimistic, I'm much busier now with real RPC requests." It's not unusual to see 80 percent or more of all NFS clients using replicated servers binding to the same server, victimizing it for being physically close on the network or temporarily lightly loaded. Fortunately, the Solaris 2.x automounter lets you skew the timings to force a more equitable distribution of NFS clients.
A weighty decision
Consider this scenario: Engineers in your office start arriving at 9 am. Before then, all of the automounted filesystems have been unmounted, the desktops having been left alone overnight. As people begin to reference the popular shared volumes, the automounter sends flurries of null RPCs to the servers listed in replicated server sets. Since there's little sustained load at this point, the fastest server or machine closest to a desktop "wins" each time. An hour later, everyone is bound to the same one or two servers, while the other four or five in your set sit, sadly, squandering the work invested in replicating the data sets.
What's the solution? Break the clients up into groups, and skew the selection process so each group has a preferred server. The automounter provides a server weighting mechanism that kicks in after subnet address matching is performed. Here's a sample automounter map entry with weights assigned:
Weights are used as penalties: the bigger the weight, the less you want to deal with that server. Leaving off the weight sets it to 0, for the highest preference. If all of the servers are on the same network, distbox will be chosen first if it is responsive, then workhorse, and finally toolbox. Network address matching takes precedence over weights, however, so workhorse may be selected if it's the only server on the same network as the client doing the automount.
There are several algorithms you can apply to divide your clients into groups:
The biggest drawback to server weights is that they're likely to be different for each client, making it hard to use the same automounter map across the board, and close to impossible to put the map under NIS or NIS+ control. To consolidate map management again, use automounter variables for the weights, so a generic map can be used by all clients with the per-client automounter invocation supplying the weight data:
Change your automounter invocation on the client to define the weight values:
automount -DW1=30 -DW2=10 -DW3=5
You can parameterize this command line as well, running a script to deduce the weights from network numbers or a configuration file that maps client names to group numbers and weights.
Executable content another way
What do you do if you have the inverse problem, that is, one server with many interfaces, and you don't want to have a plethora of maps containing the "best" interface name for each client? If you put server fred on ten networks, it's going to present itself with names like fred-net1 and fred-net7. Clients that use the name fred may have their NFS requests routed through the network, twisting and turning through a maze of less-than-ideal passages. Assuming you're not solving the problem with DNS (creating multiple IP address records for the same name), you can tackle this one in two ways with the automounter.
List all of the server names, as if it was a replicated server
set, and let the automounter find the "best" path using subnet address
matching. When all clients are on the same subnet as as one of the
server interfaces, this is the way to go. Note that NFS clients on
different subnets will mount from the "default" IP address, that is,
the one that gets returned by
gethostbyname() on the
server. When the subnet address match fails, the automounter uses the
null RPC ping data to sort the servers and then uses the IP address in
the RPC response packet to identify the server. For a multihomed
server, these IP addresses will be the same.
Combine the replicated server set listing with server weights to deal with clients on non-local subnets. For example:
Again, you can substitute variables for the weights to simplify map maintenance and distribution.
Let's add another spin: how do you make this work with CacheFS? A typical automounter master map entry to use CacheFS for an entire indirect map looks like this:
/home auto.home -fstype=cachefs,backfstype=nfs
The Solaris automounter isn't particular about the underlying
filesystem type of the mount point -- it defaults to NFS, but it can
handle CacheFS as well as HSFS mounts using the
option. The first reference to something in /home completes
the CacheFS mount of the front filesystem and the backing NFS
filesystem mount. What do you do if you have replicated servers for
the back filesystem? CacheFS uses file handles and other host-specific
information in the cache, so you'll end up with distinct cache entries
for each server you mount from, even if you are accessing the same
named files. CacheFS has no way of knowing that two versions of
Netscape on different servers are byte-for-byte identical, so it
caches them both. Therefore, make sure you enforce some client
preferences in the automounter map entries using server weights, so
that clients re-bind to the same server whenever possible and maintain
their cache "warmth." You'll still rebind to a different server if
your first choice is down, and take the hit of repopulating the
client-side disk cache, but the cache-miss penalty is far smaller than
that of being dead in the water waiting on a crashed NFS server.
Finally, for the truly adventurous, the Solaris automounter supports a
new map type called an executable map. The executable map is
just a script that is run by the automounter when the map is
referenced. The key value is passed to the executable map as an
argument, and the automounter expects a valid map entry (key-value
pair) in return, or a null return value if the key can't be matched.
Make a map executable by setting the execute permission bit, and
ensuring it is a valid script, starting with
something similar. Again, this precludes the use of NIS or NIS+ for
the map, since shell scripts don't fit the key-value pair format too
Using an executable map, you can merge the weight determination and
insertion steps, changing your values dynamically based on network
load or NFS usage statistics. Executable maps are tricky, and you
don't want to be too elaborate with them because each mount operation
is delayed by the time it takes to execute the script. If you look up
trend information in your network management database, calculate
weights based on the time of day and traffic flow, you might take 4 to 5
seconds to return a map entry. Meanwhile, the user who invoked
/usr/local/bin/mosaic is waiting at least that long for
the /usr/local mount to occur before
starts its initialization. Simpler is better; faster is better; less
configuration and maintenance work is better.
More songs about building mount tables
We'll close out this month's excursion into the automounter with three basic warnings:
stern sunrise:/export:& sue divi:/export:& frank monmouth:/export/home:& * wasteland:/export/home:&
With a subdirectory mount structure, the first reference to a filesystem from a server completes an NFS mount, then subsequent references create symbolic links to the common parent directory. The primary problem with subdirectory mounts is that the pathnames used contain the key used for the first mount -- if you reference the filesystems in a different order the next time, you'll get a different set of mount points. It's amazingly confusing and of little value. In the early days of the automounter, NFS mounts consumed precious kernel resources, and reducing the number of mount points made the kernel and users happier. In Solaris, the number of mounts isn't unlimited but it's large enough not to worry about. With NFS over TCP in Solaris 2.5, there's a single TCP connection from the client to each server anyway, so you're not changing much by reducing the number of mount points.
mountfrom the command line, you are really calling a helper executable
/usr/lib/fs/nfs/mount, documented under
mount_nfs. The NFS-specific mount executable understands all of the possible options, parses them, and passes them into the kernel. The automounter doesn't use the helper -- it calls
mount(2)directly. Consequently, your version of the automounter map may not understand some undocumented or recent additions to the NFS options camp, like the
llockoption that turns off remote file locking. Generally, these should be reported as bugs (the
llockproblem is currently being fixed by Sun) to your vendor.
To make sure we've thoroughly flogged the NFS horse, next month we'll cover the ins and outs of NFS and CD-ROM drives, some automounter debugginging tips, keeping users out of each other's NFS-mounted mail spools, and Sun's proposed WebNFS access paradigm for shuffling files through the Internet.
If you have technical problems with this magazine, contact firstname.lastname@example.org