Click on our Sponsors to help Support SunWorld
Sysadmin by Hal Stern

Errno Libretto

The system call return value
is supplemented by the error number,
or errno value.

SunWorld
October  1995
[Next story]
[Table of Contents]
[Search]
Subscribe to SunWorld, it's free!

Abstract
How often has a user come to you saying "This tool won't run correctly -- it says 'permission denied' today but didn't yesterday!" How often have you watched "NFS stale file handle" errors scroll by in a console window, worried that something was desperately wrong? We'll look at the various ways in which system calls fail, and the symptoms by which those failures manifest themselves. (3,500 words)


Mail this
article to
a friend

Opera is something I do not appreciate fully. The costumes are exquisite, the music is emotional, but without understanding Italian the plots are hard to follow. Pavarotti could be performing a free-form exploration of the UUCP source code and I would have trouble distinguishing it from Madama Butterfly. Bugs Bunny making a fruit salad on Elmer Fudd's head is the most comprehensible opera I've witnessed. "Wait," my cultured friends tell me, "use the libretto to grasp the story." While it's not quite a set of Cliff Notes, the libretto (text of the opera) helps you build a framework for understanding the action on stage.

What does this have to do with the world of system administration? If the error messages, user questions, system-call errors, and other cryptic failures you encounter sometimes make as much sense La Traviata, then you need a libretto -- a framework for understanding what the system is trying to tell you. We'll look at the various ways in which system calls fail, and the symptoms by which those failures manifest themselves. Starting with general file permission issues, we'll then dive down into NFS failures, and close with some comments on the importance of vigilance in enforcing system programming guidelines. You may not understand Puccini any better than before, but such help is easier to find.


Advertisements

Trap defense
System calls represent the boundary between user processes and operating system (kernel) services. When a process executes a system call, the associated wrapper in the libc.so library is called to perform some basic argument checking. If the call is syntactically acceptable, the wrapper executes a privileged instruction to force a trap into the kernel. From there, the operating system takes over by copying arguments, performing extensive checking, and completing the service request. If you dump out the code for a system call in libc.so, you'll see a "ta 8" instruction to issue trap 0x08, which is a system call (see /usr/include/sys/trap.h for trap types):

huey% adb /usr/lib/libc.so.*
_read,4?ia
_read:		st	%o0, [%sp + 0x44]
read+4:		mov	0x3, %g1
read+8:		ta	0x8
read+0xc:	bgeu	read + 0x40

Nearly every system call returns a single value, ranging from a pointer or an address, such as from shmat() or brk(), to the size of a data transfer from read() and write(), to a standard system type like a UID returned by getuid(). System calls that return integers often use negative return values to flag a failure, but this rule doesn't apply to calls that return addresses, which are usually set to NULL if the call fails. Simple, inconsistent indicators of success or failure don't give you (and your process) enough information to determine what went wrong and how to repair the situation, so the system call return value is supplemented by the error number, or errno value.

If an exception is encountered while processing the system call, errno is set to one of the values in /usr/include/sys/errno.h. A successful call sets errno to zero. Most applications include the errno.h header file, containing the possible values of errno. Insert a extern int errno; in your code, and it is accessible as an integer variable.

In theory, your code should check the value of errno after each system call, including those that should "never" fail like close(), because these system calls can report failures deferred from other requests -- a topic we'll visit later. Of course, not all code does such paranoid checking, and you can't modify commercial applications to make them fit your quality standards ex post facto. So, how do you start tracking down a user issue when all you have is an error message?


The system call return value
is supplemented by the error number,
or errno value.


Trace amounts
The first thing to do is to become familiar with the various kinds of errors reported back through the errno mechanism. Your best source of information is the introduction to section 2 of the manual pages:

huey% man -s 2 intro

It explains the possible error values and associates them with the cryptic error messages like "address already in use" printed by the perror() library routine. The descriptions aren't exhaustive and some of the errors are entirely non-obvious. Once you have a feel for the target, examine the routine in question (in this case, probproc) using trace or truss:

huey% truss -o /tmp/tr.out probproc -a -X -i90

truss dumps its output into the file named by the -o option. trace, the SunOS 4.1.x equivalent, doesn't follow forks or trace child processes, but truss will chase down a thread of execution until it has exited. Every system call is shown in the truss output, along with the arguments passed (or at least the first few bytes of them), the return value, and the value of errno, if it was set. Here's an edited truss spiel from an attempt to list a non-existent file:

execve("/usr/bin/ls", 0xEFFFFAE0, 0xEFFFFAEC)  argc = 2
open("/usr/lib/libintl.so.1", O_RDONLY, 035737754720) = 3
ioctl(1, TIOCGWINSZ, 0x00024C84)		Err#22 EINVAL
lstat("xyz", 0xEFFFF9A8)			Err#2 ENOENT
_exit(2)

Note that the process opens up the internationalization library, libintl.so.1, a good hint that it was linked with -lintl. ls attempts to get the current window size using the TIOCGWINSZ ioctl(), but gets an "invalid argument" because the example was generated on a dial-in line, not a shelltool or xterm. Searching for the file information on "xyz" returns a "file not found" error, which is printed by ls on its way to a non-zero exit.

Understanding errno isn't purely a serious business. One of the more popular contests at USENIX conferences has been creating new errno names.

Link dink post-shrink
One of the most frustrating exercises performed by system administrators is explaining (calmly) to users why applications that behave routinely on machine A suddenly fail or exhibit strange side effects on machine B. For well-known tools like the C shell, you can wade through .cshrc scripts and find minor environmental differences. But how do you deal with shrink-wrapped code? Use truss to identify the configuration and initialization files opened by the application. On the good machine, grep out the list of files opened and then match it against the same list on the problem machine:

huey% fgrep  'open(' truss.out1 > /tmp/out1
huey% fgrep  'open(' truss.out2 > /tmp/out2
huey% diff /tmp/out1 /tmp/out2

Look for the string "Err#2 ENOENT" signaling a missing file. Double check automounter maps, environment variables, and installation processes that modify files local to each machine, in /etc or /usr/lib, for example. Some applications search for configuration files in several directories, and may find identical files on the two hosts but process them in a different order. Again, checking the sequence of the open() calls and the ENOENT results will tell you if you have a configuration problem.


Use truss to identify the
configuration and initialization
files opened by the application.


Also look for EACCESS errors, caused by insufficient file or directory permissions. If the file exists but can't be read by the user, ensure that user and group IDs are consistent between the machines in question. Group-readable files aren't effective unless you enforce group membership on all machines at which users may camp.

Here's a nastier version of the same problem: a user is panicking to set up a demo environment. Rather than create new users and their environments, he runs the demo as root, only to have it fail miserably. Even root gets slapped with EACCESS violations if the files being accessed are NFS mounted. Over the network, root becomes the anonymous user nobody, and relies on world read and execute permissions to open files and search directories. Any application that works for non-privileged users but fails for root is probably opening configuration or data files over NFS. If you suspect that NFS access is contributing to your problem, locate the filesystem for the file in question using df:

huey% df `dirname /usr/lib/gfx/config.common`
Filesystem                 kbytes    used   avail capacity  Mounted on
bigboy:/export/home/stern 1952573  944377  812946    54%   /home/stern

Watch where you drop direct maps for the automounter, and where you use hierarchical maps that may deposit NFS mounts in the middle of someone's home directory. Applications that rely on making backup copies or renaming input data sets using hard links will fail if an NFS mount is introduced into the middle. For example, assume you are mounting home directories using the following hierarchical automounter map:

* \
	/	homeserv:/export/home/&
	/fxdata	dataserv:/export/datasets/fxdata

When /home/stern is mounted, /home/stern/fxdata is picked up from the machine dataserv. So far, so good. But an application may assume that it can create a hard link between files in /home/stern/fxdata and /home/stern/backup, since they appear to be on the same filesystem. The link() system call fails, however, with EXDEV because the hard link would cross volume boundaries.

Trail of stale crumbs
NFS errors tend to be hard to resolve because you're assigning blame in more than one operating system and host environment. Here are some of the common pitfalls:


NFS errors tend to be hard
to resolve because you're assigning
blame in more than one operating
system and host environment.


You have to follow the trail of network crumbs from the client back to the server to resolve server-specific errors. Your first step: get a general feeling for what went wrong using the NFS error number in the console message. NFS uses the standard errno values, so "NFS write error 28" is the same as ENOSPC, namely, the disk is full or the user exceeded his or her disk quota while writing a file. Many NFS errors have obvious explanations: bumping against quotas, filling up a disk, or a disk failure that results in a general I/O error. The more difficult one to chase is a stale file handle.

NFS file handles encode the server's filesystem ID and the file's inode number to uniquely identify each NFS-mounted file. Each inode also contains an inode generation number used to differentiate files that have re-used an inode. Delete a file, for example, /home/stern/summary, and then create a new file, say /home/stern/report on the same filesystem. The new file re-uses the same inode number as the previously deleted file (assuming no other file creation activity snuck in) but increments the inode generation number to distinguish it from the old, removed file.

What if a process on an NFS client machine had /home/stern/summary open when it was deleted on the server or by some other client? NFS has no record of open() activity, so it cannot notify the client that one of its open files has been removed. The next time the client sends a request with the file handle for the "summary" file, the NFS server recognizes that the handle contains an inode generation number that no longer matches the current generation, and it returns a stale file handle error. You'll also end up with stale file handles when the inode is no longer valid, for example, if a file is removed but the newly freed inode has not been re-used.

If you want to watch a network crumble, try restoring an NFS-exported filesystem onto a pristine filesystem without rebooting NFS clients using it. When the new filesystem is created, newfs runs a utility called fsirand to randomly seed the inode generation numbers. During the restore process, files are attached to the first available inode, not necessarily the same inode number they had in the old filesystem. Every client that has an open file handle on the restored filesystem will see stale file handles, since either the inode number or generation number will be mismatched. Clients will hammer away at the network, retrying NFS requests that fail, unable to determine how to fix the stale file handle problems. Your only recourse is to reboot the net-world and let the clients acquire new handles.

How do you associate an NFS error with a client process? First, identify the file in question on the server. In SunOS 4.1.x, the showfh utility takes a file handle and resolves it to a file on the NFS server. However, the RPC daemon used by showfh (rpc.showfhd) isn't started by default, and it frequently times out due to the long search time required to find the inode in question. An easier approach is to use a server-side script called fhfind, written by Sun's Brent Callaghan (creator of the automounter), that takes a file handle and locates the file associated with it. For example, let's say that you're seeing:

NFS write error 28 on server bigboy 1540002 2 a0000 4f77 48df4455 a0000 2 25d1121d

Error 28 is ENOSPC, so you're out of disk space. Running df on the server verifies that problem. Your job: Get the writing client to ease up so you can clean up. On server bigboy, run fhfind to identify the file represented by the file handle:

bigboy# fhfind 1540002 2 a0000 4f77 48df4455 a0000 2 25d1121d
/export/home/stern/summary

Running fhfind can take quite a while, particularly for large filesystems, because it does a find on every file to locate the inode number. On the client reporting the error, use the fuser utility to find the process holding this file open:

huey# fuser /home/stern/summary
/home/stern/summary:  10543o  

We can get more detail via the lsof tool:

huey# lsof /home/stern/summary
COMMAND     PID     USER   FD   TYPE     DEVICE   SIZE/OFF  INODE/NAME
reptool   12582    stern    3r  VREG 0x022000a9        158  68376 /home/stern (sugar:/export/home/stern)

lsof shows us the file descriptor number used to hold the file open, as well as some information normally included with ps. Look for open files of type VREG in lsof's output, noting that these are regular open files. Entries marked with a type of VDIR are current directories, and are probably not the source of your problem.

There is a drawback to this approach: stale file handles can't be found using the fhfind script. Inodes associated with stale file handles either aren't valid, and therefore can't be found by searching the filesystem, or have been re-used by a new file, possibly with a different name. In this case the best tactic is to narrow down the process candidates using lsof to find those with NFS files open:

huey# lsof -N | fgrep VREG

Look for file descriptors (in the FD column) with a w in them, indicating the file has been opened for writing. You don't really need the filename for the stale file handle; it may not even exist at this point. Just take the inode numbers reported by lsof and match them against the inode numbers pulled from the stale file handle error messages on the console. Use this script to convert a file handle into a server inode number:

#! /bin/sh
# fh2inode - convert NFS file handle to inode
fh=`echo $4 | tr [a-z] [A-Z]` 
echo "ibase=16;$fh" | bc

If the server exports more than one filesystem, you'll need to find the volume associated with the stale handle. The first value in the file handle is a filesystem ID; match it to the mounted filesystem ID values in /etc/mnttab to locate the volume on which you're experiencing an error.

As soon as you've found the process writing to a stale file handle, clean up gently by polling the user, then killing or restarting the process.

close() to the edge
Detecting errors while writing to a file is complex for both NFS and local filesystems. Unix does asynchronous writes, that is, the writes are stacked up by the operating system and flushed out periodically. On local disks, the update daemon runs every 30 seconds to force pending writes to disk. With NFS, the kernel threads (Solaris 2.x) or biod processes (SunOS 4.1.x) queue writes locally. What happens if an error occurs during the completion of one of the write() system calls? In short, the error is reported back on the next write() system call or on the call to close(). You're guaranteed to see any errors by the time close() returns, because all pending writes are flushed (converted to synchronous writes) when the file is closed.

How often does your code check the return value from close()? Again, this is an issue for local disks and NFS filesystems, although you're more likely to see problems with NFS since most error checking is done by the server, after the request has been buffered and subsequently flushed by the client. If you fill a disk or exceed a quota, you run the risk of having an NFS write fail undetected unless you check return and errno values from write() and close().

The moral of the story is religious enforcement of standards for system programming, paying particular attention to error checking. If you want to become the patron saint of errno, here are some guiding principles:

Rigorous adherence to good system programming practices prevents odd failures due to unexpected input or output conditions. Nobody plans for their code to handle disk overflows, but these deficiencies become clear at the worst possible moment, when the system -- and you -- are under maximum stress. Unresolved error conditions are the ones that cause loss of data, jobs, or your sanity. Keep your users abreast of your system style guide, and you might just have time to appreciate those great operatic moments.


Click on our Sponsors to help Support SunWorld


Resources


What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough
 
 
 
    

SunWorld
[Table of Contents]
Subscribe to SunWorld, it's free!
[Search]
Feedback
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-10-1995/swol-10-sysadmin.html
Last modified: