|
Errno Libretto
The system call return value
|
How often has a user come to you saying "This tool won't run correctly -- it says 'permission denied' today but didn't yesterday!" How often have you watched "NFS stale file handle" errors scroll by in a console window, worried that something was desperately wrong? We'll look at the various ways in which system calls fail, and the symptoms by which those failures manifest themselves. (3,500 words)
Mail this article to a friend |
Opera is something I do not appreciate fully. The costumes are exquisite, the music is emotional, but without understanding Italian the plots are hard to follow. Pavarotti could be performing a free-form exploration of the UUCP source code and I would have trouble distinguishing it from Madama Butterfly. Bugs Bunny making a fruit salad on Elmer Fudd's head is the most comprehensible opera I've witnessed. "Wait," my cultured friends tell me, "use the libretto to grasp the story." While it's not quite a set of Cliff Notes, the libretto (text of the opera) helps you build a framework for understanding the action on stage.
What does this have to do with the world of system administration? If the error messages, user questions, system-call errors, and other cryptic failures you encounter sometimes make as much sense La Traviata, then you need a libretto -- a framework for understanding what the system is trying to tell you. We'll look at the various ways in which system calls fail, and the symptoms by which those failures manifest themselves. Starting with general file permission issues, we'll then dive down into NFS failures, and close with some comments on the importance of vigilance in enforcing system programming guidelines. You may not understand Puccini any better than before, but such help is easier to find.
|
|
|
|
Trap defense
System calls represent the boundary between user processes and
operating system (kernel) services. When a process executes a system
call, the associated wrapper in the libc.so library is called
to perform some basic argument checking. If the call is syntactically
acceptable, the wrapper executes a privileged instruction to force a
trap into the kernel. From there, the operating system takes over by
copying arguments, performing extensive checking, and completing the
service request. If you dump out the code for a system call in
libc.so, you'll see a "ta 8" instruction to issue trap 0x08,
which is a system call (see /usr/include/sys/trap.h for trap
types):
huey% adb /usr/lib/libc.so.* _read,4?ia _read: st %o0, [%sp + 0x44] read+4: mov 0x3, %g1 read+8: ta 0x8 read+0xc: bgeu read + 0x40
Nearly every system call returns a single value, ranging from a pointer
or an address, such as from shmat()
or brk()
,
to the size of a data transfer from read()
and
write()
, to a standard system type like a UID returned by
getuid()
. System calls that return integers often use
negative return values to flag a failure, but this rule doesn't apply
to calls that return addresses, which are usually set to NULL if the
call fails. Simple, inconsistent indicators of success or failure don't
give you (and your process) enough information to determine what went
wrong and how to repair the situation, so the system call return value
is supplemented by the error number, or errno value.
If an exception is encountered while processing the system call,
errno is set to one of the values in /usr/include/sys/errno.h.
A successful call sets errno to zero. Most applications include the
errno.h header file, containing the possible values of errno.
Insert a extern int errno;
in your code, and it is
accessible as an integer variable.
In theory, your code should check the value of errno after each
system call, including those that should "never" fail like
close()
, because these system calls can report
failures deferred from other requests -- a topic we'll visit later. Of
course, not all code does such paranoid checking, and you can't modify
commercial applications to make them fit your quality standards ex
post facto. So, how do you start tracking down a user issue when
all you have is an error message?
The system call return value
is supplemented by the error number,
or errno value.
Trace amounts
The first thing to do is to become familiar with the various kinds of
errors reported back through the errno mechanism. Your best source of
information is the introduction to section 2 of the manual pages:
huey% man -s 2 intro
It explains the possible error values and associates them with the
cryptic error messages like "address already in use" printed by the
perror()
library routine. The descriptions aren't
exhaustive and some of the errors are entirely non-obvious. Once you
have a feel for the target, examine the routine in question (in this
case, probproc) using trace
or truss
:
huey% truss -o /tmp/tr.out probproc -a -X -i90
truss
dumps its output into the file named by the
-o
option. trace
, the SunOS 4.1.x
equivalent, doesn't follow forks or trace child processes, but
truss
will chase down a thread of execution until it has
exited. Every system call is shown in the truss
output,
along with the arguments passed (or at least the first few bytes of
them), the return value, and the value of errno, if it was set. Here's
an edited truss
spiel from an attempt to list a
non-existent file:
execve("/usr/bin/ls", 0xEFFFFAE0, 0xEFFFFAEC) argc = 2 open("/usr/lib/libintl.so.1", O_RDONLY, 035737754720) = 3 ioctl(1, TIOCGWINSZ, 0x00024C84) Err#22 EINVAL lstat("xyz", 0xEFFFF9A8) Err#2 ENOENT _exit(2)
Note that the process opens up the internationalization library,
libintl.so.1, a good hint that it was linked with
-lintl
. ls
attempts to get the current
window size using the TIOCGWINSZ ioctl()
, but gets an
"invalid argument" because the example was generated on a dial-in line,
not a shelltool
or xterm
. Searching for the
file information on "xyz" returns a "file not found" error, which is
printed by ls
on its way to a non-zero exit.
Understanding errno isn't purely a serious business. One of the more popular contests at USENIX conferences has been creating new errno names.
Link dink post-shrink
One of the most frustrating exercises performed by system
administrators is explaining (calmly) to users why applications that
behave routinely on machine A suddenly fail or exhibit strange side
effects on machine B. For well-known tools like the C shell, you can
wade through .cshrc scripts and find minor environmental
differences. But how do you deal with shrink-wrapped code? Use
truss
to identify the configuration and initialization
files opened by the application. On the good machine, grep out the list
of files opened and then match it against the same list on the problem
machine:
huey% fgrep 'open(' truss.out1 > /tmp/out1 huey% fgrep 'open(' truss.out2 > /tmp/out2 huey% diff /tmp/out1 /tmp/out2
Look for the string "Err#2 ENOENT" signaling a missing file. Double
check automounter maps, environment variables, and installation
processes that modify files local to each machine, in /etc or
/usr/lib, for example. Some applications search for
configuration files in several directories, and may find identical
files on the two hosts but process them in a different order. Again,
checking the sequence of the open()
calls and the ENOENT
results will tell you if you have a configuration problem.
Use
truss
to identify the
configuration and initialization
files opened by the application.
Also look for EACCESS errors, caused by insufficient file or directory permissions. If the file exists but can't be read by the user, ensure that user and group IDs are consistent between the machines in question. Group-readable files aren't effective unless you enforce group membership on all machines at which users may camp.
Here's a nastier version of the same problem: a user is panicking to
set up a demo environment. Rather than create new users and their
environments, he runs the demo as root, only to have it fail
miserably. Even root gets slapped with EACCESS violations if the files
being accessed are NFS mounted. Over the network, root becomes the
anonymous user nobody, and relies on world read and execute
permissions to open files and search directories. Any application that
works for non-privileged users but fails for root is probably opening
configuration or data files over NFS. If you suspect that NFS access
is contributing to your problem, locate the filesystem for the file in
question using df
:
huey% df `dirname /usr/lib/gfx/config.common` Filesystem kbytes used avail capacity Mounted on bigboy:/export/home/stern 1952573 944377 812946 54% /home/stern
Watch where you drop direct maps for the automounter, and where you use hierarchical maps that may deposit NFS mounts in the middle of someone's home directory. Applications that rely on making backup copies or renaming input data sets using hard links will fail if an NFS mount is introduced into the middle. For example, assume you are mounting home directories using the following hierarchical automounter map:
* \ / homeserv:/export/home/& /fxdata dataserv:/export/datasets/fxdata
When /home/stern is mounted, /home/stern/fxdata is picked
up from the machine dataserv. So far, so good. But an
application may assume that it can create a hard link between files in
/home/stern/fxdata and /home/stern/backup, since they
appear to be on the same filesystem. The link()
system
call fails, however, with EXDEV because the hard link would cross
volume boundaries.
Trail of stale crumbs
NFS errors tend to be hard to resolve because you're assigning blame
in more than one operating system and host environment. Here are some of
the common pitfalls:
NFS errors tend to be hard
to resolve because you're assigning
blame in more than one operating
system and host environment.
You have to follow the trail of network crumbs from the client back to the server to resolve server-specific errors. Your first step: get a general feeling for what went wrong using the NFS error number in the console message. NFS uses the standard errno values, so "NFS write error 28" is the same as ENOSPC, namely, the disk is full or the user exceeded his or her disk quota while writing a file. Many NFS errors have obvious explanations: bumping against quotas, filling up a disk, or a disk failure that results in a general I/O error. The more difficult one to chase is a stale file handle.
NFS file handles encode the server's filesystem ID and the file's inode number to uniquely identify each NFS-mounted file. Each inode also contains an inode generation number used to differentiate files that have re-used an inode. Delete a file, for example, /home/stern/summary, and then create a new file, say /home/stern/report on the same filesystem. The new file re-uses the same inode number as the previously deleted file (assuming no other file creation activity snuck in) but increments the inode generation number to distinguish it from the old, removed file.
What if a process on an NFS client machine had
/home/stern/summary open when it was deleted on the server or
by some other client? NFS has no record of open()
activity, so it cannot notify the client that one of its open files has
been removed. The next time the client sends a request with the file
handle for the "summary" file, the NFS server recognizes that the
handle contains an inode generation number that no longer matches the
current generation, and it returns a stale file handle error. You'll
also end up with stale file handles when the inode is no longer valid,
for example, if a file is removed but the newly freed inode has not
been re-used.
If you want to watch a network crumble, try restoring an NFS-exported
filesystem onto a pristine filesystem without rebooting NFS clients
using it. When the new filesystem is created, newfs
runs
a utility called fsirand
to randomly seed the inode
generation numbers. During the restore process, files are attached to
the first available inode, not necessarily the same inode number they
had in the old filesystem. Every client that has an open file handle
on the restored filesystem will see stale file handles, since either
the inode number or generation number will be mismatched. Clients will
hammer away at the network, retrying NFS requests that fail, unable to
determine how to fix the stale file handle problems. Your only
recourse is to reboot the net-world and let the clients acquire new
handles.
How do you associate an NFS error with a client process? First,
identify the file in question on the server. In SunOS 4.1.x, the
showfh
utility takes a file handle and resolves it to a
file on the NFS server. However, the RPC daemon used by
showfh
(rpc.showfhd
) isn't started by
default, and it frequently times out due to the long search time
required to find the inode in question. An easier approach is to use a
server-side script called
fhfind
,
written by Sun's Brent Callaghan (creator of the automounter), that
takes a file handle and locates the file associated with it. For
example, let's say that you're seeing:
NFS write error 28 on server bigboy 1540002 2 a0000 4f77 48df4455 a0000 2 25d1121d
Error 28 is ENOSPC, so you're out of disk space. Running
df
on the server verifies that problem. Your job: Get the
writing client to ease up so you can clean up. On server
bigboy, run fhfind
to identify the file
represented by the file handle:
bigboy# fhfind 1540002 2 a0000 4f77 48df4455 a0000 2 25d1121d /export/home/stern/summary
Running fhfind
can take quite a while, particularly for
large filesystems, because it does a find
on every file
to locate the inode number. On the client reporting the error, use the
fuser
utility to find the process holding this file open:
huey# fuser /home/stern/summary /home/stern/summary: 10543o
We can get more detail via the
lsof
tool:
huey# lsof /home/stern/summary COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE/NAME reptool 12582 stern 3r VREG 0x022000a9 158 68376 /home/stern (sugar:/export/home/stern)
lsof
shows us the file descriptor number used to hold the
file open, as well as some information normally included with
ps
. Look for open files of type VREG in
lsof
's output, noting that these are regular open
files. Entries marked with a type of VDIR are current directories, and
are probably not the source of your problem.
There is a drawback to this approach: stale file handles can't be
found using the fhfind
script. Inodes associated with
stale file handles either aren't valid, and therefore can't be found
by searching the filesystem, or have been re-used by a new file,
possibly with a different name. In this case the best tactic is to
narrow down the process candidates using lsof
to find
those with NFS files open:
huey# lsof -N | fgrep VREG
Look for file descriptors (in the FD column) with a w
in
them, indicating the file has been opened for writing. You don't
really need the filename for the stale file handle; it may not even
exist at this point. Just take the inode numbers reported by
lsof
and match them against the inode numbers pulled from
the stale file handle error messages on the console. Use this script
to convert a file handle into a server inode number:
#! /bin/sh # fh2inode - convert NFS file handle to inode fh=`echo $4 | tr [a-z] [A-Z]` echo "ibase=16;$fh" | bc
If the server exports more than one filesystem, you'll need to find the volume associated with the stale handle. The first value in the file handle is a filesystem ID; match it to the mounted filesystem ID values in /etc/mnttab to locate the volume on which you're experiencing an error.
As soon as you've found the process writing to a stale file handle, clean up gently by polling the user, then killing or restarting the process.
close() to the edge
Detecting errors while writing to a file is complex for both NFS and
local filesystems. Unix does asynchronous writes, that is, the writes
are stacked up by the operating system and flushed out periodically.
On local disks, the update
daemon runs every 30 seconds
to force pending writes to disk. With NFS, the kernel threads (Solaris
2.x) or biod processes (SunOS 4.1.x) queue writes locally. What
happens if an error occurs during the completion of one of the
write()
system calls? In short, the error is reported
back on the next write()
system call or on the call to
close()
. You're guaranteed to see any errors by the time
close()
returns, because all pending writes are flushed
(converted to synchronous writes) when the file is closed.
How often does your code check the return value from
close()
? Again, this is an issue for local disks and NFS
filesystems, although you're more likely to see problems with NFS
since most error checking is done by the server, after the request has
been buffered and subsequently flushed by the client. If you fill a
disk or exceed a quota, you run the risk of having an NFS write fail
undetected unless you check return and errno values from
write()
and close()
.
The moral of the story is religious enforcement of standards for system programming, paying particular attention to error checking. If you want to become the patron saint of errno, here are some guiding principles:
truss
to make sure that the value
you're testing was produced by the system call immediately preceding
the test.
-lintl
. The perror()
library routine
looks up error messages in an internationalized library, accessed if
libintl.so is linked in. If your system or application sets
its locale, you can see Unix error messages in French, German, Italian,
or other languages. (And you thought the opera reference was a non
sequitur.)
Rigorous adherence to good system programming practices prevents odd failures due to unexpected input or output conditions. Nobody plans for their code to handle disk overflows, but these deficiencies become clear at the worst possible moment, when the system -- and you -- are under maximum stress. Unresolved error conditions are the ones that cause loss of data, jobs, or your sanity. Keep your users abreast of your system style guide, and you might just have time to appreciate those great operatic moments.
|
Resources
fhfind
http:/sunworldonline/ftp/sysadmin/fhfind
fuser
/sunworldonline/swol-09-1995/swol-09-sysadmin.html#fuser
lsof
/sunworldonline/swol-09-1995/swol-09-sysadmin.html#lsof
Madama Butterfly libretto http://plaza.interport.net/nycopera/education/butterfl.html
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-10-1995/swol-10-sysadmin.html
Last modified: