The power of /proc
Using proc tools to solve your system problems
Many sysdamins don't realize the wide variety and functionality of the tools that are native to Solaris. When they experience problems with applications or trouble at the system level, they can be left at a loss for how to debug the problem and resolve issues. This month, Peter takes a look at the proc tools and how they can help during difficult times. (3,000 words)
One of the engineers from Bell Labs joined the Sun development team and created the /proc filesystem and the /usr/proc/bin tools. Before this interface was invented, all programs that used kernel state (for instance, ps and top) had to be recompiled with each OS release. They read kernel memory and the memory locations of key variables changed with each release. With the advent of /proc, applications (including systems programs) have a uniform and static interface into the kernel. The proc tools were actually written to test the /proc filesystem interface, but so many folks at Sun were using them that they decided to include them with Solaris.
Why, as a systems administrator, do you care? Well, the tools that manipulate the proc interface can be very useful in determining system state and debugging application and systems problems. First, let's take a look at /proc and the tools, then we will look at some real-life uses for the tools.
Perhaps you have noticed an odd member of the output of df -k:
$ df -k Filesystem kbytes used avail capacity Mounted on /proc 0 0 0 0% /proc
/proc is mounted by the /etc/rcS.d/S40standardmounts.sh startup script. The mount makes the /proc interface into the kernel available to the system and its applications.
Once created, /proc echos the state of all processes on the system. Consider an abbreviated listing of /proc on Solaris 2.6 (Solaris 7 works the same way):
$ ls -l /proc total 168 dr-x--x--x 5 root root 736 Jan 15 17:00 0 dr-x--x--x 5 root root 736 Jan 15 17:00 1 dr-x--x--x 5 root root 736 Feb 25 11:13 10258 dr-x--x--x 5 root root 736 Jan 15 17:00 11 dr-x--x--x 5 jds staff 736 Mar 17 08:03 11892 dr-x--x--x 5 akane staff 736 Mar 17 08:32 12032 dr-x--x--x 5 cbertold staff 736 Mar 17 08:44 12098 dr-x--x--x 5 jkelly staff 736 Mar 17 08:56 12186 dr-x--x--x 5 root root 736 Mar 9 09:08 12522 dr-x--x--x 5 jds staff 736 Mar 9 09:08 12524 dr-x--x--x 5 root root 736 Mar 9 09:10 12540 dr-x--x--x 5 jds staff 736 Mar 9 09:10 12542 dr-x--x--x 5 spd staff 736 Mar 17 10:01 12547 dr-x--x--x 5 cbertold staff 736 Mar 17 10:03 12555 dr-x--x--x 5 root root 736 Mar 17 10:09 12597 dr-x--x--x 5 pbg staff 736 Mar 17 10:09 12599 dr-x--x--x 5 jds staff 736 Mar 17 10:19 12660 dr-x--x--x 5 pbg staff 736 Mar 17 10:25 12670 dr-x--x--x 5 root root 736 Jan 15 17:00 2 dr-x--x--x 5 root root 736 Jan 15 17:01 239 dr-x--x--x 5 root root 736 Jan 15 17:01 241 dr-x--x--x 5 root root 736 Feb 9 13:50 24515 dr-x--x--x 5 root root 736 Jan 15 17:00 3 dr-x--x--x 5 root root 736 Jan 15 17:01 307
Each numerical directory entry in /proc represents the process with a matching process-ID. The owner of the directory entry is the UID of the process, and the group of the directory is likewise the GID of the process. In this way, only the process owner (and root) have primary access to the process information. Looking within a directory, we see:
$ ls -l /proc/12599 total 3543 -rw------- 1 pbg staff 1794048 Mar 18 20:48 as -r-------- 1 pbg staff 152 Mar 18 20:48 auxv -r-------- 1 pbg staff 32 Mar 18 20:48 cred --w------- 1 pbg staff 0 Mar 18 20:48 ctl lr-x------ 1 pbg staff 0 Mar 18 20:48 cwd -> dr-x------ 2 pbg staff 1056 Mar 18 20:48 fd -r--r--r-- 1 pbg staff 120 Mar 18 20:48 lpsinfo -r-------- 1 pbg staff 912 Mar 18 20:48 lstatus -r--r--r-- 1 pbg staff 536 Mar 18 20:48 lusage dr-xr-xr-x 3 pbg staff 48 Mar 18 20:48 lwp -r-------- 1 pbg staff 1728 Mar 18 20:48 map dr-x------ 2 pbg staff 544 Mar 18 20:48 object -r-------- 1 pbg staff 2048 Mar 18 20:48 pagedata -r--r--r-- 1 pbg staff 336 Mar 18 20:48 psinfo -r-------- 1 pbg staff 1728 Mar 18 20:48 rmap lr-x------ 1 pbg staff 0 Mar 18 20:48 root -> -r-------- 1 pbg staff 1440 Mar 18 20:48 sigact -r-------- 1 pbg staff 1232 Mar 18 20:48 status -r--r--r-- 1 pbg staff 256 Mar 18 20:48 usage -r-------- 1 pbg staff 0 Mar 18 20:48 watch -r-------- 1 pbg staff 2736 Mar 18 20:48 xmap
Notice the varying permissions on each component of the process's
structure. Some components are read-only, some are write-only, and
some are a mix. The mode of access to a component is dictated by its
functionality. For instance, the file "as" is the address space
(virtual memory) of the process, and is readable and
writable. On the other hand, ctl allows manipulation of
the process's state, and is therefore only writable. Details about
each component and its role are included in the man page (
-s 4 proc). Of most interest is as, because it
indicates the relative memory use of the process. It is a relative
measure because it includes the memory used by all the shared
libraries. Therefore, it is not, for example, an accurate reflection of the memory that would be freed if the process was killed.
Using the /proc tool set
Fortunately, we do not need to deal with the intricacies of /proc directly. Rather, there is a set of tools available to do the dirty work for us. With each new release of Solaris, the proc tool set expands. Under Solaris 2.6, the list includes these tools (in /usr/proc/bin).
pcred pflags pmap psig pstop ptree pwdx pfiles pldd prun pstack ptime pwait
Let's look at each of the tools.
pcred prints the effective, real, and saved UID and GID of a process:
$ /usr/proc/bin/pcred 12599 12599: e/r/suid=500 e/r/sgid=10
pfiles lists all open files (file descriptors represent open files in Unix) associated with the process, as well as any per-process limits on open files:
$ /usr/proc/bin/pfiles 12599 12955: vi Current rlimit: 64 file descriptors 0: S_IFCHR mode:0620 dev:136,0 ino:88226 uid:500 gid:7 rdev:24,4 O_RDWR 1: S_IFCHR mode:0620 dev:136,0 ino:88226 uid:500 gid:7 rdev:24,4 O_RDWR 2: S_IFCHR mode:0620 dev:136,0 ino:88226 uid:500 gid:7 rdev:24,4 O_RDWR 3: S_IFCHR mode:0666 dev:136,0 ino:88109 uid:0 gid:3 rdev:13,12 O_RDWR 4: S_IFREG mode:0600 dev:136,0 ino:456959 uid:500 gid:10 size:24576 O_RDWR
Descriptors 0, 1, and 2 are part of the standard I/O package (stdin, stdout, and stderr), so those inodes represent entries in /dev/pty. (You might want to practice the following technique on these files to prove to yourself that it works.)
To determine the file that descriptor 4 points to requires a little detective work. We could just search the entire system for inode number 456959. Unfortunately, inode numbers are only unique per-partition, so first we need to determine which partition the inode in question is on. We start by searching through the appropriate /devices entries to find one with the matching major and minor device number (in the case of descriptor 4, the major number is 136 and the minor number is 0). The matching device is found by looking through /devices:
$ ls -lR /devices | grep 136 brw------- 1 root sys 136, 0 Mar 2 11:10 dad@0,0:a crw------- 1 root sys 136, 0 Mar 2 11:10 dad@0,0:a,raw brw------- 1 root sys 136, 1 Mar 2 11:10 dad@0,0:b crw------- 1 root sys 136, 1 Mar 2 11:10 dad@0,0:b,raw . . .
Then, we determine the logical device name by grepping for the physical device name in the /dev tree, as in:
$ ls -lR /dev | grep dad@0,0 lrwxrwxrwx 1 root root 46 Mar 2 11:10 c0t0d0s0 -> ../../devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:a lrwxrwxrwx 1 root root 46 Mar 2 11:10 c0t0d0s1 -> ../../devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:b lrwxrwxrwx 1 root root 46 Mar 2 11:10 c0t0d0s2 -> ../../devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:c lrwxrwxrwx 1 root root 46 Mar 2 11:10 c0t0d0s3 -> ../../devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:d . . .
We know that the appropriate device is c0t0d0s0, because its device path matches that of the major and minor device number we're looking for. On systems with many disks, this process becomes more complex with duplicate device names (dad@0,0) for instance, but different device paths (pci@1f,4000 for instance). In those cases, the greps must be more complete and include the device path as well as the device name. So now we have the correct device, but how do we determine which file is open? First, we locate the mount point for the device in question:
$ df -k Filesystem kbytes used avail capacity Mounted on /proc 0 0 0 0% /proc /dev/dsk/c0t0d0s0 8162157 1639360 6441176 21% / fd 0 0 0 0% /dev/fd swap 546392 296 546096 1% /tmp
Then we use find, which has an option to locate files with a specific inode number:
$ find / -inum 456959 -mount -print /var/tmp/Ex0000002849
The "-mount" option prevents find from searching beyond the starting mount point (so it will not search other partitions). We have now found the file opened by vi. Unfortunately, when vi edits a file, it first copies it to /var/tmp with a temporary name. When vi writes changes back, it writes them to the original file and deletes the temporary copy (thus a system crash in the middle of a vi session would allow recovery of the original file).
Now, on to the other /proc commands.
pflags determine the status of the process.
$ /usr/proc/bin/pflags 12599 12599: -ksh /1: flags = PR_PCINVAL|PR_ORPHAN|PR_ASLEEP [ waitid(0x7,0x0,0xeffff930,0x7) ]
The meanings of the flags can be found in the appropriate ".h" file: /usr/include/sys/procfs.h.
pldd lists all the dynamic libraries that are associated with the process:
$/usr/proc/bin/pldd 12599 12599: -ksh /usr/lib/libsocket.so.1 /usr/lib/libnsl.so.1 /usr/lib/libc.so.1 /usr/lib/libdl.so.1 /usr/lib/libmp.so.2 /usr/platform/sun4u/lib/libc_psr.so.1
pmap lists the process's address space, including sizes of memory segments and the access allowed to each:
$ /usr/proc/bin/pmap 12599 12599: -ksh 00010000 184K read/exec /usr/bin/ksh 0004C000 8K read/write/exec /usr/bin/ksh 0004E000 32K read/write/exec [ heap ] EF580000 592K read/exec /usr/lib/libc.so.1 EF622000 32K read/write/exec /usr/lib/libc.so.1 EF62A000 8K read/write/exec [ anon ] EF680000 448K read/exec /usr/lib/libnsl.so.1 EF6FE000 40K read/write/exec /usr/lib/libnsl.so.1 EF708000 24K read/write/exec [ anon ] EF750000 16K read/exec /usr/platform/sun4u/lib/libc_psr.so.1 EF760000 16K read/exec /usr/lib/libmp.so.2 EF772000 8K read/write/exec /usr/lib/libmp.so.2 EF790000 32K read/exec /usr/lib/libsocket.so.1 EF7A6000 8K read/write/exec /usr/lib/libsocket.so.1 EF7A8000 8K read/write/exec [ anon ] EF7B0000 8K read/exec /usr/lib/libdl.so.1 EF7C0000 8K read/write/exec [ anon ] EF7D0000 112K read/exec /usr/lib/ld.so.1 EF7FA000 8K read/write/exec /usr/lib/ld.so.1 EFFFC000 16K read/write/exec [ stack ] total 1608K
pstack shows the stack trace for each thread (lightweight process or LWP) in a process. This information can help determine where a process is hung, why it is using up too much memory, and so on:
$ /usr/proc/bin/pstack 12599 12599: -ksh ef5b915c waitid (7, 0, effff930, 7) ef5d40d0 _libc_waitpid (ffffffff, effffa30, 4, 7, ef622e54, 2422c) + 54 0002422c job_wait (4e000, 0, 52818, 4, ef622e54, 2fa54) + 184 0002fd04 sh_exec (31b2, 0, 0, 4e400, 4e000, 0) + c1c 00027894 ???????? (5174c, 4cf38, 4cf38, 4e400, 4e400, 4d294) 00027174 main (4e400, efffff6c, 4e400, 4e400, 4e400, 4e400) + 844 00015e88 _start (0, 0, 0, 0, 0, 0) + dc
ptree prints a formatted listing of a process's lineage, with child processes indented beneath their parent. It can show you the whole system's process tree, or just the parents of a given process:
$ /usr/proc/bin/ptree 12599 285 /usr/sbin/inetd -s 12597 in.telnetd 12599 -ksh 12773 /usr/proc/bin/ptree 12599
In this case the ptree command was started by a ksh shell, which was started by telnetd due to an incoming telnet. The telnetd was started by the internet services daemon inetd.
pwdx prints the current working directory of the process:
$ /usr/proc/bin/pwdx 12599 12599: /export/home/pbg
ptime times the execution of a process with "microstate accounting" for more precision (and more reproducible results) than the time command:
$ /usr/proc/bin/ptime ls (output from ls) real 0.013 user 0.004 sys 0.007
As of Solaris 7, a couple of new and useful proc tools were added. They live in /usr/bin because they are needed by the boot and shutdown scripts.
plimit gets and sets the per-process limits:
$ /usr/bin/plimit 482 974: -ksh resource current maximum time(seconds) unlimited unlimited file(blocks) unlimited unlimited data(kbytes) unlimited unlimited stack(kbytes) 8192 unlimited coredump(blocks) unlimited unlimited nofiles(descriptors) 64 1024 vmemory(kbytes) unlimited unlimited
pgrep searches for processes matching a certain criteria. No more ps | grep pipes!
$ /usr/bin/pgrep tcsh 361 414 416 554
Finally, pkill sends a user-definable signal to one or more processes, based on criteria such as process name or process owner. Not only is pkill useful, but it is responsible in good part for the boot and shutdown performance improvements of Solaris 7.
$ /usr/bin/pkill bad-process
Most of the /proc-related commands have a few options, and most accept a list of processes. Check out the man pages for a few more details.
There is also a /proc gotcha to be aware of. From the manual:
These proc tools stop their target processes while inspecting them and reporting the results: pfiles, pldd, pmap, pstack, pwdx. A process can do nothing while it is stopped. Thus, for example, if the X server is inspected by one of these proc tools running in a window under the X server's control, the whole window system can become deadlocked because the proc tool would be attempting to print its results to a window that cannot be refreshed. Logging in from another system using rlogin(1) and killing the offending proc tool would clear up the deadlock in this case.
The following is a real-life example of the utility of the proc tools. A site was having a problem with a daemon. The daemon's job was to accept connections from client machines and allow them to process insurance claims. During testing, the daemon would run for a while and then crash. This type of behavior indicates a resource limit, but what resource? Using the various proc tools during the daemon's execution, we noticed that the number of open files kept climbing, and that the process failed as they went over 60 or so. The culprit was the file-descriptor limit. Removing the limit removed the problem as well. The proc tools made this problem very easy to solve.
A quick note about a useful book. It's actually a repackaging of a series of O'Reilly and Associates books. The result is a combination book and CD. The book is Unix in a Nutshell System V Edition,and the CD includes that work plus Unix Power Tools, Second Edition; Learning the Unix Operating System, Fourth Edition; Sed & Awk, Second Edition; Learning the vi Editor, Fifth Edition; and Learning the Korn Shell. The CD contents are indexed and in HTML format, makingthem very convenient to use. The combination is useful for those who haven't bought all the books, or like to travel light (consultants and the like). There are other book-CD combinations coming that also look to be very useful. Highly recommended. Details are available in Resources.
A couple of notes about the previous columns on patch theory and practice:
Nice article, but Peter really does need to be reminded that other countries other than the USA exist. 1-800-USA-4-SUN? What??
Sorry about that! Any 800-numbers that I mention are, of course, for the US only.
I found a patch that overwrote /dev/null. Sun's response was that all patches (regardless of install notes) should be installed in single-user mode. Not always an easy thing to do, but after getting burnt, this is now mandatory for even the simplest patch.
That makes a nice juxtaposition to another letter:
I don't agree with "If no crash, no patch." If a patch exists it is there to solve a problem. My suggestion is install all recommended and security patches and make that as a routine.
Both of these letters reinforce some points made in the previous columns: that patch policy should and will vary according to site needs. Some sites can afford no downtime at all, and therefore only patch when absolutely necessary. Others have planned maintenance windows which afford a perfect opportunity to install the recommended, suggested, Y2K, and security patches on a schedule. Another important suggestion: always make a backup before making major system changes, make the changes during scheduled downtime, in single-user mode if possible, and reboot when done to verify that all changes take affect and that they have not caused problems with system functions.
Next month Pete's Super Systems will cover the changes to system administration in Solaris 7.
About the author
Peter Baer Galvin is the chief technologist for Corporate Technologies, a systems integrator and VAR. Before that, Peter was the systems manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines and previously wrote Pete's Wicked World, the security column for SunWorld. Peter is co-author of the Operating Systems Concepts textbook. As a consultant and trainer, Peter has taught tutorials on security and system administration and given talks at many conferences and institutions.
If you have technical problems with this magazine, contact email@example.com