Click on our Sponsors to help Support SunWorld
Pete's Super Systems by Peter Baer Galvin

The power of /proc

Using proc tools to solve your system problems

April  1999
[Next story]
[Table of Contents]
Subscribe to SunWorld, it's free!

Many sysdamins don't realize the wide variety and functionality of the tools that are native to Solaris. When they experience problems with applications or trouble at the system level, they can be left at a loss for how to debug the problem and resolve issues. This month, Peter takes a look at the proc tools and how they can help during difficult times. (3,000 words)

Mail this
article to
a friend
Over the past several years, scientists at the Computing Science Research Center of Bell Laboratories have been working on a new, experimental operating system called Plan 9. This system has much in common with Unix, which isn't a coincidence, as many members of the team were also involved in the development of Unix at one level or another. The key concept of Plan 9 is that almost everything on the system is treated as if it is a file. For instance, the state of the kernel is ascertained by viewing and manipulating files in a special directory. These "files" are really interfaces into the kernel. By treating everything as a file, the kernel as well as application programs are simplified. For example, on Plan 9 there is no special set of system calls for applications to call to deal with the state of the kernel. Instead, the applications can be written using standard file I/O system calls.

One of the engineers from Bell Labs joined the Sun development team and created the /proc filesystem and the /usr/proc/bin tools. Before this interface was invented, all programs that used kernel state (for instance, ps and top) had to be recompiled with each OS release. They read kernel memory and the memory locations of key variables changed with each release. With the advent of /proc, applications (including systems programs) have a uniform and static interface into the kernel. The proc tools were actually written to test the /proc filesystem interface, but so many folks at Sun were using them that they decided to include them with Solaris.

Why, as a systems administrator, do you care? Well, the tools that manipulate the proc interface can be very useful in determining system state and debugging application and systems problems. First, let's take a look at /proc and the tools, then we will look at some real-life uses for the tools.

Perhaps you have noticed an odd member of the output of df -k:

$ df -k              
Filesystem            kbytes    used   avail capacity  Mounted on
/proc                      0       0       0     0%    /proc

/proc is mounted by the /etc/rcS.d/ startup script. The mount makes the /proc interface into the kernel available to the system and its applications.

Once created, /proc echos the state of all processes on the system. Consider an abbreviated listing of /proc on Solaris 2.6 (Solaris 7 works the same way):

$ ls -l /proc
total 168
dr-x--x--x   5 root     root         736 Jan 15 17:00 0
dr-x--x--x   5 root     root         736 Jan 15 17:00 1
dr-x--x--x   5 root     root         736 Feb 25 11:13 10258
dr-x--x--x   5 root     root         736 Jan 15 17:00 11
dr-x--x--x   5 jds      staff        736 Mar 17 08:03 11892
dr-x--x--x   5 akane    staff        736 Mar 17 08:32 12032
dr-x--x--x   5 cbertold staff        736 Mar 17 08:44 12098
dr-x--x--x   5 jkelly   staff        736 Mar 17 08:56 12186
dr-x--x--x   5 root     root         736 Mar  9 09:08 12522
dr-x--x--x   5 jds      staff        736 Mar  9 09:08 12524
dr-x--x--x   5 root     root         736 Mar  9 09:10 12540
dr-x--x--x   5 jds      staff        736 Mar  9 09:10 12542
dr-x--x--x   5 spd      staff        736 Mar 17 10:01 12547
dr-x--x--x   5 cbertold staff        736 Mar 17 10:03 12555
dr-x--x--x   5 root     root         736 Mar 17 10:09 12597
dr-x--x--x   5 pbg      staff        736 Mar 17 10:09 12599
dr-x--x--x   5 jds      staff        736 Mar 17 10:19 12660
dr-x--x--x   5 pbg      staff        736 Mar 17 10:25 12670
dr-x--x--x   5 root     root         736 Jan 15 17:00 2
dr-x--x--x   5 root     root         736 Jan 15 17:01 239
dr-x--x--x   5 root     root         736 Jan 15 17:01 241
dr-x--x--x   5 root     root         736 Feb  9 13:50 24515
dr-x--x--x   5 root     root         736 Jan 15 17:00 3
dr-x--x--x   5 root     root         736 Jan 15 17:01 307

Each numerical directory entry in /proc represents the process with a matching process-ID. The owner of the directory entry is the UID of the process, and the group of the directory is likewise the GID of the process. In this way, only the process owner (and root) have primary access to the process information. Looking within a directory, we see:

$ ls -l /proc/12599
total 3543
-rw-------   1 pbg      staff    1794048 Mar 18 20:48 as
-r--------   1 pbg      staff        152 Mar 18 20:48 auxv
-r--------   1 pbg      staff         32 Mar 18 20:48 cred
--w-------   1 pbg      staff          0 Mar 18 20:48 ctl
lr-x------   1 pbg      staff          0 Mar 18 20:48 cwd -> 
dr-x------   2 pbg      staff       1056 Mar 18 20:48 fd
-r--r--r--   1 pbg      staff        120 Mar 18 20:48 lpsinfo
-r--------   1 pbg      staff        912 Mar 18 20:48 lstatus
-r--r--r--   1 pbg      staff        536 Mar 18 20:48 lusage
dr-xr-xr-x   3 pbg      staff         48 Mar 18 20:48 lwp
-r--------   1 pbg      staff       1728 Mar 18 20:48 map
dr-x------   2 pbg      staff        544 Mar 18 20:48 object
-r--------   1 pbg      staff       2048 Mar 18 20:48 pagedata
-r--r--r--   1 pbg      staff        336 Mar 18 20:48 psinfo
-r--------   1 pbg      staff       1728 Mar 18 20:48 rmap
lr-x------   1 pbg      staff          0 Mar 18 20:48 root -> 
-r--------   1 pbg      staff       1440 Mar 18 20:48 sigact
-r--------   1 pbg      staff       1232 Mar 18 20:48 status
-r--r--r--   1 pbg      staff        256 Mar 18 20:48 usage
-r--------   1 pbg      staff          0 Mar 18 20:48 watch
-r--------   1 pbg      staff       2736 Mar 18 20:48 xmap

Notice the varying permissions on each component of the process's structure. Some components are read-only, some are write-only, and some are a mix. The mode of access to a component is dictated by its functionality. For instance, the file "as" is the address space (virtual memory) of the process, and is readable and writable. On the other hand, ctl allows manipulation of the process's state, and is therefore only writable. Details about each component and its role are included in the man page (man -s 4 proc). Of most interest is as, because it indicates the relative memory use of the process. It is a relative measure because it includes the memory used by all the shared libraries. Therefore, it is not, for example, an accurate reflection of the memory that would be freed if the process was killed.


Using the /proc tool set
Fortunately, we do not need to deal with the intricacies of /proc directly. Rather, there is a set of tools available to do the dirty work for us. With each new release of Solaris, the proc tool set expands. Under Solaris 2.6, the list includes these tools (in /usr/proc/bin).

pcred   pflags  pmap    psig    pstop   ptree   pwdx
pfiles  pldd    prun    pstack  ptime   pwait

Let's look at each of the tools.

pcred prints the effective, real, and saved UID and GID of a process:

$ /usr/proc/bin/pcred 12599
12599:  e/r/suid=500  e/r/sgid=10

pfiles lists all open files (file descriptors represent open files in Unix) associated with the process, as well as any per-process limits on open files:

$ /usr/proc/bin/pfiles 12599
12955:   vi
  Current rlimit: 64 file descriptors
   0: S_IFCHR mode:0620 dev:136,0 ino:88226 uid:500 gid:7 rdev:24,4
   1: S_IFCHR mode:0620 dev:136,0 ino:88226 uid:500 gid:7 rdev:24,4
   2: S_IFCHR mode:0620 dev:136,0 ino:88226 uid:500 gid:7 rdev:24,4
   3: S_IFCHR mode:0666 dev:136,0 ino:88109 uid:0 gid:3 rdev:13,12
   4: S_IFREG mode:0600 dev:136,0 ino:456959 uid:500 gid:10 size:24576

Descriptors 0, 1, and 2 are part of the standard I/O package (stdin, stdout, and stderr), so those inodes represent entries in /dev/pty. (You might want to practice the following technique on these files to prove to yourself that it works.)

To determine the file that descriptor 4 points to requires a little detective work. We could just search the entire system for inode number 456959. Unfortunately, inode numbers are only unique per-partition, so first we need to determine which partition the inode in question is on. We start by searching through the appropriate /devices entries to find one with the matching major and minor device number (in the case of descriptor 4, the major number is 136 and the minor number is 0). The matching device is found by looking through /devices:

$ ls -lR /devices | grep 136
brw-------   1 root     sys      136,  0 Mar  2 11:10 dad@0,0:a
crw-------   1 root     sys      136,  0 Mar  2 11:10 dad@0,0:a,raw
brw-------   1 root     sys      136,  1 Mar  2 11:10 dad@0,0:b
crw-------   1 root     sys      136,  1 Mar  2 11:10 dad@0,0:b,raw
. . .

Then, we determine the logical device name by grepping for the physical device name in the /dev tree, as in:

$ ls -lR /dev | grep dad@0,0
lrwxrwxrwx   1 root     root          46 Mar  2 11:10 c0t0d0s0 -> ../../devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:a
lrwxrwxrwx   1 root     root          46 Mar  2 11:10 c0t0d0s1 -> ../../devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:b
lrwxrwxrwx   1 root     root          46 Mar  2 11:10 c0t0d0s2 -> ../../devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:c
lrwxrwxrwx   1 root     root          46 Mar  2 11:10 c0t0d0s3 ->
. . .

We know that the appropriate device is c0t0d0s0, because its device path matches that of the major and minor device number we're looking for. On systems with many disks, this process becomes more complex with duplicate device names (dad@0,0) for instance, but different device paths (pci@1f,4000 for instance). In those cases, the greps must be more complete and include the device path as well as the device name. So now we have the correct device, but how do we determine which file is open? First, we locate the mount point for the device in question:

$ df -k
Filesystem            kbytes    used   avail capacity  Mounted on
/proc                      0       0       0     0%    /proc
/dev/dsk/c0t0d0s0    8162157 1639360 6441176    21%    /
fd                         0       0       0     0%    /dev/fd
swap                  546392     296  546096     1%    /tmp

Then we use find, which has an option to locate files with a specific inode number:

$ find / -inum 456959 -mount -print

The "-mount" option prevents find from searching beyond the starting mount point (so it will not search other partitions). We have now found the file opened by vi. Unfortunately, when vi edits a file, it first copies it to /var/tmp with a temporary name. When vi writes changes back, it writes them to the original file and deletes the temporary copy (thus a system crash in the middle of a vi session would allow recovery of the original file).

Now, on to the other /proc commands.

pflags determine the status of the process.

$ /usr/proc/bin/pflags 12599
12599:  -ksh
  /1:   flags = PR_PCINVAL|PR_ORPHAN|PR_ASLEEP [ waitid(0x7,0x0,0xeffff930,0x7) ]

The meanings of the flags can be found in the appropriate ".h" file: /usr/include/sys/procfs.h.

pldd lists all the dynamic libraries that are associated with the process:

$/usr/proc/bin/pldd 12599  
12599:  -ksh

pmap lists the process's address space, including sizes of memory segments and the access allowed to each:

$ /usr/proc/bin/pmap 12599
12599:  -ksh
00010000    184K read/exec         /usr/bin/ksh
0004C000      8K read/write/exec   /usr/bin/ksh
0004E000     32K read/write/exec     [ heap ]
EF580000    592K read/exec         /usr/lib/
EF622000     32K read/write/exec   /usr/lib/
EF62A000      8K read/write/exec     [ anon ]
EF680000    448K read/exec         /usr/lib/
EF6FE000     40K read/write/exec   /usr/lib/
EF708000     24K read/write/exec     [ anon ]
EF750000     16K read/exec         /usr/platform/sun4u/lib/
EF760000     16K read/exec         /usr/lib/
EF772000      8K read/write/exec   /usr/lib/
EF790000     32K read/exec         /usr/lib/
EF7A6000      8K read/write/exec   /usr/lib/
EF7A8000      8K read/write/exec     [ anon ]
EF7B0000      8K read/exec         /usr/lib/
EF7C0000      8K read/write/exec     [ anon ]
EF7D0000    112K read/exec         /usr/lib/
EF7FA000      8K read/write/exec   /usr/lib/
EFFFC000     16K read/write/exec     [ stack ]
 total     1608K

pstack shows the stack trace for each thread (lightweight process or LWP) in a process. This information can help determine where a process is hung, why it is using up too much memory, and so on:

$ /usr/proc/bin/pstack 12599
12599:  -ksh
 ef5b915c waitid   (7, 0, effff930, 7)
 ef5d40d0 _libc_waitpid (ffffffff, effffa30, 4, 7, ef622e54, 2422c) + 54
 0002422c job_wait (4e000, 0, 52818, 4, ef622e54, 2fa54) + 184
 0002fd04 sh_exec  (31b2, 0, 0, 4e400, 4e000, 0) + c1c
 00027894 ???????? (5174c, 4cf38, 4cf38, 4e400, 4e400, 4d294)
 00027174 main     (4e400, efffff6c, 4e400, 4e400, 4e400, 4e400) + 844
 00015e88 _start   (0, 0, 0, 0, 0, 0) + dc

ptree prints a formatted listing of a process's lineage, with child processes indented beneath their parent. It can show you the whole system's process tree, or just the parents of a given process:

$ /usr/proc/bin/ptree 12599
285   /usr/sbin/inetd -s
  12597 in.telnetd
    12599 -ksh
      12773 /usr/proc/bin/ptree 12599

In this case the ptree command was started by a ksh shell, which was started by telnetd due to an incoming telnet. The telnetd was started by the internet services daemon inetd.

pwdx prints the current working directory of the process:

$ /usr/proc/bin/pwdx 12599  
12599:  /export/home/pbg

ptime times the execution of a process with "microstate accounting" for more precision (and more reproducible results) than the time command:

$ /usr/proc/bin/ptime ls
(output from ls)

real        0.013
user        0.004
sys         0.007

As of Solaris 7, a couple of new and useful proc tools were added. They live in /usr/bin because they are needed by the boot and shutdown scripts.

plimit gets and sets the per-process limits:

$ /usr/bin/plimit 482
974:    -ksh
   resource              current         maximum
  time(seconds)         unlimited       unlimited
  file(blocks)          unlimited       unlimited
  data(kbytes)          unlimited       unlimited
  stack(kbytes)         8192            unlimited
  coredump(blocks)      unlimited       unlimited
  nofiles(descriptors)  64              1024
  vmemory(kbytes)       unlimited       unlimited

pgrep searches for processes matching a certain criteria. No more ps | grep pipes!

$ /usr/bin/pgrep tcsh

Finally, pkill sends a user-definable signal to one or more processes, based on criteria such as process name or process owner. Not only is pkill useful, but it is responsible in good part for the boot and shutdown performance improvements of Solaris 7.

$ /usr/bin/pkill bad-process

Most of the /proc-related commands have a few options, and most accept a list of processes. Check out the man pages for a few more details.

There is also a /proc gotcha to be aware of. From the manual:

These proc tools stop their target processes while inspecting them and reporting the results: pfiles, pldd, pmap, pstack, pwdx. A process can do nothing while it is stopped. Thus, for example, if the X server is inspected by one of these proc tools running in a window under the X server's control, the whole window system can become deadlocked because the proc tool would be attempting to print its results to a window that cannot be refreshed. Logging in from another system using rlogin(1) and killing the offending proc tool would clear up the deadlock in this case.

The following is a real-life example of the utility of the proc tools. A site was having a problem with a daemon. The daemon's job was to accept connections from client machines and allow them to process insurance claims. During testing, the daemon would run for a while and then crash. This type of behavior indicates a resource limit, but what resource? Using the various proc tools during the daemon's execution, we noticed that the number of open files kept climbing, and that the process failed as they went over 60 or so. The culprit was the file-descriptor limit. Removing the limit removed the problem as well. The proc tools made this problem very easy to solve.

Useful books
A quick note about a useful book. It's actually a repackaging of a series of O'Reilly and Associates books. The result is a combination book and CD. The book is Unix in a Nutshell System V Edition,and the CD includes that work plus Unix Power Tools, Second Edition; Learning the Unix Operating System, Fourth Edition; Sed & Awk, Second Edition; Learning the vi Editor, Fifth Edition; and Learning the Korn Shell. The CD contents are indexed and in HTML format, makingthem very convenient to use. The combination is useful for those who haven't bought all the books, or like to travel light (consultants and the like). There are other book-CD combinations coming that also look to be very useful. Highly recommended. Details are available in Resources.

A couple of notes about the previous columns on patch theory and practice:

Nice article, but Peter really does need to be reminded that other countries other than the USA exist. 1-800-USA-4-SUN? What??

Sorry about that! Any 800-numbers that I mention are, of course, for the US only.

I found a patch that overwrote /dev/null. Sun's response was that all patches (regardless of install notes) should be installed in single-user mode. Not always an easy thing to do, but after getting burnt, this is now mandatory for even the simplest patch.

That makes a nice juxtaposition to another letter:

I don't agree with "If no crash, no patch." If a patch exists it is there to solve a problem. My suggestion is install all recommended and security patches and make that as a routine.

Both of these letters reinforce some points made in the previous columns: that patch policy should and will vary according to site needs. Some sites can afford no downtime at all, and therefore only patch when absolutely necessary. Others have planned maintenance windows which afford a perfect opportunity to install the recommended, suggested, Y2K, and security patches on a schedule. Another important suggestion: always make a backup before making major system changes, make the changes during scheduled downtime, in single-user mode if possible, and reboot when done to verify that all changes take affect and that they have not caused problems with system functions.

Next month Pete's Super Systems will cover the changes to system administration in Solaris 7.

Click on our Sponsors to help Support SunWorld


Additional SunWorld resources

About the author
[Peter Galvin's photo]Peter Baer Galvin is the chief technologist for Corporate Technologies, a systems integrator and VAR. Before that, Peter was the systems manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines and previously wrote Pete's Wicked World, the security column for SunWorld. Peter is co-author of the Operating Systems Concepts textbook. As a consultant and trainer, Peter has taught tutorials on security and system administration and given talks at many conferences and institutions.

What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough

[Table of Contents]
Subscribe to SunWorld, it's free!
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact

Last modified: