|
What's the best way to probe processes?How to measure the resource consumption and process activity on a Unix computer. |
There are several ways to examine individual processes in Solaris, there is actually more information available than you may realize. This month we look at the commands and programming interfaces provided in Solaris 2.5. (3,500 words)
Mail this article to a friend |
Q:
How can I measure the resource consumption and activity of the
processes on my Unix computer?
--Probing in Peshtigo
A: Most people know about the ps command. It gives a high level summary of the status of processes on your system. It has many options, but it is a relatively crude way to figure out what your processes are doing.
Of all the available options, the best performance related summary comes from the BSD version /usr/ucb/ps uax, which collects all the process data in one go, sorts the output by CPU usage, then displays the result. The unsorted versions of ps loop through the processes, printing as they go. This spreads out the time at which the processes are measured, and the last line process measured is significantly later than the first. By collecting in one go, the sorted versions give a more consistent view of the system. A quick look at the most active processes can be obtained easily using this command:
% /usr/ucb/ps uax | head USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND adrianc 333 1.0 8.2 8372 5120 console S 09:28:38 0:29 /usr/openwin/bin/X root 483 0.4 1.4 1016 872 pts/1 O 09:56:36 0:00 /usr/ucb/ps uax adrianc 433 0.3 15.812492 9832 ? S 09:31:47 0:26 /export/framemaker root 240 0.3 5.3 3688 3260 ? S 09:27:22 0:07 //opt/RICHPse/bin/ adrianc 367 0.2 4.2 3472 2620 ?? S 09:28:56 0:00 cmdtool -Wp 603 49 adrianc 484 0.1 0.9 724 540 pts/1 S 09:56:36 0:00 head root 3 0.1 0.0 0 0 ? S 09:25:17 0:02 fsflush adrianc 370 0.1 1.4 980 824 pts/1 S 09:28:57 0:00 /bin/csh adrianc 358 0.1 2.6 2088 1616 console S 09:28:54 0:00 olwm -syncpid 357
This summary immediately tells you who is running the most active processes. The %CPU measure is a time-decayed average of recent CPU usage. %MEM tells you the proportion of the total RAM in your system is in use by each process (it won't add up to 100 percent as some RAM is shared by several processes). SZ is the size of the process address space. It's a good indicator of how much swap space the process needs. In some cases it includes memory mapped devices, so don't be surprised if the X process appears to be huge on an Ultra 1 with Creator framebuffer. RSS is the basis for %MEM, its the amount of RAM in use by the process. TT shows you which "teletype" the user is logged in on. S shows the status. "S" means sleeping, "O" means on-cpu or running, "R" means runnable and waiting for a CPU to become free. START is the time the process started up, and TIME is the total amount of CPU time it has used so far. COMMAND The same basic information is displayed by the well known freeware utilities top and proctool. Solstice Symon and most commercial performance tools also display this data.
|
|
|
|
Where does the data come from?
I'm sure you have noticed the strange entry that pops up in df
when you are checking how much disk space is left.
% df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t2d0s0 963662 782001 85301 91% / /proc 0 0 0 0% /proc fd 0 0 0 0% /dev/fd /dev/dsk/c0t3d0s0 406854 290066 96448 76% /export/homeIt seems that there is a filesystem called
/proc
, if you
look at it, you find a list of numbers that correspond to the processes
on your system.
% ls /proc 00000 00140 00197 00237 00309 00333 00358 00379 00586 00001 00149 00207 00239 00312 00334 00359 00382 00002 00152 00216 00240 00313 00342 00367 00385 00003 00154 00217 00258 00318 00349 00370 00388Using ls -l you see the owner and size.
% /usr/ucb/ps uax | head USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND adrianc 333 0.6 8.0 8380 4984 console S 09:28:38 1:16 /usr/openwin/bin/X % ls -l /proc/333 -rw------- 1 adrianc staff 8581120 Jul 26 09:28 /proc/333The units in ps are KB, and 8581120/1024 = 8380 as expected. The owner and permissions are used to control access to the processes. The ps command has to be setuid root so be able to show anyone status of all processes, You can only debug or trace processes that you have permissions for.
As you might expect there is a manual page for /proc
,
if you are running Solaris 2.5 you will discover another manual page.
There are a whole bunch of commands that use /proc
,
documented under the proc(1) manual page. The
/proc
programming interface is described in
proc(4). Let's take a look at these commands, which live in
/usr/proc/bin
. Here's an excerpt from the manual page, to
save you looking it up.
SunOS 5.5 Last change: 9 Nov 1994 1 proc(1) User Commands proc(1) NAME proc, pflags, pcred, pmap, pldd, psig, pstack, pfiles, pwdx, pstop, prun, pwait, ptree, ptime - proc tools DESCRIPTION The proc tools are utilities which exercise features of /proc (see proc(4)). Most of them take a list of process- ids (pid); those that do also accept /proc/nnn as a process-id, so the shell expansion /proc/* can be used to specify all processes in the system. pflags print the /proc tracing flags, the pend- ing and held signals, and other /proc status information for each lwp in each process. pcred print the credentials (effective, real and saved UID's and GID's) of each pro- cess. pmap print the address space map of each pro- cess. pldd list the dynamic libraries linked into each process, including shared objects explicitly attached using dlopen(3X). (See also ldd(1).) psig list the signal actions of each process (See signal(5).) pstack print a hex+symbolic stack trace for each lwp in each process. pfiles report fstat(2) and fcntl(2) information for all open files in each process. pwdx print the current working directory of each process. pstop stop each process (PR_REQUESTED stop). prun set each process running (inverse of pstop). pwait wait for all of the specified processes to terminate. ptree print the process trees containing the specified pid's or users, with child processes indented from their respective parent processes. An argument of all digits is taken to be a process-id, oth- erwise it is assumed to be a user login name. Default is all processes. ptime time a command, such as the time(1) com- mand, but using microstate accounting for reproducible precision.
That's already opened up a lot more possibilities. The
/proc
interface is designed to support the process
debugging and analyzing tools in Sun's Workshop development tools.
There are a few tantalizing hints here. Look at the description of
pflags, it mentions tracing. And ptime, what is
microstate accounting? A way to get higher precision measurements? We
need to dig further into the programming interface to find out, but
first we'll take a look at a bundled tool that uses /proc
to trace the system calls made by a process.
Tracing in Solaris 2
The /usr/bin/truss
command has many useful features not
found in the SunOS 4 trace command. It can trace child
processes, and it can count and time system calls and signals. Other
options allow named system calls to be excluded or focused on, and data
structures can be printed out in full. Here is an excerpt showing a
fragment of truss output with the -v option to set verbose
mode for data structures, and an example of truss -c showing
the system call counts.
% truss -v all cp NewDocument Tuning execve("/usr/bin/cp", 0xEFFFFB28, 0xEFFFFB38) argc = 3 open("/usr/lib/libintl.so.1", O_RDONLY, 035737561304) = 3 mmap(0x00000000, 4096, PROT_READ, MAP_SHARED, 3, 0) = 0xEF7B0000 fstat(3, 0xEFFFF768) = 0 d=0x0080001E i=29585 m=0100755 l=1 u=2 g=2 sz=14512 at = Apr 27 11:30:14 PDT 1993 [ 735935414 ] mt = Mar 12 18:35:36 PST 1993 [ 731990136 ] ct = Mar 29 11:49:11 PST 1993 [ 733434551 ] bsz=8192 blks=30 fs=ufs .... % truss -c cp NewDocument Tuning syscall seconds calls errors _exit .00 1 write .00 1 open .00 10 4 close .01 7 creat .01 1 chmod .01 1 stat .02 2 1 lseek .00 1 fstat .00 4 execve .00 1 mmap .01 18 munmap .00 9 memcntl .01 1 ---- --- --- sys totals: .07 57 5 usr time: .02 elapsed: .43
I use truss a great deal to find out what a process is doing, which files are being read and written, and with truss -c you can see how long system calls take to execute on average, and where your system CPU time is coming from.
Who, what, when, how much?
Many processes live very short lives. You cannot see them with ps,
but they may be so frequent that they dominate the load on your system.
The only way to catch them is to ask the system to keep a record of
every process that has run, who ran it, what was it, when did it start
and end, and how much resource did it use. This is done by the system
accounting subsystem. For some reason many administrators seem to have
hang-ups about accounting. Perhaps it has connotations of "big brother
is watching you," or they fear additional overhead. In truth, if Fred
complains that his system is too slow, and the accounting records show
that he spends all his time playing Doom, you should not be too
sympathetic! The overhead of collecting accounting data is always
present. When you turn on accounting, you are just enabling storage of
a few bytes of useful data when a process exits.
Accounting data is most useful when measured over a long period of time. This can be useful on a network of workstations as well as on a single time-shared server. From this you can identify how often programs run, how much CPU time, I/O, and memory each program uses, and what work patterns throughout the week look like. To enable accounting to start immediately, enter the three commands shown below. Check out the section "Administering Security, Performance, and Accounting in Solaris 2" in the Solaris System Administration Answerbook and see the acctcom command. Some crontab entries must also be added to summarize and checkpoint the accounting logs. Collecting and checkpointing the accounting data itself puts a negligible additional load onto the system, but the summary scripts that run once a day or once a week can have a noticeable effect, so they should be scheduled to run out of hours.
# ln /etc/init.d/acct /etc/rc0.d/K22acct # ln /etc/init.d/acct /etc/rc2.d/S22acct # /etc/init.d/acct start Starting process accounting
This is what your crontab file for the adm user should contain.
# crontab -l adm #ident "@(#)adm 1.5 92/07/14 SMI" /* SVr4.0 1.2 */ #min hour day month weekday 0 * * * * /usr/lib/acct/ckpacct 30 2 * * * /usr/lib/acct/runacct 2gt; /var/adm/acct/nite/fd2log 30 9 * * 5 /usr/lib/acct/monacctYou get a daily accounting summary, but the one I like to keep track of is the monthly one stored in
/var/adm/acct/fiscal
. Here is
an excerpt from fiscrpt07 on my home system.
Jul 26 09:30 1996 TOTAL COMMAND SUMMARY FOR FISCAL 07 Page 1 TOTAL COMMAND SUMMARY COMMAND NUMBER TOTAL TOTAL TOTAL MEAN MEAN HOG CHARS BLOCKS NAME CMDS KCOREMIN CPU-MIN REAL-MIN SIZE-K CPU-MIN FACTOR TRNSFD READ TOTALS 26488 16062007.75 3960.11 494612.41 4055.95 0.15 0.01 17427899648 39944 mae 36 7142887.25 1501.73 2128.50 4756.45 41.71 0.71 2059814144 1653 sundgado 16 3668645.19 964.83 1074.34 3802.36 60.30 0.90 139549181 76 Xsun 29 1342108.55 251.32 9991.62 5340.18 8.67 0.03 2784769024 1295 xlock 32 1027099.38 726.87 4253.34 1413.04 22.71 0.17 4009349888 15 fountain 2 803036.25 165.11 333.65 4863.71 82.55 0.49 378388 1 netscape 22 489512.97 72.39 3647.61 6762.19 3.29 0.02 887353080 2649 maker4X. 10 426182.31 43.77 5004.30 9736.27 4.38 0.01 803267592 3434 wabiprog 53 355574.99 44.32 972.44 8022.87 0.84 0.05 355871360 570 imagetoo 21 257617.08 15.65 688.46 16456.60 0.75 0.02 64291840 387 java 235 203963.64 37.96 346.35 5373.76 0.16 0.11 155950720 240 aviator 2 101012.82 22.93 29.26 4406.20 11.46 0.78 2335744 40 se.sparc 18 46793.09 19.30 6535.43 2424.47 1.07 0.00 631756294 20 xv 3 40930.98 5.58 46.37 7337.93 1.86 0.12 109690880 28It looks as if my kids have been using it to play games during the day! The commands reported are sorted by KCOREMIN, which is the product of the amount of CPU time used and the amount of RAM used while the command was active. CPU-MIN is the number of minutes of CPU time. REAL_MIN is the elapsed time for the commands. SIZE-K is an average value for the RSS over the active lifetime of the process. It does not include times when the process was not actually running. In Solaris 2.4 and earlier releases a bug causes this measure to be garbage. HOG FACTOR is the ratio of CPU-MIN to REAL-MIN. A high factor means that this command hogs the CPU whenever it is running. CHARS TRNSFD counts the number of characters read and written. BLOCKS READ counts data read from block devices (basically local disk filesystem reads and writes). The underlying data that is collected can be seen in the acct(4) manual page. The data structure looks like this, its very compact, around 40 bytes.
DESCRIPTION Files produced as a result of calling acct(2) have records in the form defined by lt;sys/acct.hgt;, whose contents are: typedef ushort comp_t; /* pseudo "floating point" representation */ /* 3 bit base-8 exponent in the high */ /* order bits, and a 13-bit fraction */ /* in the low order bits. */ struct acct { char ac_flag; /* Accounting flag */ char ac_stat; /* Exit status */ uid_t ac_uid; /* Accounting user ID */ gid_t ac_gid; /* Accounting group ID */ dev_t ac_tty; /* control tty */ time_t ac_btime; /* Beginning time */ comp_t ac_utime; /* accounting user time in clock */ /* ticks */ comp_t ac_stime; /* accounting system time in clock */ /* ticks */ comp_t ac_etime; /* accounting total elapsed time in clock */ /* ticks */ comp_t ac_mem; /* memory usage in clicks (pages) */ comp_t ac_io; /* chars transferred by read/write */ comp_t ac_rw; /* number of block reads/writes */ char ac_comm[8]; /* command name */ };
Process data structures
There isn't a great deal of data in the accounting record. Let's see
what is available from a process that is still running. Actually, a
neat trick is to open an entry in /proc
and wait until the
process has exited. The fact that the /proc
entry is still
open means that you can still get the data described below. When you
close the /proc
"file" the zombie process will disappear.
These data structures are described in full in the proc(4) manual
page. They are also available in the SE toolkit, so if you want to
obtain the data and play around with it you should look at the code for
ps-ax.se and msacct.se. The interface to
/proc
involves sending ioctl commands. The one that ps
uses is called PIOCPSINFO and this is what you get back.
SunOS 5.5 Last change: 28 Mar 1995 2 proc(4) File Formats proc(4) PIOCPSINFO This returns miscellaneous process information such as that reported by ps(1). p is a pointer to a prpsinfo structure containing at least the following fields: typedef struct prpsinfo { char pr_state; /* numeric process state (see pr_sname) */ char pr_sname; /* printable character representing pr_state */ char pr_zomb; /* !=0: process terminated but not waited for */ char pr_nice; /* nice for cpu usage */ u_long pr_flag; /* process flags */ int pr_wstat; /* if zombie, the wait() status */ uid_t pr_uid; /* real user id */ uid_t pr_euid; /* effective user id */ gid_t pr_gid; /* real group id */ gid_t pr_egid; /* effective group id */ pid_t pr_pid; /* process id */ pid_t pr_ppid; /* process id of parent */ pid_t pr_pgrp; /* pid of process group leader */ pid_t pr_sid; /* session id */ caddr_t pr_addr; /* physical address of process */ long pr_size; /* size of process image in pages */ long pr_rssize; /* resident set size in pages */ u_long pr_bysize; /* size of process image in bytes */ u_long pr_byrssize; /* resident set size in bytes */ caddr_t pr_wchan; /* wait addr for sleeping process */ short pr_syscall; /* system call number (if in syscall) */ id_t pr_aslwpid; /* lwp id of the aslwp; zero if no aslwp */ timestruc_t pr_start; /* process start time, sec+nsec since epoch */ timestruc_t pr_time; /* usr+sys cpu time for this process */ timestruc_t pr_ctime; /* usr+sys cpu time for reaped children */ long pr_pri; /* priority, high value is high priority */ char pr_oldpri; /* pre-SVR4, low value is high priority */ char pr_cpu; /* pre-SVR4, cpu usage for scheduling */ u_short pr_pctcpu; /* % of recent cpu time, one or all lwps */ u_short pr_pctmem; /* % of system memory used by the process */ dev_t pr_ttydev; /* controlling tty device (PRNODEV if none) */ char pr_clname[PRCLSZ]; /* scheduling class name */ char pr_fname[PRFNSZ]; /* last component of exec()ed pathname */ char pr_psargs[PRARGSZ];/* initial characters of arg list */ int pr_argc; /* initial argument count */ char **pr_argv; /* initial argument vector */ char **pr_envp; /* initial environment vector */ } prpsinfo_t;For a multithreaded process it is possible to get the data for each lightweight process separately. There's a lot more useful looking information there, but no sign of the high resolution microstate accounting that
/usr/proc/bin/ptime
(and msacct.se)
display. They use a separate ioctl, PIOCUSAGE.
SunOS 5.5 Last change: 28 Mar 1995 18 proc(4) File Formats proc(4) PIOCUSAGE When applied to the process file descriptor, PIOCUSAGE returns the process usage information; when applied to an lwp file descriptor, it returns usage information for the specific lwp. p points to a prusage structure which is filled by the operation. The prusage structure contains at least the following fields: typedef struct prusage { id_t pr_lwpid; /* lwp id. 0: process or defunct */ u_long pr_count; /* number of contributing lwps */ timestruc_t pr_tstamp; /* current time stamp */ timestruc_t pr_create; /* process/lwp creation time stamp */ timestruc_t pr_term; /* process/lwp termination time stamp */ timestruc_t pr_rtime; /* total lwp real (elapsed) time */ timestruc_t pr_utime; /* user level CPU time */ timestruc_t pr_stime; /* system call CPU time */ timestruc_t pr_ttime; /* other system trap CPU time */ timestruc_t pr_tftime; /* text page fault sleep time */ timestruc_t pr_dftime; /* data page fault sleep time */ timestruc_t pr_kftime; /* kernel page fault sleep time */ timestruc_t pr_ltime; /* user lock wait sleep time */ timestruc_t pr_slptime; /* all other sleep time */ timestruc_t pr_wtime; /* wait-cpu (latency) time */ timestruc_t pr_stoptime; /* stopped time */ u_long pr_minf; /* minor page faults */ u_long pr_majf; /* major page faults */ u_long pr_nswap; /* swaps */ u_long pr_inblk; /* input blocks */ u_long pr_oublk; /* output blocks */ u_long pr_msnd; /* messages sent */ u_long pr_mrcv; /* messages received */ u_long pr_sigs; /* signals received */ u_long pr_vctx; /* voluntary context switches */ u_long pr_ictx; /* involuntary context switches */ u_long pr_sysc; /* system calls */ u_long pr_ioch; /* chars read and written */ } prusage_t; PIOCUSAGE can be applied to a zombie process (see PIOCPSINFO). Applying PIOCUSAGE to a process that does not have micro- state accounting enabled will enable microstate accounting and return an estimate of times spent in the various states up to this point. Further invocations of PIOCUSAGE will yield accurate microstate time accounting from this point. To disable microstate accounting, use PIOCRESET with the PR_MSACCT flag.There is a lot of useful data here. The time spent waiting for various events is a key measure. I summarize it in msacct.se like this:
Elapsed time 3:20:50.049 Current time Fri Jul 26 12:49:28 1996 User CPU time 2:11.723 System call time 1:54.890 System trap time 0.006 Text pfault sleep 0.000 Data pfault sleep 0.023 Kernel pfault sleep 0.000 User lock sleep 0.000 Other sleep time 3:16:43.022 Wait for CPU time 0.382 Stopped time 0.000The other thing to notice is that microstate accounting is not turned on by default. It slows the system down slightly, and while it was on by default up to Solaris 2.3, from Solaris 2.4 onwards it is only collected if you use it. The way CPU time is normally measured is to sample the state of all the CPUs from the clock interrupt 100 times a second. The way microstate accounting works, a high resolution timestamp is taken on every state change, every system call, every page fault, every scheduler change. It doesn't miss anything.
Wrap up
There are a lot more things that can be done with /proc
,
but I hope I have opened your eyes to some of the most useful
performance information available in Solaris 2.
|
Resources
percollator.se
http://www.sun.com/sun-on-net/www.sun.com/percol/Percollator.html
About the author
Adrian Cockcroft joined Sun in 1988, and currently works as a performance specialist for the Server Division of SMCC. He wrote Sun Performance and Tuning: SPARC and Solaris, published by SunSoft Press PTR Prentice Hall.
Reach Adrian at adrian.cockcroft@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-08-1996/swol-08-perf.html
Last modified: