|
Unveiling vmstat's charms
vmstat offers too much information to the uninitiated.
|
The vmstat
command is one of the best known performance
utilities. We'll explore the sources of its data, see what extra
information is available, and look at a few related commands.
(2,600 words)
Mail this article to a friend |
Q:
What do all those columns of data in vmstat
mean? How do
they relate to the data from mpstat
, and where does it all
come from?
--statting in Sturgeon Bay
We covered vmstat
and the virtual memory system that it
monitors in the October, 1995 column,
"Help! I've lost my memory!"
We haven't looked behind the scenes though, to see what other data is
available. Following on from recent columns on the underlying disk data
and the process data discussed last month, we'll hunt around the header
files provided with Solaris and see what we can find.
First let's remind ourselves what vmstat
itself looks like.
% vmstat 5 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s0 s2 s3 in sy cs us sy id 0 0 0 72724 25348 0 2 3 1 1 0 0 0 0 1 0 63 362 85 1 1 98 0 0 0 64724 25184 0 24 56 0 0 0 0 0 0 19 0 311 1112 356 2 4 94 0 0 0 64724 24796 0 5 38 0 0 0 0 0 0 15 0 92 325 212 0 1 99 0 0 0 64680 24584 0 12 106 0 0 0 0 0 0 41 0 574 1094 340 2 5 93 0 1 0 64632 23736 0 0 195 0 0 0 0 0 0 66 0 612 870 270 1 7 92 0 0 0 64628 22796 0 0 144 0 0 0 0 0 0 59 0 398 764 222 1 8 91 0 0 0 64620 22796 0 0 79 0 0 0 0 0 0 50 0 255 1383 136 2 18 80
The command printed the first line of data immediately, then a new line
every five seconds that gives the average rates over the five second
interval. The first line is also the average rate over the interval
that started when the system was booted! The reason is that the numbers
are stored by the system as counts of the number of times each event
has happened. To get the average over a time interval you measure the
counters at the start and end, and divide the difference by the time
interval. For the very first measure there is nothing to subtract, so
you automatically get the count since boot, divided by the time since
boot. The absolute counters themselves can be seen using another option
to vmstat
as shown below.
% vmstat -s 0 swap ins 0 swap outs 0 pages swapped in 0 pages swapped out 208724 total address trans. faults taken 45821 page ins 3385 page outs 61800 pages paged in 27132 pages paged out 712 total reclaims 712 reclaims from free list 0 micro (hat) faults 208724 minor (as) faults 44636 major faults 34020 copy-on-write faults 77883 zero fill page faults 9098 pages examined by the clock daemon 1 revolutions of the clock hand 27748 pages freed by the clock daemon 1333 forks 187 vforks 1589 execs 6730851 cpu context switches 12848989 device interrupts 340014 traps 28393796 system calls 285638 total name lookups (cache hits 91%) 108 toolong 159288 user cpu 123409 system cpu 15185004 idle cpu 192794 wait cpu
The other closely related command is mpstat
, which shows
basically the same data but on a per-CPU basis. Here's some output from
a dual-CPU system.
% mpstat 5 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 1 0 4 82 17 43 0 5 1 0 182 1 1 1 97 2 1 0 3 81 17 42 0 5 2 0 181 1 1 1 97 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 2 0 39 156 106 42 0 5 21 0 30 0 2 61 37 2 0 0 0 158 106 103 5 4 8 0 1704 3 36 61 0 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 19 28 194 142 96 1 4 18 0 342 1 8 76 16 2 0 6 11 193 141 62 4 4 10 0 683 5 15 74 6 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 22 33 215 163 87 0 7 0 0 287 1 4 90 5 2 0 22 29 214 164 88 2 8 1 0 304 2 5 89 4
|
|
|
|
Where does it all come from?
The data is read from the kernel statistics interface. Most of the data
is maintained on a per-CPU basis by the kernel, and is combined into
the overall summaries by the commands themselves. The same
kstat(3K)
programming interface that is used
to get at the per-disk statistics is used. This is quite different to
the per-process interface I described last month in
"Probing Processes". It is also very
lightweight, it only takes a few microseconds to retrieve each data
structure. The data structures are based on those described in the file
/usr/include/sys/sysinfo.h, the system information header
file. Of course, all the raw kstat
data is directly available
to the SE toolkit, which contains a customized vmstat.se
script and a hybrid mpvmstat.se.
While the kstat
programming interface is stable, the
metrics obtained via that interface are not fixed. They vary from one
type of hardware to another, from one OS release or patch to another,
and are extremely depenent on the particular implementation in use. The
metrics obtained by vmstat
don't seem to vary much, but could
without warning.
Since both performance and behavior are also implementation dependent, you just have to put up with this problem. Performance tools are amongst the least portable software products available.
Process queues
If we follow through the fields of vmstat
, the first one
is labelled procs, r, b, w. This is
derived from the sysinfo data and there is a single global kstat.
typedef struct sysinfo { /* (update freq) update action */ ulong updates; /* (1 sec) ++ */ ulong runque; /* (1 sec) += num runnable procs */ ulong runocc; /* (1 sec) ++ if num runnable procs > 0 */ ulong swpque; /* (1 sec) += num swapped procs */ ulong swpocc; /* (1 sec) ++ if num swapped procs > 0 */ ulong waiting; /* (1 sec) += jobs waiting for I/O */ } sysinfo_t;
As the comments indicate, it is updated once per second. The extra data
called runocc and swpocc are displayed by sar -q
and are
the occupancy of the queues. Solaris 2 counts the total number of
swapped out idle processes, so if you see any swapped jobs registered
here there is no cause for alarm. sar -q
is strange, if
the number to be displayed is zero, it displays nothing at all, just
white space. This makes it very hard to extract data to be plotted.
Virtual memory counters
The next part of vmstat
lists the free swap space and
memory. This is obtained as a kstat
from a single global
vminfo
structure.
typedef struct vminfo { /* (update freq) update action */ longlong_t freemem; /* (1 sec) += freemem in pages */ longlong_t swap_resv; /* (1 sec) += reserved swap in pages */ longlong_t swap_alloc; /* (1 sec) += allocated swap in pages */ longlong_t swap_avail; /* (1 sec) += unreserved swap in pages */ longlong_t swap_free; /* (1 sec) += unallocated swap in pages */ } vminfo_t;
The only swap number shown by vmstat
is swap_avail, which
is the most important one. If it ever gets to zero, your system will
hang and be unable to start more processes! For some strange reason
sar -r
reports swap_free
instead and converts
the data into stupid units of 512-byte blocks. The bizarre state of the
sar
command is one of the reasons we were motivated to
create the SE toolkit in the first place!
Paging counters
This one is per-cpu, its also clear that vmstat
and sar
don't show all of the available information. The states and state transitions being counted were described in detail in my
first column on memory issues.
typedef struct cpu_vminfo { ulong pgrec; /* page reclaims (includes pageout) */ ulong pgfrec; /* page reclaims from free list */ ulong pgin; /* pageins */ ulong pgpgin; /* pages paged in */ ulong pgout; /* pageouts */ ulong pgpgout; /* pages paged out */ ulong swapin; /* swapins */ ulong pgswapin; /* pages swapped in */ ulong swapout; /* swapouts */ ulong pgswapout; /* pages swapped out */ ulong zfod; /* pages zero filled on demand */ ulong dfree; /* pages freed by daemon or auto */ ulong scan; /* pages examined by pageout daemon */ ulong rev; /* revolutions of the page daemon hand */ ulong hat_fault; /* minor page faults via hat_fault() */ ulong as_fault; /* minor page faults via as_fault() */ ulong maj_fault; /* major page faults */ ulong cow_fault; /* copy-on-write faults */ ulong prot_fault; /* protection faults */ ulong softlock; /* faults due to software locking req */ ulong kernel_asflt; /* as_fault()s in kernel addr space */ ulong pgrrun; /* times pager scheduled */ } cpu_vminfo_t;
A few of these might need some extra explanation. Protection faults occur when a program tries to access memory it shouldn't, gets a segmentation violation signal, and dumps a core file. hat_faults only occur on systems that have a software managed memory management unit (sun4c and sun4u). Trivia alert: hat stands for hardware address translation.
I'll skip the disk counters printed by vmstat
as it is
just reading the same kstat
data as iostat
,
and providing a crude count of the number of operations.
CPU usage and event counters
Let's remind ourselves what vmstat
looks like.
% vmstat 5 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s0 s2 s3 in sy cs us sy id 0 0 0 72724 25348 0 2 3 1 1 0 0 0 0 1 0 63 362 85 1 1 98The last six columns show the interrupt rate, system call rate, context switch rate, and CPU user, system, and idle time. The per-cpu
kstat
that these are derived from is the biggest
kstat
yet, with about sixty values. Some of them are
summarized by sar
, but there is a lot of interesting information
here that is being carefully recorded by the kernel, read by
vmstat
, and then just thrown away. Look down the
comments, and at the end I'll point out some non-obvious and
interesting values. The cpu and wait states are arrays holding the four
CPU states usr/sys/idle/wait, and three wait states io/swap/pio. Only
the io wait state is implemented, so this is actually redundant. I made
the mpvmstat.se script display the wait states before I
realized that they were always zero :-(.
typedef struct cpu_sysinfo { ulong cpu[CPU_STATES]; /* CPU utilization */ ulong wait[W_STATES]; /* CPU wait time breakdown */ ulong bread; /* physical block reads */ ulong bwrite; /* physical block writes (sync+async) */ ulong lread; /* logical block reads */ ulong lwrite; /* logical block writes */ ulong phread; /* raw I/O reads */ ulong phwrite; /* raw I/O writes */ ulong pswitch; /* context switches */ ulong trap; /* traps */ ulong intr; /* device interrupts */ ulong syscall; /* system calls */ ulong sysread; /* read() + readv() system calls */ ulong syswrite; /* write() + writev() system calls */ ulong sysfork; /* forks */ ulong sysvfork; /* vforks */ ulong sysexec; /* execs */ ulong readch; /* bytes read by rdwr() */ ulong writech; /* bytes written by rdwr() */ ulong rcvint; /* XXX: UNUSED */ ulong xmtint; /* XXX: UNUSED */ ulong mdmint; /* XXX: UNUSED */ ulong rawch; /* terminal input characters */ ulong canch; /* chars handled in canonical mode */ ulong outch; /* terminal output characters */ ulong msg; /* msg count (msgrcv()+msgsnd() calls) */ ulong sema; /* semaphore ops count (semop() calls) */ ulong namei; /* pathname lookups */ ulong ufsiget; /* ufs_iget() calls */ ulong ufsdirblk; /* directory blocks read */ ulong ufsipage; /* inodes taken with attached pages */ ulong ufsinopage; /* inodes taked with no attached pages */ ulong inodeovf; /* inode table overflows */ ulong fileovf; /* file table overflows */ ulong procovf; /* proc table overflows */ ulong intrthread; /* interrupts as threads (below clock) */ ulong intrblk; /* intrs blkd/prempted/released (swtch) */ ulong idlethread; /* times idle thread scheduled */ ulong inv_swtch; /* involuntary context switches */ ulong nthreads; /* thread_create()s */ ulong cpumigrate; /* cpu migrations by threads */ ulong xcalls; /* xcalls to other cpus */ ulong mutex_adenters; /* failed mutex enters (adaptive) */ ulong rw_rdfails; /* rw reader failures */ ulong rw_wrfails; /* rw writer failures */ ulong modload; /* times loadable module loaded */ ulong modunload; /* times loadable module unloaded */ ulong bawrite; /* physical block writes (async) */ /* Following are gathered only under #ifdef STATISTICS in source */ ulong rw_enters; /* tries to acquire rw lock */ ulong win_uo_cnt; /* reg window user overflows */ ulong win_uu_cnt; /* reg window user underflows */ ulong win_so_cnt; /* reg window system overflows */ ulong win_su_cnt; /* reg window system underflows */ ulong win_suo_cnt; /* reg window system user overflows */ } cpu_sysinfo_t;
Some of the numbers printed by mpstat
are visible here.
The smtx value used to watch for kernel contention is
mutex_adenters. The srw value is the sum of the
failures to obtain a readers/writer lock. The term xcalls is
shorthand for cross-calls. A cross-call occurs when one CPU passes work
to another CPU by interrupting it.
Could you do better?
vmstat
displays 22 columns of numbers, summarizing more
than 100 underlying measures (even more on a multiprocessor). It's
good to have a lot of different things summarized together, but the
layout of vmstat
(and sar
) is as much a
result of their long history as it is by design.
I'm afraid I'm going to end up plugging the SE toolkit again. It's just
so easy to get at this data and do things with it. All the
kstats
can be read by any user, with no need to be setuid
root (this is a key advantage of Solaris 2, other Unix systems read the
kernel directly so you would have to obtain root permissions).
If you want to customize your very own vmstat
, you could
either write one from scratch in C using the kstat
library, or load the
SE toolkit
and spend a few seconds hacking at a trivial script. Either way, if you
come up with something that you think is an improvement, while staying
with the basic concept of one line of output that fits in 80 columns,
send it to me, the best ones get highlighted in a future column and
we'll add them to the next version of SE.
If you are curious about some of these numbers, but can't be bothered
to write your own SE scripts, you should try out the GUI front end to
the raw kstat
data that is provided in the SE toolkit as
/opt/RICHPse/examples/infotool.se. A sample snapshot is shown above.
|
Resources
percollator.se
http://www.sun.com/sun-on-net/www.sun.com/percol/Percollator.html
About the author
Adrian Cockcroft joined Sun in 1988, and currently works as a performance specialist for the Server Division of SMCC. He wrote Sun Performance and Tuning: SPARC and Solaris, published by SunSoft Press PTR Prentice Hall.
Reach Adrian at adrian.cockcroft@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-09-1996/swol-09-perf.html
Last modified: