|
What's the holdup?How is wait time measured? Are the results exaggerated? |
Overall CPU time is sometimes reported as user, system, wait, and idle. The concept of wait time is confusing, and in some circumstances an inflated value is reported. Adrian explains what it all means and describes a more accurate measurement method. (1,600 words)
Mail this article to a friend |
: This is one of the most common "frequently asked questions." I'll describe how it's measured, what it does and doesn't include, and the problems that occur in current implementations. We've made a change to the way this value is calculated for Solaris 7 which obtains more accurate results.
As I mentioned in my June 1998 column (see Resources section below), CPU time is measured using the clock interrupt, which samples the state of each CPU one hundred times a second. If the CPU is idle then an extra check is made to see whether there are any block disk I/O requests outstanding. If there are, then the time is counted as I/O wait rather than idle time. The only possible cause of wait time is normal disk I/O to a block device. Nothing else is counted, no network waits, and no semaphore or database lock waits. If your system is using asynchronous I/O to raw disks then that is also not counted as a cause for CPU wait.
The block I/O request queue is reported by vmstat in its second
column. To show what happens, three copies of find
were run on the
same disk. The output from both vmstat and iostat is shown.
Figure 1: Example vmstat and iostat output
% vmstat 5 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s3 -- -- -- in sy cs us sy id 0 0 0 12520 11204 0 3 7 4 5 0 12 2 0 0 0 75 615 84 6 3 91 0 0 0 140328 17144 0 21 66 0 0 0 0 23 0 0 0 183 450 109 16 8 75 0 1 0 140292 16508 0 0 131 0 0 0 0 46 0 0 0 342 718 150 17 14 68 0 1 0 140136 15824 0 43 76 0 0 0 0 41 0 0 0 278 2390 181 39 31 29 0 2 0 140016 15508 0 0 27 0 0 0 0 44 0 0 0 332 981 125 17 12 71 0 2 0 140016 15332 0 0 42 0 0 0 0 40 0 0 0 202 550 143 8 10 82 0 1 0 140016 15176 0 0 24 0 0 0 0 41 0 0 0 206 320 87 5 6 90 % iostat -xc 5 extended device statistics cpu device r/s w/s kr/s kw/s wait actv svc_t %w %b us sy wt id sd3 1.4 0.8 7.3 10.3 0.7 0.1 364.2 1 4 6 3 3 88 extended device statistics cpu device r/s w/s kr/s kw/s wait actv svc_t %w %b us sy wt id sd3 0.6 2.4 2.4 18.6 0.0 0.0 12.0 0 4 5 1 4 90 extended device statistics cpu device r/s w/s kr/s kw/s wait actv svc_t %w %b us sy wt id sd3 12.6 1.2 100.6 5.9 0.0 0.4 29.5 0 33 19 14 28 39 extended device statistics cpu device r/s w/s kr/s kw/s wait actv svc_t %w %b us sy wt id sd3 24.2 16.6 193.6 124.4 0.9 2.2 77.9 26 100 42 32 26 0 extended device statistics cpu device r/s w/s kr/s kw/s wait actv svc_t %w %b us sy wt id sd3 21.0 18.8 168.0 150.4 0.7 2.4 78.4 33 100 14 18 69 0
You can see that there were one or two blocked processes shown in the "b" column of vmstat after a few seconds. This corresponds to the idle time moving to wait time as reported by iostat. The two commands weren't synchronized, so the output doesn't match exactly. Also the first line of output for each should be ignored as it's a long-term average.
This output was collected on a uniprocessor system. When there's only one CPU involved, and it's idle, then the CPU must truly be waiting for the I/O to complete. If the I/O completed more quickly, then the CPU could do more work, so wait time is the proportion of the time that the system is I/O bound.
On a multiprocessor system the situation gets more complex. All measurements are made on a per-processor basis, and these values can be seen using mpstat. During the clock interrupt, the total number of blocked jobs is added up for all the CPUs. If this total is nonzero, then, when the CPU state is obtained, every idle CPU is recorded as waiting rather than idle.
Even a single blocked I/O request can change every CPU from idle to "waiting." Because only one CPU can process that request, it's obviously an overestimate of the situation. However, this is the way that most Unix systems calculate this value. The resulting mpstat output is shown below for a nine-CPU system running Solaris 2.6 that was idle apart from a single find
command. You can see that there is some activity on CPU 9, but that every CPU is now waiting.
|
|
|
|
Figure 2 Example mpstat output on Solaris 2.6
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 0 0 0 8 0 0 0 0 0 0 0 92 8 1 0 0 0 1 1 6 0 0 0 0 0 0 0 92 8 8 0 0 0 368 368 12 0 0 0 0 4 0 0 92 8 9 0 103 2089 201 1 237 1 0 0 0 1395 4 4 92 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 92 8 13 0 0 0 0 0 6 0 0 0 0 0 0 0 92 8 16 0 0 0 0 0 5 0 0 0 0 0 0 0 92 8 20 0 0 0 0 0 9 0 0 0 0 0 0 0 92 8 21 0 0 0 1 1 7 0 0 0 0 29 0 0 92 8
A work-around for this inflation is to divide the wait time by the number of CPUs, and put the remainder back into idle time. It prevents the gross exaggeration of wait time, but it isn't accurate.
Some time ago when I was looking at the algorithm, I noticed that the number of blocked I/O requests was being maintained on a per-CPU basis. This means that the per-CPU blocked value could be used in the per-CPU calculation of wait time. I filed a bug, and it was accepted as a change for Solaris 2.7, which will be released shortly. In this case with the same workload the difference can be seen using mpstat. The I/O requests move from one CPU to another, so when examined at 5-second intervals the wait time could be spread over several CPUs. However, the overall level of wait time is far lower as it now accurately represents the number of CPUs that were waiting at each measurement interval. In this case, the command stayed primarily on CPU 5, and only that CPU shows high wait time on this 18-processor system.
Figure 3 Example mpstat output on Solaris 2.7
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 0 0 0 7 0 0 0 0 15 0 0 0 100 1 0 0 0 79 7 6 0 0 0 0 1 0 3 0 97 4 0 0 0 1 1 2 0 0 0 0 7 0 0 0 100 5 0 94 8063 2 0 201 1 0 0 0 1257 4 7 89 0 8 0 0 0 1 1 3 0 0 0 0 306 0 0 0 100 9 0 0 108 0 0 8 0 0 1 0 2 0 0 1 99 12 20 0 6 0 0 8 0 0 0 0 28 0 0 0 100 13 0 0 0 1 1 10 0 0 1 0 0 0 0 0 100 16 0 0 0 0 0 8 0 0 0 0 1 0 0 0 100 17 0 0 0 1 1 11 0 0 0 0 4 0 0 0 100 20 0 0 0 0 0 7 0 0 0 0 0 0 0 0 100 21 0 0 6 342 342 1 0 0 0 0 63 0 0 0 100 24 0 0 68 205 5 0 0 0 0 0 0 0 0 0 100 25 0 0 0 0 0 7 0 0 0 0 0 0 0 0 100 26 0 0 0 0 0 5 0 0 0 0 1 0 0 0 100 27 0 0 0 1 1 6 0 0 0 0 4 0 0 0 100 28 0 0 0 0 0 5 0 0 0 0 17 0 0 0 100 29 0 0 0 1 1 8 0 0 0 0 3 0 0 0 100
Wrap up
Even though I've gone over this issue many times, the wait time question still keeps coming back. Finally, with a fix in the next release of Solaris, we should have a more useful, accurate measure of the time in which large multiprocessor systems are I/O bound.
|
Resources
About the author
Adrian Cockcroft joined Sun Microsystems in 1988,
and currently works as a performance specialist for the Server Division
of SMCC. He wrote
Sun Performance and Tuning: SPARC and Solaris and Sun Performance and Tuning -- Java and the Internet, both
published by SunSoft Press
PTR Prentice Hall.
Reach Adrian at adrian.cockcroft@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-10-1998/swol-10-perf.html
Last modified: