Click on our Sponsors to help Support SunWorld

How busy is the CPU, really?

Unfortunately, CPU usage measurements are often inaccurate. Here's what you can do about it

June 1998

Abstract

CPU usage measurements aren't as accurate as you may think. If you measure the errors directly and analyze them, it becomes clear that CPU usage is significantly under reported, especially at low usage levels. Extrapolating from low workload levels to high ones is a risky business. Adrian describes the tool he built to examine this problem and shows us plots of his results. (2,200 words)

Mail this
article to
a friend

: In April's column you said that CPU usage is inaccurate -- but by how much, and does it matter?

: Error is minimal at high usage levels, but ranges up to 80 percent or more at low levels. The problem is that usage is under reported, and the range of error increases on faster CPUs. At a real usage level of 5 percent busy, you'll often see vmstat reporting that the system is only 1 percent busy -- under reporting by 80 percent of the true value. You could also look at this as a 400 percent error in the reported value.

As an example of the kind of problem this can cause, consider a system planned to cope with a load of up to 1000 users. If you measure the average process activity of the first 20 users, they only appear to use 1 percent of the system (but in fact use 5 percent). There appears to be sufficient capacity for 2000 users, but really there is only enough capacity for 400. As the total user load increases, and the measurement error reduces, the amount of CPU used by each user also appears to increase.

I built a tool to measure the errors, collected data on a few systems, and plotted the results. I would like to get more data, so the tool has been folded into an updated copy of the process monitoring update bundle. If you like, you can monitor accuracy on your own systems and send me the results. I'll start with a more detailed explanation of the problem, then describe the tool I built, and show you plots of the initial results.

Advertisements

CPU usage measurements
Normally, CPU time is measured by sampling, 100 times per second, the state of all CPUs at the clock interrupt. Process scheduling employs the same clock interrupt used to measure CPU usage, leading to systematic errors in the sampled data. Microstate accounting, discussed in April's Performance Q&A, is much more accurate than sampled measurements.

To illustrate how errors occur, I'll excerpt the following example from April's column:

Consider a performance monitor that wakes up every 10 seconds, reads some data from the kernel, then prints the results and sleeps. On a fast system, the total CPU time consumed per wake-up might be a few milliseconds. On exit from the clock interrupt, the scheduler wakes up processes and kernel threads that have been sleeping. Processes that sleep consume less than their allotted CPU time-quanta and always run at the highest timeshare priority.
On a lightly loaded system there is no queue for access to the CPU, so immediately after the clock interrupt, it's likely that the performance monitor will be scheduled. If it runs for less than 10 milliseconds it will have completed its task and be sleeping again by the time the next clock interrupt comes along. Now, given that CPU time is allocated based on what is running when the clock interrupt occurs, you can see that the performance monitor could be sneaking a bite of CPU time whenever the clock interrupt isn't looking.

In the diagram below, a process wakes up, then sleeps twice. The first wake-up occurs between clock ticks. The period is interrupted by the subsequent tick, which charges a full 10 milliseconds to the process. The next two wake-ups occur as a result of the clock interrupt scheduling the process. They complete before the subsequent interrupt, so there is no charge. The true measured CPU usage is measured by microstate accounting as 8.3 + 4.6 + 7.4 = 20.3 ms. The first wake-up is overestimated; the second and third are missed completely.

CPU usage error checking tool
I've already extended the SE toolkit to include a process class. This reports the measured CPU usage -- but if microstate accounting is not enabled for a process, then the value returned is just the same as the sampled usage. I modified the process class to report sampled CPU usage as a separate value, and to explicitly set the microstate accounting flags to enable accurate measurement of every process and its children.

I used the new programming interface that was introduced in Solaris 2.6; this tool doesn't work on older releases. In Solaris 2.4 to 2.5.1, microstate data is obtained by issuing an ioctl call with the PIOCUSAGE flag. This also automatically turns on microstate data collection. (This interface is still supported but will go away in a future release.) In Solaris 2.6, I obtain data by reading /proc/pid/usage, which no longer requires special permissions, but which also no longer turns on microstate data collection. The data returned is an approximation based on the sampled measurements. To turn on the flags, a control message is written to /proc/pid/ctl, which does require access permissions. To collect data for all the processes on the system, this code must be run as root.

The tool that collects data is called cpuchk.se, and is loosely based upon pea.se. It compares the sampled and measured data for each interval for each active process, then calculates the error and prints the results. It also calculates the overall CPU usage totals and the total, absolute, and maximum errors. The total error is lower, because positive and negative errors are allowed to cancel each other out. The absolute error is the sum of errors without any cancellation. The maximum is the highest absolute error seen. All errors are calculated relative to the accurately measured result. If you start with the inaccurate sampled result and try to calculate errors, they are much larger -- in some cases, infinite.

I ran cpuchk.se using several sample intervals. It doesn't seem to affect the results, so I started some long-term data collection on several machines with a 10-minute interval. This only collects long running processes, but keeps the load level from the cpuchk.se command itself to a minimum. Some sample output data is shown below. The first line shows the time of day, the number of processes, and the number of processes seen for the first time. Subsequent lines show the error for each active process. The last line shows how many processes were totaled. (System processes like sched and fsflush cannot have microstate enabled, so they are excluded.)

00:17:10 cpu time accuracy check  proc 45  new 0
pid  1435  meas  0.000  samp  0.000  err 100.00%
pid   316  meas  0.001  samp  0.000  err 100.00%
pid  1438  meas  0.000  samp  0.000  err 100.00%
pid   227  meas  0.001  samp  0.002  err  28.05%
pid   211  meas  0.011  samp  0.008  err  25.80%
pid   226  meas  0.032  samp  0.035  err   7.69%
pid   229  meas  0.018  samp  0.002  err  90.92%
pid   246  meas  0.083  samp  0.060  err  28.00%
pid   318  meas  0.000  samp  0.000  err 100.00%
pid   380  meas  0.143  samp  0.003  err  97.67%
pid  1439  meas  0.000  samp  0.000  err 100.00%
pid   357  meas  0.000  samp  0.000  err 100.00%
pid   518  meas  0.125  samp  0.000  err 100.00%
pid  7376  meas  0.041  samp  0.000  err 100.00%
pid  7377  meas  0.000  samp  0.000  err 100.00%
pid  6276  meas  0.156  samp  0.156  err   0.37%
pid  6262  meas  2.413  samp  0.002  err  99.93%
pid  9199  meas  0.221  samp  0.225  err   1.56%
pid  9200  meas  0.206  samp  0.202  err   2.20%
pid  9209  meas  2.308  samp  2.333  err   1.05%
msac   42  meas  5.763  samp  3.027  err -47.48%  abs  48.56%  max 100.00%
00:27:11 cpu time accuracy check  proc 45  new 0
pid  1435  meas  0.000  samp  0.000  err 100.00%
pid   316  meas  0.001  samp  0.000  err 100.00%
pid  1438  meas  0.000  samp  0.000  err 100.00%
pid   227  meas  0.003  samp  0.002  err  46.53%
pid   211  meas  0.010  samp  0.007  err  35.85%
pid   226  meas  0.032  samp  0.022  err  31.33%
pid   229  meas  0.015  samp  0.002  err  88.77%
pid   246  meas  0.074  samp  0.052  err  30.70%
pid   318  meas  0.002  samp  0.000  err 100.00%
pid   380  meas  0.143  samp  0.000  err 100.00%
pid  1439  meas  0.000  samp  0.000  err 100.00%
pid   357  meas  0.000  samp  0.000  err 100.00%
pid   518  meas  0.122  samp  0.000  err 100.00%
pid   379  meas  0.144  samp  0.126  err  12.00%
pid  7376  meas  0.040  samp  0.000  err 100.00%
pid  7377  meas  0.000  samp  0.000  err 100.00%
pid  6276  meas  0.155  samp  0.155  err   0.00%
pid  6262  meas  2.410  samp  0.003  err  99.86%
msac   39  meas  3.152  samp  0.368  err -88.33%  abs  88.33%  max 100.00%

Analysis and graphing results
I extracted the measured CPU time and the absolute error from the output using awk and fed it into a statistics package (S-PLUS from www.statsci.com). After looking at the data for individual processes for a while, I decided to concentrate on the summaries for each measurement interval. First I plotted both of them together in time sequence, then I plotted error as a function of CPU usage. The relationship is basically an inverse one, so I fitted and displayed an inverse relationship line. The systems I monitored were a SPARCstation 10 with dual 60-MHz CPUs, an E4000 with four 168-MHz CPUs, an Ultra 1/170, and a Tadpole 85-MHz microSPARC laptop. Not an ideal mix, but enough to investigate the effect of CPU speed and workload variations.

The SPARCstation 10 with dual 60-MHz CPUs is a lightly used Web server that runs CPU-intensive batch jobs from cron at regular intervals. The time-based plot shows that it is mostly idle with regular batch jobs.

Figure 1: Discrepancy (error)
between reported CPU usage
versus actual CPU usage
measured over time on
a SPARCstation 10 with
dual 60-MHz CPUs. (CPU useage
is denoted by the line, error
is denoted by the dots.)

Errors show a good fit to the inverse line, probably because the workload doesn't vary much.

Figure 2. Discrepancy (error) in
sampled CPU usage reported versus
actual CPU usage measured on a
SPARCStation 10.

The E4000 with four 168-MHz CPUs is a workgroup server that runs e-mail and NFS services, among other things.

Figure 3. Time-based plot of discrepancy
(error) between reported CPU usage
and actual CPU usage measured
on E4000 4x186MHz CPU. (CPU useage
is denoted by the line, error
is denoted by the dots.)

The workload mix varies, but the fit is still a reasonable one. The data falls into several distinct curves, but they are close together.

Figure 4. Discrepancy (error) in sampled CPU
usage reported versus actual CPU usage
measured on E4000 4x186MHz

The Ultra 1/170 was running the CDE window system. It included some Web browser screens with animated GIFs and a Java application that started towards the end. The Java application ran a busy/idle loop and consumed about 6 percent of the CPU while reporting less than 0.5 percent. Overall, this period sustained a real usage rate of 8.8 percent with only 1.1 percent reported via sampling.

Figure 5. Discrepancy (error) in CPU usage
reported versus actual CPU usage
measured over time for 167-MHz Ultra 1/170
(CPU useage
is denoted by the line,
error is denoted by the dots.)

When we look at the error on this system as a function of the measured usage, it shows several separate clusters of data, each of which could have its own fitted curve. No overall curve could be fitted to this data.

Figure 6. Discrepancy (error) of reported
CPU usage versus actual CPU usage
measured for 167-MHz Ultra 1/170

Finally, on a much slower CPU -- the 85-MHz microSPARC -- the error levels are smaller, as we would expect.

Figure 7. Discrepancy (error) of reported
CPU usage versus actual CPU usage
measured over time for 85-MHz microSPARC
(CPU useage is denoted by the line, error
is denoted by the dots.)

The measured load level was on the low side all the time, and the results are too scattered to obtain a good fit.

Figure 8. Discepancy (error) of sampled CPU
usage reported versus actual CPU usage
measured for 85-MHz microSPARC

Wrap up
These errors are significant. They may explain why you never seem to be able to scale a workload up as far as you'd expect to from an apparently low-usage level to a high one.

This problem gets worse on faster CPUs and as more CPUs are added to a system. In the future, CPU measurement will be less and less accurate. This problem isn't specific to Solaris 2. It's a generic Unix problem that probably affects other operating systems as well. Not many operating systems support high-resolution measured CPU usage data.

I'm interested to see what the data looks like for more varieties of workload and will be doing some more tests. If you don't mind collecting data and sending it to me, I'd appreciate the input. To get systemwide data, cpuchk.se needs to be run on Solaris 2.6 as root -- so take care, and avoid production systems.

There is not a lot you can do to solve this problem. The sampled data collection is inaccurate, but it is very low overhead. Performance tools that look at per-process CPU usage should use microstate enabled data. Even on a single system there is no simple calibration that can be applied to correct the errors, as they vary depending upon the workload.

You can download a tar file from the regular SE3.0 download page that contains updated workload and process classes, pea.se and pw.se, cpuchk.se, a new version of the proc.se header file, and the pw.sh script. When you untar it as root, it automatically puts the SE files in the /opt/RICHPse directory, and it puts pw.sh in your current directory.

Click on our Sponsors to help Support SunWorld

Resources

"Processor partioning," April 1998 Performance Q&A column http://www.sun.com/sunworldonline/swol-05-1998/swol-05-perf.html
The SE3.0 Toolkit page http://www.sun.com/sun-on-net/performance/se3
SE Toolkit FAQ, Adrian's January 1998 SunWorld Performance Q&A column http://www.sun.com/sunworldonline/swol-01-1998/swol-01-perf.html
See Adrian Cockcroft's frequently asked questions for answers to three dozen performance-related questions. Subjects covered include performance monitoring commands, tuning variables, logins and processes, how to interpret the output of performance measurements, and how to optimize Web servers and news servers. http://www.sun.com/sunworldonline/common/cockcroft.letters.html
virtual_adrian.se rule http://www.sun.com/951001/columns/adrian/column2.html
Interested in Web server performance? Go to SunWorld's Site Index http://www.sun.com/sunworldonline/common/swol-siteindex.html#webperf
If you want to build performance tools and utilities, get a copy of the SE Performance Toolkit Version 2.5.0.2 http://www.sun.com/960601/columns/adrian/se2.5.html
Sun Performance and Tuning -- Java and the Internet, by Adrian Cockcroft and Richard Pettit, Sun Press/PTR Prentice Hall, ISBN 0-13-095249-4 http://www.sun.com/books/catalog/Cockcroft/Cockcroft.html
Adrian Cockcroft's profile (complete with low- and high-bandwidth bios) http://www.sun.com/950901/columns/adrian/adrian.html
A full listing of Adrian Cockcroft's other Performance Q&A columns in SunWorld http://www.sun.com/sunworldonline/common/swol-backissues-columns.html#perf

Other Cockcroft columns at www.sun.com

"New Release of the SE Performance Toolkit" http://www.sun.com/960301/columns/adrian/column7.html
"Solaris 2.5 Performance Update" http://www.sun.com/960201/columns/adrian/
"Confessions of an Ultra 1 User" http://www.sun.com/951107/columns/adrian/column3.html
"Advanced Monitoring and Tuning" http://www.sun.com/951001/columns/adrian/column2.html
"System Performance Monitoring" http://www.sun.com/950901/columns/adrian/column1.html

About the author
Adrian Cockcroft joined Sun Microsystems in 1988, and currently works as a performance specialist for the Server Division of SMCC. He wrote Sun Performance and Tuning: SPARC and Solaris, published by SunSoft Press PTR Prentice Hall. Reach Adrian at adrian.cockcroft@sunworld.com.

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-06-1998/swol-06-perf.html
Last modified:

Comments:
Name:
Email:
Company Name: