Swamped by several competing workloads on the same machine? Learn how you can divvy them up into processor sets
When several workloads compete for CPU time on a large system, you can divide the CPUs into sets and bind each workload to a different set to constrain it. This month Adrian looks at how this works and where it can be used effectively. (1,500 words)
: I want to run several very different workloads on the same machine. How can I make sure that one workload doesn't take over all the CPU power and crowd out another one?
: In the past it was common to use several systems -- one to run each workload. Nowadays, systems are so powerful and scalable that it's simpler to use one machine and run everything on it at once. A new feature in Solaris 2.6 allows a multiprocessor machine to be partitioned into processor sets, and constrain each workload to use only the processors in a single set.
I mention processor sets in the second edition of my book -- which, by the way, is now available (finally!); but I don't go into any detail of how they work or how to use them. In this column I'll explain more about this new feature of Solaris 2.6 and how it can be used. Let's start by taking a look at the manual page for the
psrset command, which is all the information that is
provided as standard.
Maintenance Commands psrset(1M) NAME psrset - creation and management of processor sets SYNOPSIS psrset -c [ processor_id ... ] psrset -d processor_set_id psrset -a processor_set_id processor_id ... psrset -r processor_id ... psrset - creation and management of processor sets psrset -p [ processor_id ... ] psrset -b processor_set_id pid ... psrset -u pid ... psrset -q [ pid ... ] psrset [ -i ] [ processor_set_id ... ]
DESCRIPTIONpsrset controls the management of processor sets. Processor sets allow the binding of processes to groups of processors, rather than just a single processor. There are two types of processor sets, those created by the user using the psrset command or the pset_create(2) system call, and those automatically created by the system. Processors assigned to user-created processor sets will run only LWPs that have been bound to that processor set, but system processor sets may run other LWPs as well.
A single systemwide processor set is created on a multiprocessor system by default. One reason for implementing processor sets is to provide support for NUMA architecture systems that have groups of processors with fast communications, connected by a slower interconnect. In that case a system processor set would be set up for each group. This feature is not needed on Sun's Enterprise server range, as all processors are on a single fast interconnect.
The initial state is that all CPUs belong to an initial system processor set. Additional sets can be created by the system administrator by taking CPUs away from the system set. The kernel only uses the system set for normal operations, although interrupts are handled by processors regardless of which set they belong to. There will always be at least one CPU left in the system processor set -- for example, NFS server services will only run on the system processor set.
If you have a mix that includes some NFS service that needs to be constrained this is one way to do that. In general the system set should be as large as possible, perhaps shared with one of your regular workloads, so that you don't starve the kernel of CPU time.
Sun's published dual TPC-C and TPC-D result
Sun recently published a fully audited benchmark where we run an online transaction processing TPC-C workload on the same machine at the same time as a data warehouse TPC-D workload. This was managed using processor sets. A 16-CPU E6000 was divided into an eight-CPU system processor set and an additional eight-CPU user-created set. A single copy of IBM's DB2 Universal Server database code was used to create two database instances on separate parts of the disk subsystem. When the benchmark was run, the continuous small TPC-C transactions ran at a constant rate, providing good response times to the online users. The large and varied TPC-D transactions were constrained and did not affect the online user response times. The overall throughput is less than it could have been if the idle time in each set could be used by the other workload, but consistency of steady state response times and throughput is a requirement for an audited TPC-C result, and it could not be achieved without using processor sets in this way.
The TPC-C summary is at: http://www.tpc.org/results/individual_results/Sun/sun.ue6000.ibm.es.pdf
The TPC-D summary is at: http://www.tpc.org/results/individual_results/Sun/sun.ue6000.ibm.d.es.pdf
How does it work?
Solaris maintains a queue of jobs that are ready to run on a per-CPU basis. There is no single global run queue. Older versions of Solaris implement processor binding using the
command and underlying system calls. A process is bound to a CPU
pbind, but it isn't exclusive. Other work can also
run on that CPU. With
psrset, the binding is to a group
of CPUs, but it is also an exclusive binding, and nothing else will
be scheduled to run on that set. It's possible to use
pbind within any set, to give a further level of
control over resource usage.
psrset works is to create a kind of virtual
machine for scheduling purposes. Once a process is bound to that
set, all child processes are also bound to that set, so it is
sufficient to bind a shell or startup script for an application.
Bindings can only be made if you have root permissions.
The system normally keeps a linked list of the online processors. Each processor has its own run queue. When a kernel thread is to be placed on a run queue, it goes through some various machinations and decides where the thread should be placed. Normally this is the same processor on which it last ran, but sometimes it changes processors (migrates) for various reasons (load balancing, etc.).
With processor sets, we can split up the list of processors into disjoint subsets. When you create a processor set, you create a new list with the processors that are in the set. The processors are taken out of the "normal" list of processors that run everything not in the set. Processes assigned to the set run on the processors in the set's list and can migrate between them. Other processes and normal (non-interrupt) kernel threads cannot run on those processors; they no longer have access to them. It's as if the processors have been taken offline. The exception is kernel threads that may be bound to a specific processor for one reason or another, but this is unusual.
Interrupts are taken on whichever CPU normally takes that interrupt,
but any subsequent activity will take place in the system processor set.
mpstat command can be used to see the distribution of interrupts and load over all the CPUs.
% mpstat 5 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 58 8 1459 822 610 1306 171 242 96 30 609 6 67 27 0 1 36 8 1750 1094 657 1100 151 238 104 28 717 6 76 18 0 4 53 7 1518 951 759 1111 155 226 95 29 642 6 69 24 0 5 25 7 1715 1067 765 1104 178 232 111 23 552 7 65 28 0
Thanks to Andy Tucker for implementing processor sets and providing some of the explanations provided above.
If you have technical problems with this magazine, contact email@example.com