Click on our Sponsors to help Support SunWorld
Performance Q & A by Adrian Cockcroft

How does swap space work?

Too much swap space and you waste disk space.
Too little swap and your system will grind to a halt

April  1996
[Next story]
[Table of Contents]
Subscribe to SunWorld, it's free!

Too much swap space, and you are wasting your disk. Too little swap space and your system will grind to a halt spewing errors on the console. Too few swap disks and your system may run slowly. The author suggests ways to size and monitor swap space; what the measurements mean; and how to tell if it is a performance bottleneck. (2,700 words)

Mail this
article to
a friend

Q: How much swap space should I allocate, and what is the best configuration for good performance? Why don't vmstat and sar agree on the quantity of swap space? -- swapless in Shawano

First, I had better explain what "swap space" is used for. Encompassing RAM and the disk space dedicated to it, swap space holds the virtual memory of the system. Every program you run occupies a certain amount of virtual memory. Once all your virtual memory has been allocated to specific applications you cannot start new programs and currently running programs may fail if they try to grow. Physical memory is what you know as the RAM of the system. If you use up all the RAM, your system may run more slowly, but you can still start more programs because the swap space absorbs excess data. The physical memory contains the current "working set" of virtual memory -- this means the parts of your applications that are actually running on the CPU.

You usually configure enough RAM to run your main application, or the parts of it that are active at the same time. You may have a window system with lots of other applications iconized or dormant in the background. These dormant applications gradually get their RAM stolen from them, and they migrate to the swap space on disk. You will notice that when you open up a long-forgotten window it takes a while to respond, and you may hear the disk heads rattling as it is read back into RAM.

The important thing to realize about swap space is that it is the combined total size of every program running and dormant on the system that matters. When a system runs out of swap space it can be very difficult to recover. Sometimes you find that there is insufficient swap space left to login as root or run the commands needed to kill the errant process that is consuming all the swap space.

There are two possible situations to consider. If you are prepared to keep track of your swap space, and administer it regularly, then you can run with "just enough" swap space. If you don't want the hassle and can spare some disk space in return for an easier life, then you should run with "lots" of swap space.

One extra thing to note: Swap works differently in Solaris 2 as compared with other Unix systems, including SunOS 4. These systems must always have some swap space, and it must be bigger than RAM. Every program in RAM has its total size reserved on the swap disk in case it needs to be swapped out to disk. Since there are systems with 5 gigabytes (SPARCcenter 2000) or more of RAM, it seems ridiculous that systems that already have huge RAM capacity would need huge swap disks that would probably not be used. Solaris 2 changes the rules by adding the RAM and the disk space. If you can buy enough RAM for your workload, you can run with no swap disk at all! In practice common database applications that are sized to run in a few gigabytes of RAM will actually need many gigabytes of disk allocated as swap space.


Keep track of your own swap space for small desktop systems
Most application vendors can tell you how much swap space their application needs. If you have no idea how much swap space you will need, configure at least 64 megabytes of virtual memory to start with. It's easy to add more later so don't go overboard. With SunOS 4, your swap space should be bigger than your RAM, but at least a 64-megabyte swap partition will be needed. With Solaris 2 the swap partition size should be the difference between 64 megabytes and the RAM size; that is 48-megabyte swap with 16 megabytes RAM, 32-megabyte swap with 32-megabyte RAM, no swap partition at all with 64 megabytes or more RAM. If your application vendor says a Solaris 2 application needs 64 megabytes of RAM and 128 megabytes of swap, this adds up to 192 megabytes of virtual memory. You could configure 96 megabytes of RAM and 96 megabytes of swap instead. If you run out of swap space, make a swap file (I put them in /swap) or add more RAM. If you are running the CDE window system, or a mixture of OpenWindows and Motif applications you will likely need more swap.

Swap Space Requirements for NIS+ servers
NIS+ is both more flexible and more complex than NIS. It actually needs a lot of CPU power to process the encryption and decryption needed for secure, network-based administration, and it requires more swap space than you might expect. When the server process is large, the forked child processes can lead to a large requirement for swap space. NIS+ servers for large and complex NIS+ domains often need several hundred megabytes of swap space to service some kinds of client requests. For example the niscat command causes the server to fork so that it can send back a long stream of information. The nismatch command does a simple lookup and, thus, does much less work. You should try to avoid piping niscat into grep, when nismatch could be used.

Swap space requirements for database and timeshared servers
The consequences of running out of swap space affect a larger number of users on a big server, so it wise to allocate a lot more than you normally need to cope with any usage peaks. To start with, add twice as much disk as you have RAM.

How to add swap space for good performance
Swap performance only makes a difference when you are short of RAM. If the system is not paging, it makes no difference. When you are paging, it is easy to overwhelm a single swap disk, so try to add more swap disks when the existing one(s) get busy. Swap space is allocated in a round-robin fashion over all swap disks, so the workload is naturally spread over them all in a crude manner. It is not worth making a striped metadevice to swap on -- that would just add overhead and slow it down. There is also a limit of 2 gigabytes on the size of each swap partition or file, so striping disks together tends to make them too big. You can add as many swap partitions as you like. There is no limit to the total size of swap in Solaris 2.

Commands for monitoring swap space use
This part of Solaris 2 has some code derived from System V and some from BSD4.3 via SunOS 4, but the algorithm has been redesigned and is unique to Solaris 2. Swapping is now a scheduler function, Solaris 2.3 swaps in some extreme circumstances; Solaris 2.4 and 2.5 implement a new swapping algorithm, designed to help performance on small-memory desktop machines. The paging process is designed to free memory as fast as possible. Page-outs are queued and clustered so that the random page-outs are organized into large sequential writes to the swap space. This makes page-out very efficient, but page-in is still random. You will see larger than normal disk service times due to large sequential writes that often reach 200 kilobytes each. These can get in the way of page-in reads that occur randomly and one page at a time. For this reason I like to keep swap disks less than 30 percent busy for good performance.

Most commercial performance monitoring tools keep track of swap space, or can be configured to generate a warning when it gets low. The threshold I use by default in my SE Toolkit rules is set to start warning when there is less than 10 megabytes left, and complain more when there is less than 4 megabytes left. This was based on my own desktop system. I found that with less than 4 megabytes mailtool cannot fork, this causes it to fail when I try to send a message. On a larger server these thresholds should be increased to give you an earlier warning.

Process size
/usr/ucb/ps alx, fields SZ or SIZE, /usr/proc/bin/pmap
% /usr/ucb/ps alx
 8  2595  1133  1130  0  48 20  988  360 modlinka S pts/4     0:00 -bin/csh

Be careful! The System V version /bin/ps prints a field labelled SZ, but this is the resident set size in RAM -- printed as RSS by the /usr/ucb/ps. You need to use the SZ or SIZE field reported by /usr/ucb/ps alx in units of kilobytes to determine the amount of swap space used by the process. Really huge processes can be hard to figure out as the SIZE and RSS numbers run together. If you want to get at the data cleanly, you could easily modify the script provided with the SE Performance Toolkit Version 2.5 to print whatever you want. You should also beware of processes that map hardware device space. These device mappings do not use swap space. For example the SIZE of the X server process includes a lot of device space and it is often the largest process on a system. The mapped Creator3D address space is around 100 megabytes! If you are running Solaris 2.5 you could try to figure out the address space mappings using the new /usr/proc/bin/pmap command.

Swapped Queue: vmstat procs w and sar -q swpq-sz, swpocc
% vmstat
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr f0 s2 s3 s5   in   sy   cs us sy id
 0 0 0  82368 14776   0   3  3  1  1  0  0  0  1  0  0  167  679  133  3  1 96
% sar -q 1

SunOS bloodnok 5.5 Generic sun4m    03/20/96

22:55:06 runq-sz %runocc swpq-sz %swpocc
This is the total number of dormant processes currently swapped out to free up all their RAM. In Solaris 2.4 and later releases, you may find that some of the system daemons that are started but never used get swapped out during a busy period. swpocc is the proportion of the time that there is something in the swap queue. This is never a sign of a performance problem, as the only processes that will be swapped out are ones that are completely dormant. An active process may have pages stolen from it during a RAM shortage, but it will never be completely swapped out.
Swap Space: vmstat swap, sar -r freeswap, and swap -s
vmstat swap shows the available swap in kilobytes, sar -r freeswap shows the free swap in 512-byte blocks, and swap -s shows several measures including available swap. They do not measure the same thing! In the example shown in the figure below, available swap is about 34 megabytes whereas free swap is about 42 megabytes, the 8 megabytes of reserved swap shown by swap -s is the difference. swap -s available + swap -s reserved = sar -r freeswap.
hostname% sar -r 1
SunOS hostname 5.3 Generic sun4c    06/26/94
16:46:36 freemem freeswap
16:46:37     307    85104
hostname% swap -s
total: 35856k bytes allocated + 8532k reserved = 44388k used, 34172k available
hostname% vmstat 1
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr f0 s3 s5 --   in   sy   cs us sy id
 0 0 0   8808  4200   0   3  1  0  0  0  0  0  0  0  0   78  284  192  6  3 92
 0 0 0  34144  1320   0  14  0  0  0  0  0  0  0  0  0   30  169  144  1  2 97

You can also use swap -l to list the individual swap partitions and sizes. Remember that each swap partition or file can only be up to 2 gigabytes in size. If you try to use a larger disk it will only use the first 2 gigabytes.

Pages Swapped In: vmstat -S si and sar -w swpin, bswin, Pages Swapped Out: vmstat so and sar -w swpot, bswot

% vmstat -S 5
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  si  so pi po fr de sr f0 s2 s3 s5   in   sy   cs us sy id
 0 0 0  82272 14796   0   0  3  1  1  0  0  0  1  0  0  168  692  135  3  1 96
 0 0 0  73624 16640   0   0  0  0  0  0  0  0  0  0  0   19  306   54  0  0 100
% sar -w 1

SunOS bloodnok 5.5 Generic sun4m 03/20/96

23:03:32 swpin/s bswin/s swpot/s bswot/s pswch/s 23:03:33 0.00 0.0 0.00 0.0 114

vmstat -S si reports the number of kilobytes per second swapped in, sar -w swpin reports the number of swap-in operations, and sar -w bswin reports the number of 512-byte blocks swapped in. They will usually show zero.

vmstat -S so reports the number of kilobytes per second swapped out, sar -w swpot reports the number of swap-out operations and sar -w bswot reports the number of 512-byte blocks swapped out.

Pages Paged In: vmstat pi and sar -p pgin, ppgin, Pages Paged Out: vmstat po and sar -g pgout, ppgout
% vmstat 5
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr f0 s2 s3 s5   in   sy   cs us sy id
 0 0 0  82272 14796   0   3  3  1  1  0  0  0  1  0  0  168  692  135  3  1 96
 0 0 0  73624 16640   0   2  0  0  0  0  0  0  0  0  0   19  320   52  1  0 99
% sar -p 5

SunOS bloodnok 5.5 Generic sun4m 03/20/96

23:06:06 atch/s pgin/s ppgin/s pflt/s vflt/s slock/s 23:06:11 0.00 0.00 0.00 8.91 6.93 0.00 % sar -g 5

SunOS bloodnok 5.5 Generic sun4m 03/20/96

23:10:22 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf 23:10:27 0.00 0.00 0.00 0.00 0.00

vmstat pi reports the number of kilobytes per second and sar reports the number of page faults and the number of pages paged in by swap space or file system reads. Since the filesystem block size is 8 kilobytes, there are often two pages or 8 kilobytes paged in per page fault on systems with 4 kilobyte pages. Note that UltraSPARC systems (and the earliest sun4 systems) have 8 kilobyte pages. Everything from the SPARCstation 1 up to UltraSPARC uses 4 kilobytes.

vmstat po reports the number of kilobytes per second and sar reports the number of page-outs and the number of pages paged out to the swap space or file system. Because of the clustering that occurs on swap space writes, there may be very many pages written per page-out.

Click on our Sponsors to help Support SunWorld