How does swap space work?
Too much swap space and you waste disk space.
Too much swap space, and you are wasting your disk. Too little swap space and your system will grind to a halt spewing errors on the console. Too few swap disks and your system may run slowly. The author suggests ways to size and monitor swap space; what the measurements mean; and how to tell if it is a performance bottleneck. (2,700 words)
Q: How much swap space should I allocate, and what is the best configuration for good performance? Why don't vmstat and sar agree on the quantity of swap space? -- swapless in Shawano
First, I had better explain what "swap space" is used for. Encompassing RAM and the disk space dedicated to it, swap space holds the virtual memory of the system. Every program you run occupies a certain amount of virtual memory. Once all your virtual memory has been allocated to specific applications you cannot start new programs and currently running programs may fail if they try to grow. Physical memory is what you know as the RAM of the system. If you use up all the RAM, your system may run more slowly, but you can still start more programs because the swap space absorbs excess data. The physical memory contains the current "working set" of virtual memory -- this means the parts of your applications that are actually running on the CPU.
You usually configure enough RAM to run your main application, or the parts of it that are active at the same time. You may have a window system with lots of other applications iconized or dormant in the background. These dormant applications gradually get their RAM stolen from them, and they migrate to the swap space on disk. You will notice that when you open up a long-forgotten window it takes a while to respond, and you may hear the disk heads rattling as it is read back into RAM.
The important thing to realize about swap space is that it is the combined total size of every program running and dormant on the system that matters. When a system runs out of swap space it can be very difficult to recover. Sometimes you find that there is insufficient swap space left to login as root or run the commands needed to kill the errant process that is consuming all the swap space.
There are two possible situations to consider. If you are prepared to keep track of your swap space, and administer it regularly, then you can run with "just enough" swap space. If you don't want the hassle and can spare some disk space in return for an easier life, then you should run with "lots" of swap space.
One extra thing to note: Swap works differently in Solaris 2 as compared with other Unix systems, including SunOS 4. These systems must always have some swap space, and it must be bigger than RAM. Every program in RAM has its total size reserved on the swap disk in case it needs to be swapped out to disk. Since there are systems with 5 gigabytes (SPARCcenter 2000) or more of RAM, it seems ridiculous that systems that already have huge RAM capacity would need huge swap disks that would probably not be used. Solaris 2 changes the rules by adding the RAM and the disk space. If you can buy enough RAM for your workload, you can run with no swap disk at all! In practice common database applications that are sized to run in a few gigabytes of RAM will actually need many gigabytes of disk allocated as swap space.
Keep track of your own swap space for small desktop systems
Most application vendors can tell you how much swap space their application needs. If you have no idea how much swap space you will need, configure at least 64 megabytes of virtual memory to start with. It's easy to add more later so don't go overboard. With SunOS 4, your swap space should be bigger than your RAM, but at least a 64-megabyte swap partition will be needed. With Solaris 2 the swap partition size should be the difference between 64 megabytes and the RAM size; that is 48-megabyte swap with 16 megabytes RAM, 32-megabyte swap with 32-megabyte RAM, no swap partition at all with 64 megabytes or more RAM. If your application vendor says a Solaris 2 application needs 64 megabytes of RAM and 128 megabytes of swap, this adds up to 192 megabytes of virtual memory. You could configure 96 megabytes of RAM and 96 megabytes of swap instead. If you run out of swap space, make a swap file (I put them in /swap) or add more RAM. If you are running the CDE window system, or a mixture of OpenWindows and Motif applications you will likely need more swap.
Swap Space Requirements for NIS+ servers
NIS+ is both more flexible and more complex than NIS. It actually needs a lot of CPU power to process the encryption and decryption needed for secure, network-based administration, and it requires more swap space than you might expect. When the server process is large, the forked child processes can lead to a large requirement for swap space. NIS+ servers for large and complex NIS+ domains often need several hundred megabytes of swap space to service some kinds of client requests. For example the niscat command causes the server to fork so that it can send back a long stream of information. The nismatch command does a simple lookup and, thus, does much less work. You should try to avoid piping niscat into grep, when nismatch could be used.
Swap space requirements for database and timeshared servers
The consequences of running out of swap space affect a larger number of users on a big server, so it wise to allocate a lot more than you normally need to cope with any usage peaks. To start with, add twice as much disk as you have RAM.
How to add swap space for good performance
Swap performance only makes a difference when you are short of RAM. If the system is not paging, it makes no difference. When you are paging, it is easy to overwhelm a single swap disk, so try to add more swap disks when the existing one(s) get busy. Swap space is allocated in a round-robin fashion over all swap disks, so the workload is naturally spread over them all in a crude manner. It is not worth making a striped metadevice to swap on -- that would just add overhead and slow it down. There is also a limit of 2 gigabytes on the size of each swap partition or file, so striping disks together tends to make them too big. You can add as many swap partitions as you like. There is no limit to the total size of swap in Solaris 2.
Commands for monitoring swap space use
This part of Solaris 2 has some code derived from System V and some from BSD4.3 via SunOS 4, but the algorithm has been redesigned and is unique to Solaris 2. Swapping is now a scheduler function, Solaris 2.3 swaps in some extreme circumstances; Solaris 2.4 and 2.5 implement a new swapping algorithm, designed to help performance on small-memory desktop machines. The paging process is designed to free memory as fast as possible. Page-outs are queued and clustered so that the random page-outs are organized into large sequential writes to the swap space. This makes page-out very efficient, but page-in is still random. You will see larger than normal disk service times due to large sequential writes that often reach 200 kilobytes each. These can get in the way of page-in reads that occur randomly and one page at a time. For this reason I like to keep swap disks less than 30 percent busy for good performance.
Most commercial performance monitoring tools keep track of swap space, or can be configured to generate a warning when it gets low. The threshold I use by default in my SE Toolkit rules is set to start warning when there is less than 10 megabytes left, and complain more when there is less than 4 megabytes left. This was based on my own desktop system. I found that with less than 4 megabytes mailtool cannot fork, this causes it to fail when I try to send a message. On a larger server these thresholds should be increased to give you an earlier warning.
% /usr/ucb/ps alx F UID PID PPID CP PRI NI SZ RSS WCHAN S TT TIME COMMAND 8 2595 1133 1130 0 48 20 988 360 modlinka S pts/4 0:00 -bin/csh
Be careful! The System V version /bin/ps prints a field labelled SZ, but this is the resident set size in RAM -- printed as RSS by the /usr/ucb/ps. You need to use the SZ or SIZE field reported by /usr/ucb/ps alx in units of kilobytes to determine the amount of swap space used by the process. Really huge processes can be hard to figure out as the SIZE and RSS numbers run together. If you want to get at the data cleanly, you could easily modify the ps-ax.se script provided with the SE Performance Toolkit Version 2.5 to print whatever you want. You should also beware of processes that map hardware device space. These device mappings do not use swap space. For example the SIZE of the X server process includes a lot of device space and it is often the largest process on a system. The mapped Creator3D address space is around 100 megabytes! If you are running Solaris 2.5 you could try to figure out the address space mappings using the new /usr/proc/bin/pmap command.
vmstat procs wand
sar -q swpq-sz, swpocc
% vmstat procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s2 s3 s5 in sy cs us sy id 0 0 0 82368 14776 0 3 3 1 1 0 0 0 1 0 0 167 679 133 3 1 96 % sar -q 1 SunOS bloodnok 5.5 Generic sun4m 03/20/96 22:55:06 runq-sz %runocc swpq-sz %swpocc 22:55:07This is the total number of dormant processes currently swapped out to free up all their RAM. In Solaris 2.4 and later releases, you may find that some of the system daemons that are started but never used get swapped out during a busy period.
swpoccis the proportion of the time that there is something in the swap queue. This is never a sign of a performance problem, as the only processes that will be swapped out are ones that are completely dormant. An active process may have pages stolen from it during a RAM shortage, but it will never be completely swapped out.
sar -r freeswap, and
vmstat swapshows the available swap in kilobytes,
sar -r freeswapshows the free swap in 512-byte blocks, and
swap -sshows several measures including available swap. They do not measure the same thing! In the example shown in the figure below, available swap is about 34 megabytes whereas free swap is about 42 megabytes, the 8 megabytes of reserved swap shown by
swap -sis the difference.
swap -savailable +
swap -sreserved =
sar -r freeswap.
hostname% sar -r 1 SunOS hostname 5.3 Generic sun4c 06/26/94 16:46:36 freemem freeswap 16:46:37 307 85104 hostname% swap -s total: 35856k bytes allocated + 8532k reserved = 44388k used, 34172k available hostname% vmstat 1 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s3 s5 -- in sy cs us sy id 0 0 0 8808 4200 0 3 1 0 0 0 0 0 0 0 0 78 284 192 6 3 92 0 0 0 34144 1320 0 14 0 0 0 0 0 0 0 0 0 30 169 144 1 2 97
You can also use
swap -l to list the individual swap partitions and sizes. Remember that each swap partition or file can only be up to 2 gigabytes in size. If you try to use a larger disk it will only use the first 2 gigabytes.
vmstat -S siand
sar -w swpin, bswin, Pages Swapped Out:
sar -w swpot, bswot
% vmstat -S 5 procs memory page disk faults cpu r b w swap free si so pi po fr de sr f0 s2 s3 s5 in sy cs us sy id 0 0 0 82272 14796 0 0 3 1 1 0 0 0 1 0 0 168 692 135 3 1 96 0 0 0 73624 16640 0 0 0 0 0 0 0 0 0 0 0 19 306 54 0 0 100 % sar -w 1
SunOS bloodnok 5.5 Generic sun4m 03/20/96
23:03:32 swpin/s bswin/s swpot/s bswot/s pswch/s 23:03:33 0.00 0.0 0.00 0.0 114
vmstat -S si reports the number of kilobytes per second swapped in,
sar -w swpin
reports the number of swap-in operations, and
sar -w bswin reports the
number of 512-byte blocks swapped in. They will usually show zero.
vmstat -S so reports the number of kilobytes per second swapped out,
sar -w swpot
reports the number of swap-out operations and
sar -w bswot reports the
number of 512-byte blocks swapped out.
sar -p pgin, ppgin, Pages Paged Out:
sar -g pgout, ppgout
% vmstat 5 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s2 s3 s5 in sy cs us sy id 0 0 0 82272 14796 0 3 3 1 1 0 0 0 1 0 0 168 692 135 3 1 96 0 0 0 73624 16640 0 2 0 0 0 0 0 0 0 0 0 19 320 52 1 0 99 % sar -p 5
SunOS bloodnok 5.5 Generic sun4m 03/20/96
23:06:06 atch/s pgin/s ppgin/s pflt/s vflt/s slock/s 23:06:11 0.00 0.00 0.00 8.91 6.93 0.00 % sar -g 5
SunOS bloodnok 5.5 Generic sun4m 03/20/96
23:10:22 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf 23:10:27 0.00 0.00 0.00 0.00 0.00
vmstat pi reports the number of kilobytes per
sar reports the number of page faults and the
number of pages paged in by swap space or file system reads. Since the
filesystem block size is 8 kilobytes, there are often two pages or 8
kilobytes paged in per page fault on systems with 4 kilobyte pages. Note that
UltraSPARC systems (and the earliest sun4 systems) have 8 kilobyte pages.
Everything from the SPARCstation 1 up to UltraSPARC uses 4 kilobytes.
vmstat po reports the number of kilobytes
per second and
sar reports the number of page-outs and the
number of pages paged out to the swap space or file system. Because of
the clustering that occurs on swap space writes, there may be very many
pages written per page-out.
About the author
Adrian Cockcroft joined Sun in 1988, and currently works as a performance specialist for the Server Division of SMCC. He is the author of Sun Performance and Tuning: SPARC and Solaris, published by SunSoft Press PTR Prentice Hall. Reach Adrian at firstname.lastname@example.org.
If you have technical problems with this magazine, contact email@example.com