|
How does swap space work?
Too much swap space and you waste disk space.
|
Too much swap space, and you are wasting your disk. Too little swap space and your system will grind to a halt spewing errors on the console. Too few swap disks and your system may run slowly. The author suggests ways to size and monitor swap space; what the measurements mean; and how to tell if it is a performance bottleneck. (2,700 words)
Mail this article to a friend |
Q: How much swap space should I allocate, and what is the best configuration for good performance? Why don't vmstat and sar agree on the quantity of swap space? -- swapless in Shawano
First, I had better explain what "swap space" is used for. Encompassing RAM and the disk space dedicated to it, swap space holds the virtual memory of the system. Every program you run occupies a certain amount of virtual memory. Once all your virtual memory has been allocated to specific applications you cannot start new programs and currently running programs may fail if they try to grow. Physical memory is what you know as the RAM of the system. If you use up all the RAM, your system may run more slowly, but you can still start more programs because the swap space absorbs excess data. The physical memory contains the current "working set" of virtual memory -- this means the parts of your applications that are actually running on the CPU.
You usually configure enough RAM to run your main application, or the parts of it that are active at the same time. You may have a window system with lots of other applications iconized or dormant in the background. These dormant applications gradually get their RAM stolen from them, and they migrate to the swap space on disk. You will notice that when you open up a long-forgotten window it takes a while to respond, and you may hear the disk heads rattling as it is read back into RAM.
The important thing to realize about swap space is that it is the combined total size of every program running and dormant on the system that matters. When a system runs out of swap space it can be very difficult to recover. Sometimes you find that there is insufficient swap space left to login as root or run the commands needed to kill the errant process that is consuming all the swap space.
There are two possible situations to consider. If you are prepared to keep track of your swap space, and administer it regularly, then you can run with "just enough" swap space. If you don't want the hassle and can spare some disk space in return for an easier life, then you should run with "lots" of swap space.
One extra thing to note: Swap works differently in Solaris 2 as compared with other Unix systems, including SunOS 4. These systems must always have some swap space, and it must be bigger than RAM. Every program in RAM has its total size reserved on the swap disk in case it needs to be swapped out to disk. Since there are systems with 5 gigabytes (SPARCcenter 2000) or more of RAM, it seems ridiculous that systems that already have huge RAM capacity would need huge swap disks that would probably not be used. Solaris 2 changes the rules by adding the RAM and the disk space. If you can buy enough RAM for your workload, you can run with no swap disk at all! In practice common database applications that are sized to run in a few gigabytes of RAM will actually need many gigabytes of disk allocated as swap space.
|
|
|
|
Keep track of your own swap space for small desktop systems
Most application vendors can tell you how much swap space their
application needs. If you have no idea how much swap space you will
need, configure at least 64 megabytes of virtual memory to start with.
It's easy to add more later so don't go overboard. With SunOS 4, your
swap space should be bigger than your RAM, but at least a 64-megabyte swap
partition will be needed. With Solaris 2 the swap partition size should
be the difference between 64 megabytes and the RAM size; that is 48-megabyte
swap with 16 megabytes RAM, 32-megabyte swap with 32-megabyte RAM, no swap
partition at all with 64 megabytes or more RAM. If your application vendor
says a Solaris 2 application needs 64 megabytes of RAM and 128 megabytes of
swap, this adds up to 192 megabytes of virtual memory. You could configure
96 megabytes of RAM and 96 megabytes of swap instead. If you run out of swap
space, make a swap file (I put them in /swap) or add more RAM. If you
are running the CDE window system, or a mixture of OpenWindows and
Motif applications you will likely need more swap.
Swap Space Requirements for NIS+ servers
NIS+ is both more flexible and more complex than NIS. It actually needs
a lot of CPU power to process the encryption and decryption needed for
secure, network-based administration, and it requires more swap space than
you might expect. When the server process is large, the forked child
processes can lead to a large requirement for swap space. NIS+ servers
for large and complex NIS+ domains often need several hundred megabytes
of swap space to service some kinds of client requests. For example
the niscat command causes the server to fork so that it can
send back a long stream of information. The nismatch command
does a simple lookup and, thus, does much less work. You should try to avoid
piping niscat into grep, when nismatch could
be used.
Swap space requirements for database and timeshared servers
The consequences of running out of swap space affect a larger number of
users on a
big server, so it wise to allocate a lot more than you normally need
to cope with any usage peaks. To start with, add twice as much disk as
you have RAM.
How to add swap space for good performance
Swap performance only makes a difference when you are short of RAM. If
the system is not paging, it makes no difference. When you are paging, it
is easy to overwhelm a single swap disk, so try to add more swap disks
when the existing one(s) get busy. Swap space is allocated in a
round-robin fashion over all swap disks, so the workload is naturally
spread over them all in a crude manner. It is not worth making a
striped metadevice to swap on -- that would just add overhead and slow it
down. There is also a limit of 2 gigabytes on the size of each swap partition
or file, so striping disks together tends to make them too big. You can
add as many swap partitions as you like. There is no limit to the total
size of swap in Solaris 2.
Commands for monitoring swap space use
This part of Solaris 2 has some code derived from System V and some
from BSD4.3 via SunOS 4, but the algorithm has been redesigned and is
unique to Solaris 2. Swapping is now a scheduler function, Solaris 2.3
swaps in some extreme circumstances; Solaris 2.4 and 2.5 implement a
new swapping algorithm, designed to help performance on small-memory
desktop machines. The paging process is designed to free memory as fast
as possible. Page-outs are queued and clustered so that the random
page-outs are organized into large sequential writes to the swap space.
This makes page-out very efficient, but page-in is still random. You
will see larger than normal disk service times due to large sequential
writes that often reach 200 kilobytes each. These can get in the way of page-in
reads that occur randomly and one page at a time. For this reason I like to
keep swap disks less than 30 percent busy for good performance.
Most commercial performance monitoring tools keep track of swap space, or can be configured to generate a warning when it gets low. The threshold I use by default in my SE Toolkit rules is set to start warning when there is less than 10 megabytes left, and complain more when there is less than 4 megabytes left. This was based on my own desktop system. I found that with less than 4 megabytes mailtool cannot fork, this causes it to fail when I try to send a message. On a larger server these thresholds should be increased to give you an earlier warning.
% /usr/ucb/ps alx F UID PID PPID CP PRI NI SZ RSS WCHAN S TT TIME COMMAND 8 2595 1133 1130 0 48 20 988 360 modlinka S pts/4 0:00 -bin/csh
Be careful! The System V version /bin/ps prints a field labelled SZ, but this is the resident set size in RAM -- printed as RSS by the /usr/ucb/ps. You need to use the SZ or SIZE field reported by /usr/ucb/ps alx in units of kilobytes to determine the amount of swap space used by the process. Really huge processes can be hard to figure out as the SIZE and RSS numbers run together. If you want to get at the data cleanly, you could easily modify the ps-ax.se script provided with the SE Performance Toolkit Version 2.5 to print whatever you want. You should also beware of processes that map hardware device space. These device mappings do not use swap space. For example the SIZE of the X server process includes a lot of device space and it is often the largest process on a system. The mapped Creator3D address space is around 100 megabytes! If you are running Solaris 2.5 you could try to figure out the address space mappings using the new /usr/proc/bin/pmap command.
vmstat procs w
and sar -q swpq-sz, swpocc
% vmstat procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s2 s3 s5 in sy cs us sy id 0 0 0 82368 14776 0 3 3 1 1 0 0 0 1 0 0 167 679 133 3 1 96 % sar -q 1 SunOS bloodnok 5.5 Generic sun4m 03/20/96 22:55:06 runq-sz %runocc swpq-sz %swpocc 22:55:07This is the total number of dormant processes currently swapped out to free up all their RAM. In Solaris 2.4 and later releases, you may find that some of the system daemons that are started but never used get swapped out during a busy period.
swpocc
is the proportion of the time
that there is something in the swap queue. This is never a sign of a
performance problem, as the only processes that will be swapped out are
ones that are completely dormant. An active process may have pages
stolen from it during a RAM shortage, but it will never be completely
swapped out.
vmstat swap
, sar -r freeswap
, and swap -s
vmstat swap
shows the available swap in kilobytes, sar -r
freeswap
shows the free swap in 512-byte blocks, and swap
-s
shows several measures including available swap. They do not
measure the same thing! In the example shown in the figure below, available
swap is about 34 megabytes whereas free swap is about 42 megabytes, the
8 megabytes of reserved swap shown by swap -s
is the difference.
swap -s
available + swap -s
reserved = sar -r
freeswap
.
hostname% sar -r 1 SunOS hostname 5.3 Generic sun4c 06/26/94 16:46:36 freemem freeswap 16:46:37 307 85104 hostname% swap -s total: 35856k bytes allocated + 8532k reserved = 44388k used, 34172k available hostname% vmstat 1 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s3 s5 -- in sy cs us sy id 0 0 0 8808 4200 0 3 1 0 0 0 0 0 0 0 0 78 284 192 6 3 92 0 0 0 34144 1320 0 14 0 0 0 0 0 0 0 0 0 30 169 144 1 2 97
You can also use swap -l
to list the individual swap partitions and sizes. Remember that each swap partition or file can only be up to 2 gigabytes in size. If you try to use a larger disk it will only use the first 2 gigabytes.
vmstat -S si
and sar -w swpin, bswin
, Pages Swapped Out: vmstat so
and sar -w swpot, bswot
% vmstat -S 5 procs memory page disk faults cpu r b w swap free si so pi po fr de sr f0 s2 s3 s5 in sy cs us sy id 0 0 0 82272 14796 0 0 3 1 1 0 0 0 1 0 0 168 692 135 3 1 96 0 0 0 73624 16640 0 0 0 0 0 0 0 0 0 0 0 19 306 54 0 0 100 % sar -w 1SunOS bloodnok 5.5 Generic sun4m 03/20/96
23:03:32 swpin/s bswin/s swpot/s bswot/s pswch/s 23:03:33 0.00 0.0 0.00 0.0 114
vmstat -S si
reports the number of kilobytes per second swapped in, sar -w swpin
reports the number of swap-in operations, and sar -w bswin
reports the
number of 512-byte blocks swapped in. They will usually show zero.
vmstat -S
so reports the number of kilobytes per second swapped out, sar -w swpot
reports the number of swap-out operations and sar -w bswot
reports the
number of 512-byte blocks swapped out.
vmstat pi
and sar -p pgin, ppgin
, Pages Paged Out: vmstat po
and sar -g pgout, ppgout
% vmstat 5 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s2 s3 s5 in sy cs us sy id 0 0 0 82272 14796 0 3 3 1 1 0 0 0 1 0 0 168 692 135 3 1 96 0 0 0 73624 16640 0 2 0 0 0 0 0 0 0 0 0 19 320 52 1 0 99 % sar -p 5SunOS bloodnok 5.5 Generic sun4m 03/20/96
23:06:06 atch/s pgin/s ppgin/s pflt/s vflt/s slock/s 23:06:11 0.00 0.00 0.00 8.91 6.93 0.00 % sar -g 5
SunOS bloodnok 5.5 Generic sun4m 03/20/96
23:10:22 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf 23:10:27 0.00 0.00 0.00 0.00 0.00
vmstat pi
reports the number of kilobytes per
second and sar
reports the number of page faults and the
number of pages paged in by swap space or file system reads. Since the
filesystem block size is 8 kilobytes, there are often two pages or 8
kilobytes paged in per page fault on systems with 4 kilobyte pages. Note that
UltraSPARC systems (and the earliest sun4 systems) have 8 kilobyte pages.
Everything from the SPARCstation 1 up to UltraSPARC uses 4 kilobytes.
vmstat po
reports the number of kilobytes
per second and sar
reports the number of page-outs and the
number of pages paged out to the swap space or file system. Because of
the clustering that occurs on swap space writes, there may be very many
pages written per page-out.
|
Resources
About the author
Adrian Cockcroft joined Sun in 1988, and currently works as a performance specialist for the Server Division of SMCC. He is the author of Sun Performance and Tuning: SPARC and Solaris, published by SunSoft Press PTR Prentice Hall.
Reach Adrian at adrian.cockcroft@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-04-1996/swol-04-perf.html
Last modified: