Swap space implementation, part one
We take an inside look at how Solaris implements swap space, including discussions on swap configuration and allocation
Swap space is an integral part of implementing virtual memory, and the sizing of swap space on Solaris systems is the source of a great many questions. Understanding how swap space is implemented in Solaris is crucial to making a reasonably accurate assessment as to how much swap space your system really needs. (1,900 words)
Traditionally, sysadmins needed to scale up swap space as physical memory grew, with the accepted rule-of-thumb being to configure disk space for swap equal to 2 to 2.5 times physical memory. This works pretty well, for the most part, and is still a common practice today. Even with systems installed with large amounts of physical memory (RAM), disk prices have been such that we've been able to keep pace and sustain configuring two to three times the system's memory size for swap.
With Solaris 2 (SunOS 5), a new implementation of swap was introduced using a pseudo file system called swapfs (swap file system). The algorithm the system uses for reserving swap space for the process's anonymous memory pages has changed somewhat, and the requirement for configuring swap space is now closely aligned to the actual virtual memory requirements of the applications running on the system.
We'll illustrate the way things used to work with a simple example. Let's say you have a system with 32 megabytes of RAM. It's up and running, and there are 20 megabytes free (available physical memory). This system has 10 megabytes of available swap space. A process starts up and attempts to grab 15 megabytes of memory (malloc(15 * 1024 * 1024)). The malloc(3c) call will fail with an ENOMEM error code. Why? There's 20 megabytes of free RAM, and the process only asked for 15, so what's the problem? As you've probably figured out, there is insufficient swap space (10 megabytes available); the system couldn't allocate swap to back all of the anonymous memory requested, so it wouldn't grant the request.
The above scenario is no longer the case in Solaris 2.X. In the following sections, we'll provide a little background information and then get into the specifics of the swap implementation. Following that, we'll look at the information made available to users and administrators on swap space.
A little background -- virtual memory
Let's start with some background information. The notion of swap space exists to support virtual memory in operating systems, so we'll start with a quick look at the hows and whys of virtual memory, then move on. Note that our discussion of virtual memory below is very high level, providing only enough information to facilitate the topic at hand (swap). Virtual memory is a relatively complex topic, which could easily require two columns to cover in sufficient detail (it's on my editorial calendar for future installments of Inside Solaris).
The motivation behind virtual memory is really quite simple; it exists to provide the illusion of unlimited memory for processes to execute in. Back in the "golden days" of computing (1960s and 1970s), programmers had to deal with the memory requirements of programs on certain platforms. This is no longer the case with operating systems that support virtual memory (all current versions of Unix do and have done so for quite some time). A process's virtual memory requirement may exceed the amount of physical memory actually installed on the system, or it may exceed the amount of memory available for a process's pages. In either case, it's due to contention for memory by other processes running on the same system. Other features of virtual memory include protecting an address space such that one process can't simply write over another process's memory pages, and the operating system (kernel) pages are protected from all non-kernel processes.
A page of memory is a predefined number of bytes, typically four or eight kilobytes, and is the smallest unit of memory that can be allocated, mapped, and protected. The page size is sometimes determined by the hardware architecture and sometimes by the operating system. Processors based on SPARC Version 8, which include the family of processors known as SuperSPARC, use a four-kilobyte page size, as stipulated by the SPARC V8 architecture specification (more precisely, the SPARC Reference MMU, or SRMMU, which is a subset of SPARC V8). The SPARC V9 architecture, which is what the UltraSPARC processors are based on, does not specify page size. The operating system has some flexibility in selecting page sizes. For the most part, UltraSPARC systems use an eight-kilobyte page size. With Solaris 2.6, support for large pages was added for shared memory segments using ISM (See September 1997 Inside Solaris column in Resources). Anyway, for the purposes of this discussion, we need only remember that a memory page is a fixed number of bytes in size, and pages are the units of memory allocated, managed, and freed by the operating system.
An address space is comprised of pages, representing all the memory
required to maintain the segments a process is comprised of. These
are the processes' text (executable code), data (code variables),
and stack (a place to store state information for function call
returns). Processes may also have "heap" space, chunks of memory allocated
as a result of applications calling memory allocation routines like
malloc(3), as well as shared memory
shmat(2). The pages that
make up a process's heap space are referred to as anonymous memory
in Solaris because the pages have no corresponding name or space in
the file system.
Here's an example of a process's address space. We wrote a small C program that has a few instructions in it and some signal handlers in place (more on the signal handlers later):
fawlty> /usr/proc/bin/pmap 2141 2141: p1 00010000 8K read/exec dev:172,4 ino:1268005 00020000 8K read/write/exec dev:172,4 ino:1268005 EF6E0000 16K read/exec /usr/platform/sun4u/lib/libc_psr.so.1 EF700000 592K read/exec /usr/lib/libc.so.1 EF7A2000 24K read/write/exec /usr/lib/libc.so.1 EF7A8000 8K read/write/exec [ anon ] EF7B0000 8K read/exec/shared /usr/lib/libdl.so.1 EF7C0000 112K read/exec /usr/lib/ld.so.1 EF7EA000 16K read/write/exec /usr/lib/ld.so.1 EFFFC000 16K read/write/exec [ stack ] total 808K fawlty>
pmap(1) command, which dumps a process's
address space map, we can see what mapping exists for running
processes. The column on the far left is the virtual address of the
mapping, followed by the size in bytes, the permissions, and finally
the object name. In our example, a relatively small
program, we see mappings for the process text (virtual addresses
10000 and 20000); the required shared object libraries (addresses
EF6E0000, EF700000, EF7A2000, EF7B0000, EF7C0000, EF7EA000); the
stack mapping (EFFFC000); and an anonymous memory mapping
(EF7A8000). The total virtual address space for the process is 808
kilobytes. (We're going to revisit our process address space
example a little later in the column.)
When a process is created (via
exec(2)) the process pages get "demand paged" in as
needed. This means that the first time a reference is made to a page
(the process generates a virtual address to retrieve instructions or
data), a page fault is generated, and the operating system locates
the page, brings it into physical memory, and maps it to the
process's virtual address space. As the process runs, the memory
pages the process needs to execute get mapped in and made
resident in physical memory. This is the case for all processes
running on the system.
As we said earlier, virtual memory allows processes to have address spaces that are larger than physical memory, and/or the combined memory requirements of all the processes running on a system can be greater than available physical memory. So, the operating system may have to page out pages of memory to make room for others to execute and for other process pages to load. This is where swap space comes in. When it's time for memory pages to get paged out, the kernel needs to ensure that the page can be restored in memory with the contents intact -- in case a process needs it again (remember, the processes getting their pages paged out have not necessarily finished executing). For pages that originated in the file system (such as the executable program file and the shared library files that it uses), a place on disk already exists, so there's a place to put them if paging out is required. In fact, for such pages, we typically do not need to implement a physical disk I/O for the "pageout," because the page has probably not been modified (e.g. program text and shared object libraries). In such cases, we simply free the memory page, knowing that we can restore the page as it was from its original place in the file system. The page structure that describes the page includes a "modified" bit, which tells the operating system that the contents of the page have been modified, and thus a write to disk is required before freeing the page.
This, however, is not the case for anonymous memory pages. As we said, they did not originate from the file system and thus have no corresponding name or place on disk. In the event of page out, we'll need space on disk to place these pages. This is what disk swap is used for. Sysadmins configure areas of disk, typically raw partitions, for use as swap space.
This concludes our background discussion and sets us up nicely for the specifics of swap in Solaris. We'll conclude next month, in part two.
I'd like to thank everyone for your interest in this Inside Solaris column and wish you all a safe and happy holiday season.
See you next year!
About the author
Jim Mauro is currently an Area Technology Manager for Sun Microsystems in the Northeast area, focusing on server systems, clusters, and high availability. He has a total of 18 years industry experience, working in service, educational services (he developed and delivered courses on Unix internals and administration), and software consulting. Reach Jim at firstname.lastname@example.org.
If you have technical problems with this magazine, contact email@example.com