Swap space implementation, part one

We take an inside look at how Solaris implements swap space, including discussions on swap configuration and allocation

December 1997

Abstract

Swap space is an integral part of implementing virtual memory, and the sizing of swap space on Solaris systems is the source of a great many questions. Understanding how swap space is implemented in Solaris is crucial to making a reasonably accurate assessment as to how much swap space your system really needs. (1,900 words)

Mail this
article to
a friend

olaris users, developers, and systems administrators view swap space as simply those areas of disk, either raw partitions or swap files, that the system uses when, due to contention for memory space, it needs to move pages out of physical memory. This is technically accurate because that is precisely what swap space is used for -- with a secondary function of providing a place for the system to put a memory dump image in the event of a system panic.

Traditionally, sysadmins needed to scale up swap space as physical memory grew, with the accepted rule-of-thumb being to configure disk space for swap equal to 2 to 2.5 times physical memory. This works pretty well, for the most part, and is still a common practice today. Even with systems installed with large amounts of physical memory (RAM), disk prices have been such that we've been able to keep pace and sustain configuring two to three times the system's memory size for swap.

With Solaris 2 (SunOS 5), a new implementation of swap was introduced using a pseudo file system called swapfs (swap file system). The algorithm the system uses for reserving swap space for the process's anonymous memory pages has changed somewhat, and the requirement for configuring swap space is now closely aligned to the actual virtual memory requirements of the applications running on the system.

We'll illustrate the way things used to work with a simple example. Let's say you have a system with 32 megabytes of RAM. It's up and running, and there are 20 megabytes free (available physical memory). This system has 10 megabytes of available swap space. A process starts up and attempts to grab 15 megabytes of memory (malloc(15 * 1024 * 1024)). The malloc(3c) call will fail with an ENOMEM error code. Why? There's 20 megabytes of free RAM, and the process only asked for 15, so what's the problem? As you've probably figured out, there is insufficient swap space (10 megabytes available); the system couldn't allocate swap to back all of the anonymous memory requested, so it wouldn't grant the request.

The above scenario is no longer the case in Solaris 2.X. In the following sections, we'll provide a little background information and then get into the specifics of the swap implementation. Following that, we'll look at the information made available to users and administrators on swap space.

Advertisements

A little background -- virtual memory
Let's start with some background information. The notion of swap space exists to support virtual memory in operating systems, so we'll start with a quick look at the hows and whys of virtual memory, then move on. Note that our discussion of virtual memory below is very high level, providing only enough information to facilitate the topic at hand (swap). Virtual memory is a relatively complex topic, which could easily require two columns to cover in sufficient detail (it's on my editorial calendar for future installments of Inside Solaris).

The motivation behind virtual memory is really quite simple; it exists to provide the illusion of unlimited memory for processes to execute in. Back in the "golden days" of computing (1960s and 1970s), programmers had to deal with the memory requirements of programs on certain platforms. This is no longer the case with operating systems that support virtual memory (all current versions of Unix do and have done so for quite some time). A process's virtual memory requirement may exceed the amount of physical memory actually installed on the system, or it may exceed the amount of memory available for a process's pages. In either case, it's due to contention for memory by other processes running on the same system. Other features of virtual memory include protecting an address space such that one process can't simply write over another process's memory pages, and the operating system (kernel) pages are protected from all non-kernel processes.

The implementation of virtual memory on Solaris today represents ten years of refinements of a unique architecture first implemented in SunOS 4.X. Solaris takes an object-oriented approach to implementing virtual memory, and it is tightly integrated with the Virtual File System (VFS) architecture. All physical memory on Solaris systems is treated as a page cache for memory objects. Every memory object cached in the page cache has a corresponding vnode that describes it. Vnodes are a file system abstraction. They provide a method of describing and managing files in the kernel independent of the lower level specifics of the particular type of file system the file originated in. A simple example of this is the caching of a file from the Unix file system. The memory pages that hold the contents of the file each have a corresponding vnode. The vnode maps to an inode in the UFS that describes the file.

We'll get back to more virtual memory/Virtual File System details in a bit. For now, let's go over some basic concepts and definitions.

A page of memory is a predefined number of bytes, typically four or eight kilobytes, and is the smallest unit of memory that can be allocated, mapped, and protected. The page size is sometimes determined by the hardware architecture and sometimes by the operating system. Processors based on SPARC Version 8, which include the family of processors known as SuperSPARC, use a four-kilobyte page size, as stipulated by the SPARC V8 architecture specification (more precisely, the SPARC Reference MMU, or SRMMU, which is a subset of SPARC V8). The SPARC V9 architecture, which is what the UltraSPARC processors are based on, does not specify page size. The operating system has some flexibility in selecting page sizes. For the most part, UltraSPARC systems use an eight-kilobyte page size. With Solaris 2.6, support for large pages was added for shared memory segments using ISM (See September 1997 Inside Solaris column in Resources). Anyway, for the purposes of this discussion, we need only remember that a memory page is a fixed number of bytes in size, and pages are the units of memory allocated, managed, and freed by the operating system.

An address space is comprised of pages, representing all the memory required to maintain the segments a process is comprised of. These are the processes' text (executable code), data (code variables), and stack (a place to store state information for function call returns). Processes may also have "heap" space, chunks of memory allocated as a result of applications calling memory allocation routines like sbrk(2) and malloc(3), as well as shared memory interfaces shmget(2) and shmat(2). The pages that make up a process's heap space are referred to as anonymous memory in Solaris because the pages have no corresponding name or space in the file system.

Here's an example of a process's address space. We wrote a small C program that has a few instructions in it and some signal handlers in place (more on the signal handlers later):

fawlty> /usr/proc/bin/pmap 2141
2141:   p1
00010000      8K read/exec         dev:172,4 ino:1268005
00020000      8K read/write/exec   dev:172,4 ino:1268005
EF6E0000     16K read/exec         /usr/platform/sun4u/lib/libc_psr.so.1
EF700000    592K read/exec         /usr/lib/libc.so.1
EF7A2000     24K read/write/exec   /usr/lib/libc.so.1
EF7A8000      8K read/write/exec     [ anon ]
EF7B0000      8K read/exec/shared  /usr/lib/libdl.so.1
EF7C0000    112K read/exec         /usr/lib/ld.so.1
EF7EA000     16K read/write/exec   /usr/lib/ld.so.1
EFFFC000     16K read/write/exec     [ stack ]
 total      808K
fawlty>

Using the pmap(1) command, which dumps a process's address space map, we can see what mapping exists for running processes. The column on the far left is the virtual address of the mapping, followed by the size in bytes, the permissions, and finally the object name. In our example, a relatively small program, we see mappings for the process text (virtual addresses 10000 and 20000); the required shared object libraries (addresses EF6E0000, EF700000, EF7A2000, EF7B0000, EF7C0000, EF7EA000); the stack mapping (EFFFC000); and an anonymous memory mapping (EF7A8000). The total virtual address space for the process is 808 kilobytes. (We're going to revisit our process address space example a little later in the column.)

When a process is created (via fork(2) and exec(2)) the process pages get "demand paged" in as needed. This means that the first time a reference is made to a page (the process generates a virtual address to retrieve instructions or data), a page fault is generated, and the operating system locates the page, brings it into physical memory, and maps it to the process's virtual address space. As the process runs, the memory pages the process needs to execute get mapped in and made resident in physical memory. This is the case for all processes running on the system.

As we said earlier, virtual memory allows processes to have address spaces that are larger than physical memory, and/or the combined memory requirements of all the processes running on a system can be greater than available physical memory. So, the operating system may have to page out pages of memory to make room for others to execute and for other process pages to load. This is where swap space comes in. When it's time for memory pages to get paged out, the kernel needs to ensure that the page can be restored in memory with the contents intact -- in case a process needs it again (remember, the processes getting their pages paged out have not necessarily finished executing). For pages that originated in the file system (such as the executable program file and the shared library files that it uses), a place on disk already exists, so there's a place to put them if paging out is required. In fact, for such pages, we typically do not need to implement a physical disk I/O for the "pageout," because the page has probably not been modified (e.g. program text and shared object libraries). In such cases, we simply free the memory page, knowing that we can restore the page as it was from its original place in the file system. The page structure that describes the page includes a "modified" bit, which tells the operating system that the contents of the page have been modified, and thus a write to disk is required before freeing the page.

This, however, is not the case for anonymous memory pages. As we said, they did not originate from the file system and thus have no corresponding name or place on disk. In the event of page out, we'll need space on disk to place these pages. This is what disk swap is used for. Sysadmins configure areas of disk, typically raw partitions, for use as swap space.

Summary
This concludes our background discussion and sets us up nicely for the specifics of swap in Solaris. We'll conclude next month, in part two.

I'd like to thank everyone for your interest in this Inside Solaris column and wish you all a safe and happy holiday season.

See you next year!

Resources

"Shared memory uncovered," September 1997 Inside Solaris column in SunWorld http://www.sun.com/sunworldonline/swol-09-1997/swol-09-insidesolaris.html
"Virtual Swap Space in SunOS" http://opcom.sun.ca/white-papers/swapfs.html
"Virtual Memory Architecture in SunOS" http://opcom.sun.ca/white-papers/vm-arch.html
"SunOS Virtual Memory Implementation" http://opcom.sun.ca/white-papers/vm-impl.html
"How does swap space work?" April 1996 Performance Q&A column in SunWorld http://www.sun.com/sunworldonline/swol-04-1996/swol-04-perf.html
Goodheart, B. & Cox, J. "The Magic Garden Explained: The Internals of Unix System V Release 4," Prentice Hall. http://www.amazon.com/exec/obidos/ISBN=0130981389/sunworldonlineA/
Vahalia, Uresh. "Unix Internals: The New Frontiers," Prentice-Hall. http://www.amazon.com/exec/obidos/ISBN=0131019082/sunworldonlineA/
Full listing of past Inside Solaris columns http://www.sun.com/sunworldonline/common/swol-backissues-columns.html#insidesolaris

About the author
Jim Mauro is currently an Area Technology Manager for Sun Microsystems in the Northeast area, focusing on server systems, clusters, and high availability. He has a total of 18 years industry experience, working in service, educational services (he developed and delivered courses on Unix internals and administration), and software consulting. Reach Jim at jim.mauro@sunworld.com.

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-12-1997/swol-12-insidesolaris.html
Last modified:

Comments:
Name:
Email:
Company Name: