Help! I've lost my memory!
Before you scream "Memory leak!" take a look at how
Why doesn't Sun's OS free unused memory? Adrian Cockcroft tackles this question in the first of his monthly performance columns for SunWorld Online. Cockcroft, Sun's performance guru, has heard and answered this and countless other questions during his years as a systems engineer. Once he explains how Solaris 1 and 2 handle your computer's memory, you'll probably be relieved. (2,600 words)
After a reboot I saw that most of my computer's memory was free, but when I launched my application it used up almost all the memory. When I stopped the application the memory didn't come back! Take a look at my
% vmstat 5 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy idThis is before the program starts:
0 0 0 330252 80708 0 2 0 0 0 0 0 0 0 0 1 18 107 113 0 1 99 0 0 0 330252 80708 0 0 0 0 0 0 0 0 0 0 0 14 87 78 0 0 99I start the program and it runs like this for a while:
0 0 0 314204 8824 0 0 0 0 0 0 0 0 0 0 0 414 132 79 24 1 74 0 0 0 314204 8824 0 0 0 0 0 0 0 0 0 0 0 411 99 66 25 1 74I stop it, then almost all the swap space comes back, but the free memory does not:
0 0 0 326776 21260 0 3 0 0 0 0 0 0 1 0 0 420 116 82 4 2 95 0 0 0 329924 24396 0 0 0 0 0 0 0 0 0 0 0 414 82 77 0 0 100 0 0 0 329924 24396 0 0 0 0 0 0 0 0 2 0 1 430 90 84 0 1 99
I checked that there were no application processes running. It
looks like a huge memory leak in the operating system. How can I get my
--RAMless in Ripon
The short answer
Launch your application again. Notice that it starts up more quickly than it did the first time, and with less disk activity. The application code and its data files are still in memory, even though they are not active. The memory they occupy is not "free." If you restart the same application it finds the pages that are already in memory. The pages are attached to the inode cache entries for the files. If you start a different application, and there is insufficient free memory, the kernel will scan for pages that have not been touched for a long time, and "free" them. Once you quit the first application, the memory it occupies is not being touched, so it will be freed quickly for use by other applications.
In 1988, Sun introduced this feature in SunOS 4.0. It still applies to all versions of Solaris 1 and 2. The kernel is trying to avoid disk reads by caching as many files as possible in memory. Attaching to a page in memory is around 1,000 times faster than reading it in from disk. The kernel figures that you paid good money for all of that RAM, so it will try to make good use of it by retaining files you might need.
By contrast, Memory leaks appear as a shortage of swap space after the misbehaving program runs for a while. You will probably find a process that has a larger than expected size. You should restart the program to free up the swap space, and check it with a debugger that offers a leak-finding feature (SunSoft's DevPro debugger, for example).
The long (and technical) answer
To understand how Sun's operating systems handle memory, I will explain how the inode cache works, how the buffer cache fits into the picture, and how the life cycle of a typical page evolves as the system uses it for several different purposes.
The inode cache and file data caching
Whenever you access a file, the kernel needs to know the size, the access permissions, the date stamps and the locations of the data blocks on disk. Traditionally, this information is known as the inode for the file. There are many filesystem types. For simplicity I will assume we are only interested in the Unix filesystem (UFS) on a local disk. Each filesystem type has its own inode cache.
Want to speed up your computer?
Send your performance questions to
-- look for his answers here each month.
The filesystem stores inodes on the disk; the inode must be read into
memory whenever an operation is performed on an entity in the
filesystem. The number of inodes read per second is reported as
iget/s by the
sar -a command. The inode read
from disk is cached in case it is needed again, and the number of
inodes that the system will cache is influenced by a kernel parameter
called ufs_ninode. The kernel keeps inodes on a linked list, rather
than in a fixed-size table.
As I mention each command I will show you what the output looks
like. In my case I'm collecting
sar data automatically
sar, which defaults to reading the
stored data for today. If you have no stored data, specify a time
sar will show you current activity.
All reads or writes to UFS files occur by paging from the filesystem. All pages that are part of the file and are in memory will be attached to the inode cache entry for that file. When a file is not in use, its data is cached in memory, using an inactive inode cache entry. When the kernel reuses an inactive inode cache entry that has pages attached, it puts the pages on the free list; this case is shown by
% sar -a SunOS hostname 5.4 Generic_101945-32 sun4c 09/18/95 00:00:01 iget/s namei/s dirbk/s 01:00:01 4 6 0
%ufs_ipf. This number is the percentage of UFS inodes that were overwritten in the inode cache by
igetand that had reusable pages associated with them. The kernel flushes the pages, and updates on disk any modified pages. Thus, this %ufs_ipf number is the percentage of igets with page flushes. Any non-zero values of %ufs_ipf reported by
sar -gindicate that the inode cache is too small for the current workload.
% sar -g SunOS hostname 5.4 Generic_101945-32 sun4c 09/18/95 00:00:01 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf 01:00:01 0.02 0.02 0.08 0.12 0.00
For SunOS 4 and releases up to Solaris 2.3, the number of inodes that the kernel will keep in the inode cache is set by the kernel variable ufs_ninode. To simplify: When a file is opened, an inactive inode will be reused from the cache if the cache is full; when an inode becomes inactive, it is discarded if the cache is over-full. If the cache limit has not been reached then an inactive inode is placed at the back of the reuse list and invalid inodes (inodes for files that longer exist) are placed at the front for immediate reuse. It is entirely possible for the number of open files in the system to cause the number of active inodes to exceed ufs_ninode; raising ufs_ninode allows more inactive inodes to be cached in case they are needed again.
Solaris 2.4 uses a more clever inode cache algorithm. The kernel
maintains a reuse list of blank inodes for instant use. The number of
active inodes is no longer constrained, and the number of idle inodes
(inactive but cached in case they are needed again) is kept between
ufs_ninode and 75 percent of ufs_ninode by a new kernel thread that
scavenges the inodes to free them and maintains entries on the reuse
list. If you use
sar -v to look at the inode cache, you
may see a larger number of existing inodes than the reported "size."
% sar -v SunOS hostname 5.4 Generic_101945-32 sun4c 09/18/95 00:00:01 proc-sz ov inod-sz ov file-sz ov lock-sz 01:00:01 66/506 0 2108/2108 0 353/353 0 0/0
The buffer cache is used to cache filesystem data in SVR3 and BSD Unix. In SunOS 4, generic SVR4, and Solaris 2, it is used to cache inode, indirect block, and cylinder group blocks only. Although this change was introduced in 1988, many people still incorrectly think the buffer cache is used to hold file data. Inodes are read from disk to the buffer cache in 8-kilobyte blocks, then the individual inodes are read from the buffer cache into the inode cache.
Life cycle of a typical physical memory page
This section provides additional insight into the way memory is used. The sequence described is an example of some common uses of pages; many other possibilities exist.
This example sequence can continue from Step 4 or Step 9 with minor variations. The fsflush process occurs every 30 seconds by default for all pages, and whenever the free list size drops below a certain value, the pagedaemon scanner wakes up and reclaims some pages.
Now you know
I have seen this missing-memory question asked about once a month since 1988! Perhaps the manual page for vmstat should include a better explanation of what the values are measuring. This answer is based on some passages from my book Sun Performance and Tuning. The book explains in detail how the memory algorithms work and how to tune them.
Please fill out SunWorld Online's performance questionnaire to let us know where you stand on performance issues. Submit your own question by sending mail to email@example.com.
About the author
Adrian Cockcroft joined Sun in 1988, and currently works as a performance specialist for the newly formed Server Division of SMCC. Reach Adrian at firstname.lastname@example.org.
If you have technical problems with this magazine, contact email@example.com