Click on our Sponsors to help Support SunWorld
Sysadmin by Hal Stern

Caches, thrashes and smashes

How to use CacheNFS to cut your NFS server and network load

August  1996
[Next story]
[Table of Contents]
Subscribe to SunWorld, it's free!

A look at CacheFS in action, minimizing network contortion, thrashing, and trashing. We'll start with a review of NFS caching using the virtual memory system, and then see how CacheFS adds another local layer of backing store to the cache hierarchy. Then we'll dig into CacheFS management, configuration options, and cache tuning. The usual assortment of tips, caveats and customer-tested hacks rounds out this month's installment--and the recent months' series--on NFS and friends. (3,100 words)

Mail this
article to
a friend

Even as George Gilder promises the world that bandwidth will be infinite, demand for network capacity seems to increase faster than corporate networks can deliver it. Wire speed may be cheap, but running a fat and wide pipe to every user's desk is typically beyond the financial reach of most IT organizations. Network congestion and server pile-ups haven't slaked our thirst for networked data access (possibly why we've devoted the last quarter's worth of columns to NFS-related topics), so demand reduction is the order of the day. The best way to control network congestion is to simply not use the network, and access data from a local disk. Pushing the data out to the edges of the network reduces latency, and allows for better scaling by reducing the network requirements of each node.

Disk caching of recently used files was introduced in the Unix space with the Andrew filesystem, and later commercialized with the OSF's Distributed Filesystem (DFS) as part of the Distributed Computing Environment (DCE). Sun added similar functionality to Solaris 2.3 with the CacheFS filesystem type, an on-disk cache for recently used NFS data. Caching NFS accesses can improve server scalability (since fewer requests are serviced over the network), relieve network congestion, and improve client response time. On the contrary, poor cache management and configuration can lead to reduced client performance with no appreciable benefit on the server or network.

This month, we'll see CacheFS in action, minimizing network contortion, thrashing, and trashing. We'll start with a review of NFS caching using the virtual memory system, and then see how CacheFS adds another local layer of backing store to the cache hierarchy. Then we'll dig into CacheFS management, configuration options, and cache tuning. The usual assortment of tips, cavaets and customer-tested hacks rounds out this month's installment--and the recent months' series--on NFS and friends.


The NFS mosh pit: in-memory caching unmasked
NFS relies heavily on client-side caching to achieve a minimum level of performance. NFS clients keep recently used file pages in memory, updating them when the server's copy of the file has changed or purging them when the cache space is needed for other files. When a file is read from an NFS server, new pages are added to the cache. Upon subsequent access, the client's NFS code goes through a series of if-then clauses that would make a naive BASIC programmer blush:

The 60-second window within which file attributes are assumed to remain static gives NFS its weak consistency model--another client could modify the file during the attribute cache entry lifetime, and the NFS client won't see the changes until it discards the dated attributes and reloads them from the NFS server. In the ideal world, an NFS client's working set would be small enough to be held in memory without memory contention from processes or the kernel. In practice, however, the NFS working set is typically larger than the memory available for file page caching, so NFS clients send out a steady stream of NFS read requests.

CacheFS inserts a new if-then check immediately after the VM system search. If the file page isn't already cached in memory, the NFS client looks for it in a CacheFS filesystem on a local disk. Data found via CacheFS is subject to the same consistency checks as file pages located in the VM cache, after which the file pages are copied from disk back into memory. Caching NFS blocks on the local disk doesn't improve NFS's consistency, but it can lead to reduced network traffic and server load since NFS read requests are replaced with an occasional getattr call and local disk traffic.

Given that you're doing a disk read to pull the bits off of the local disk, where's the win for CacheFS? When the time required to read an NFS buffer from the local disk is less than the time for an 8 kilobyte network transfer, plus the server's disk access and RPC service times, using CacheFS will out-perform a straight NFS client. If the NFS server has the data you need already cached in memory, however, the NFS read is likely to be fulfilled faster than a local read. Similarly, workstations with a single local disk may send the disk arm into jitterbug mode if CacheFS and paging or swapping activity send the disk seeking from one end to the other.

Here are some simple defining guidelines for CacheFS applications:

Any "yes" answer means you're in a position to reduce demands on your network and server, making headroom for future growth while improving client performance at the same time. Sound like a win-win? You're ready to start crafting a cache.

Down in front, up in back: filesystem overlays with CacheFS
CacheFS operates as a transparent filesystem on top of an NFS mounted volume. A good analogy is that of a clear overhead placed on top of a printed sheet of paper -- you can see what's on the paper, but anything you write on the overhead transparency "comes first" when looking at the overlaid sheets. CacheFS uses its own terminology for the underlying filesystem and the cache. The front filesystem contains the cache directory. It may be entirely dedicated to NFS caching, or it could contain one or more cache directories along with regular user and system files. An NFS-mounted volume is the back filesystem, mounted on the backpath. Maintaining our analogy, the back filesystem is the printed sheet of paper, while the front filesystem is the clear overhead. Whenever you access a file on the back filesystem, it gets cached in the front filesystem local to the client.

Turning on NFS caching is as simple as creating the front filesystem cache, and then mounting it on top of the back filesystem:

luey# cfsadmin -c /vol/cache/cache1
luey# mount -F cachefs -o backfstype=nfs,backpath=/home/stern,\
cachedir=/vol/cache/cache1 bigboy:/export/home/stern /home/stern

The first command creates a new cache directory underneath /vol/cache. The cache1 subdirectory cannot exist before you build the cache; cfsadmin creates the subdirectory and initializes the cache management parameters described below.

The CacheFS mount is a bit more complex. If you strip off the CacheFS options (preceded by the -o flag), the command line looks like a regular NFS mount of bigboy:/export/home/stern onto /home/stern. The CacheFS options line up the front and back filesystems:

Now for the subtle part: the CacheFS mount takes the front filesystem, that is, the subdirectory of /vol/cache, and mounts it on top of the existing NFS mount of /home/stern. As a user, you continue to use /home/stern as the path to the desired files; when you hit that mount point, however, you'll first touch a CacheFS directory backed by an underlying NFS mount. Look in the /etc/mnttab mount table and you'll see two entries for the overlaid mount points:

bigboy:/home/stern    /home/stern          nfs
/home/stern           /vol/cache/cache1    cachefs     backfstype=nfs

You don't have to perform the CacheFS mount on top of the back filesystem, but users will be noticeably confused if they have to contend with two pathnames for the same data with different behaviors. Think about the DOS DoubleSpace driver, and the compressed C: and raw H: drives it creates. Ever try explaining how C: and H: are the same disk, but not quite? By dropping the front filesystem on top of the back filesystem, you maintain the user's view of the world, possibly giving them a performance boost in the process.

A single cache directory can be used for multiple back filesystems; all NFS mounts will share the available cache space with global least recently used (LRU) policies dictating which files are purged when the cache fills. Similarly, you can create multiple cache directories in the same front filesystem, if you want to differentiate caches based on access patterns or file sizes.

The weak stay weak: CacheFS internals
What happens when you reference a file through the front filesystem? If the page is not in memory, and not in the cache, it gets dragged from the NFS server via an NFS read operation. Once in-core on the client, the page is put into the local disk cache for future reference. Every piece of data retrieved by NFS, including directory entries, symbolic links, and file buffers, are put into the cache. As a result, CacheFS can improve performance of simple commands like ls -l executed against large directories that would have previously required multiple NFS readdir operations.

Directory entries are cached in 8 kilobyte blocks; files are usually managed in 64 kilobyte chunks. CacheFS doesn't bring the entire file over into the cache upon first access, and it only caches pages as they are accessed. If you walk through a file randomly, the CacheFS entry will be sparse. Determining what files are in the cache is quite difficult, because pathnames are not used in the cache directory. Instead, cache entries are given strings of hex digits as identifiers, based on the inode number of the underlying file. Protecting the cache directory with root-only access, and hiding the underlying file names reduces the chance that a casual browsing of the front filesystem would lead to accidental access to the cached, and possibly hole-ridden, files. If you want to see how large a cache has grown, use df on the front filesystem if it's being used solely for cache, or du -s on the cache directories to monitor their growth.

CacheFS doesn't change the NFS consistency picture. The timestamp comparison algorithm used to determine when pages become stale is the same as used for in-core NFS filesystem pages. Writes to a cached NFS volume go directly to the back filesystem server, causing the cached blocks to be purged. CacheFS operates as a "write through" or "write around" cache, and never as a write-back cache. You'll never have data in the cache that isn't permanently recorded on the NFS server.

Threshold of pain: tuning and configuration options
cfsadmin lists the basic CacheFS configuration parameters for a cache directory along with the names, or CacheFS ids, of back filesystems using that cache:

luey# cfsadmin -l /vol/cache/cache1
cfsadmin: list cache FS information
maxblocks     90%
minblocks      0%
threshblocks  85%
maxfiles      90%
minfiles       0%
threshfiles   85%
maxfilesize    3MB

These parameters are set by default, but of course, you can tune them as needed. Use the maxblocks threshold to prevent contention between multiple caches in the same front filesystem. If you have three independent caches, with roughly equal access to them, you should consider using maxblocks=30 to give each one 30% of the available filesystem blocks. You don't want the total maximum block thresholds to exceed 90 percent of the front filesystem, nor do you want to starve out a cache with too small a maximum allocation. There's a lower bound parameter for CacheFS called minblocks that allows CacheFS to grow at least that large before internal buffer management starts tossing data out of the cache

When you need to reserve disk space for other, non-CacheFS uses, make sure to set the threshblocks parameter to slow down CacheFS growth when disk space is at a premium. While maxblocks puts a static upper bound on the size of a cache directory, the block threshold implements a dynamic ceiling on growth. Once the front filesystem has reached the capacity specified by threshblocks, CacheFS will stop allocating space from the front filesystem and will resort to internal LRU chunk management to handle future space requests.

Establishing a block threshold prevents a potential denial of service problem, where CacheFS would grow large enough to prevent user-level processes from correctly writing local temporary or data files. The block threshold always takes precedence over the minimum space allocation parameter. If you set threshblocks to 60 percent, and set minblocks to 30 percent of the front filesystem, the CacheFS directory will drop its minimum space parameter down to 0 percent when the front filesystem reaches 60 percent of its capacity. Setting maxfiles limits the number of file entries that may be put into the cache. Large numbers of small files will consume a disproportionate number of inodes, possibly leading to another denial of service problem where the front filesystem no longer has a free inode for file creation or extension.

cfsadmin also offers two variations on the consistency model. Read-only back filesystems, like CD-ROMs, can be cached without any consistency checks. Using the noconst option cuts down on the client's getattr traffic. The other option, non-shared, indicates that no other client will be accessing the data in your CacheFS front filesystem. CacheFS performs a literal "write through" in this mode, updating both the cache and the back filesystem. If you know you'll be re-using the data on subsequent access, the non-shared mode is a good way to improve cache warmth.

What makes a good filesystem for use with CacheFS?

Unfortunately, determining the best set of options for a particular back filesystem is largely a black art. Performance tuning CacheFS involves a fair bit of data collection as well as understanding your customers--users--expectations.

Bring to front or send to back: rules of thumb for CacheFS
Put enough knobs on something, and it gets easier to twist them into unpleasant states. Put NFS, the automounter, and CacheFS together and you have a system administrator's dream (or nightmare). Here are some guidelines for using CacheFS under different kinds of duress:

In exchange for its complexity, CacheFS provides noticeable performance benefits on most clients on which it has been enabled. By reducing demand on the network, you gain scalability to an extent not possible by adding more or faster Ethernet segments, or by buying a bigger and better NFS server. Less is most definitely more when you're looking at possible network smashes and server thrashes.

Click on our Sponsors to help Support SunWorld


What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough

[Table of Contents]
Subscribe to SunWorld, it's free!
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact

Last modified: