Inside Solaris by Jim Mauro

Setting our sights on semaphores

A strong understanding of how semaphores work will help you develop and manage Solaris application code and systems

SunWorld
October  1997
[Next story]
[Table of Contents]
[Search]
Sun's Site

Abstract
Last month we looked at the shared memory interprocess communication (IPC) facility in Solaris. We'll follow this month with a logical successor to the topic of shared memory -- semaphores. Semaphores are another IPC facility that evolved from the original AT&T System V Unix release (as did shared memory). Semaphores provide a means of synchronizing access to shared resources and are used quite often to synchronize access to shared memory segments by multiple processes in applications. This month we will examine the Solaris implementation of semaphores, including the tunable kernel parameters that apply to semaphore resource allocation and the semaphore information made available with the ipcs(1M) command.

As with last month, our goal here does not detail how to program with semaphores. Reference the Solaris Developers Kit (SDK) documentation, or any number of books on programming with Unix for specifics on using semaphores in code. (3,500 words)



Mail this
article to
a friend
S olaris ships with many application programming interfaces (APIs). Before we dig into the System V semaphore facility, we need to discuss other semaphore APIs that ship with the bundled Solaris libraries. Currently, there are three sets of semaphore callable routines. First, the traditional System V implementation, which is what this month's column is all about. Next, we have the semaphore routines bundled in the threads library, libthread. Finally, the POSIX standard defines a set of semaphore APIs. The three different sets of routines are listed in Table 1, which also indicate which section of the man pages you should reference for a given routine.

Table 1

System V
Semaphore
System Calls
Thread Library Calls POSIX Library Calls
semget(2) sema_destroy(3T) sem_destroy(3R)
semctl(2) sema_init(3T) sem_init(3R)
semop(2) sema_post(3T) sem_post(3R)

sema_trywait(3T) sem_trywait(3R)

sema_wait(3T) sem_wait(3R)


sem_open(3R)


sem_close(3R)


sem_unlink(3R)


sem_getvalue(3R)


Advertisements

Which API is which?
Important to note is that in Solaris 2.5.1, there is minimal support for the POSIX semaphore calls. In those cases where a call to a POSIX semaphore routine has a corresponding libthread call, it's actually the libthread code that gets executed. For example, calling sem_post(), which requires linking in the POSIX library, libposix4, will ultimately call the libthread sema_post() code. The POSIX routines can be thought of simply as a wrapper around the underlying libthread semaphore code.

This changes somewhat in Solaris 2.6, where support is added for POSIX named semaphores. Calls to sem_open(), sem_close(), and sem_unlink(), which are required for named semaphore support (and have no libthread equivalents), are supported in 2.6.

There are similarities between the three different semaphore implementations, as the fundamental notion of what a semaphore is does not change across the three libraries. Several of the libthread and libposix4 routines are functionally identical, which is fairly obvious from the routine names; the disparity again begins with the additional routines in the libposix4 code for support of named semaphores. The System V semaphore interfaces do not have a direct correlation to the thread and POSIX code in terms of the names of the APIs. The programmer using System V semaphores must deal with a couple of generic interfaces for semaphore control and operations using the semctl(2) and semop(2) system calls and set the appropriate flags in the argument list in order to do things like initialize, increment, or decrement a semaphore. Finally, the thread and POSIX implementations do not support the notion of semaphore sets, as the System V code does.

The Unix System V semaphore routines are the ones that have been around the longest and are thus implemented in a greater number of existing applications. The routines linked via the libthread library, as well as the POSIX routines, are typically found in multithreaded Solaris applications -- those applications that implement thread calls in the libthread and libposix4 libraries.

Note that the kernel tunables and resources discussed in the following sections do not apply to the libthread and libposix4 semaphore routines. Semaphores allocated via calls to the thread and POSIX code do not utilize any of the kernel resources allocated for System V semaphores. In fact, the underlying library support code uses low-level mutex and lwp calls for thread and POSIX semaphore support. All the semaphore resources exist in the process address space.

First an overview
While we stated earlier that a non-goal of this month's column is how to program using semaphores, we'll briefly discuss how applications use semaphores, in order to make the following sections on the kernel implementation more digestible.

Let's start with an extremely brief history. A semaphore, as defined in the dictionary, is a mechanical signaling device or a means of doing visual signaling. The analogy typically used is the railroad mechanism of signaling trains, where mechanical arms swing down to block a train from a section of track that another train is currently using. When the track was free, the arm would swing up, and the waiting train could now proceed.

The notion of using semaphores as a means of synchronization in computer software was originated by a Dutch mathematician, E.W. Dijkstra in 1965. Dijkstra's original work defined two semaphore operations, wait and signal (which correlate nicely to the railroad example). The operations were referred to as P and V operations. The P operation was the wait and decremented the value of the semaphore if it was greater than zero, and the V operation was the signal, which increments the semaphore value. I assume the names Dijkstra choose, P and V, originate from the Dutch terms for signal and wait, but I'm not sure.

Semaphores provide a method of synchronizing access to a shareable resource by multiple processes. It can be used as a binary lock for exclusive access or as a counter, allowing for multiple concurrent access to a finite number of shared resources, where the semaphore value gets initialized to the number of shared resources. Each time a process needs a resource, the semaphore value gets decremented. When the process is done with the resource, the semaphore value gets incremented. A semaphore value of zero indicates to the calling process that there are currently no resources available, and the calling process blocks until another process finishes using the resource and frees it. We can represent semaphore usage in the following pseudocode.

init_semaphore(key)
	int number_shared_resources;
	semaphore_id = semget(key, number_shared_resources, IPC_CREAT)

get_shared_resource()
	get_semaphore()		
		if (semaphore > 0)
			decrement semaphore;
			use resource
		else if (semaphore == 0)
			wait();

done_with_shared_resource()
	increment semaphore() 
	return

The semaphore implementation in Solaris (Unix System V semaphores) allows for semaphore sets, meaning that a unique semaphore indentifier may contain multiple semaphores. Whether or not a semaphore indentifier contains one semaphore or a set of semaphores is determined when the semaphore is created with the semget(2) system call. The second argument to semget(2) determines the number of semaphores that will be associated with the semaphore identifier returned by semget(2). The semaphore system calls allow for some operations on the semaphore set, such that the programmer can make one semctl(2) or semop(2) system call and touch all the semaphores in the semaphore set. This makes dealing with semaphore sets programmatically a little easier.

One final note. The semget(2) call requires that a key be passed as the first argument. All three System V IPC facilities (shared memory, semaphores, and message queues) support the notion of a key in each respective "ipcget()" call (shmget, semget, and msgget). This provides programmers a way of grabbing the same shared resource (shared segment, semaphore set, or message queue) from multiple processes. Using the same key value in the "get" call will ensure that the same IPC indentifier is returned. See the stdipc(3C) man page for more information on keys and a description of the ftok(3C) file-to-key function.

Semphore kernel resources
There are some similarities in the kernel implementation of the Unix System V IPC facilities as we will soon demonstrate. Some of the information covered in last month's column on shared memory will look familiar as we move through this month's discussion.

The kernel support code for semaphores comes in the form of a dynamically loaded kernel module, /kernel/sys/semsys, and the IPC support routines, ipcaccess and ipcget, are also required. These IPC routines are found in the /kernel/misc/ipc loadable kernel module. The semsys and IPC modules will be loaded automatically when the system first executes a semaphore system call. See last month's column for more information on loadable kernel modules.

The tunable kernel parameters that apply to semaphores are summarized in Table 2. We will now take a closer look at each one and discuss how kernel resources get allocated.

Table 2
Name Default Value Maximum Value * Data Type Description
semmap 10 2GB signed integer Number of entries in semaphore map
semmni 10 65,536 signed integer Number if semaphore identifiers. See text for why it can not be > 64k.
semmns 60 2GB signed integer Total number of semaphores system wide
semmnu 30 2GB signed integer Total number of undo structures in system
semmsl 25 65,535 unsigned short Maximum number semaphores per semaphore ID
semopm 10 2GB signed integer Maximum operations for semop call
semume 10 2GB signed integer Maximum undo entries per process
semusz 96 2GB signed integer Total bytes required for undo structures, system wide.
semvmx 32,767 65,535 unsigned short Maximum semaphore value
semaem 16,384 32,767 signed short Maximum adjust on exit value
* Note that the maximum value listed is, in most cases, the largest value attainable based on the data type (e.g. 2,147,483,647 for a signed integer). It is a theoretical limit, and should not be construed as a value of practical use on a running, production system.

The semmap parameter determines the maximum number of entries in a semaphore map. The memory space given to the creation of semaphores is taken from the semmap, which is initialized with a fixed number of map entries based on the value of semmap. The implementation of allocation maps is generic within Unix SVR4 and also Solaris, supported with a standard set of kernel routines (rmalloc(), rmfree(), etc.). The use of allocation maps by the semaphore subsystem is just one example of their implementation. They basically prevent the kernel from having to deal with allocating and de-allocating additional kernel memory as semaphores get initialized and freed. By initializing and using allocation maps, kernel memory is allocated upfront, and map entries are allocated and freed dynamically from the semmap allocation maps. The semmni tunable should never be larger than semmap. In fact, semmap should be set to the product of semmni and semmsl: (semmap = semmni * semmsl). If you make semmap to small for the application, you'll get "WARNING: rmfree map overflow" messages on the console. Tweak it higher and reboot.

The semmni tunable establishes the maximum number of semaphore sets system wide. Every semaphore set in the system has a unique indentifier and control structure, the semid_ds data structure. During initialization, the system allocates kernel memory for semmni control structures. Each control structure is 84 bytes, so as with the shared memory shmmni tunable we discussed last month, you should avoid making this value arbitrarily large.

semmns defines the maximum number of semaphores in the system. A semaphore set may have more than one semaphore associated with it, and each semaphore has a corresponding sem data structure. Each sem structure is only 16 bytes, but you still shouldn't go over the edge with this (make it arbitrarily a very large value). Actually, this number should really be calculated as: semmns = semmni x semmsl. Because semmsl defines the maximum number of semaphores per semaphore set, and semmni defines the maximum number of semaphore sets, the total number of semaphores systemwide can never be greater than the product of semmni and semmsl. This calculation should probably be done in the kernel, and semmns should be removed as a tunable parameter.

semmnu defines the systemwide maximum number of semaphore undo structures. Semaphore undo structures are maintained in the event of a process terminating that has made some semaphore value adjustments. If the SEM_UNDO bit is true in the semaphore flag value (sem_flg) when the semaphore operation is done (the semop(2) system call), then the kernel will "undo" changes made to the semaphore(s) when the process terminates. It seems intuitive to make this equal to semmni which would provide for an undo structure for every semaphore set. Each semaphore undo structure is 16 bytes.

semmsl is the maximum number of semaphores per semaphore set. As mentioned previously, each semaphore set may have one or more semaphores associated with it. This tunable defines the maximum number per set.

semopm is the maximum number of semaphore operations that can be performed per semop(2) call. This gets back to the notion of semaphore sets and the ability to do operations on multiple semaphores via the semop(2) system call. The semaphores in the set will all be associated with the same semaphore ID (semid_ds structure). You should probably set this value equal to semmsl above, so you'll always be able to do an operation on every semaphore in a semaphore set. When a semop(2) call is executed, the kernel checks to ensure that the third argument to semop(2), which is the size of the semaphore set array, is not larger than the semopm tunable. If it is, an error gets returned to the calling code.

semume determines the maximum allowable per process undo structures, or, put another way, the maximum number of semaphore undo operations that can be performed per process. The kernel maintains information on the changes processes make to semaphores (semaphore value adjustments). In the event of a process exiting prematurely, the kernel can set the semaphore values back to what they were prior to the process changing them. This is what undo structures are used for -- to undo semaphore alterations.

semusz is described as the size in bytes for undo structures. I don't know why this is even documented as a tunable. During initialization of the semaphore code, the kernel sets semusz to (sizeof(undo) + (semume x sizeof(undo)), so setting it in /etc/system is pointless. Leave it alone, it should be removed as a tunable.

semvmx is the maximum value of a semaphore. Due to the interaction with undo structures and semaem, below, this tunable should not exceed a max of its default value of 32,767 unless you can guarantee that SEM_UNDO is never being used. The actual value of a semaphore is stored as an unsigned short (2 bytes) in its semaphore structure, which implies that the maximum semaphore value can be 65,535 (64 K), the maximum value of an unsigned short data type. The next paragraph explains why the limit should really be 32,767.

Finally (last but not least!), the semaem tunable is the maximum adjust-on-exit value. This value is stored as an integer in the seminfo structure that the kernel uses to maintain the tunable values; but it is implemented as a signed short in the undo structure. It needs to be signed because semaphore operations can increment or decrement the value of a semaphore, and thus the system may need to apply a negative or positive adjustment value to the semaphore in order to do a successful adjust-on-exit operation. The actual value of a semaphore can never be negative. The maximum value for a signed short, which is 2 bytes (16 bits) is 32,767 (32 K). If semvmx (above) were set to 65,535, and the application actually set semaphore values that high, the system would not be able to undo the entire range of semaphore value changes, because semaem can never be greater than 32,767. This is why semvmx should never be greater than 32,767.

That covers all the semaphore tunables. Now, we'll take a close look at the kernel implementation of semaphores.

During initialization of the semaphore code, when /kernel/sys/semsys first gets loaded, the value of semmni is checked to ensure it is not greater than the maximum allowable value of 65,536 (64 K). If it is, it gets set to 65,536 and a console message is printed indicating the value of semmni was too large. Following that, the tunable parameters from the /etc/system file get plugged into the internal seminfo data structure, with the exception of semusz. As we mentioned earlier it gets set to:

seminfo.semusz =  (sem_undo structure size) + (semume * sem_undo structure size)

Just like with shared memory, the system checks for the maximum amount of available kernel memory, and divides that number by 4, in order to prevent semaphore requirements from taking more than 25 percent of available kernel memory. Actual memory requirements for semaphores are calculated as follows:

total_kernel_memory_required = (semmns * (sem structure size)) + 
			       (semmni * (semid_ds structure size)) +
			       (semmni * (kernel mutex size)) +
			       (max processes * (undo structure pointer size)) +
			       (semusz * semmnu * (integer size)

The structure sizes are given in the previous section. Pointers and integers are 4 bytes, and a kernel mutex is 8 bytes. Max processes is determined during startup based on the amount of RAM in the system. You can determine the actual value on your system with the sysdef(1M) command:

#sysdef | grep v_proc
   1962        maximum number of processes (v.v_proc)
#

Doing the actual arithmetic is left as an exercise for the reader. If the required memory is more than 25 percent of available kernel memory, a message indicating as much will appear on the console.

Assuming everything fits, kernel memory is allocated as follows. Resource map allocation is done based on semmap, and a kernel semmap pointer is set. Space is allocated for all the sem structures (one for every semaphore) based on semmns, all the semaphore identifiers (semid_ds) based on the size of that structure, all the undo structure pointers based on the max processes and pointer size, all the undo structures themselves, based on semmnu and the size of an undo structure, and all the kernel mutex locks required, one for every unique semaphore identifier, based on semmni and the size of a kernel mutex:

        semmap     = rmallocmap(seminfo.semmap);
        sem        = allocate_kernel_memory(semmns *  sem structure size)
        sema       = allocate_kernel_memory(semmni * semid_ds structure size)
        sem_undo   = allocate_kernel_memory(max processes * pointer size)
        semu       = allocate_kernel_memory((semusz * semmnu) * integer size)
        sema_locks = allocate_kernel_memory(semmni * kernel mutez size)

Note that a kernel mutex lock is created for each semaphore set. This provides for fairly fine-grained parallelism on multiprocessor hardware, as it means that multiple processes can do operations on different semaphore sets concurrently. For operations on semaphores in the same set, the kernel needs to ensure atomicity for the application, which is a guarantee that a semaphore operation initiated by a process will complete without interference from another process, whether the operation is on a single semaphore, or multiple semaphores in the same set.

That pretty much covers kernel resources used by semaphores. Now we'll take a quick look at some implementation details.

Semaphore operations inside solaris
The creation of a semaphore set by an application requires a call to semget(2). Every semaphore set in the system is described by a shmds_id data structure, which contains the following elements:

struct semid_ds {
        struct ipc_perm sem_perm;       /* operation permission struct */
        struct sem      *sem_base;      /* ptr to first semaphore in set */
        ushort_t        sem_nsems;      /* # of semaphores in set */
        time_t          sem_otime;      /* last semop time */
        long            sem_pad1;       /* reserved for time_t expansion */
        time_t          sem_ctime;      /* last change time */
        long            sem_pad2;       /* time_t expansion */
        long            sem_binary;     /* flag indicating semaphore type */
        long            sem_pad3[3];    /* reserve area */
};

The system checks to see if a semaphore already exists based on the key value passed to semget(2), and does a permission check using the IPC support routine, ipcaccess(). Semaphore permissions differ slightly from permission modes we're used to seeing on things like Solaris files. They're defined as READ and ALTER, such that processes can either read the current semaphore value, or alter it (increment/decrement). Permissions are established with arguments passed to the semget(2) call, following the owner, group, and other conventions used for Solaris file permissions.

Assuming a new semaphore, space is allocated from the resource map pool based on the number of semaphores in the set requested, and the above elements in the semid_ds data structure are initialized, with the sem_base pointer being set to point to the first semaphore in the set.

Once the semaphore is created, typically the next step is initializing the semaphore value(s). This is done with the semctl(2) call, using either SETVAL to set the value of each semaphore in the set one at a time (or if there's 1 semaphore in the set), or SETALL, which allows for setting the value of all the semaphores in the set in one fell swoop. The actual kernel flow is relatively straightforward with the expected permission and value checks against the maximum allowable values, so we won't take up space detailing it here.

Actual semaphore use by application code involves the semop(2) system call. Semop takes the semaphore ID (returned by semget), a pointer to a sembuf structure, and the number of semaphore operations as call arguments. The sembuf structure contains the following elements:

struct sembuf {
        ushort_t        sem_num;        /* semaphore # */
        short           sem_op;         /* semaphore operation */
        short           sem_flg;        /* operation flags */
};

The programmer must create and initialize the sembuf structure, setting the semaphore number (which semaphore in the set), the operation (more on that in a minute), and the flag. The value of sem_op determines whether the semaphore operation will alter or read the value of a semaphore. A non-zero sem_op value will, either negative or positive, alter the semaphore value. A zero sem_op value will simply do a read of the current semaphore value.

The semop(2) man page actually has a fairly detailed flow in the DESCRIPTION section on what the operation will be for a given sem_op value and a given flag value. It obviates the need to do a kernel code path flow for the semop() call in the column (If this doesn't make readers happy, it at least makes the writer happy!).

Semaphores are used extensively in applications written for Solaris. For example, relational database systems use them to manage access to the database cache they implement via shared memory. A solid understanding of how the kernel implements semaphores can aid in the management of Solaris systems, as well as application code development using semaphores.

Next month, we'll finish up the System V IPC facilities with a discussion of message queues.


Resources


About the author
Jim Mauro is currently an Area Technology Manager for Sun Microsystems in the Northeast area, focusing on server systems, clusters, and high availability. He has a total of 18 years industry experience, working in service, educational services (he developed and delivered courses on Unix internals and administration), and software consulting. Reach Jim at jim.mauro@sunworld.com.

What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough
 
 
 
    

SunWorld
[Table of Contents]
Sun's Site
[Search]
Feedback
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-10-1997/swol-10-insidesolaris.html
Last modified: