|
Setting our sights on semaphoresA strong understanding of how semaphores work will help you develop and manage Solaris application code and systems |
Last month we looked at the shared memory interprocess communication (IPC) facility in Solaris. We'll follow this month with a logical successor to the topic of shared memory -- semaphores. Semaphores are another IPC facility that evolved from the original AT&T System V Unix release (as did shared memory). Semaphores provide a means of synchronizing access to shared resources and are used quite often to synchronize access to shared memory segments by multiple processes in applications. This month we will examine the Solaris implementation of semaphores, including the tunable kernel parameters that apply to semaphore resource allocation and the semaphore information made available with theipcs(1M)
command.As with last month, our goal here does not detail how to program with semaphores. Reference the Solaris Developers Kit (SDK) documentation, or any number of books on programming with Unix for specifics on using semaphores in code. (3,500 words)
Mail this article to a friend |
Table 1
System V Semaphore System Calls |
Thread Library Calls | POSIX Library Calls |
---|---|---|
semget(2) | sema_destroy(3T) | sem_destroy(3R) |
semctl(2) | sema_init(3T) | sem_init(3R) |
semop(2) | sema_post(3T) | sem_post(3R) |
sema_trywait(3T) | sem_trywait(3R) | |
sema_wait(3T) | sem_wait(3R) | |
sem_open(3R) | ||
sem_close(3R) | ||
sem_unlink(3R) | ||
sem_getvalue(3R) |
|
|
|
|
Which API is which?
Important to note is that in Solaris 2.5.1, there is minimal support
for the POSIX semaphore calls. In those cases where a call to a
POSIX semaphore routine has a corresponding libthread call, it's
actually the libthread code that gets executed. For example, calling
sem_post()
, which requires linking in the POSIX
library, libposix4, will ultimately call the libthread
sema_post()
code. The POSIX routines can be thought of
simply as a wrapper around the underlying libthread semaphore code.
This changes somewhat in Solaris 2.6, where support is added for
POSIX named semaphores. Calls to sem_open()
,
sem_close()
, and sem_unlink()
, which are
required for named semaphore support (and have no libthread
equivalents), are supported in 2.6.
There are similarities between the three different semaphore
implementations, as the fundamental notion of what a semaphore is
does not change across the three libraries. Several of the
libthread and libposix4 routines are functionally identical, which
is fairly obvious from the routine names; the disparity again begins
with the additional routines in the libposix4 code for support of
named semaphores. The System V semaphore interfaces do not have a
direct correlation to the thread and POSIX code in terms of the
names of the APIs. The programmer using System V semaphores must
deal with a couple of generic interfaces for semaphore control and
operations using the semctl(2)
and
semop(2)
system calls and set the appropriate flags in
the argument list in order to do things like initialize, increment,
or decrement a semaphore. Finally, the thread and POSIX
implementations do not support the notion of semaphore sets, as the
System V code does.
The Unix System V semaphore routines are the ones that have been around the longest and are thus implemented in a greater number of existing applications. The routines linked via the libthread library, as well as the POSIX routines, are typically found in multithreaded Solaris applications -- those applications that implement thread calls in the libthread and libposix4 libraries.
Note that the kernel tunables and resources discussed in the following sections do not apply to the libthread and libposix4 semaphore routines. Semaphores allocated via calls to the thread and POSIX code do not utilize any of the kernel resources allocated for System V semaphores. In fact, the underlying library support code uses low-level mutex and lwp calls for thread and POSIX semaphore support. All the semaphore resources exist in the process address space.
First an overview
While we stated earlier that a non-goal of this month's column is how
to program using semaphores, we'll briefly discuss how applications
use semaphores, in order to make the following sections on the
kernel implementation more digestible.
Let's start with an extremely brief history. A semaphore, as defined in the dictionary, is a mechanical signaling device or a means of doing visual signaling. The analogy typically used is the railroad mechanism of signaling trains, where mechanical arms swing down to block a train from a section of track that another train is currently using. When the track was free, the arm would swing up, and the waiting train could now proceed.
The notion of using semaphores as a means of synchronization in computer software was originated by a Dutch mathematician, E.W. Dijkstra in 1965. Dijkstra's original work defined two semaphore operations, wait and signal (which correlate nicely to the railroad example). The operations were referred to as P and V operations. The P operation was the wait and decremented the value of the semaphore if it was greater than zero, and the V operation was the signal, which increments the semaphore value. I assume the names Dijkstra choose, P and V, originate from the Dutch terms for signal and wait, but I'm not sure.
Semaphores provide a method of synchronizing access to a shareable resource by multiple processes. It can be used as a binary lock for exclusive access or as a counter, allowing for multiple concurrent access to a finite number of shared resources, where the semaphore value gets initialized to the number of shared resources. Each time a process needs a resource, the semaphore value gets decremented. When the process is done with the resource, the semaphore value gets incremented. A semaphore value of zero indicates to the calling process that there are currently no resources available, and the calling process blocks until another process finishes using the resource and frees it. We can represent semaphore usage in the following pseudocode.
init_semaphore(key) int number_shared_resources; semaphore_id = semget(key, number_shared_resources, IPC_CREAT) get_shared_resource() get_semaphore() if (semaphore > 0) decrement semaphore; use resource else if (semaphore == 0) wait(); done_with_shared_resource() increment semaphore() return
The semaphore implementation in Solaris (Unix System V semaphores)
allows for semaphore sets, meaning that a unique semaphore
indentifier may contain multiple semaphores. Whether or not a
semaphore indentifier contains one semaphore or a set of semaphores
is determined when the semaphore is created with the
semget(2)
system call. The second argument to
semget(2)
determines the number of semaphores that will
be associated with the semaphore identifier returned by
semget(2)
. The semaphore system calls allow for some
operations on the semaphore set, such that the programmer can make
one semctl(2)
or semop(2)
system call and
touch all the semaphores in the semaphore set. This makes dealing
with semaphore sets programmatically a little easier.
One final note. The semget(2) call requires that a key
be passed as the first argument. All three System V IPC facilities
(shared memory, semaphores, and message queues) support the notion
of a key in each respective "ipcget()
" call
(shmget
, semget
, and msgget
).
This provides programmers a way of grabbing the same shared resource
(shared segment, semaphore set, or message queue) from multiple
processes. Using the same key value in the "get" call will ensure
that the same IPC indentifier is returned. See the
stdipc(3C)
man page for more information on keys and a
description of the ftok(3C)
file-to-key function.
Semphore kernel resources
There are some similarities in the kernel implementation of the Unix
System V IPC facilities as we will soon demonstrate. Some of the
information covered in last month's column on shared memory will
look familiar as we move through this month's discussion.
The kernel support code for semaphores comes in the form of a
dynamically loaded kernel module, /kernel/sys/semsys
,
and the IPC support routines, ipcaccess
and
ipcget
, are also required. These IPC routines are found
in the /kernel/misc/ipc
loadable kernel module. The
semsys
and IPC modules will be loaded automatically
when the system first executes a semaphore system call. See last
month's column for more information on loadable kernel modules.
The tunable kernel parameters that apply to semaphores are summarized in Table 2. We will now take a closer look at each one and discuss how kernel resources get allocated.
Table 2
Name | Default Value | Maximum Value * | Data Type | Description |
---|---|---|---|---|
semmap | 10 | 2GB | signed integer | Number of entries in semaphore map |
semmni | 10 | 65,536 | signed integer | Number if semaphore identifiers. See text for why it can not be > 64k. |
semmns | 60 | 2GB | signed integer | Total number of semaphores system wide |
semmnu | 30 | 2GB | signed integer | Total number of undo structures in system |
semmsl | 25 | 65,535 | unsigned short | Maximum number semaphores per semaphore ID |
semopm | 10 | 2GB | signed integer | Maximum operations for semop call |
semume | 10 | 2GB | signed integer | Maximum undo entries per process |
semusz | 96 | 2GB | signed integer | Total bytes required for undo structures, system wide. |
semvmx | 32,767 | 65,535 | unsigned short | Maximum semaphore value |
semaem | 16,384 | 32,767 | signed short | Maximum adjust on exit value |
The semmap
parameter determines the maximum number of
entries in a semaphore map. The memory space given to the creation
of semaphores is taken from the semmap
, which is
initialized with a fixed number of map entries based on the value of
semmap
. The implementation of allocation maps is
generic within Unix SVR4 and also Solaris, supported with a standard
set of kernel routines (rmalloc()
,
rmfree()
, etc.). The use of allocation maps by the
semaphore subsystem is just one example of their implementation.
They basically prevent the kernel from having to deal with
allocating and de-allocating additional kernel memory as semaphores
get initialized and freed. By initializing and using allocation
maps, kernel memory is allocated upfront, and map entries are
allocated and freed dynamically from the semmap
allocation maps. The semmni
tunable should never be
larger than semmap
. In fact, semmap
should be
set to the product of semmni
and semmsl
:
(semmap = semmni * semmsl)
. If you make semmap
to small for the application, you'll get "WARNING: rmfree map overflow"
messages on the console. Tweak it higher and reboot.
The semmni
tunable establishes the maximum number of
semaphore sets system wide. Every semaphore set in the system has a
unique indentifier and control structure, the semid_ds
data structure. During initialization, the system allocates kernel memory for
semmni
control structures. Each control structure is 84
bytes, so as with the shared memory shmmni
tunable we
discussed last month, you should avoid making this value arbitrarily
large.
semmns
defines the maximum number of semaphores in the
system. A semaphore set may have more than one semaphore associated
with it, and each semaphore has a corresponding sem
data structure. Each sem
structure is only 16 bytes,
but you still shouldn't go over the edge with this (make it
arbitrarily a very large value). Actually, this number should really
be calculated as: semmns
= semmni
x
semmsl
. Because semmsl
defines the
maximum number of semaphores per semaphore set, and
semmni
defines the maximum number of semaphore sets,
the total number of semaphores systemwide can never be greater than
the product of semmni
and semmsl
. This
calculation should probably be done in the kernel, and
semmns
should be removed as a tunable parameter.
semmnu
defines the systemwide maximum number of
semaphore undo structures. Semaphore undo structures are maintained
in the event of a process terminating that has made some semaphore
value adjustments. If the SEM_UNDO
bit is true in the
semaphore flag value (sem_flg
) when the semaphore
operation is done (the semop(2)
system call), then the
kernel will "undo" changes made to the semaphore(s) when the process
terminates. It seems intuitive to make this equal to
semmni
which would provide for an undo structure for
every semaphore set. Each semaphore undo structure is 16 bytes.
semmsl
is the maximum number of semaphores per
semaphore set. As mentioned previously, each semaphore set may have
one or more semaphores associated with it. This tunable defines
the maximum number per set.
semopm
is the maximum number of semaphore operations
that can be performed per semop(2)
call. This gets
back to the notion of semaphore sets and the ability to do
operations on multiple semaphores via the semop(2)
system call. The semaphores in the set will all be associated with
the same semaphore ID (semid_ds
structure). You should
probably set this value equal to semmsl
above, so
you'll always be able to do an operation on every semaphore in a
semaphore set. When a semop(2)
call is executed, the
kernel checks to ensure that the third argument to
semop(2)
, which is the size of the semaphore set array,
is not larger than the semopm
tunable. If it is, an
error gets returned to the calling code.
semume
determines the maximum allowable per process
undo structures, or, put another way, the maximum number of
semaphore undo operations that can be performed per process. The
kernel maintains information on the changes processes make to
semaphores (semaphore value adjustments). In the event of a process
exiting prematurely, the kernel can set the semaphore values back to
what they were prior to the process changing them. This is what undo
structures are used for -- to undo semaphore alterations.
semusz
is described as the size in bytes for undo
structures. I don't know why this is even documented as a tunable.
During initialization of the semaphore code, the kernel sets
semusz
to (sizeof(undo) + (semume
x
sizeof(undo)), so setting it in /etc/system
is
pointless. Leave it alone, it should be removed as a tunable.
semvmx
is the maximum value of a semaphore. Due to the
interaction with undo structures and semaem
, below,
this tunable should not exceed a max of its default value of 32,767
unless you can guarantee that SEM_UNDO is never being used. The
actual value of a semaphore is stored as an unsigned short (2 bytes)
in its semaphore structure, which implies that the maximum semaphore
value can be 65,535 (64 K), the maximum value of an unsigned short
data type. The next paragraph explains why the limit should really
be 32,767.
Finally (last but not least!), the semaem
tunable is the
maximum adjust-on-exit value. This value is stored as an integer in
the seminfo
structure that the kernel uses to maintain
the tunable values; but it is implemented as a signed short in the
undo structure. It needs to be signed because semaphore operations
can increment or decrement the value of a semaphore, and thus the
system may need to apply a negative or positive adjustment value to
the semaphore in order to do a successful adjust-on-exit operation.
The actual value of a semaphore can never be negative. The maximum
value for a signed short, which is 2 bytes (16 bits) is 32,767 (32
K). If semvmx
(above) were set to 65,535, and the
application actually set semaphore values that high, the system would
not be able to undo the entire range of semaphore value changes,
because semaem
can never be greater than 32,767. This is
why semvmx
should never be greater than 32,767.
That covers all the semaphore tunables. Now, we'll take a close look at the kernel implementation of semaphores.
During initialization of the semaphore code, when
/kernel/sys/semsys
first gets loaded, the value of
semmni
is checked to ensure it is not greater than the
maximum allowable value of 65,536 (64 K). If it is, it gets set to
65,536 and a console message is printed indicating the value of
semmni
was too large. Following that, the tunable
parameters from the /etc/system
file get plugged into
the internal seminfo
data structure, with the exception
of semusz
. As we mentioned earlier it gets set to:
seminfo.semusz = (sem_undo structure size) + (semume * sem_undo structure size)
Just like with shared memory, the system checks for the maximum amount of available kernel memory, and divides that number by 4, in order to prevent semaphore requirements from taking more than 25 percent of available kernel memory. Actual memory requirements for semaphores are calculated as follows:
total_kernel_memory_required = (semmns * (sem structure size)) + (semmni * (semid_ds structure size)) + (semmni * (kernel mutex size)) + (max processes * (undo structure pointer size)) + (semusz * semmnu * (integer size)
The structure sizes are given in the previous section. Pointers and
integers are 4 bytes, and a kernel mutex is 8 bytes. Max processes
is determined during startup based on the amount of RAM in the
system. You can determine the actual value on your system with the
sysdef(1M)
command:
#sysdef | grep v_proc 1962 maximum number of processes (v.v_proc) #
Doing the actual arithmetic is left as an exercise for the reader. If the required memory is more than 25 percent of available kernel memory, a message indicating as much will appear on the console.
Assuming everything fits, kernel memory is allocated as follows.
Resource map allocation is done based on semmap
, and a
kernel semmap
pointer is set. Space is allocated for
all the sem
structures (one for every semaphore) based
on semmns
, all the semaphore identifiers
(semid_ds
) based on the size of that structure, all
the undo structure pointers based on the max processes and pointer
size, all the undo structures themselves, based on
semmnu
and the size of an undo structure, and all the
kernel mutex locks required, one for every unique semaphore
identifier, based on semmni
and the size of a kernel
mutex:
semmap = rmallocmap(seminfo.semmap); sem = allocate_kernel_memory(semmns * sem structure size) sema = allocate_kernel_memory(semmni * semid_ds structure size) sem_undo = allocate_kernel_memory(max processes * pointer size) semu = allocate_kernel_memory((semusz * semmnu) * integer size) sema_locks = allocate_kernel_memory(semmni * kernel mutez size)
Note that a kernel mutex lock is created for each semaphore set. This provides for fairly fine-grained parallelism on multiprocessor hardware, as it means that multiple processes can do operations on different semaphore sets concurrently. For operations on semaphores in the same set, the kernel needs to ensure atomicity for the application, which is a guarantee that a semaphore operation initiated by a process will complete without interference from another process, whether the operation is on a single semaphore, or multiple semaphores in the same set.
That pretty much covers kernel resources used by semaphores. Now we'll take a quick look at some implementation details.
Semaphore operations inside solaris
The creation of a semaphore set by an application requires a call to
semget(2)
. Every semaphore set in the system is
described by a shmds_id
data structure, which contains
the following elements:
struct semid_ds { struct ipc_perm sem_perm; /* operation permission struct */ struct sem *sem_base; /* ptr to first semaphore in set */ ushort_t sem_nsems; /* # of semaphores in set */ time_t sem_otime; /* last semop time */ long sem_pad1; /* reserved for time_t expansion */ time_t sem_ctime; /* last change time */ long sem_pad2; /* time_t expansion */ long sem_binary; /* flag indicating semaphore type */ long sem_pad3[3]; /* reserve area */ };
The system checks to see if a semaphore already exists based on the
key value passed to semget(2)
, and does a permission
check using the IPC support routine, ipcaccess()
.
Semaphore permissions differ slightly from permission modes we're
used to seeing on things like Solaris files. They're defined as READ
and ALTER, such that processes can either read the current semaphore
value, or alter it (increment/decrement). Permissions are
established with arguments passed to the semget(2)
call, following the owner, group, and other conventions used for
Solaris file permissions.
Assuming a new semaphore, space is allocated from the resource map
pool based on the number of semaphores in the set requested, and the
above elements in the semid_ds
data structure are
initialized, with the sem_base
pointer being set to
point to the first semaphore in the set.
Once the semaphore is created, typically the next step is
initializing the semaphore value(s). This is done with the
semctl(2)
call, using either SETVAL to set the value of
each semaphore in the set one at a time (or if there's 1 semaphore
in the set), or SETALL, which allows for setting the value of all
the semaphores in the set in one fell swoop. The actual kernel flow
is relatively straightforward with the expected permission and value
checks against the maximum allowable values, so we won't take up
space detailing it here.
Actual semaphore use by application code involves the
semop(2)
system call. Semop
takes the
semaphore ID (returned by semget
), a pointer to a
sembuf
structure, and the number of semaphore
operations as call arguments. The sembuf
structure
contains the following elements:
struct sembuf { ushort_t sem_num; /* semaphore # */ short sem_op; /* semaphore operation */ short sem_flg; /* operation flags */ };
The programmer must create and initialize the sembuf
structure, setting the semaphore number (which semaphore in the
set), the operation (more on that in a minute), and the flag. The
value of sem_op
determines whether the semaphore
operation will alter or read the value of a semaphore. A non-zero
sem_op
value will, either negative or positive, alter
the semaphore value. A zero sem_op
value will simply do
a read of the current semaphore value.
The semop(2)
man page actually has a fairly detailed flow
in the DESCRIPTION section on what the operation will be for a given
sem_op
value and a given flag value. It obviates the
need to do a kernel code path flow for the semop()
call
in the column (If this doesn't make readers happy, it at least makes
the writer happy!).
Semaphores are used extensively in applications written for Solaris. For example, relational database systems use them to manage access to the database cache they implement via shared memory. A solid understanding of how the kernel implements semaphores can aid in the management of Solaris systems, as well as application code development using semaphores.
Next month, we'll finish up the System V IPC facilities with a discussion of message queues.
|
Resources
About the author
Jim Mauro is currently an Area Technology Manager for Sun
Microsystems in the Northeast area, focusing on server systems,
clusters, and high availability. He has a total of 18 years industry
experience, working in service, educational services (he developed
and delivered courses on Unix internals and administration), and
software consulting.
Reach Jim at jim.mauro@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-10-1997/swol-10-insidesolaris.html
Last modified: