|
Demangling message queuesWe've already covered two of the three traditional Unix System V IPC (Interprocess Communication) facilities: shared memory and semaphores. We'll finish up with a look at the message queue facility -- the allocation of kernel resources, kernel implementation, and the actual sending and receiving of messages |
Message queues appeared in an early release of Unix System V Release III (pre SVR4) as a means of doing asynchronous message passing between processes. Message queues allowed application developers to pass data around running processes in an ordered fashion.We will conclude our coverage of the traditional Unix System V IPC facilities with a discussion of message queues that follows a pattern similar to that of the past two columns. We'll drill down on the kernel tunable parameters and take a close look at the kernel implementation of message queues.
Once again, our intent here is not to explain how to write application code by using the message queue facility. That information is readily available in any number of books on programming in Unix, as well as the Solaris Developers Kit (SDK) documentation. (3,800 words)
Mail this article to a friend |
The APIs used for the System V message queue facility are in the form of system calls, documented in section 2 of the man pages. Not to be confused with the POSIX message queue facility, documented in section 3R of the man pages. The POSIX interfaces do not use the kernel resources discussed in this month's column. The POSIX code only works in Solaris 2.6. Prior releases do not support the POSIX messaging facility.
As with the other IPC facilities, the initial call when using
message queues is an ipcget
call, in this case
msgget(2)
. The msgget(2)
system call
takes a key value and some flags as arguments. The key is a generic
IPC convention that allows for the retrieval of commonly shared
resources from different processes. Consider an application being
developed where multiple processes will be sending and receiving
messages to and from the same message queue. In order for these
different processes to use the same message queue, we need to grab
the correct message queue indentifier, which is what the
msgget(2)
system call returns to the calling
application. The way we ensure that the processes are using a common
message queue is via a key value. As long as each process passes the
same key value in the msgget(2)
call, the correct
message-queue identifier will be returned; the same identifier goes to
all processes using the same key. This is assuming, of course, that
the correct flags are used and the permissions are correctly
established.
Once the message queue has been established, it's simply a matter of
sending and receiving messages. Applications use the
msgsnd(2)
and msgrcv(2)
to accomplish
this. The sender simply constructs the message, assigns a message
type, and calls msgsnd(2)
. The system will place the
message on the appropriate message queue until a
msgrcv(2)
is successfully executed. Sent messages are
placed in the back of the queue; messages are received from the
front of the queue.
The message queue facility implements a message type field, which is user (programmer) defined. This gives programmers some flexibility because the kernel has no embedded or predefined knowledge of different message types. Programmers typically use the type field for priority messaging or directing a message to a particular recipient.
Lastly, applications use the msgctl(2)
system call in
order to get or set permissions on the message queue, and to remove
the message queue from the system when the application is finished
with it (e.g., as a clean way to implement an application shutdown
procedure; the system will not remove an empty and unused message
queue unless it is explicitly removed or the system is rebooted).
As a general note, the shared memory and semaphore IPC facilities
also provide a "control" system call (shmctl(2)
and
semctl(2)
), which is used for the same purpose: getting
or setting permissions for the resource, getting general information
on use of the resource, and removing the resource programmatically.
The system ipcs(1M)
and ipcrm(1M)
commands
can be used to retrieve resource usage information and remove the
resource via the command line.
|
|
|
|
Kernel resources
As with the IPC facilities previously discussed, the message queue
facility comes in the form of a dynamically loadable kernel module,
/kernel/sys/msgsys
, and depends on the IPC support module,
/kernel/misc/ipc
, to be loaded in RAM. We talked a bit about
loadable kernel modules in the shared memory column of September 1997
(see Resources below).
The key points to remember are that the system will load the module
into system memory when the first message system call is executed,
and the IPC facility that is required to provide some low-level
support routines will get loaded at the same time.
A system administrator can use the forceload
utility in
the /etc/system
to force the system to load the
message queue module at boot time. The ipcs(1M)
command
can be used to determine if the module is loaded and examine
current utilization of message queue resources (more on this
shortly).
The number of resources that the kernel will allocate for message queues is tunable. Values for various message queue tunable parameters can be increased from their default values so more resources are made available for systems running applications that make heavy use of message queues. A summary of the tunable parameters, along with the default and maximum values, can be found in Table 1. We'll take a closer look at each one now.
Name | Default | Data Type | Max | Description |
---|---|---|---|---|
msgmap | 100 | signed int | 2GB | Number of message map entries |
magmax | 2048 | signed int | 2GB | Maximum message size |
msgmnb | 4096 | signed int | 2GB | Max bytes on a msg queue |
msgmni | 50 | signed int | 2GB | Max msg queue identifiers |
msgssz | 8 | signed int | 2GB | Message segment size |
msgtql | 40 | signed int | 2GB | Max message headers |
msgseg | 1024 | unsigned short | 32k | Max message segments |
Note: The maximum value listed in the "Max" column in table 1 is a value based on the data type. It is a theoretical maximum only, and should not be construed as something attainable on production systems.
The msgmap
tunable is described as defining the number
of entries in the message map and is essentially the same as the
semmap
parameter used for semaphores described last
month. Both IPC facilities use resource allocation maps in the
kernel. A kernel resource map is simply an array of map structures
used for the allocation and deallocation of segments of an address
space. They provide a convenient means of managing small segments of
kernel memory where there is frequent allocation and deallocation,
as is the case with message queues (and semaphores). The
system grabs message map entries when it needs space to store new
messages destined for a message queue.
The msgmni
tunable is (hopefully!) a now familiar IPC
identifier "mni" parameter. As with shared memory segments and
semaphores, message queues have an identifier associated with them,
with a corresponding id
data structure, the
msqid_ds
structure. The value of msgmni
determines the maximum number of message queues the kernel can
maintain. As we'll see below, the system allocates kernel memory
based on the value of msgmni
, so one should not set
this arbitrarily high. Hopefully, the system software engineers will
have a sense for how many message queues are needed by the
application, and can set msgmni
appropriately, adding
10 percent or so for headroom.
The msgmax
parameter defines the maximum size a message
can be, in bytes. The kernel does not allocate resources up front
based on msgmax
, but it is something that the
application developer needs to be aware of, as the system will not
allow messages that have a size larger than msgmax
on
the message queue. An error will be returned to the calling code
indicating the message is too large. Even with a theoretical size
limit of 2 GB for the maximum message size (see Table 1), message
queues are probably not the most efficient way to move large blocks
of data between processes. If the data requirements are relatively
large, the software engineers should consider using shared memory
instead of message queues for data sharing among processes -- or one
of the more recent additions to Unix, such as a FIFO (named pipe).
When I say "more recent," I mean recent relative to message queues.
FIFOs have actually been around for quite a while, but not nearly
as long as message queues.
msgmnb
is used to determine the maximum number of bytes
on a message queue. More succinctly put, the sum total of all the
bytes of all the messages on the queue can not exceed
msgmnb
. When the message queue is initialized (the
first msgget(2)
) call executed with the
IPC_CREAT
flag set), the kernel sets a member of the
msgid_ds
structure, the msg_qbytes
field,
to the value of msgmnb
. This makes the information
available to programmers, as the msgctl(2)
can be used
to retrieve the msgid_ds
data. More importantly, code
executed with an effective UID of root (typically 0) can
programatically increase this variable in case the queue needs to
hold more message bytes than originally allocated at boot time. If
an application attempts to put a new message on a message queue that
will result in the total bytes being greater than
msgmnb
, the msgsnd(2)
call will either
return an error, or the process will block waiting for one or more messages to be removed (read) from the queue, such that the total number of bytes on the queue, plus the size of the new message, is less-then or equal-to msgmnb
. Whether or not the process plugs depends on the state of the IPC_WAIT
flag.
msgtql
defines the maximum number of message headers.
Each message on a message queue requires a message header, which is
defined in the kernel by the msg
structure (more on
that in the next section). Basically, this tunable should reflect
the maximum number of messages (message queues, times messages per
queue) the application will need, plus a little headroom.
msgssz
establishes the maximum message segment size,
and msgseg
determines the maximum number of message
segments. msgseg
is stored as a short in the kernel,
and thus, can not be greater then 32,768 (32 K) bytes in size. The
kernel creates a pool of memory to hold the message data, and the
size of that pool is the product of msgssz
and
msgseg
parameters. Described more clearly, the number
of units of allocation from the data space is msgseg
,
and the size of each allocation unit from the space is
msgssz
.
Kernel resource allocation
We'll take a look at how the kernel allocates resources for message
queues based on the tunables, then we'll put it all together in the
last section.
When the /kernel/sys/msgsys
module is first loaded, an
initialization routine executes, which does pretty much the same sort
of work that is done for shared memory and semaphore initialization.
That is, a check is made on the amount of kernel memory that will be
required for resources based on the tunable parameters discussed
previously, and providing the required amount is no greater than 25 percent
of available kernel memory, the system allocates the resources.
The amount of kernel memory required is calculated as follows:
kernel_memory_required = ((msgseg * msgssz) * sizeof char datatype) + (msgmap * sizeof map structure) + (msgmni * sizeof msqds_id structure) + (msgmni * sizeof msglock structure) + (msgtql * sizeof msg structure)The sizes of the structures in bytes will be provided in the next few paragraphs, and once again the arithmetic using either the default or custom values is left as an exercise for the reader. The
char
datatype on SPARC/Solaris is one byte in size.
Assuming everything will fit, the system grabs various chunks of kernel memory as follows, assigning kernel pointers described below:
msg = allocate_kernel_memory(msgseg * msgssz) * sizeof (char); msgmap = allocate_kernel_memory(msgmap * sizeof (struct map)); msgh = allocate_kernel_memory(msgtql * sizeof (struct msg)); msgque = allocate_kernel_memory(msgmni * sizeof (struct msqid_ds)); msglock = allocate_kernel_memory(msgmni * sizeof (struct msglock));The
msg
pointer is set to point to the beginning of the
pool of memory used to store message data, described earlier.
msgmap
points to the beginning of the map structures
used for maintaining resource allocation maps, also described
earlier. A map structure is eight bytes in size, and looks like:
struct map { ulong_t m_size; /* the size of the map segment */ ulong_t m_addr; /* the address of the start of the segment */ };
The kernel data structure that describes each message queue is the msqid_ds structure:
struct msqid_ds { struct ipc_perm msg_perm; /* operation permission structure */ struct msg *msg_first; /* ptr to first message on q */ struct msg *msg_last; /* ptr to last message on q */ ulong msg_cbytes; /* current # bytes on q */ ulong msg_qnum; /* # of messages on q */ ulong msg_qbytes; /* max # of bytes on q */ pid_t msg_lspid; /* pid of last msgsnd */ pid_t msg_lrpid; /* pid of last msgrcv */ time_t msg_stime; /* last msgsnd time */ long msg_pad1; /* reserved for time_t expansion */ time_t msg_rtime; /* last msgrcv time */ long msg_pad2; /* time_t expansion */ time_t msg_ctime; /* last change time */ long msg_pad3; /* time expansion */ kcondvar_t msg_cv; kcondvar_t msg_qnum_cv; long msg_pad4[3]; /* reserve area */ };
The structure field descriptions above are basically
self-explanatory. The msg_perm
field is an IPC
permissions structure used in all the IPC facilities to maintain the
access permissions to the resource based on the same "owner,"
"group," and "other" convention used for files in the file system.
It is described in the September 1997 column on shared memory segments
(see Resources below).
The permissions get established by the process that creates the
shared segment, and they can be changed via the
msgctl(2)
system call. The kernel pointer
msgque
points to the beginning of the kernel space
allocated to hold all the system msqid_ds
structures.
It is simply an array of msqid_ds
structures, with
msgque
pointing to the first structure in the array.
The total number of structures in the array is equal to the
msgmni
tunable. Each structure is 112 bytes in size.
The messages in a message queue are maintained in a linked list,
with the root of the list in the msqid_ds
data structure (the
msg_first
pointer), which points to the message header
for the message. The kernel also maintains a linked list of message
headers, rooted in the kernel msgh
pointer.
struct msg { struct msg *msg_next; /* ptr to next message on q */ long msg_type; /* message type */ ushort_t msg_ts; /* message text size */ short msg_spot; /* message text map address */ };The kernel message structure (actually, message *header* structure is a more accurate name) is 12 bytes in size and, as we said, one exists for every message on every message queue (the
msgtql
tunable).
The last chunk of kernel memory allocated is for the message queue
synchronization locks. The method of synchronization used is a
condition variable protected by a mutex (mutual exclusion) lock,
defined in the msglock
structure:
struct msglock { char msglock_lock; kcondvar_t msglock_cv; };There is a
msglock
created for every message queue (one
per message queue identifier).
Condition variables are a means of allowing a process (thread) to test whether or not a particular condition is true under the protection of a mutex lock. The mutex ensures that the condition can be checked for atomicity, and no other thread can change the condition while the first thread is testing the condition. The thread will block holding the mutex until the condition changes state (becomes true), at which point the thread can continue execution. A good example is the existence of a message on a queue. If none exists, the thread blocks (sleeps) on the condition variable. When a message appears on the queue, the system sends a broadcast; and the thread is woken up, ready to pull the message off the queue. We'll cover this a bit more in the next section.
A final note on kernel locking. All versions of Solaris, up to and including 2.5.1, do very coarse-grained locking in the kernel message queue module. Specifically, there is one kernel mutex initialized that protects the message queue kernel code and data. The net-net of this is that applications running on multiprocessor platforms using message queues will not scale very well. This is changed in Solaris 2.6, which implements a finer-grained locking mechanism, allowing for greater concurrency. The improved message queue kernel module has been backported and is available as a patch for Solaris 2.5 and 2.5.1.
Figure 1 illustrates the general layout of things after initialization of the message queue module in complete, along with the kernel pointers described above.
Figure 1 - Kernel Message Queue Resources |
Kernel implementation
We'll walk through the kernel flow involved in the creation of a
message queue and the sending and receiving of messages, as these
represent the vast majority of message queue activity.
The creation of a message, on behalf of an application calling the
msgget
system call, starts with a call to the kernel
ipcget
routine. ipcget
is a generic
interface implemented in the kernel /kernel/misc/ipc
module. It is used by all the IPC "get" routines.
ipcget
examines the key value, and if it is IPC_PRIVATE
(which is defined as a value of zero in the kernel), the code
locates the next available ipc_perm
structure, in
preparation for creating a new message queue. There will be an
ipc_perm
structure available for every message queue
identifier (msgmni
). Once a structure has been allocated, the system initializes the structure members based on the UID and GID of the calling process. The permission mode bits get set based on values passed by the calling code, and finally the IPC_ALLOC bit gets set to indicate that the ipc_perm
structure has been allocated.
If the key is not equal to IPC_PRIVATE, the following psuedo-code illustrates the loop implemented:
set pointer to the first ipc_perm structure while (we're not at the end of ipc_perm structures) do if (match on key value) if (IPC_EXCL and IPC_CREAT are true) return (QUEUE EXISTS ERROR) else if (permissions do not allow) return (ACCESS ERROR) else return (success) done /* * When we get here, we've walked through all of the ipc_perm structures * and didn't find on match on the key. So we are probably creating a new * one that's not IPC_PRIVATE. */ if (IPC_CREAT flag is not TRUE) return (not found error) else allocate and initialize the next available entry return (success)Once the
ipcget
work is done, the remaining
msgget
initializes the rest of the
msqid_ds
structure members, such as the message header
pointer (to NULL, because there are no messages yet), the creator PID, and
byte fields, etc. At this point, the application code has a valid
message queue identifier and can send and receive messages, as well
as do message control (msgctl(2)
) operations.
A message send (msgsnd(2)
) call requires the
application to construct a message, setting a message type field
(discussed earlier) and creating the body of the message (e.g., a
text message).
The message send kernel support code does some general housekeeping
when the code path is first entered (such as incrementing the
processor statistics to indicate a message queue system call is
being executed, verifying access permissions to the message queue by
the calling process, and ensuring the message size does not exceed
the msgmax
tunable). The message type field is then
copied from the user address space to a designated area in the
kernel.
The rest of the message send flow is best represented in pseudo-code:
if (the message queue no longer exists) return (queue ID removed error) if (current bytes on queue + bytes in new message > msgmax) if (IPC_NOWAIT flag is true) return (error -- try again) else set MSGWAIT flag in msqid_ds.msg_perm.mode field set up a condition variable to wait for available space on the message queue. when wake-up received on condition variable check for space on the queue again /* * code will loop rechecking for space unless an * error condition occurs. moving beyond this point * means we have the space we need on the queue now */ grab space for the message from the resource map (msgmap) if (space currently not available AND IPC_NOWAIT is true) return (try again error) else set a condition variable and wait for an available map location /* * OK. we got this far, so we have the kernel resources we need */ copy the message data from user space to the allocated kernel space from the map. update the following members of the msqid_ds structure; increment the msg_qnum field add the appropriate byte count to the msg_cbytes field set the correct PID in the msg_lspid field set the correct time in msg_stime update the following fields in the message header; set the message type value in msg_type set the text size in msg_ts set the msg_spot pointer to point to the appropriate resource map location where the body of the message was stored adjust the queue pointers (msg_first, msg_last) appropriatelyAt this point, the message has been placed on the end of the designated message queue, and the return code is sent to the calling program.
The msgrcv
support code is a little less painful,
because now we're looking for a message on the queue (as opposed to
putting one on the queue). Kernel resources do not need to be
allocated for a msgrcv
.
The general flow of the kernel code path for receiving messages goes something like:
check permissions for operation loop through all the messages on the queue if (requested type = message type) copy the message type to the user supplied location copy the message data to the user supplied location update the msqid_ds structure fields; subtract the message size from msg_cbytes; set PID in msg_lrpid set time in msg_rtime free the message resources free the message header (msg structure) free the resource map entry if (looped through all messages, no matching type) return (no message error)That's basically what happens with a message receive. When completed, the application code will have the message type and data in a buffer area supplied in the
msgrcv(2)
system call.
The only remaining callable routine for applications to use is the
msgctl(2)
system call, which we discussed briefly
earlier in the column. The control functions are pretty
straightforward, as they typically involve either retrieving or
setting values in a message queues ipc_perm
structure.
When msgctl(2)
is invoked with the IPC_RMID flag,
meaning the caller wishes to remove the message queue from the
system, the kernel will walk the linked list of messages on the
queue, freeing up the kernel resources associated with each message.
Processes (threads) sleeping on the message queue will be sent a
wake-up signal and ultimately end up with an EIDRM error (ID
removed). The system will simply mark the msqid_ds
structure as being available and return.
Summary
That's it for IPC facilities. Next month we'll get into an area
on Solaris that generates a fair number of questions: swap space,
the swap file system, and swap allocation.
|
Resources
About the author
Jim Mauro is currently an Area Technology Manager for Sun
Microsystems in the Northeast area, focusing on server systems,
clusters, and high availability. He has a total of 18 years industry
experience, working in service, educational services (he developed
and delivered courses on Unix internals and administration), and
software consulting.
Reach Jim at jim.mauro@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-11-1997/swol-11-insidesolaris.html
Last modified: