|
Ask the ExpertsThis month: Technical Q&A galore! Jim Mauro exceeds the limits of shared memory; Cameron Laird reveals the hottest Tcl/DOS innovations; and Adrian Cockcroft diagnoses the symptoms of mutex failure. Plus, have you checked out virtual_adrian lately? |
Mail this article to a friend |
Marius Strom
Network Administrator,
Twister Communications
You had question marks beside the shm_cnattch
field in
your Shared memory column. In other versions of shm, this has been the
number of processes which have this shared memory attached and are
still in memory, as opposed to shm_nattch
which is the
total number of attaches. This code has been used so that the shared
memory can be swapped after shm_cnattch
goes to zero.
Diane Fallier
Jim Mauro replies:
Hi Diane,
It doesn't look like the
Instead, the reference count (amp->refcnt) field of the shared
segment's corresponding anonymous memory structure
(
Obviously, this does not allow for the finer grained tracking you
described, as it is effectively the total number of attaches, with
no additional state information for determining if an attached
process is currently in RAM or swapped out.
Jim
shm_nattch
and
shm_cnattch
fields are used for any serious tracking
in the kernel.
shm_amp
) is used by the kernel to figure out what to
do with allocated resources.
Exceeding the limits of (virtual) shared memory
Greetings Jim,
The article "Shared memory uncovered" in the September issue of SunWorld covered most of the very important concepts of shared memory handling on Solaris. But I still have some questions:
shmmni
using the formula
((shmmni
*112) + (shmmni
*8)).
shmget
started failing. If
kernel memory were RAM it should have failed after 96 MB.
Why did it reach 540 MB?Name not given
Jim Mauro replies:
1) Not quite. Kernel memory is a subset of RAM.
RAM refers to all of physical memory. Kernel memory refers to those pages of
physical memory occupied by operating system (kernel) text, data, heap space,
and stack.
2) Oopps! Looks like I made a mathematical error
in the column. The system allocates
3) During startup, the system checks the values of the tunables to
ensure that they do not exceed allowable maximums. If they do,
you'll see console messages like: "
4) Kernel memory for shared memory involves the supporting operating
system routines for the system calls (e.g.
When you use shared memory in code, it's treated by the operating
system as anonymous memory -- just like calling
One of two things happened in your example: either you hit your
process's maximum available virtual address space (unlikely), or
the system ran out of swap space (most likely). Try re-running your
test, keeping an eye on "swap -s" output.
Thanks for writing. I hope this clears things up for you.
Jim
shmid_ds
structures
based on the value of shmmni
. Each structure is 112 bytes,
plus 8 bytes for a kernel mutex lock. (100 * 112) + (100 * 8) =
11,200 + 800 = 12k.
shminfo.shmseg
limited to 32 K" or "shmsys
: can't load module, too
much memory requested."
shmget(2)
)
and maintaining the data structures and mappings for the processes.
malloc(3)
or sbrk(2)
. The amount of
shared memory you attach to gets charged against your process's
total available virtual address space, which is about 4 GB total.
Since it's virtual memory, it can exceed actual available physical
memory (that's one of the benefits of virtual memory systems). As
with any anonymous memory, you need available swap space for
backing store before the allocation can complete.
mmap()
vs. shared memory interface
Dear Jim,
I personally find using mmap()
interface simpler
than System V shared memory. When using shared memory, the
processes should agree on a key. This can be done by hard coding it
into the programs. But a better way is to communicate by
writing into a predecided file. If the processes agree on a
file name for key definition, they can agree on a file name
to mmap
as well.
One facility that is not provided for mmap
interface (or at least not documented in Solaris 2.5) that is
available for shared memory is the intimate shared memory. When this
is also provided, mmap
can be used for cases
like those mentioned in your article.
Coordinating concurrent access to a shared memory segment can be
achieved by using semaphores. Sun implementation of pthread
library provides a way for synchronization between threads belonging to
different processes (in addition to between the threads of a process). We can
place a mutex lock (or a semaphore or a conditional variable) in a shared
memory segment and use the pthread
locking functions. This is
much easier to use than semaphores and can also be more efficient.
Sincerely,
Nitin Muppalaneni
Jim replies:
Hi Nitin,
Thank you for writing. Many folks, myself included, agree that
Shared memory has been available on Unix systems for many years,
nearly 20 if memory serves. The
Check out my column (http://www.sunworld.com/swol-10-1997/swol-10-insidesolaris.html) in this
issue for a detailed look at semaphores.
Jim
mmap()
has some advantages over shared memory for
the reasons you described.
mmap()
interfaces came
much later. Thus, there's a lot of code, and there are many programmers
who use the shared memory interfaces. They've simply been around
longer, and people tend to use what they're most comfortable with.
I just read your column about what a Webmaster needs to know to run a site effectively. Speaking as a person who makes a living designing and implementing Web sites and applications, I think you left out an important piece -- database design, implementation, and administration. Almost any nontrivial Web application we have developed uses database access for lookups to drive dynamic content or for storage of data entered by the user.
Otherwise, the column was quite thorough.
Kent Vidrine
I've worked with Tcl on SunOS platform and have now changed companies. Here we use DOS- and OS-based systems. Will Tcl run in DOS? Will Tcl run in DOS under OS/2?
Steve Sharpe
Cameron replies:
Steve,
Yes and yes.
But there's probably a lot more you want to know. The central
team at Sun which supports Tcl has given MS-DOS no attention for
quite a while; they focus on Unix, 16- and 32-bit Windows, and
MacOS. However, in recent years other enthusiasts have tinkered
with MS-DOS versions. Go to ftp://ftp.neosoft.com/languages/tcl/sorted/distrib/tcl67dos.zip to
download some of the more recent innovations.
"Will Tcl run in DOS under OS/2?" Well, sure, the ones I
refer to above work equally well under OS/2, from what I know. If you'd rather
have native OS/2 implementation, Ilya Zakharevich and
Ilya Vaes offer this and more at ftp://ftp.math.ohio-state.edu/pub/users/ilya/os2/.
For more detailed information, read the FAQs at http://www.teraform.com/~lvirden/tcl-faq
-- almost everything I've written is available there.
Cameron
I am writing to you because we are having a problem with NFS.
We have two TCP/IP LANS connected by two IBM 2210 routers via a 2-Mbit/s CDN. In one of the LANs there is a Unix server, and the PCs that are on that. The LAN can perform any operations over its files. The PCs on the otherLAN can perform only a few operations (they can copy files, but they can't show a directory's content).
Could the reason be that NFS uses different ports for different file commands and the routers are not configured for all those ports? Have you any suggestions to cope with our problem?
Regards,
Enrico Ferretti
Rawn replies:
NFS and PCNFS use different RPC ports but not the actual
operations themselves. The "dir" or "ls" command uses the same port
as the "copy" or "cp" operations.
Several possible solutions here:
A. The routers are filtering some of the address/services; this isn't likely
because the operations for copy and ls are on the same service port.
B. A more simple explanation: The directories being shared to one
LAN have read, write, and execute permissions while the other has write
permissions only. You have to check the permissions of the
directory that you are looking at, the NFS "exports" file (or
equivalent depending upon your version of Unix), and how they are
mounted on the PCs in question. Typically on a Unix system if you
have write and execute permissions for a directory you can open and
write files into the area, but you can't do a directory listing
with "ls." Another problem arises if you have read and write but not
execute permissions; you can list the directory, but you can't
change into that directory with "cd."
C. Check the "nfsstat" command on each PC where it isn't working and
one where it is. If the "ls" command shows entries it's possible (but
not likely) that the "ls" command isn't working right and a simple re-install
could do the trick.
Rawn
MPOA inclusion
Hi Rawn,
In the MPOA subsection of September's column ("Understanding ATM networking and network layer switching, part two") you mentioned Cisco for Tag Switching, 3Com for Fast IP, and others. Are you aware of the fact that Newbridge Networks is the first and only one to support MPOA in its ATM switches?
Tushar Desai
Rawn replies:
Tushar,
Yes. Newbridge Networks is one of the first to incorporate MPOA directly into
their system. We'll have to see how far an early start will carry them.
Rawn
Editor's note: We got tons of mail about Adrian's SE Performance Toolkit this month. It's obviously proved useful for more than a handful of SunWorld readers. You can download at http://www.sun.com/960601/columns/adrian/se2.5.html.
Hi Adrian,
I always enjoy your articles. I am trying to run your SE Toolkit V2.4 on an UltraSPARC running ODS. It has a Fast-Wide SCSI cards. As you know the network interface on these cards is referenced as hme.n and not le.n. The Ruletool (http://www.sun.com/950901/columns/adrian/column1.html), shows no network connection or disk partitions.
What do I need to configure to make the tool run accurately on systems with ODS and fast-wide SCSI cards?
Bassam Hamdan
Adrian replies:
Bassam,
The SE2.4 Toolkit is ancient. Use the SE2.5.0.2 version:
http://www.sun.com/960601/columns/adrian/se2.5.html
Adrian
Toolkit installation without a C compiler
Hi Adrian,
I have installed the SE Performance Toolkit on dozens of machines over the past year, and I've written several little scripts which I think are pretty handy. Unfortunately, management at the places where I wrote them will not let me distribute them.
Anyway, today was the first time I've tried to install the toolkit on a machine that doesn't have a C compiler or C++ compiler. I've tried the "-n" option, and that doesn't seem to help. Percollator, pure_test, and all the others I've tried don't run.
Do I need to compile a different version? The machine I'm working with doesn't have enough space to install a C and C++ compiler package.
Thank you very much for writing this toolkit and making it available.
Rick Otten
Adrian replies:
Hi Rick,
You need
I have a Java 1.1 browser for percollator data and some extended
measurements in the latest version (when we eventually get SE3 for
2.6 to work it will ship) If you have any ideas or useful code
fragments I'd like to have them.
Adrian
/usr/ccs/lib/cpp
that's all; it's
the bundled compiler tools package, about 3 MB of stuff that comes
with the OS.
% grep cpp /var/sadm/install/contents ... /usr/ccs/lib/cpp f
none 0755 bin bin 84268 8035 827507878 SUNWsprot ...
Performance tools: HTML access?
Have you heard of any HTML versions of any of your performance tools? If so where might I find them?
Michael Roark
Adrian replies:
Michael,
I have a Java browser for percollator data files, that's all. It's in
final test and is jws2.0/java1.1 based.
It would be pretty easy to modify one of the example scripts to
write its output with HTML formatting, and rewind/rewrite it for
each interval. If you want something like zoom.se then start with
live_test.se and either write its output to a file or run it from
cgi-bin so it pushes a new version of the page out each time.
I don't have time to do this, but if you have a go I'll try and help,
and I'll include your code in the release.
Adrian
Mutex failure
Hello,
I have recently been involved in a performance investigation on a SPARC 20 running Solaris 2.5.1 (no jumbo patch) Sun4m (generic). Your book and the SE toolkit have been very helpful. However, a couple of things are puzzling.
One is that we seem to have a larger than expected number of mutex failures. We have dual 150-MHz processors and the balance of mutex failures between them is very even (it isn't skewed toward one because of the clock). However, the combined mutex failure has been logged in the above-700 level (per the virtual_adrian.se script running during the major workload and beyond) with a reasonable amount of available CPU. At one point we had smtx of 780 with an idle rate of 12 percent -- 6 processes in the run queue. We have also had numbers like an smtx of 989 with six percent idle and only 3 processes in the run queue.
The bottom line is that we do have crunch periods that lead to considering more processors. However, I am concerned that something is off with our mutex situation that may be contributing to the problem and could get worse with more processors. We also do not have the 2.5.1 "jumbo" patch installed although we are looking at it (vs. going straight to 2.6) for other reasons. Is there anything in the unpatched 2.5.1 kernel that is prone to this kind of problem? Or am I just interpreting the numbers wrong?
On a separate subject, is there any way to match up your
recommendations regarding iostat
values and the output
of vxstat
for SPARCarrays? Your recommendations are
couched in iostat
terms, yet iostat
on a
SPARCarray keys into the underlying physical disks and
vxstat
does not give the same statistics.
Often (apparently due to the internal activity of the array) the
physical disks will exceed the recommended iostat
thresholds when there is little system activity directed toward the
volumes themselves. Are there data structures available to SE that
could provide the data that vxstat
and
iostat
leave out? What would really be handy would be
an iostat
that would produce iostat
statistics, but based on SPARCarray volumes. A structure
like that would even facilitate comparisons between RAID and
non-RAID configurations for the same load.
Is this something you might consider tying together in a future column?
Thank you for your time.
Name not given
Adrian replies:
Mutex is only a problem if the proportion of system time shoots up
as the mutex rate increases. You seem to be borderline, and I doubt if it's a
big problem at present. Upgrading to 2.6 will help; 2.5.1 patches probably
won't.
Merging
Adrian
vxstat
and iostat
would be good. We did this with DiskSuite, and it's an area I hope to do some work on soon.
Disk slicing: fast and cheap
Adrian,
I enjoyed your Performance Book (http://www.sun.com/books/catalog/Cockcroft/Cockcroft.html) a lot.
A quick question: I am setting up some file systems ( 2-GB to 12-GB sizes) using SDS 4.1, for disks in SSAs. I am setting them up as mirrors with UFS logging. The SA-356 course book suggests two ways:
If I used the same disk and two slices, one for data and other for logging, wouldn't that cause a lot of head travel as the log slice would always be written to, and it is at one end of the disk?
Instead, what if I split the disk into three slices, with a small slice in center of disk and two equal slices on the inside and outside? If I use the center slice for logging and create a stripe using the other two, won't that reduce the head travel? Is this significant or am I adding complexity for no gain?
Thanks,
Suheb Farooqi
Adrian replies:
Suheb,
I've never tried this trick, it should be better than putting the
log at one end of the disk but not that much better. Pick a
dedicated log disk, dedicate it to logs only -- don't make a mounted
filesystem on the rest of the disk as even an idle filesystem gets
updated. I made a 100-MB log on a 4.3-GB disk and combined it with a
four-way stripe (on a different UltraSCSI bus), and that combination
really screamed.
If you want to be safer (mirroring etc.) you will get a performance
hit in any case. If you also want performance you need extra disks
so you can keep the log separate while still having enough capacity.
Remember, you can be fast, cheap or safe, not all three at once.
Adrian
Editor's note: See Adrian Cockcroft's ever-growing FAQ sheet for in-depth answers to more than three dozen performance-related questions: http://www.sun.com/sunworldonline/common/cockcroft.letters.html
|
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-10-1997/swol-10-letters.html
Last modified: