Click on our Sponsors to help Support SunWorld

Ask the Experts

This month: Technical Q&A galore! Jim Mauro exceeds the limits of shared memory; Cameron Laird reveals the hottest Tcl/DOS innovations; and Adrian Cockcroft diagnoses the symptoms of mutex failure. Plus, have you checked out virtual_adrian lately?

October 1997

Mail this
article to
a friend

Send letters to sweditors@sunworld.com

To the Editors

Just wanted to say thanks for the great resource. Even as the network administrator for the company I work for, I learn a great deal every month from SunWorld and have quite a thick binder full of printouts of articles. Thanks and keep up the great publication!

Marius Strom
Network Administrator,
Twister Communications

"Shared memory uncovered," by Inside Solaris columnist Jim Mauro

Hi Jim,

You had question marks beside the shm_cnattch field in your Shared memory column. In other versions of shm, this has been the number of processes which have this shared memory attached and are still in memory, as opposed to shm_nattch which is the total number of attaches. This code has been used so that the shared memory can be swapped after shm_cnattch goes to zero.

Diane Fallier

Jim Mauro replies:

Hi Diane,

It doesn't look like the shm_nattch and shm_cnattch fields are used for any serious tracking in the kernel.

Instead, the reference count (amp->refcnt) field of the shared segment's corresponding anonymous memory structure (shm_amp) is used by the kernel to figure out what to do with allocated resources.

Obviously, this does not allow for the finer grained tracking you described, as it is effectively the total number of attaches, with no additional state information for determining if an attached process is currently in RAM or swapped out.

Jim

Exceeding the limits of (virtual) shared memory

Greetings Jim,

The article "Shared memory uncovered" in the September issue of SunWorld covered most of the very important concepts of shared memory handling on Solaris. But I still have some questions:

Am I right in assuming that "kernel memory" is the same as RAM? If not, could you please explain what "kernel memory" means?
Re: paragraph following Table 3, the amount of kernel memory required by the system to support shared memory: I didn't get how you arrived at 112.8 KB for 100 shmmni using the formula ((shmmni*112) + (shmmni*8)).
Further, could you please tell me what will happen if I exceed the limits on shared memory tunable parameters? What errors can I expect w.r.t OS?
Finally, I tried exhausting all the shared memory available on my on my sun4m, Solaris 2.5.1, 96-MB RAM system, and found I could go up to 540 MB before shmget started failing. If kernel memory were RAM it should have failed after 96 MB. Why did it reach 540 MB?

Name not given

Jim Mauro replies:

1) Not quite. Kernel memory is a subset of RAM. RAM refers to all of physical memory. Kernel memory refers to those pages of physical memory occupied by operating system (kernel) text, data, heap space, and stack.

2) Oopps! Looks like I made a mathematical error in the column. The system allocates shmid_ds structures based on the value of shmmni. Each structure is 112 bytes, plus 8 bytes for a kernel mutex lock. (100 * 112) + (100 * 8) = 11,200 + 800 = 12k.

3) During startup, the system checks the values of the tunables to ensure that they do not exceed allowable maximums. If they do, you'll see console messages like: "shminfo.shmseg limited to 32 K" or "shmsys: can't load module, too much memory requested."

4) Kernel memory for shared memory involves the supporting operating system routines for the system calls (e.g. shmget(2)) and maintaining the data structures and mappings for the processes.

When you use shared memory in code, it's treated by the operating system as anonymous memory -- just like calling malloc(3) or sbrk(2). The amount of shared memory you attach to gets charged against your process's total available virtual address space, which is about 4 GB total. Since it's virtual memory, it can exceed actual available physical memory (that's one of the benefits of virtual memory systems). As with any anonymous memory, you need available swap space for backing store before the allocation can complete.

One of two things happened in your example: either you hit your process's maximum available virtual address space (unlikely), or the system ran out of swap space (most likely). Try re-running your test, keeping an eye on "swap -s" output.

Thanks for writing. I hope this clears things up for you.

Jim

mmap() vs. shared memory interface

Dear Jim,

I personally find using mmap() interface simpler than System V shared memory. When using shared memory, the processes should agree on a key. This can be done by hard coding it into the programs. But a better way is to communicate by writing into a predecided file. If the processes agree on a file name for key definition, they can agree on a file name to mmap as well.

One facility that is not provided for mmap interface (or at least not documented in Solaris 2.5) that is available for shared memory is the intimate shared memory. When this is also provided, mmap can be used for cases like those mentioned in your article.

Coordinating concurrent access to a shared memory segment can be achieved by using semaphores. Sun implementation of pthread library provides a way for synchronization between threads belonging to different processes (in addition to between the threads of a process). We can place a mutex lock (or a semaphore or a conditional variable) in a shared memory segment and use the pthread locking functions. This is much easier to use than semaphores and can also be more efficient.

Sincerely,
Nitin Muppalaneni

Jim replies:

Hi Nitin,

Thank you for writing. Many folks, myself included, agree that mmap() has some advantages over shared memory for the reasons you described.

Shared memory has been available on Unix systems for many years, nearly 20 if memory serves. The mmap() interfaces came much later. Thus, there's a lot of code, and there are many programmers who use the shared memory interfaces. They've simply been around longer, and people tend to use what they're most comfortable with.

Check out my column (http://www.sunworld.com/swol-10-1997/swol-10-insidesolaris.html) in this issue for a detailed look at semaphores.

Jim

"Webmaster PhD: coming soon to a school near you," by Webmaster columnist Chuck Musciano

Chuck,

I just read your column about what a Webmaster needs to know to run a site effectively. Speaking as a person who makes a living designing and implementing Web sites and applications, I think you left out an important piece -- database design, implementation, and administration. Almost any nontrivial Web application we have developed uses database access for lookups to drive dynamic content or for storage of data entered by the user.

Otherwise, the column was quite thorough.

Kent Vidrine

Sun reverses course on Tcl products, by Cameron Laird

Cameron,

I've worked with Tcl on SunOS platform and have now changed companies. Here we use DOS- and OS-based systems. Will Tcl run in DOS? Will Tcl run in DOS under OS/2?

Steve Sharpe

Cameron replies:

Steve,

Yes and yes.

But there's probably a lot more you want to know. The central team at Sun which supports Tcl has given MS-DOS no attention for quite a while; they focus on Unix, 16- and 32-bit Windows, and MacOS. However, in recent years other enthusiasts have tinkered with MS-DOS versions. Go to ftp://ftp.neosoft.com/languages/tcl/sorted/distrib/tcl67dos.zip to download some of the more recent innovations.

"Will Tcl run in DOS under OS/2?" Well, sure, the ones I refer to above work equally well under OS/2, from what I know. If you'd rather have native OS/2 implementation, Ilya Zakharevich and Ilya Vaes offer this and more at ftp://ftp.math.ohio-state.edu/pub/users/ilya/os2/.

For more detailed information, read the FAQs at http://www.teraform.com/~lvirden/tcl-faq -- almost everything I've written is available there.

Cameron

"Understanding ATM networking and network layer switching, part one" by Connectivity columnist Rawn Shah

Rawn,

I am writing to you because we are having a problem with NFS.

We have two TCP/IP LANS connected by two IBM 2210 routers via a 2-Mbit/s CDN. In one of the LANs there is a Unix server, and the PCs that are on that. The LAN can perform any operations over its files. The PCs on the otherLAN can perform only a few operations (they can copy files, but they can't show a directory's content).

Could the reason be that NFS uses different ports for different file commands and the routers are not configured for all those ports? Have you any suggestions to cope with our problem?

Regards,
Enrico Ferretti

Rawn replies:

NFS and PCNFS use different RPC ports but not the actual operations themselves. The "dir" or "ls" command uses the same port as the "copy" or "cp" operations.

Several possible solutions here:

A. The routers are filtering some of the address/services; this isn't likely because the operations for copy and ls are on the same service port.

B. A more simple explanation: The directories being shared to one LAN have read, write, and execute permissions while the other has write permissions only. You have to check the permissions of the directory that you are looking at, the NFS "exports" file (or equivalent depending upon your version of Unix), and how they are mounted on the PCs in question. Typically on a Unix system if you have write and execute permissions for a directory you can open and write files into the area, but you can't do a directory listing with "ls." Another problem arises if you have read and write but not execute permissions; you can list the directory, but you can't change into that directory with "cd."

C. Check the "nfsstat" command on each PC where it isn't working and one where it is. If the "ls" command shows entries it's possible (but not likely) that the "ls" command isn't working right and a simple re-install could do the trick.

Rawn

MPOA inclusion

Hi Rawn,

In the MPOA subsection of September's column ("Understanding ATM networking and network layer switching, part two") you mentioned Cisco for Tag Switching, 3Com for Fast IP, and others. Are you aware of the fact that Newbridge Networks is the first and only one to support MPOA in its ATM switches?

Tushar Desai

Rawn replies:

Tushar,

Yes. Newbridge Networks is one of the first to incorporate MPOA directly into their system. We'll have to see how far an early start will carry them.

Rawn

Performance Q&A with Adrian Cockcroft

Virtual Adrian's SE Performance Toolkit update

Editor's note: We got tons of mail about Adrian's SE Performance Toolkit this month. It's obviously proved useful for more than a handful of SunWorld readers. You can download at http://www.sun.com/960601/columns/adrian/se2.5.html.

Hi Adrian,

I always enjoy your articles. I am trying to run your SE Toolkit V2.4 on an UltraSPARC running ODS. It has a Fast-Wide SCSI cards. As you know the network interface on these cards is referenced as hme.n and not le.n. The Ruletool (http://www.sun.com/950901/columns/adrian/column1.html), shows no network connection or disk partitions.

What do I need to configure to make the tool run accurately on systems with ODS and fast-wide SCSI cards?

Bassam Hamdan

Adrian replies:

Bassam,

The SE2.4 Toolkit is ancient. Use the SE2.5.0.2 version: http://www.sun.com/960601/columns/adrian/se2.5.html

Adrian

Toolkit installation without a C compiler

Hi Adrian,

I have installed the SE Performance Toolkit on dozens of machines over the past year, and I've written several little scripts which I think are pretty handy. Unfortunately, management at the places where I wrote them will not let me distribute them.

Anyway, today was the first time I've tried to install the toolkit on a machine that doesn't have a C compiler or C++ compiler. I've tried the "-n" option, and that doesn't seem to help. Percollator, pure_test, and all the others I've tried don't run.

Do I need to compile a different version? The machine I'm working with doesn't have enough space to install a C and C++ compiler package.

Thank you very much for writing this toolkit and making it available.

Rick Otten

Adrian replies:

Hi Rick,

You need /usr/ccs/lib/cpp that's all; it's the bundled compiler tools package, about 3 MB of stuff that comes with the OS.

 % grep cpp /var/sadm/install/contents ... /usr/ccs/lib/cpp f
none 0755 bin bin 84268 8035 827507878 SUNWsprot ...

I have a Java 1.1 browser for percollator data and some extended measurements in the latest version (when we eventually get SE3 for 2.6 to work it will ship) If you have any ideas or useful code fragments I'd like to have them.

Adrian

Performance tools: HTML access?

Have you heard of any HTML versions of any of your performance tools? If so where might I find them?

Michael Roark

Adrian replies:

Michael,

I have a Java browser for percollator data files, that's all. It's in final test and is jws2.0/java1.1 based.

It would be pretty easy to modify one of the example scripts to write its output with HTML formatting, and rewind/rewrite it for each interval. If you want something like zoom.se then start with live_test.se and either write its output to a file or run it from cgi-bin so it pushes a new version of the page out each time.

I don't have time to do this, but if you have a go I'll try and help, and I'll include your code in the release.

Adrian

Mutex failure

Hello,

I have recently been involved in a performance investigation on a SPARC 20 running Solaris 2.5.1 (no jumbo patch) Sun4m (generic). Your book and the SE toolkit have been very helpful. However, a couple of things are puzzling.

One is that we seem to have a larger than expected number of mutex failures. We have dual 150-MHz processors and the balance of mutex failures between them is very even (it isn't skewed toward one because of the clock). However, the combined mutex failure has been logged in the above-700 level (per the virtual_adrian.se script running during the major workload and beyond) with a reasonable amount of available CPU. At one point we had smtx of 780 with an idle rate of 12 percent -- 6 processes in the run queue. We have also had numbers like an smtx of 989 with six percent idle and only 3 processes in the run queue.

The bottom line is that we do have crunch periods that lead to considering more processors. However, I am concerned that something is off with our mutex situation that may be contributing to the problem and could get worse with more processors. We also do not have the 2.5.1 "jumbo" patch installed although we are looking at it (vs. going straight to 2.6) for other reasons. Is there anything in the unpatched 2.5.1 kernel that is prone to this kind of problem? Or am I just interpreting the numbers wrong?

On a separate subject, is there any way to match up your recommendations regarding iostat values and the output of vxstat for SPARCarrays? Your recommendations are couched in iostat terms, yet iostat on a SPARCarray keys into the underlying physical disks and vxstat does not give the same statistics.

Often (apparently due to the internal activity of the array) the physical disks will exceed the recommended iostat thresholds when there is little system activity directed toward the volumes themselves. Are there data structures available to SE that could provide the data that vxstat and iostat leave out? What would really be handy would be an iostat that would produce iostat statistics, but based on SPARCarray volumes. A structure like that would even facilitate comparisons between RAID and non-RAID configurations for the same load.

Is this something you might consider tying together in a future column?

Thank you for your time.

Name not given

Adrian replies:

Mutex is only a problem if the proportion of system time shoots up as the mutex rate increases. You seem to be borderline, and I doubt if it's a big problem at present. Upgrading to 2.6 will help; 2.5.1 patches probably won't.

Merging vxstat and iostat would be good. We did this with DiskSuite, and it's an area I hope to do some work on soon.

Adrian

Disk slicing: fast and cheap

Adrian,

I enjoyed your Performance Book (http://www.sun.com/books/catalog/Cockcroft/Cockcroft.html) a lot.

A quick question: I am setting up some file systems ( 2-GB to 12-GB sizes) using SDS 4.1, for disks in SSAs. I am setting them up as mirrors with UFS logging. The SA-356 course book suggests two ways:

Set up data slice and log slice on two different disks.
Set up data slice and log slice from same disk if using striping. (page 5-38)

If I used the same disk and two slices, one for data and other for logging, wouldn't that cause a lot of head travel as the log slice would always be written to, and it is at one end of the disk?

Instead, what if I split the disk into three slices, with a small slice in center of disk and two equal slices on the inside and outside? If I use the center slice for logging and create a stripe using the other two, won't that reduce the head travel? Is this significant or am I adding complexity for no gain?

Thanks,
Suheb Farooqi

Adrian replies:

Suheb,

I've never tried this trick, it should be better than putting the log at one end of the disk but not that much better. Pick a dedicated log disk, dedicate it to logs only -- don't make a mounted filesystem on the rest of the disk as even an idle filesystem gets updated. I made a 100-MB log on a 4.3-GB disk and combined it with a four-way stripe (on a different UltraSCSI bus), and that combination really screamed.

If you want to be safer (mirroring etc.) you will get a performance hit in any case. If you also want performance you need extra disks so you can keep the log separate while still having enough capacity.

Remember, you can be fast, cheap or safe, not all three at once.

Adrian

Editor's note: See Adrian Cockcroft's ever-growing FAQ sheet for in-depth answers to more than three dozen performance-related questions: http://www.sun.com/sunworldonline/common/cockcroft.letters.html

Click on our Sponsors to help Support SunWorld

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-10-1997/swol-10-letters.html
Last modified:

Comments:
Name:
Email:
Company Name: