Click on our Sponsors to help Support SunWorld

Performance Q&A Compendium

System integration drives performance improvements

December 1995

Abstract

This month we cover a diverse and plentiful collection of performance-related questions. Subjects covered include performance monitoring commands, tuning variables, logins and processes, how to interpret the output of performance measurements, and how to optimize web servers and news servers. (3,800 words)

Mail this
article to
a friend

During the few months I've been writing this column, readers have submitted numerous worthwhile questions. So this month, rather than zeroing in on one or two particular questions, I'm going to tackle a handful of queries that affect a wide variety of administrators and users who are grappling with performance-related issues.

Here's a list of the questions covered:

How do I interpret the w column in vmstat?
How do I tune the Solaris kernel?
How can I time-out orphaned processes in Solaris?
What causes slow rlogin?
Any performance tuning hints for Solaris 2.5?
Why do some login IDs in SunOS 4.1 accounting files change?
Why doesn't my virtual-memory monitoring program add up?
Are kernel memory allocation errors worth worrying about?
How can I improve my Web server's http performance?
Is there a viable alternative to se?
Does Solaris offer a vmtune-like tool?
Why are my news spool disks overloaded?

How do I interpret the w column in vmstat?

Q:
We have a SPARCserver 1000E/Solaris 2.4 with four CPUs and 610 megabytes of RAM as a dedicated Sybase server. vmstat 5 is used to monitor the system at all times. Recently, the third column 'w' of 'procs' in vmstat's output started to report a value of around 20 and rarely changed. This value shows up again even after rebooting the system many times during the past month. This seems to indicate we have a memory shortage because swapping occurred. But my questions are:

Why does the free swap space still show a big value (i.e., 201356) indicating we have plenty of it? (Physical swap space is 100 megabytes)
Why does the value in 'w' column stay the same regardless the load on the system? (Ten users and 300 users produce the same value.)
Sun tech support tells me there is no problem on our system as long as it runs OK, but Mike Loukides's book System Performance Tuning tells me that when swapping occurs, my sysadmin needs to find the problem because it may be the tip of the iceberg. To whom should I listen?
Sun tech support tells me swapping and paging are the same thing. I disagree. Who is right?
Sun tech support tells me that we should have a swap space at least as big as our memory size. Again, I disagree -- based on your book and my own experience. Who's right?

--(name and firm indeterminate)

A:
vmstat w reports the number of processes that are currently swapped out. Those 20 processes all are idle ones. This is not a performance problem. The Loukides book is rather out of date in places, and is not particularly relevant to Solaris 2.

If you run vmstat -S and see lots of si and so, you might have a problem. Here's a reminder of what vmstat -S looks like:

vmstat -S 5
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  si  so pi po fr de sr f0 s0 s1 s2   in   sy   cs us sy id
 0 0 0 137392 15608   0   0  2  2  5  0 55  0  0  0  0  132  319   69  2  1 98

Swapping moves whole processes to the swap space, and paging is done a page at a time. Page-outs occur in large clusters, so the net effect is not all that different.

Swap space size is not a performance issue. If you have enough to run your apps reliably without running out at peak loads then you should be happy. If you want to collect crash dumps you might need more. That is one reason why SunService recommends setting swap equal to RAM.

Advertisements

How do I tune the Solaris kernel?

Q:
We are veteran Interactive users and used to tuning the kernel using kconfig (mtune and stune files). We are now porting to Solaris x86 (Base Server) and need to be able to make equivalent tuning changes. In particular, we need to increase the various values associated with IPC queue (MSGMAX, MSGTQL, etc.). We have found one cryptic way to do this by hacking at system(4). Is there a better way and is there any comprehensive documentation source on tuning kernel parameters under Solaris x86? Thanks.

--Ken Robbins, firm indeterminate

A:
The "better way" involves editing /etc/system.

The performance manual section offers little help, but does list some parameters. My book (Sun Performance and Tuning) contains more details, including the algorithms that are being tuned.

Your question is a common one. I will probably address the question "What is the list of tunables in Solaris?" in a future column. There is no easy answer, unfortunately.

How can I time-out orphaned processes in Solaris?

Q:
At Brown & Root, we run both Solaris and AIX servers. On all servers, we have Oracle as our database. On occasion, some clients' Oracle processes remain active even after they have logged off. In AIX, we have found two parameters, tcp_keepidle and tcp_keepalive, that help us timeout these orphaned processes. Is there anything comparable in Solaris?

--Jacques Dejean, Brown & Root

A:
Your looking for the Solaris ndd command, find a description of it and the values it can be assigned in appendix E of TCP/IP Illustrated, Volume 1 by W. Richard Stevens. This book is also a complete reference to TCP/IP and how it works.

Make sure you understand the implications of any TCP tweaks. You can easily mess up the standard algorithm if you set it up wrong.

What causes slow rlogin?

Q:
What are likely causes of extremely slow rlogin both to and from a machine? The machine in question is seldom busy. It takes about 60 seconds to do rlogin from or to the machine. Once rlogin is completed, response is fine.

--Mike Kelly, firm indeterminate

A:
Check for:

Incorrect routing setup -- use ping -sRv to check route to NFS servers, etc.
NIS, NIS+ or DNS server problems.
Automount or NFS server problems.
Bad directories in set path= in .cshrc or similar files.
Symbolic links in home directory to /net/system/somewhere.

To diagnose this problem, use etherfind or snoop (Solaris 2) on a third system, capture all packets in and out of the slow machine, and look at the sequence and timestamps to see which part of the sequence is taking a long time.

I get this problem myself, normally due to routing foul-ups. It may also help to put "file" at the start of the name server lookup path for hosts and password. I use "file nisplus dns" for hosts in /etc/nsswitch.conf, as I find that the system boots much more quickly if it looks up system identities for its main routers and servers in the /etc/hosts file.

Any performance tuning hints for Solaris 2.5?

Q:
I'm hoping that when Solaris 2.5 comes out you can dedicate an article or a series of articles to the improvements and kernel /etc/system parameters that should or should not be set for 2.5. In reading your book, you gave different hints for different types of systems (i.e., servers vs. hosts), and the hints varied depending upon the version of Solaris being used. I'm guessing that when 2.5 comes out, it'll be different from 2.4, so it'd be nice to know what changes have been made and what performance tuning hints are applicable for 2.5.

--Blair Zajac, firm indeterminate

A:
Since I'm writing this before Solaris 2.5 is officially released, I can't offer much guidance yet. I'll cover tunables soon. It takes a while to figure out how to tweak a new OS release. There are a few new NFS V3 variables. The rest is basically identical to 2.4.

Editor's Note: Solaris 2.5 was announced at the end of October. Shipment for the SPARC and Intel versions of the new OS has just begun; Solaris for PowerPC recently entered beta testing and is expected to ship early next year.

Why do some login IDs in SunOS 4.1 accounting files change?

Q:
I hope you can help me out. Lately, I had to look at the /usr/adm/acct/fiscal/fiscrptxx files. I found that some login IDs had two entries per file, while other login IDs had one.

Can you tell me what the problem is, or at least give me a hint? I need to use the files for performance evaluation purposes. Do I have to add up the entries corresponding to a given login ID per file?

--Halim M. Khelalfa, AI Division, CERIST

A:
I'm not sure, and I haven't used accounting on 4.1.3 for many years. One guess: Perhaps some users changed their group ID, keeping the same user ID, during the month.

Why doesn't my virtual memory monitoring program
add up?

Q:
I now have a hard copy of your System Performance Monitoring article and will read it soon. First, I am going to take advantage of your offer to answer questions about this subject.

One of my personal monitoring programs presents physical memory utilization, which I calculate based on the following method. I believe it works and the assumptions are correct, but I'd like your opinion on its accuracy.

First I get some static facts:

V = total virtual memory size (everything is in kilobytes)

R = total real (physical) memory size

Next, I get 1-second snapshots of transient facts:

A = allocated (in-use) virtual memory

F = free (available) virtual memory, in the form of free resident memory pages (now in the physical memory)

IF ( A + F ) >= R

THEN U = 100%

ELSE U = (100% * A) / (R - F)

There are cases when A < R but I report 100 percent because of the free pages that inhabit physical memory, forcing some allocated pages to be swapped out. I am not concerned about this because there will be (at least a potential for) thrashing, and that's what 100-percent physical memory utilization is supposed to indicate. What I am concerned about is when ( A + F ) <= R yet there is a potential for thrashing -- and I don't know why -- because there is something missing from my equations.

Notes:

Sun does not present non-zero "avm" (active virtual memory) values from a vmstat report, so I must get A from pstat -s. The V and R values are from dmesg.
When I asked a Sun performance person why that was so, I got led off on a tangent about how unnecessary my calculations were. ("Why do you want to know the physical memory utilization?") I am hoping that your answer is more to the point.

--Alex Vrenios, EMTEK Health Care Systems

A:
Look at my SunWorld Online column entitled "Help! I've lost my memory!" Then it may become clear why your calculation does not work. The VM system is far more complex than your simple equation presumes. I don't think the available data is sufficient to model memory use. In particular, the only data available on a per-process basis is the size of the address space for the process, and the amount that has valid memory mappings. These values can be seen (measured in kilobytes) via the old-style ps command, in the SZ (process size) and RSS (process resident set size) fields:

% /usr/ucb/ps uax
USER       PID %CPU %MEM   SZ  RSS TT       S    START  TIME COMMAND
root      2026  3.0  2.1 1424 1284 pts/6    O 23:14:29  0:01 /usr/ucb/ps uax
adrianc   2021  0.7  4.1 3444 2500 ??       S 23:14:26  0:00 /usr/openwin/bin/c
adrianc   1785  0.6 11.110048 6840 console  S 20:50:55  1:12 /usr/openwin/bin/X
adrianc   2024  0.3  1.4  980  856 pts/6    S 23:14:27  0:00 /bin/csh
...

Unfortunately for your calculation, the RSS excludes pages that are in memory but do not have valid mappings, and it includes pages that are shared by other processes. Your calculation also doesn't consider the memory used by files that are cached. To obtain this data, kernel code would have to be written that traverses many data structures and tallies the pages. This is not available in the base release, or in any commercial performance tools that I am aware of.

I think it would be useful to have more information about memory usage, and it is on my list of things I'd like to see added to Solaris.

Are kernel memory allocation errors worth worrying
about?

Q:
While recently monitoring a SPARCcenter 2000E with RuleTool, I noted that the system was regularly experiencing kernel memory allocation errors. I tried to find some info on the seriousness of this, but wasn't able to find much other than it possibly being caused by a memory leak. A call to SunService seemed to indicate that as long as the frequency was very low (it was, approx. 5 per day) it wasn't a cause for concern.

I would like more info on this in a future article (or in your next book Performance Tuning: The Sequel). I'd like to congratulate you on your book and I assume it's doing well considering it fills a void that's been around for years. (I purchased two copies myself, and have influenced several others in purchasing the book.)

--Greg Wells, firm indeterminate

A:
If the system can't grab memory when it needs it and can't wait, then you can get problems like a stream or login attempt failing. There are other reasons why allocation failures occur; in most cases, the system finds a way to retry the operation and succeed.

This problem happens mostly on Solaris 2.4 and 2.5 multiprocessor systems, not so often on Solaris 2.3 or uniprocessor systems. If you see kmem allocation errors (sar -k 1), then increase the free list so that it is less likely to hit the endstop.

Set lotsfree to 128 * the number of CPUs you have or set up virtual_adrian.se to run every time you reboot and it will set this for you.

Adding more RAM doesn't help, as the free list size is not scaled. As you can see below, I've had a few errors on my Ultra 1, but I take this as a warning, not a serious worry. It is useful to track it in case something else fails at the same time as a new kmem error.

% sar -k 1

SunOS eccles 5.5 Generic sun4u    11/05/95

23:30:18 sml_mem   alloc  fail  lg_mem   alloc  fail  ovsz_alloc  fail
23:30:19 4046848 3611540     0 7536640 6492776     8     5373952     0

How can I improve my Web server's http performance?

Q:
My issue deals with a problem I have been seeing on more and more Solaris 2.4 systems running as WWW servers.

First, I do realize that the http protocol was not designed to work with TCP/IP. In fact, it butchers it, but since it's a growing phenomena, we need to tune the system for it!

Now, the problem I have been seeing. When dialup users connect to these WWW servers via SLIP/PPP, Solaris apparently drops a lot of packets, and a lot of retransmissions are occurring as shown from the results of the netstat -s command.

What I discovered is that the default setting of 200 for tcp_rexmit_interval_min is too low. Setting this up to 10000 finally gives good performance results. However, as you are well aware, this will increase the amount of time the system waits before a retransmission takes place after a packet is dropped. Catch-22! ;)

I also noted that the listen backlog parameter set by ndd: tcp_conn_req_max is set to 5 and allows a maximum value of 32.

How can I optimize Solaris 2.4 to perform well as a WWW server? More and more clients are asking me to improve WWW performance on Sun.

--Boni Bruno, Data Systems West

A:
There is an excessive retransmit bug that is fixed in Solaris 2.4 patch 101945-34 (Sun's recently released kernel jumbo patch) and Solaris 2.5. You will still see retransmit levels of 10-30% on machines with direct Internet connections. You can reduce them by setting the initial retransmit interval to a second or so (1000, as the units are ms). Most packets seems to take at least a second to get to their destination and get an acknowledgement back over the Internet! You should not set it to much more than a second.

The limit for tcp_conn_req_max should be set to 32 in 2.4, and can be set up to 1024 with ndd in 2.5 if you have enough memory to hold all those pending connections. A setting of 128 seems to work well on Solaris 2.5, and is being used on some big internet sites. Add these lines to /etc/init.d/inetinit

ndd -set /dev/tcp tcp_rexmit_interval_initial 1000
ndd -set /dev/tcp tcp_conn_req_max 32

We also have fast name service caching in 2.5, so DNS (Domain Name System) lookups get cached (see the nscd man page). In general 2.5 is a much faster Internet server than 2.4, even though there are several areas where tuning work is still underway.

Is there a viable alternative to SE?

Q:
Could you please hint as to where I might find a discussion of how to interrogate the device drivers without using SE?

--Gal Bar-or, firm indeterminate

A:
The SE toolkit provides direct access to many of the data sources in the kernel. The primary commands you can use are:

kstat - the kernel statistics interface - man -s3k kstat
/proc - the process statistics interface - man -s4 proc
kmem - direct kernel access - man -s3k kvm_read
network - protocol information - netstat -s
probes - new in Solaris 2.5 - man prex

Does Solaris offer a vmtune-like tool?

Q:
I've recently started using Suns. With Sequent Dynix/ptx (based on AT&T V3.2), a vmtune utility controls virtual memory (VM) management.

Does Solaris have a vmtune-like virtual memory tool?
Is there Maximum Resident Set size for each process on Solaris? If so, how can one modify this value?
Is there "swapout" on Solaris? In my environment, there is no swapping and free memory is always more than 25 megabytes. I think Solaris was designed to avoid swapping. If Solaris doesn't avoid swapping, when does swap occur?

--MyungSuk Yoo, Bombardier Regional Aircraft

A:
There are no controls on resident set size per process in any of the mainstream versions of Unix. Why not? Well, it's hard to get a default behavior that works any better than the current system over a wide range of system sizes and workloads. Also, implementing a working set pager requires a lot more overhead, in terms of both CPU use and kernel data storage.

In Solaris 2.4, swapouts of large idle processes occur if free memory stays well below its normal level for several seconds.

Why are my news spool disks overloaded?

Q:
I am running a news server and I am getting very poor performance from it. It is running on a SPARCserver 1000 with 640 megabytes of RAM. The news software (INN) resides on /opt (sd1) and Solaris 2.4 resides on sd0. iostat -x 30 indicates that at least one of my bottlenecks can be attributed to my disks, primarily the spool. I am striping 3 disks (sd15 sd37 sd7) using Online DiskSuite. The stripe has an interlace value of 16 blocks.

Below is some of the output from iostat -x 30. As you can see, most of the load is caused by writes to the spool.

disk      r/s  w/s   Kr/s   Kw/s wait actv  svc_t  %w  %b 
sd0       0.0  3.3    0.0   19.9  0.0  0.1   34.6   0   5 
sd1       0.0 15.7    0.0   99.3  0.0  0.4   24.7   0  39 
sd15      1.4 18.0    3.6   97.9 45.1 46.9 4737.6  44  79 
sd37      0.7 16.5    3.2   99.9 10.5  7.1 1025.1   9  22 
sd7       0.5 16.5    2.7   98.7  9.7  6.8  972.8   9  20 
                                 extended disk statistics 
disk      r/s  w/s   Kr/s   Kw/s wait actv  svc_t  %w  %b 
sd0       0.0  3.8    0.0   24.6  0.0  0.1   38.2   0   5 
sd1       0.0 15.9    0.0  101.1  0.0  0.4   23.9   0  38 
sd15      1.1 17.5    2.3  100.3 14.0 54.2 3656.0  36  73 
sd37      0.8 15.5    3.3   96.9  9.0  6.6  961.8   8  21 
sd7       0.6 15.5    3.2   97.0  8.9  6.1  929.2   8  18

--(name and firm indeterminate)

A:
Those disks are dead meat! A slow service time is 50 milliseconds; 4737 ms is glacier-like speed. As you can see, there are 47 active commands inside the disk drive, and 45 commands waiting to be sent to the drive. Each new command you send to the drive has to wait for 92 other commands to finish first. Thus it takes almost 5 seconds to service each I/O. Dividing down, 4737 ms/92 commands = 51 ms for each I/O at the disk drive. This indicates a lot of long seeks -- probably random seeks between inodes and data in many parts of the disk drive.

The problem is lots of files being created, touched, and destroyed; lots of inode updates; and Directory Name Lookup Cache (DNLC) activity (i.e., a busy NNTP [Network News Transfer Protocol] server).

The best fix: Add non-volatile (NV) SIMMs and Legato's Prestoserve software. This will help a lot more than anything else. If the disks are still too busy, you need more of them, and you need an NVRAM disk cache. A SPARCstorage Array (SSA) with 12 or so disks would give you a wider stripe. The SSA NVRAM is a reasonable substitute for the Prestoserve NVSIMMs, but both together is even better. Note that you do not need the storage capacity of 12 disk drives, but you look as if you need the random I/O performance of them. Twelve disks may seem extreme, but so does a 4700-ms service time!

Increasing ncsize and ufs_ninode to 34000 in /etc/system may help a little. With 640 megabytes of RAM, maxusers should be at 640 already, and the caches will already be quite large. If you have set maxusers directly to some low value then you should remove it from /etc/system and let it size automatically.

That's all for this month
I realize that some of my answers were written specifically for the people asking the question, and could be expanded to fill a whole column, with much more explanation for a wider audience. But these shorter answers let me cover more topics. Let me know if you like this format, or would prefer a single in-depth answer to a common question.

Next month I will offer an answer to the question, "What are the tunable parameters for Solaris?"

Click on our Sponsors to help Support SunWorld

About the author
Adrian Cockcroft joined Sun in 1988, and currently works as a performance specialist for the Server Division of SMCC. He is the author of Sun Performance and Tuning: SPARC and Solaris, published by SunSoft Press PTR Prentice Hall. Reach Adrian at adrian.cockcroft@sunworld.com.

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-12-1995/swol-12-perf.html
Last modified:

Comments:
Name:
Email:
Company Name: