|
Performance Q&A CompendiumSystem integration drives performance improvements |
This month we cover a diverse and plentiful collection of performance-related questions. Subjects covered include performance monitoring commands, tuning variables, logins and processes, how to interpret the output of performance measurements, and how to optimize web servers and news servers. (3,800 words)
Mail this article to a friend |
Here's a list of the questions covered:
How do I interpret the w column in
vmstat?
How do I tune the Solaris kernel?
How can I time-out orphaned processes in Solaris?
What causes slow rlogin?
Any performance tuning hints for Solaris 2.5?
Why do some login IDs in SunOS 4.1 accounting files
change?
Why doesn't my virtual-memory monitoring program
add up?
Are kernel memory allocation errors worth worrying
about?
How can I improve my Web server's http performance?
Is there a viable alternative to se?
Does Solaris offer a vmtune-like tool?
Why are my news spool disks overloaded?
How do I interpret the w column in vmstat?
Q:
We have a SPARCserver 1000E/Solaris 2.4 with four CPUs and 610 megabytes of RAM
as a dedicated Sybase server. vmstat 5 is used to monitor the
system at all times. Recently, the third column 'w' of
'procs' in vmstat's output started to report a value of around
20 and rarely changed. This value shows up again even after
rebooting the system many times during the past month. This
seems to indicate we have a memory shortage because
swapping occurred. But my questions are:
--(name and firm indeterminate)
A:
vmstat w reports the number of processes that are currently swapped out.
Those 20 processes all are idle ones. This is not a performance
problem. The Loukides book is rather out of date in places, and is not
particularly relevant to Solaris 2.
If you run vmstat -S and see lots of si and so, you might have a problem. Here's a reminder of what vmstat -S looks like:
vmstat -S 5 procs memory page disk faults cpu r b w swap free si so pi po fr de sr f0 s0 s1 s2 in sy cs us sy id 0 0 0 137392 15608 0 0 2 2 5 0 55 0 0 0 0 132 319 69 2 1 98
Swapping moves whole processes to the swap space, and paging is done a page at a time. Page-outs occur in large clusters, so the net effect is not all that different.
Swap space size is not a performance issue. If you have enough to run your apps reliably without running out at peak loads then you should be happy. If you want to collect crash dumps you might need more. That is one reason why SunService recommends setting swap equal to RAM.
|
|
|
|
How do I tune the Solaris kernel?
Q:
We are veteran Interactive users and used to tuning the kernel
using kconfig (mtune and stune files). We
are now porting to Solaris x86 (Base Server) and need to be able to
make equivalent tuning changes. In particular, we need to increase the
various values associated with IPC queue (MSGMAX, MSGTQL, etc.). We
have found one cryptic way to do this by hacking at system(4). Is there
a better way and is there any comprehensive documentation source on
tuning kernel parameters under Solaris x86? Thanks.
--Ken Robbins, firm indeterminate
A:
The "better way" involves editing /etc/system.
The performance manual section offers little help, but does list some parameters. My book (Sun Performance and Tuning) contains more details, including the algorithms that are being tuned.
Your question is a common one. I will probably address the question "What is the list of tunables in Solaris?" in a future column. There is no easy answer, unfortunately.
How can I time-out orphaned processes in Solaris?
Q:
At Brown & Root, we run both Solaris and AIX servers.
On all servers, we have Oracle as our database. On
occasion, some clients' Oracle processes
remain active even after they have logged off. In AIX,
we have found two parameters, tcp_keepidle and tcp_keepalive,
that help us timeout these orphaned processes. Is
there anything comparable in Solaris?
--Jacques Dejean, Brown & Root
A:
Your looking for the Solaris ndd
command, find a description of it
and the values it can be assigned in appendix E of
TCP/IP Illustrated, Volume 1 by W. Richard
Stevens. This book is also a complete reference to TCP/IP and how it works.
Make sure you understand the implications of any TCP tweaks. You can easily mess up the standard algorithm if you set it up wrong.
Q:
What are likely causes of extremely slow rlogin both to and from a
machine? The machine in question is seldom busy. It takes about 60
seconds to do rlogin from or to the machine. Once rlogin is completed,
response is fine.
--Mike Kelly, firm indeterminate
A:
Check for:
I get this problem myself, normally due to routing foul-ups. It may also help to put "file" at the start of the name server lookup path for hosts and password. I use "file nisplus dns" for hosts in /etc/nsswitch.conf, as I find that the system boots much more quickly if it looks up system identities for its main routers and servers in the /etc/hosts file.
Any performance tuning hints for Solaris 2.5?
Q:
I'm hoping that when Solaris 2.5 comes out you can dedicate
an article or a series of articles to the improvements and
kernel /etc/system parameters that should or should not be
set for 2.5. In reading your book, you gave different
hints for different types of systems (i.e., servers vs.
hosts), and the hints varied depending upon the version of
Solaris being used. I'm guessing that when 2.5 comes out,
it'll be different from 2.4, so it'd be nice to know what
changes have been made and what performance tuning hints
are applicable for 2.5.
--Blair Zajac, firm indeterminate
A:
Since I'm writing this before Solaris 2.5 is officially released, I can't offer
much guidance yet. I'll cover tunables soon. It takes a while to
figure out how to tweak a new OS release. There are a few new NFS V3
variables. The rest is basically identical to 2.4.
Editor's Note: Solaris 2.5 was announced at the end of October. Shipment for the SPARC and Intel versions of the new OS has just begun; Solaris for PowerPC recently entered beta testing and is expected to ship early next year.
Why do some login IDs in SunOS 4.1 accounting files change?
Q:
I hope you can help me out. Lately, I had to look at the
/usr/adm/acct/fiscal/fiscrptxx files. I found that some login IDs
had two entries per file, while other login IDs had one.
Can you tell me what the problem is, or at least give me a hint? I need to use the files for performance evaluation purposes. Do I have to add up the entries corresponding to a given login ID per file?
--Halim M. Khelalfa, AI Division, CERIST
A:
I'm not sure, and I haven't used accounting on 4.1.3 for many years.
One guess: Perhaps some users changed their group ID, keeping the same
user ID, during the month.
Why doesn't my virtual memory monitoring program
add up?
Q:
I now have a hard copy of your System Performance Monitoring
article and will read it soon. First, I am going to take advantage
of your offer to answer questions about this subject.
One of my personal monitoring programs presents physical memory utilization, which I calculate based on the following method. I believe it works and the assumptions are correct, but I'd like your opinion on its accuracy.
First I get some static facts:
V = total virtual memory size (everything is in kilobytes)
R = total real (physical) memory size
Next, I get 1-second snapshots of transient facts:
A = allocated (in-use) virtual memory
F = free (available) virtual memory, in the form of free resident memory pages (now in the physical memory)
IF ( A + F ) >= R THEN U = 100% ELSE U = (100% * A) / (R - F)There are cases when A < R but I report 100 percent because of the free pages that inhabit physical memory, forcing some allocated pages to be swapped out. I am not concerned about this because there will be (at least a potential for) thrashing, and that's what 100-percent physical memory utilization is supposed to indicate. What I am concerned about is when ( A + F ) <= R yet there is a potential for thrashing -- and I don't know why -- because there is something missing from my equations.
Notes:
--Alex Vrenios, EMTEK Health Care Systems
A:
Look at my SunWorld Online column entitled "Help! I've lost
my memory!" Then it may become clear why your calculation does not
work. The VM system is far more complex than your simple equation presumes.
I don't think the available data is sufficient to
model memory use. In particular, the only data available on a per-process basis
is the size of the address space for the process, and the amount that
has valid memory mappings. These values can be seen (measured in kilobytes)
via the old-style ps command, in the SZ (process size) and RSS
(process resident set size) fields:
% /usr/ucb/ps uax USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND root 2026 3.0 2.1 1424 1284 pts/6 O 23:14:29 0:01 /usr/ucb/ps uax adrianc 2021 0.7 4.1 3444 2500 ?? S 23:14:26 0:00 /usr/openwin/bin/c adrianc 1785 0.6 11.110048 6840 console S 20:50:55 1:12 /usr/openwin/bin/X adrianc 2024 0.3 1.4 980 856 pts/6 S 23:14:27 0:00 /bin/csh ...
Unfortunately for your calculation, the RSS excludes pages that are in memory but do not have valid mappings, and it includes pages that are shared by other processes. Your calculation also doesn't consider the memory used by files that are cached. To obtain this data, kernel code would have to be written that traverses many data structures and tallies the pages. This is not available in the base release, or in any commercial performance tools that I am aware of.
I think it would be useful to have more information about memory usage, and it is on my list of things I'd like to see added to Solaris.
Are kernel memory allocation errors worth worrying
about?
Q:
While recently monitoring a SPARCcenter 2000E with RuleTool, I noted that
the system was regularly experiencing kernel memory allocation errors.
I tried to find some info on the seriousness of this, but wasn't
able to find much other than it possibly being caused by a
memory leak. A call to SunService seemed to indicate that as
long as the frequency was very low (it was, approx. 5 per day)
it wasn't a cause for concern.
I would like more info on this in a future article (or in your next book Performance Tuning: The Sequel). I'd like to congratulate you on your book and I assume it's doing well considering it fills a void that's been around for years. (I purchased two copies myself, and have influenced several others in purchasing the book.)
--Greg Wells, firm indeterminate
A:
If the system can't grab memory when it needs it and can't wait, then
you can get problems like a stream or login attempt
failing. There are other reasons why allocation failures occur; in most
cases, the system finds a way to retry the operation and succeed.
This problem happens mostly on Solaris 2.4 and 2.5 multiprocessor systems, not so often on Solaris 2.3 or uniprocessor systems. If you see kmem allocation errors (sar -k 1), then increase the free list so that it is less likely to hit the endstop.
Set lotsfree to 128 * the number of CPUs you have or set up virtual_adrian.se to run every time you reboot and it will set this for you.
Adding more RAM doesn't help, as the free list size is not scaled. As you can see below, I've had a few errors on my Ultra 1, but I take this as a warning, not a serious worry. It is useful to track it in case something else fails at the same time as a new kmem error.
% sar -k 1 SunOS eccles 5.5 Generic sun4u 11/05/95 23:30:18 sml_mem alloc fail lg_mem alloc fail ovsz_alloc fail 23:30:19 4046848 3611540 0 7536640 6492776 8 5373952 0
How can I improve my Web server's http performance?
Q:
My issue deals with a problem I have been seeing on more
and more Solaris 2.4 systems running as WWW servers.
First, I do realize that the http protocol was not designed to work with TCP/IP. In fact, it butchers it, but since it's a growing phenomena, we need to tune the system for it!
Now, the problem I have been seeing. When dialup users connect to these WWW servers via SLIP/PPP, Solaris apparently drops a lot of packets, and a lot of retransmissions are occurring as shown from the results of the netstat -s command.
What I discovered is that the default setting of 200 for tcp_rexmit_interval_min is too low. Setting this up to 10000 finally gives good performance results. However, as you are well aware, this will increase the amount of time the system waits before a retransmission takes place after a packet is dropped. Catch-22! ;)
I also noted that the listen backlog parameter set by ndd: tcp_conn_req_max is set to 5 and allows a maximum value of 32.
How can I optimize Solaris 2.4 to perform well as a WWW server? More and more clients are asking me to improve WWW performance on Sun.
--Boni Bruno, Data Systems West
A:
There is an excessive retransmit bug that is fixed in Solaris 2.4 patch
101945-34 (Sun's recently released kernel jumbo patch) and Solaris 2.5.
You will still see retransmit levels of 10-30% on machines with
direct Internet connections. You can reduce them by setting the initial
retransmit interval to a second or so (1000, as the units are ms).
Most packets seems to take at least a second to get to their destination
and get an acknowledgement back over the Internet! You should not set
it to much more than a second.
The limit for tcp_conn_req_max should be set to 32 in 2.4, and can be set up to 1024 with ndd in 2.5 if you have enough memory to hold all those pending connections. A setting of 128 seems to work well on Solaris 2.5, and is being used on some big internet sites. Add these lines to /etc/init.d/inetinit
ndd -set /dev/tcp tcp_rexmit_interval_initial 1000 ndd -set /dev/tcp tcp_conn_req_max 32
We also have fast name service caching in 2.5, so DNS (Domain Name System) lookups get cached (see the nscd man page). In general 2.5 is a much faster Internet server than 2.4, even though there are several areas where tuning work is still underway.
Is there a viable alternative to SE?
Q:
Could you please hint as to where I might find a discussion of
how to interrogate the device drivers without using SE?
--Gal Bar-or, firm indeterminate
A:
The SE toolkit
provides direct access to many of the data sources in
the kernel. The primary commands you can use are:
Does Solaris offer a vmtune-like tool?
Q:
I've recently started using Suns. With Sequent Dynix/ptx (based on AT&T
V3.2), a vmtune utility controls virtual memory (VM) management.
--MyungSuk Yoo, Bombardier Regional Aircraft
A:
There are no controls on resident set size per process in any of the mainstream
versions of Unix. Why not? Well, it's hard to get a default behavior that works any
better than the current system over a wide range of system sizes
and workloads. Also, implementing a working set pager requires a
lot more overhead, in terms of both CPU use and kernel data storage.
In Solaris 2.4, swapouts of large idle processes occur if free memory stays well below its normal level for several seconds.
Why are my news spool disks overloaded?
Q:
I am running a news server and I am getting very poor
performance from it. It is running on a SPARCserver 1000 with
640 megabytes of RAM. The news software (INN) resides on /opt (sd1)
and Solaris 2.4 resides on sd0. iostat -x 30 indicates that at least
one of my bottlenecks can be attributed to my disks, primarily the spool.
I am striping 3 disks (sd15 sd37 sd7)
using Online DiskSuite. The stripe has an interlace value of 16 blocks.
Below is some of the output from iostat -x 30. As you can see, most of the load is caused by writes to the spool.
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b sd0 0.0 3.3 0.0 19.9 0.0 0.1 34.6 0 5 sd1 0.0 15.7 0.0 99.3 0.0 0.4 24.7 0 39 sd15 1.4 18.0 3.6 97.9 45.1 46.9 4737.6 44 79 sd37 0.7 16.5 3.2 99.9 10.5 7.1 1025.1 9 22 sd7 0.5 16.5 2.7 98.7 9.7 6.8 972.8 9 20 extended disk statistics disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b sd0 0.0 3.8 0.0 24.6 0.0 0.1 38.2 0 5 sd1 0.0 15.9 0.0 101.1 0.0 0.4 23.9 0 38 sd15 1.1 17.5 2.3 100.3 14.0 54.2 3656.0 36 73 sd37 0.8 15.5 3.3 96.9 9.0 6.6 961.8 8 21 sd7 0.6 15.5 3.2 97.0 8.9 6.1 929.2 8 18
--(name and firm indeterminate)
A:
Those disks are dead meat! A slow service time is 50 milliseconds; 4737 ms is
glacier-like speed. As you can see, there are 47 active commands inside
the disk drive,
and 45 commands waiting to be sent to the drive. Each new command you send to
the drive has to wait for 92 other commands to finish first. Thus it takes almost
5 seconds to service each I/O. Dividing down, 4737 ms/92 commands = 51 ms for
each I/O at the disk drive. This indicates a lot of long seeks -- probably
random seeks between inodes and data in many parts of the disk drive.
The problem is lots of files being created, touched, and destroyed; lots of inode updates; and Directory Name Lookup Cache (DNLC) activity (i.e., a busy NNTP [Network News Transfer Protocol] server).
The best fix: Add non-volatile (NV) SIMMs and Legato's Prestoserve software. This will help a lot more than anything else. If the disks are still too busy, you need more of them, and you need an NVRAM disk cache. A SPARCstorage Array (SSA) with 12 or so disks would give you a wider stripe. The SSA NVRAM is a reasonable substitute for the Prestoserve NVSIMMs, but both together is even better. Note that you do not need the storage capacity of 12 disk drives, but you look as if you need the random I/O performance of them. Twelve disks may seem extreme, but so does a 4700-ms service time!
Increasing ncsize and ufs_ninode to 34000 in /etc/system may help a little. With 640 megabytes of RAM, maxusers should be at 640 already, and the caches will already be quite large. If you have set maxusers directly to some low value then you should remove it from /etc/system and let it size automatically.
That's all for this month
I realize that some of my answers were written specifically for the
people asking the question, and could be expanded to fill a whole
column, with much more explanation for a wider audience. But these
shorter answers let me cover more topics. Let me know if you like this
format, or would prefer a single in-depth answer to a common question.
Next month I will offer an answer to the question, "What are the tunable parameters for Solaris?"
|
About the author
Adrian Cockcroft joined Sun in 1988, and currently works as a performance specialist for the Server Division of SMCC. He is the author of Sun Performance and Tuning: SPARC and Solaris, published by SunSoft Press PTR Prentice Hall.
Reach Adrian at adrian.cockcroft@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-12-1995/swol-12-perf.html
Last modified: