Adrian Cockcroft's frequently asked questions |
Editor's note: Adrian Cockcroft's Performance Q&A column in SunWorld generates more reader e-mail than any other item in our monthly magazine. We're starting to see some repetition to your letters, so to lighten Adrian's e-mail load and offer you quicker access to Adrian's wisdom, we've compiled the more frequently asked questions here.
As time passes, we'll add popular questions to the top. We'll leave this file at this URL, so you may wish to add this FAQ to your bookmarks list.
Where can I find SymbEL
(aka "rule tool" and "SE")?
Where can I find in-depth documentation for
rule tool?
What's the highest value
ncsize can have?
How do I figure out target
drives for vmstat and iostat?
How does Solaris 2 use memory?
Is there any way to change the time slice on
Solaris?
What's a high process-switch value?
Static vs. dynamic linking.
What is a "Level Red" mutex stall?
Does the current edition of your book cover SunOS 5.5?
What flags in /etc/system are related to security?
I need to reboot my Solaris 2.4 machines every two weeks. Why?
I think swap size doesn't affect performance.
How do I improve my Web server's performance?
What's the latest recommended version of Solaris for
older SPARC computers?
Should I use the SSA NVSIMM and Presto NVSIMMs together?
Any tips for Solaris X86 administrators?
Is there a Solaris 2.4 kernel tuning parameter
that stop unfriendly programs from
taking over a system?
We see a "Allocation errors, kmap full?"
message on SPARCstation 20 with 512 megabytes. Why?
Help me program asynchronous I/O.
How should I partition my hard disk?
Will I/O be faster on a 64-bit file system,
especially on a database application like Oracle?
What's better for a Web server: UltraSPARC or hyperSPARC?
Why shouldn't I run CacheFS on a read-write filesystem?
How do I interpret the w column in
vmstat?
How do I tune the Solaris kernel?
How can I time-out orphaned processes in Solaris?
What causes slow rlogin?
Any performance tuning hints for Solaris 2.5?
Why do some login IDs in SunOS 4.1 accounting files
change?
Why doesn't my virtual-memory monitoring program
add up?
Are kernel memory allocation errors worth worrying
about?
How can I improve my Web server's http performance?
Does Solaris offer a vmtune-like tool?
Why are my news spool disks overloaded?
Do you have a version of SymbEL for HP-UX?
What is better, SPARC or Pentium?
Why doesn't 32 megabytes
seem like enough?
Is there a way to measure
the amount of CPU used by AIO "waiting" methods?
How many syscalls are too many?
What can I do kernel memory button in ruletool goes black?
How large can a process be in Solaris?
Where may I find ruletool?
Q:
Where may I find ruletool?
A: I discuss a new version of the SymbEL toolkit (release 2.5) in my column "Description and Installation for SymbEL release 2.5.0.2". If you already have the SE Performance Toolkit Version 2.5.0.2 or have read Appendix A at the end of my book, you should be familiar with the ten basic rules that indicate the state of parts of the system and the action required to improve bad states. The toolkit makes it easy to include the rules, and they provide a high-level indication of the system's health.
Q: Could you please hint as to where I might find a discussion of how to interrogate the device drivers without using SE?
--Gal Bar-or, firm indeterminate
A: The SE toolkit provides direct access to many of the data sources in the kernel. The primary commands you can use are:
Where I can find in-depth
documentation for rule tool?
Q:
Can you please tell me where I can find indepth documentation for rule tool?
--Kenny Henderson, (firm indeterminate)
A: I don't understand your request. Ruletool is a script, so you can read the source code. It was also described in depth in articles on www.sun.com that are linked from my column and the SE2.4 download page. That's all there is, apart from the rules in appendix A of my book.
What's the highest value
ncsize can have?
Q:
I'm running a very large Web server and I seem to have the directory
cache "go to amber" (under ruletool) for poor DNLC hit rates. I
increased ncsize to 34000 on a : *) 512 MB 4-CPU SPARC 1000 running 2.4
Is that the highest ncsize can go? Would tuning somewhere else help?
--(name and firm indeterminate)
A: It can go bigger, but there isn't much point. DNLC performance effects are not great unless you are on a NFS server with too little RAM. Amber is just a warning. You are probably accessing lots of new files once, so the cache will never be able to hit. You do need to make ufs_ninode 34000 as well, otherwise there are not enough inodes for the DNLC to cache references to.
How do I figure out target
drives for vmstat and iostat?
Q:
There is some confusion about which target sd0 in the iostat output
refers to. Some people here say it is target 3 (Internal drive) while
others say it is target 0. Which would be correct? The same quandary
exists with the vmstat disk fields, s0 and s3. Which target does each
field refer to? Thanks and I am going to buy your book this afternoon!
--Jeanne Brennan, brennan@hou.moc.com
A: On my system its setup like this (note that the SE toolkit figures this out for you):
% /opt/RICHPse/examples/disks.se sd0 -> c0t0d0 sd2 -> c0t2d0 sd3 -> c0t3d0On older Sun systems its setup with t3 and t0 swapped: eeprom(1M) Maintenance Commands eeprom(1M) sd-targets Map SCSI disk units (OpenBoot PROM version 1.x only). Defaults to 31204567, which means that unit 0 maps to target 3, unit 1 maps to target 1, and so on.
How does Solaris 2 use
memory?
Q:
We are busy porting a real-time application from Linux to Solaris x86,
and are experiencing problems with regard to memory.
We load about half of our physical memory with data (30 MB), and
even though there should be plenty memory available, we experience lots
of paging to disk. I would appreciate it if you could enlighten us as
to what is the problem.
--Antony Jankelowitz, (firm indeterminate)
A: If you do any file I/O it appears as paging, you may want to look at plock and mlock manpages Read my October SunWorld article that talks about how memory is used in Solaris.
Is there any way to change
the time slice on Solaris?
Q:
We are using Solaris 2.3 on SPARC 20 (four 125-MHz CPUs). We have 15 to 20
idle time left over when we have 18 application processes running with
lot of messages being processed. There are two processes which take up
35% and 25% of the time, respectively.
My question is: Is there anyway to change the time slice on Solaris?
I don't know what the default is. But for example, if it is 20ms, we
change it to 50 ms for the first process and 40ms for the second
process. By doing this, there will be less swapping and better turn
round. Do you think this will make any difference?
--Jay, (firm indeterminate)
A: I don't think the timeslice will make any difference. Upgrading to Solaris 2.5 will help. The kernel is more efficient, especially on MP systems. You have two busy processes, and you have four CPUs, that is why there is idle time.
What's a high process-switch
value?
Q:
I'm trying to tune some SPARC machines. sar -w reports a lot of
processes switches as reported in pswch/s. 1000 on a SS1000 60-MHz with
two CPUs, more than 1500 on a SS1000 85-MHz with four CPUs.
I ran se2.4 on both systems and it reported nothing wrong. They have
quite different profiles : One is an NFS server, one runs six Oracle
instances (DB server).
--Alexis Grandemange, (firm indeterminate)
A:
Q:Are these common values?
A:These are quite low.
Q:Is pswch an accurate indicator?
A:Yes, the metric itself is accurate, but it is not usually useful as a problem indicator.
Q:Is it possible to reduce number of process switches without impact on throughput?
A:No, usually as throughput increases so do pswch.
Use mpstat to see how many are involuntary versus voluntary context
switches. If there are a large proportion of icsw, then increasing the
timeslice might help a little. (see dispadmin) Don't expect any
dramatic improvements.
% mpstat 5 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 68 19 0 1046 826 568 57 0 27 0 285 7 13 0 80here there are 568 switches but only 57 are involuntary,
Q:Both machines are paging heavily (about 100 p/s).
A:That's not very heavy. Its probably just filesystem I/O activity.
Q:I ran se2.4 on both systems and it reported nothing wrong.
A:In that case they are probably both OK. Wait until a problem is reported before you start to worry.
Static vs. dynamic linking.
Q:
I would like to add a few points to your column
"Which is better, static or dynamic linking?":
In summary, dynamic linking may be better if you do not mind paying
good money for license fees, disks and networks. It is definitely
better if you are the person selling those licenses, disks and
networks. Otherwise, better take a second look at static linking.
--Hubert Meitz, (firm indeterminate)
A: You can certainly decide to static link to third-party libraries or ones that would need to be installed on every system. There is no reason to statically link to the standard OS libraries. You can statically link to one set of libraries, while dynamically linking to libc etc and the window system. Note that Sun's Fortran libraries (libF77.so etc) are explicitly licensed such that you can install them anywhere they are needed to run a fortran program that uses them.
The benefit of dynamic linking is that you can upgrade to an improved libF77.so without rebuilding your application. You may also get a platform-specific libF77.so that is optimized for the hardware that it is running on. This may make a big difference for high-end floating point applications.
What is a "Level Red" mutex stall?
Q:Following the guidelines indicated in appendix A of your SP&T book
we have a four-processor SS1000 in a "Level Red" Mutex Stall (smtx
> 400). Is there a quick hardware fix for this problem (i.e., more
or faster CPUs)? Any advice would be appreciated.
...BTW, great book!
--Todd Resnick, Duke University Medical Center
A: I actually answer this question in "New Release of the SE Performance Toolkit". The main new feature is the performance collator for Web servers. See the SE Performance Toolkit Version 2.5.0.2 for more information.
Adrian:
OK, I'm trying to locate a copy of your book locally, I installed your
SE Tools (cool, big help), but here's one I can't figure out. Is this a
hardware problem, or configuration problem. This particular SPARC 5,
with 32 megabytes RAM, O/S on 1 gigabyte disk, user dir's on NFS mounts from an old
(cough, gag) Digital Ultrix 4.4 box, keeps crashing Framemaker sessions. I'm
trying to determine if it's Framemaker, or the box, or the
configuration. I have about 128 megabytes swap. We have nine of these SPARC 5's
running Framemaker, and they are having problems with NFS, which I am
trying to determine if there is any tuning stuff I can do until I can
get them to get rid of the old Ultrix machine.
I'm running the calendar manager monitor, but can't get cm printing to quit truncating the output!! Any help on at least this kmem issue would be highly praised throughout my company.
KERNEL MEMORY INFORMATION The current kmem state is: amber Allocation errors have occurred Total number of kernel memory allocation errors since boot: 1 Number of kernel memory allocation errors this interval: 1 Number of pages of memory on the free list: 274 KMEM RULE THRESHOLD KMEM_FREEMEM_LOW default= 64pg getenv= 64pg no pages for kernel to use
All I can suggest is to attach a truss to frame, wait for it to die then look at the system call sequence before it died to see if there are any clues.
truss -o /tmp/trusslog -p makerpidthen
tail /tmp/trusslog
Adrian:
I seek examples of se_2.4 scripts others that these installed in
/opt/RICHPse/examples, especially a class allowing monitoring of distant
site by using rstatd. I use Solaris 2.4 in many SPARCstation 5.
Does an anonymous ftp site exist with such examples?
Thank you very much of your assistance
--Robert Rivoir, (firm indeterminate)
Adrian:
I've just installed the latest version of the SE toolkit on
my Solaris 2.5 machine. It seems to work fine. But when
I DISPLAY it on my sunos 4.1.3 machine running Openwin 3,
it kills the openwin!
What can I do to make it work?
--(name and firm indeterminate)
Adrian Cockcroft replies I don't have any 4.1.3 systems to test with. I have no idea why it might kill openwin. Do other Motif apps work? Do all the SE GUI applications kill it or just one?
The reader replies I do not have many Motif applications, but I noticed the same behaviour with Netscape for Solaris. The other SE apps seems to behave the same. Does it mean my 4.1.3 machine has no Motif libraries?
Adrian Cockcroft replies A good possibility. Readers?
Adrian:
I have followed several articles of yours about tuning and found
the information to be most helpful. Is there a non-graphical/non-audio
based version of virtual_adrian.se? We need to run this at our customer
sites over a telnet connection.
--David A. Dempsey, Endeavor Information Systems
virtual_adrian.se 30 -to disable the audioplay command.
Does the current edition of your book cover SunOS 5.5?
Q:
Does the current edition of your book cover SunOS 5.5?
--(name and firm indeterminate)
A: The book was written before Solaris 2.5 was released but there were few changes and everything still applies. The performance changes in Solaris 2.5 are the subject of my "Sun Home Page" performance column this month.
What flags in /etc/system are related to security?
Q:
While the article on kernel and /etc/system parameters there are still
several things missing. In Solaris 2.4 there was a fix in /etc/system
set nfs:nfs_portmon=1which made nfsd only accept connections from low numbered ports. This is the same as running rpc.mountd on sunos4.x with a -n flag. Unfortunately this same parameter does not exist on solaris 2.5. What I need is a list of parameters such as this one which have nothing to do with performance tuning but everything to do with securing my system. I'd like a reference source for these sorts of tuning parameters. They are not autoconfigured based on available resources.
A: I concentrate on performance related parameters, I'm not a security expert. However you will find set nfssrv:nfs_portmon=1 works in Solaris 2.5. There were changes due to the integration of NFS3 into the system that seem to have rearranged the nfs modules slightly.
I found that it still existed by running /usr/ccs/bin/nm on /dev/ksyms, then located the module it contains in /kernel.
/usr/ccs/bin/nm /kernel/misc/nfssrv | grep portmon [29] | 1676| 4|OBJT |LOCL |0 |3 |nfs_portmon
The manual page for nfsd(1m) also documents this change.
If the NFS_PORTMON variable is set, then clients are required to use privileged ports (ports < IPPORT_RESERVED) in order to get NFS services. This variable is equal to zero by default. This variable has been moved from the "nfs" module to the "nfssrv" module. To set the variable, edit the /etc/system file and add this entry:
set nfssrv:nfs_portmon = 1
I need to reboot my Solaris 2.4 machines every two weeks. Why?
Q:
Hi, I have three or four SPARC 5 machines running Solaris 2.4 that have
to be rebooted every two weeks or even every week depending on what the
users are doing (e.g., every week if they continually run Matlab
simulations along with the usual programs such as text-editing, email;
longer for email and text-editing until they start running more
resource demanding programs). What happens is that the machines slow
down for a while and then completely freeze because they are paging in
and out. The users could not do anything, even to just move the mouse
from one window to another.
I ran some performance statistics using sar, ps and vmstat. On one of the machines which is a SPARCprinter II server, I was convinced that there was a memory shortage. So I added another 32MB RAM expanding the total RAM to 64MB. It still slows down when it's printing, but the system is able to recover from the paging activity unlike before when I had to reboot the system. In this case, I did not think of a possible kernel memory leak, so I did not collect sufficient statistics for analysis (sar -k). Here are the statistics output and notes on the analysis:
(Editor's note: extensive tables and statistics deleted)
I believe that all machines displaying the above-mentioned behavior are experiencing some kind of memory leak somewhere. I tried to set maxusers=40 in /etc/system, but that did not work. I would try setting shared memory (shmsys), but I could not find any helpful hints on it. Please help.
I appreciate any comments or thoughts on this issue.
--Cindy Doehr, (firm indeterminate)
A: It sounds like a kernel memory leak bug, have you tried loading the latest set of kernel patches for Solaris 2.4?
I think swap size doesn't affect performance.
Q:
Most of your article on performance tuning is just fine, but
I disagree when you state that so long as applications fit,
swap size doesn't affect performance. I used to think
that, too, but then I learned about fragmentation, and
observed user complains going away when I increased their
swap.
Also, you mention ncsize and ufs_ninode. These are
discussed in Sun's old performance tuning overview. I'd
be interested in an updated discussion of these as pertains
to SunOS 5.3-5.
--Anthony D'Atri, (firm indeterminate)
A: The only mechanism I can think of is that the pageout swapfs clustering could be limited if free swap space becomes fragmented into small blocks, this might delay the availability of free pages a bit, but if you are paging in and out at the same time to an overloaded disk you are already in a slow performance mode. Adding swap space on a separate disk helps.
Regarding ncsize and ufs_ninode, this is discussed in the book, and I have touched on it a few times in my Web articles.
From further discussion it seems that your experience of performance improvement may have been on SunOS 4.x systems. Solaris 2 has a completely different swap space layout policy that avoids this problem as far as I can tell.
How do I improve my Web
server's performance?
Q:
Just wanted to drop you a quick email to add to your no doubt groaning
emailbox, to thank you for the articles on performance on the web.
I am currently a system admin. at a small internet provider (but growing fast) in the UK, which means I do everything ;-)
Our Web server, sparc 20, sol 2.5, has just died performance-wise over the last week.... (seem to be loads of time_waits) and whilst doing everything else to keep the provider running, Ive got to look at it urgently before all our customers leave :-(
Anyway, I asked for a list of /etc/system parms and was recommended to read your articles.. I've only just started going through them, but already I can see I have to order your book ;-)
I can see this week will be spent reading everything you've written,
that I can get my hands on, to try and solve this problem, so thought I
should at least email you to say thanks for the info.
--Keith Pritchard, (firm indeterminate)
A: This is the subject of my March column. In this particular case it turned out that the workload had crept up until there was no more Internet bandwidth left to/from this system. Increasing the speed of the Internet link fixed it.
What's the latest
recommended version of Solaris for older SPARC
computers?
Q:
Currently we have an installed base of Sun SPARC 4/110, 4/330
and 4/75s. I was wondering whether running Solaris 2.4/5 is
compatible with these old hardware platforms, and what are the
performance implications by running just Solaris and/or any
simple X-based application.
--Ioannis M. Kyratzoglou, Mitre
A: Solaris 2.4 is the last release that works on a 4/110 or 4/260 or 4/330, or 4/490 -- we have upgraded our lab systems to 4/600 CPU boards with SuperSPARC modules that will run the latest OS.
All others run 2.5, the latest releases are faster and smaller than older Solaris 2 versions. Solaris 2.5 uses more RAM than 4.X, but most things are faster than 4.X, a few operations are slower because extra functionality has been added.
If you have enough RAM I'd upgrade them, if any are marginal don't bother. 32 MB is probably the minimum: you can't waste CPU cycles paging on a slow system, so you need more RAM to keep up with more recent hardware.
Should I use the SSA NVSIMM
and Presto NVSIMMs together?
Q:
I have heard and read conflicting views of using SSA NVSIMM and Presto
NVSIMM items together. While I understand the configuration problem
of putting one logically before the other in order to have an orderly
recovery after a system crash, I still am not clear that using a
Presto-NVSIMM with an SSA-NVSIMM gives you anything over just the
SSA-NVSIMM alone.
--Kerry P. Boomsliter, Knight-Ridder Information
A: The bandwidth to/from Presto NVRAM is perhaps 10 times that of SSA NVRAM. The CPU does more work with Presto as the data is copied to and from the Presto NVRAM, but doesn't bottleneck on the disk interface like the SSA NVRAM.
That is the main difference. Either one gets you most of the performance boost by changing disk accesses into memory accesses, but the higher bandwidth and capacity of Presto gets better response times, and the SSA NVRAM has higher throughput. Both together is the best option for maximum performance. The recent introduction of 16MB SSA NVRAMa (up from 4MB in older SPARCstorage Arrays) also helps performance for write-intensive applications.
Q:
In your
answer to the person whose NNTP server
had overworked disks, you said adding an NVRAM SIMM and
Legato's Prestoserve software was the best thing to do.
I had always associated Prestoserve with NFS service.
Does your advice imply these pieces of hardware and
software are for use in more than just NFS servers?
Should I consider putting them in any machine with a high
I/O load?
--(name and firm withheld)
A: Directory and inode updates are synchronous, also files are flushed when they are closed. The NVSIMM defers and coalesces all synchronous I/Os, and has no effect on regular writes to a local filesystem. NFS writes are also synchronous, which is why it helps NFS. News and mail do a lot of directory and inode updates, and create/move/delete small files which is why NVSIMM helps a lot.
Q:
You seem to be very free in recommending NVSIMMs, everywhere
filesystem performance is mentioned :-)
Having recently gotten my greedy little hands on ODS4.0,
I am playing with the "metatrans" device, known to the rest
of the world as journaling filesystems. It seems to speed
up filesystem access quite nicely, although I have never
had NVSIMMs to play with, to compare.
Could you provide some juicy details comparing the pros
and cons of both approaches?
--Philip Brown, (firm indeterminate)
A: Bandwidth to an NVSIMM is greater than 100MB/s, SBus Presto is 30MB/s?, log disk is 2-5MB/s, random writes go at 400KB/s on a good day.
Converting random writes to any of the others is faster, NVSIMM is the lowest latency, lowest overhead option.
Remember that the data must be written to the log (low latency as far as user is concerned) then must be read back and written to the filesystem (extra overhead and throughput needed, not latency sensitive).
Philip Brown replies All the data has to be written there?? This seems strange to me. How is it then, that it speeds up ufs throughput as much as it does? If it only does that.. it would seem to me, that it would only become equal to ufs speed, not exceed it.
Adrian answers Only the synchronous writes go to the log, inode updates and synch writes on NFS servers.
It speeds up allocation of blocks by forcing them to be sequential, regardless of how many inode and indirect block updates are needed on the way.
Any tips for Solaris X86 administrators?
Q:
(Your) performance tuning articles are excellent, but I wonder
if anyone has any performance tips or gotchas for the vaguarities
of the Intel platforms running Solaris x86?
What special concerns come up on those hardware platforms?
--(name and firm indeterminate)
A: I don't have any Solaris x86 systems (I work in SMCC remember :-) but almost everything I say about Solaris on SPARC applies also to Solaris x86. The SE toolkit supports Solaris x86.
Is there a Solaris 2.4
kernel tuning parameter that stop unfriendly programs from taking over
a system?
Q:
Is there a Solaris 2.4 kernel tuning parameter (like maxuprc)
that would allow sysadmins to stop unfriendly programs from
taking over a system? The problem we have sometimes seen is
a poorly written program forking off infinite copies of itself
until the machine dies or hits its process limit. We want to
be able to limit a user's total to, say, 100 processes.
Is this possible under Solaris 2.4?
--Lance Nakata, Stanford University
A: The same maxuprc variable does this for you in Solaris 2.X
set maxuprc=100 in /etc/system and reboot
We see a "Allocation errors, kmap full?"
message on SPARCstation 20 with 512 megabytes. Why?
Q:
Hello Adrian. I have read your book a dozen times and use your tools.
Excellent. I have a question about an "Allocation errors, kmap full?"
message we received last week on one of our production servers. It is
a SS20 with 512 megabytes of RAM. For some weird reason, it started
canceling telnet and rlogin connections and I have a feeling they were
during the same time we received the kmap full error messages. Could
you explain? Every once in a while we would receive mutex contention
errors as well.
I know this is in the dark work without the specs on the system, processes running, system configuration and things like that. But, you are an expert and I figured you could point me in the right direction.
Thanks!
--Neil Greene, Sr Oracle DBA / Unix Administrator,
SHL Systemhouse
A: If the kernel can't grab memory it will cause a login or telnet to fail and you will get allocation errors.
If it persists, the machine stops working, and you need a reboot to fix, it means the kernel got too big. To fix this reduce maxusers to 200 or so, set bufhwm to 4000, upgrade to 2.5 (which has more kmap on sun4m) or upgrade to SS1000 or UltraSPARC systems that have much bigger kmap.
If it comes and goes, then the free list was empty so no pages for the kernel to grab. Set lotsfree to 512 and desfree to 256, leaving minfree alone. Increase slowscan to 500.
This is a fairly common problem with 512MB SS20's.
Help me program asynchronous I/O.
Q:
I was wondering if you could point me in the right direction on which
Solaris 2.4 patch will enable me to do asynchronous I/Os (aio_read and
aio_write). The man page says that async. support is a future release,
but in one of your articles you mentioned a patch that would allow
async. I/O. I just installed the latest jumbo patch (101945-34) but
the routines still return -1 (errno set to ENOSYS). Any help would be
greatly appreciated.
Thanks,
--Chuck Williams,
Senior Telecommunication Systems Engineer, Loral Test & Information Systems
A: You need to look on the second CD that comes with 2.4 or in the Patches directory on the main 2.4 CD. Kernel async I/O was shipped with the 2.4 release but was not installed by default. There is probably an updated release of that patch to look for once you know its number.
In the meantime, the aioread calls should work with no patches, the KAIO fast path in the patch is only really needed for Sybase on raw disks.
My guess is that you are not using the API correctly in some way. (See the feature story Programming asynchronous I/O.)
How should I partition my hard disk?
Q:I always face hot comments when I suggest to bundle /, /usr
and /var under one large partition... I do not think that
having separate partitions is needed anymore. Am I right?
Is there any good reason to split them these days ? I know
that in the past it was needed because of small disks but
now ? It is an issue I would like to close once and for all.
What are your thoughts about this?
--Benoit Gendron, (firm indeterminate)
A: My book Sun Performance and Tuning: SPARC and Solaris contains my thoughts on this subject. I recommend one partition for desktops, and keeping /var separate on servers only -- so that /var/mail can have Prestoserve acceleration.
Also makes upgrades much easier.
Will I/O be faster on a
64-bit file system, especially on a database application like
Oracle?
Q:
You have not focused on 64-bit file systems and performance
in your article. Will I/O be faster on a 64-bit file system
and special on a database application like Oracle?
--(name and firm indeterminate)
A: 64-bit file sizes and file systems can and will be implemented on any system. Solaris already supports 1 TB file systems, and 2 GB files.
Oracle runs best on a raw disk setup, and there are no 64-bit features that would speedup file system accesses.
What's better for a Web server: UltraSPARC or hyperSPARC?
Q:
We have a SS20/712 as an applications layer firewall that is
(at times) completely CPU bound w/ all the http traffic
going through it. We are in the process of enhancing the
http-proxy w/ all the recommendations made on this web page.
However, until that is done we want to increase the throughput
via hardware, i.e., faster processor. We are looking at
the 100-MHz HyperSPARC setup, but don't know what the optimal
cache size would be. We have a choice of 1M or 256K. Please
help. In a networked environment what would be the preferred
(fastest) for us?
--Mike McPherson, (firm indeterminate)
A: I would spend the money on an UltraServer 1 Model 170
Solaris 2.5 is a bit more efficient than 2.4, and the faster CPU and system bandwidth will probably work better than a dual CPU SS20.
I don't think HyperSPARC systems run kernel code as well as SuperSPARC systems. I found the 125-MHz 256KB HyperSPARCs were about the same as 60-MHz SuperSPARCs for running commercial applications like database backends that do a lot of kernel work.
Is the http proxy forking for every request? If so, a preforked or threaded proxy would be much better -- i.e., Netscape or phttpd or Apache, but not CERN or NCSA.
Why shouldn't I run
CacheFS on a read-write filesystem?
Q:I attended a seminar you gave
at a Computer Literacy in San Jose a while back and remember you
mentioning a caveat about using CacheFS.
I remember you saying something like "it's not a good idea to use CacheFS on a r/w filesystem." What I can't remember is WHY. Is it because writes through CacheFS are slower, or is it because writes through CacheFS are unreliable? Or does having an r/w fs mounted through CacheFS cause performance of CacheFS to drop in general?
Also do you have any suggestions for CFS option settings for read-mostly
filesystems ?
--Jim Burwell, Systems/Network Admin., Broadvision
A: Read-mostly is fine. If you only read the data once don't bother caching it, if you keep changing a lot of it it is a waste of time caching it.
If you have a few updates, but mostly read the data it should give a good speedup. /var/mail is a really bad choice, /home is usually OK, /export/local (or whatever you mount applications on) is a good idea.
How do I interpret the w column in vmstat?
Q:
We have a SPARCserver 1000E/Solaris 2.4 with four CPUs and 610 megabytes of RAM
as a dedicated Sybase server. vmstat 5 is used to monitor the
system at all times. Recently, the third column 'w' of
'procs' in vmstat's output started to report a value of around
20 and rarely changed. This value shows up again even after
rebooting the system many times during the past month. This
seems to indicate we have a memory shortage because
swapping occurred. But my questions are:
A: vmstat w reports the number of processes that are currently swapped out. Those 20 processes all are idle ones. This is not a performance problem. The Loukides book is rather out of date in places, and is not particularly relevant to Solaris 2.
If you run vmstat -S and see lots of si and so, you might have a problem. Here's a reminder of what vmstat -S looks like:
vmstat -S 5 procs memory page disk faults cpu r b w swap free si so pi po fr de sr f0 s0 s1 s2 in sy cs us sy id 0 0 0 137392 15608 0 0 2 2 5 0 55 0 0 0 0 132 319 69 2 1 98
Swapping moves whole processes to the swap space, and paging is done a page at a time. Page-outs occur in large clusters, so the net effect is not all that different.
Swap space size is not a performance issue. If you have enough to run your apps reliably without running out at peak loads then you should be happy. If you want to collect crash dumps you might need more. That is one reason why SunService recommends setting swap equal to RAM.
How do I tune the Solaris kernel?
Q:
We are veteran Interactive users and used to tuning the kernel
using kconfig (mtune and stune files). We
are now porting to Solaris x86 (Base Server) and need to be able to
make equivalent tuning changes. In particular, we need to increase the
various values associated with IPC queue (MSGMAX, MSGTQL, etc.). We
have found one cryptic way to do this by hacking at system(4). Is there
a better way and is there any comprehensive documentation source on
tuning kernel parameters under Solaris x86? Thanks.
--Ken Robbins, firm indeterminate
A: The "better way" involves editing /etc/system.
The performance manual section offers little help, but does list some parameters. My book (Sun Performance and Tuning) contains more details, including the algorithms that are being tuned.
Your question is a common one. I will probably address the question "What is the list of tunables in Solaris?" in a future column. There is no easy answer, unfortunately.
How can I time-out orphaned processes in Solaris?
Q:
At Brown & Root, we run both Solaris and AIX servers.
On all servers, we have Oracle as our database. On
occasion, some clients' Oracle processes
remain active even after they have logged off. In AIX,
we have found two parameters, tcp_keepidle and tcp_keepalive,
that help us timeout these orphaned processes. Is
there anything comparable in Solaris?
--Jacques Dejean, Brown & Root
A:
Your looking for the Solaris ndd
command, find a description of it
and the values it can be assigned in appendix E of
TCP/IP Illustrated, Volume 1 by W. Richard
Stevens. This book is also a complete reference to TCP/IP and how it works.
Make sure you understand the implications of any TCP tweaks. You can easily mess up the standard algorithm if you set it up wrong.
What causes slow rlogin?
Q:
What are likely causes of extremely slow rlogin both to and from a
machine? The machine in question is seldom busy. It takes about 60
seconds to do rlogin from or to the machine. Once rlogin is completed,
response is fine.
--Mike Kelly, firm indeterminate
A: Check for:
I get this problem myself, normally due to routing foul-ups. It may also help to put "file" at the start of the name server lookup path for hosts and password. I use "file nisplus dns" for hosts in /etc/nsswitch.conf, as I find that the system boots much more quickly if it looks up system identities for its main routers and servers in the /etc/hosts file.
Q: This is in response to your December Performance Q&A column and the question about what causes slow rlogins.
The most common reason I see (and hear about) a slow login is the remote site using daemons or protocol wrappers that use the ident protocol to lookup who is trying to connect. I use a TCP wrapper that logs user names on a daily to weekly basis. The lookup can cause a login delay of up to a configurable timeout (2 seconds) if the client machine is not running an identd daemon.
Another common cause on a busy machine is when the remote
site does not have enough physical memory and must swap to
get the login daemons or the shell loaded and running.
Hope this helps...
--Michael Johnson, CS Undergrad,
Oregon State University
Any performance tuning hints for Solaris 2.5?
Q:
I'm hoping that when Solaris 2.5 comes out you can dedicate
an article or a series of articles to the improvements and
kernel /etc/system parameters that should or should not be
set for 2.5. In reading your book, you gave different
hints for different types of systems (i.e., servers vs.
hosts), and the hints varied depending upon the version of
Solaris being used. I'm guessing that when 2.5 comes out,
it'll be different from 2.4, so it'd be nice to know what
changes have been made and what performance tuning hints
are applicable for 2.5.
--Blair Zajac, firm indeterminate
A: Since I'm writing this before Solaris 2.5 is officially released, I can't offer much guidance yet. I'll cover tunables soon. It takes a while to figure out how to tweak a new OS release. There are a few new NFS V3 variables. The rest is basically identical to 2.4.
Editor's Note: Solaris 2.5 was announced at the end of October. Shipment for the SPARC and Intel versions of the new OS has just begun; Solaris for PowerPC recently entered beta testing and is expected to ship early next year.
Why do some login IDs in SunOS 4.1 accounting files change?
Q:
I hope you can help me out. Lately, I had to look at the
/usr/adm/acct/fiscal/fiscrptxx files. I found that some login IDs
had two entries per file, while other login IDs had one.
Can you tell me what the problem is, or at least give me a hint? I need to use the files for performance evaluation purposes. Do I have to add up the entries corresponding to a given login ID per file?
--Halim M. Khelalfa, AI Division, CERIST
A: I'm not sure, and I haven't used accounting on 4.1.3 for many years. One guess: Perhaps some users changed their group ID, keeping the same user ID, during the month.
Why doesn't my virtual memory monitoring program
add up?
Q:
I now have a hard copy of your System Performance Monitoring
article and will read it soon. First, I am going to take advantage
of your offer to answer questions about this subject.
One of my personal monitoring programs presents physical memory utilization, which I calculate based on the following method. I believe it works and the assumptions are correct, but I'd like your opinion on its accuracy.
First I get some static facts:
V = total virtual memory size (everything is in kilobytes)
R = total real (physical) memory size
Next, I get 1-second snapshots of transient facts:
A = allocated (in-use) virtual memory
F = free (available) virtual memory, in the form of free resident memory pages (now in the physical memory)
IF ( A + F ) >= R THEN U = 100% ELSE U = (100% * A) / (R - F)
There are cases when A < R but I report 100 percent because of the free pages that inhabit physical memory, forcing some allocated pages to be swapped out. I am not concerned about this because there will be (at least a potential for) thrashing, and that's what 100-percent physical memory utilization is supposed to indicate. What I am concerned about is when ( A + F ) <= R yet there is a potential for thrashing -- and I don't know why -- because there is something missing from my equations.
Notes:
--Alex Vrenios, EMTEK Health Care Systems
A: Look at my SunWorld column entitled "Help! I've lost my memory!" Then it may become clear why your calculation does not work. The VM system is far more complex than your simple equation presumes. I don't think the available data is sufficient to model memory use. In particular, the only data available on a per-process basis is the size of the address space for the process, and the amount that has valid memory mappings. These values can be seen (measured in kilobytes) via the old-style ps command, in the SZ (process size) and RSS (process resident set size) fields:
% /usr/ucb/ps uax USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND root 2026 3.0 2.1 1424 1284 pts/6 O 23:14:29 0:01 /usr/ucb/ps uax adrianc 2021 0.7 4.1 3444 2500 ?? S 23:14:26 0:00 /usr/openwin/bin/c adrianc 1785 0.6 11.110048 6840 console S 20:50:55 1:12 /usr/openwin/bin/X adrianc 2024 0.3 1.4 980 856 pts/6 S 23:14:27 0:00 /bin/csh ...
Unfortunately for your calculation, the RSS excludes pages that are in memory but do not have valid mappings, and it includes pages that are shared by other processes. Your calculation also doesn't consider the memory used by files that are cached. To obtain this data, kernel code would have to be written that traverses many data structures and tallies the pages. This is not available in the base release, or in any commercial performance tools that I am aware of.
I think it would be useful to have more information about memory usage, and it is on my list of things I'd like to see added to Solaris.
Are kernel memory allocation errors worth worrying
about?
Q:
While recently monitoring a SPARCcenter 2000E with RuleTool, I noted that
the system was regularly experiencing kernel memory allocation errors.
I tried to find some info on the seriousness of this, but wasn't
able to find much other than it possibly being caused by a
memory leak. A call to SunService seemed to indicate that as
long as the frequency was very low (it was, approx. 5 per day)
it wasn't a cause for concern.
I would like more info on this in a future article (or in your next book Performance Tuning: The Sequel). I'd like to congratulate you on your book and I assume it's doing well considering it fills a void that's been around for years. (I purchased two copies myself, and have influenced several others in purchasing the book.)
--Greg Wells, firm indeterminate
A: If the system can't grab memory when it needs it and can't wait, then you can get problems like a stream or login attempt failing. There are other reasons why allocation failures occur; in most cases, the system finds a way to retry the operation and succeed.
This problem happens mostly on Solaris 2.4 and 2.5 multiprocessor systems, not so often on Solaris 2.3 or uniprocessor systems. If you see kmem allocation errors (sar -k 1), then increase the free list so that it is less likely to hit the endstop.
Set lotsfree to 128 * the number of CPUs you have or set up virtual_adrian.se to run every time you reboot and it will set this for you.
Adding more RAM doesn't help, as the free list size is not scaled. As you can see below, I've had a few errors on my Ultra 1, but I take this as a warning, not a serious worry. It is useful to track it in case something else fails at the same time as a new kmem error.
% sar -k 1 SunOS eccles 5.5 Generic sun4u 11/05/95 23:30:18 sml_mem alloc fail lg_mem alloc fail ovsz_alloc fail 23:30:19 4046848 3611540 0 7536640 6492776 8 5373952 0
How can I improve my Web server's http performance?
Q:
My issue deals with a problem I have been seeing on more
and more Solaris 2.4 systems running as WWW servers.
First, I do realize that the http protocol was not designed to work with TCP/IP. In fact, it butchers it, but since it's a growing phenomena, we need to tune the system for it!
Now, the problem I have been seeing. When dialup users connect to these WWW servers via SLIP/PPP, Solaris apparently drops a lot of packets, and a lot of retransmissions are occurring as shown from the results of the netstat -s command.
What I discovered is that the default setting of 200 for tcp_rexmit_interval_min is too low. Setting this up to 10000 finally gives good performance results. However, as you are well aware, this will increase the amount of time the system waits before a retransmission takes place after a packet is dropped. Catch-22! ;)
I also noted that the listen backlog parameter set by ndd: tcp_conn_req_max is set to 5 and allows a maximum value of 32.
How can I optimize Solaris 2.4 to perform well as a WWW server? More and more clients are asking me to improve WWW performance on Sun.
--Boni Bruno, Data Systems West
A: There is an excessive retransmit bug that is fixed in Solaris 2.4 patch 101945-34 (Sun's recently released kernel jumbo patch) and Solaris 2.5. You will still see retransmit levels of 10-30% on machines with direct Internet connections. You can reduce them by setting the initial retransmit interval to a second or so (1000, as the units are ms). Most packets seems to take at least a second to get to their destination and get an acknowledgement back over the Internet! You should not set it to much more than a second.
The limit for tcp_conn_req_max should be set to 32 in 2.4, and can be set up to 1024 with ndd in 2.5 if you have enough memory to hold all those pending connections. A setting of 128 seems to work well on Solaris 2.5, and is being used on some big internet sites.
Add these lines to /etc/init.d/inetinit
ndd -set /dev/tcp tcp_rexmit_interval_initial 1000 ndd -set /dev/tcp tcp_conn_req_max 32
We also have fast name service caching in 2.5, so DNS (Domain Name System) lookups get cached (see the nscd man page). In general 2.5 is a much faster Internet server than 2.4, even though there are several areas where tuning work is still underway.
Does Solaris offer a vmtune-like tool?
Q:
I've recently started using Suns. With Sequent Dynix/ptx (based on AT&T
V3.2), a vmtune utility controls virtual memory (VM) management.
--MyungSuk Yoo, Bombardier Regional Aircraft
A: There are no controls on resident set size per process in any of the mainstream versions of Unix. Why not? Well, it's hard to get a default behavior that works any better than the current system over a wide range of system sizes and workloads. Also, implementing a working set pager requires a lot more overhead, in terms of both CPU use and kernel data storage.
In Solaris 2.4, swapouts of large idle processes occur if free memory stays well below its normal level for several seconds.
Why are my news spool disks overloaded?
Q:
I am running a news server and I am getting very poor
performance from it. It is running on a SPARCserver 1000 with
640 megabytes of RAM. The news software (INN) resides on /opt (sd1)
and Solaris 2.4 resides on sd0. iostat -x 30 indicates that at least
one of my bottlenecks can be attributed to my disks, primarily the spool.
I am striping 3 disks (sd15 sd37 sd7)
using Online DiskSuite. The stripe has an interlace value of 16 blocks.
Below is some of the output from iostat -x 30. As you can see, most of the load is caused by writes to the spool.
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b sd0 0.0 3.3 0.0 19.9 0.0 0.1 34.6 0 5 sd1 0.0 15.7 0.0 99.3 0.0 0.4 24.7 0 39 sd15 1.4 18.0 3.6 97.9 45.1 46.9 4737.6 44 79 sd37 0.7 16.5 3.2 99.9 10.5 7.1 1025.1 9 22 sd7 0.5 16.5 2.7 98.7 9.7 6.8 972.8 9 20 extended disk statistics disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b sd0 0.0 3.8 0.0 24.6 0.0 0.1 38.2 0 5 sd1 0.0 15.9 0.0 101.1 0.0 0.4 23.9 0 38 sd15 1.1 17.5 2.3 100.3 14.0 54.2 3656.0 36 73 sd37 0.8 15.5 3.3 96.9 9.0 6.6 961.8 8 21 sd7 0.6 15.5 3.2 97.0 8.9 6.1 929.2 8 18
--(name and firm indeterminate)
A: Those disks are dead meat! A slow service time is 50 milliseconds; 4737 ms is glacier-like speed. As you can see, there are 47 active commands inside the disk drive, and 45 commands waiting to be sent to the drive. Each new command you send to the drive has to wait for 92 other commands to finish first. Thus it takes almost 5 seconds to service each I/O. Dividing down, 4737 ms/92 commands = 51 ms for each I/O at the disk drive. This indicates a lot of long seeks -- probably random seeks between inodes and data in many parts of the disk drive.
The problem is lots of files being created, touched, and destroyed; lots of inode updates; and Directory Name Lookup Cache (DNLC) activity (i.e., a busy NNTP [Network News Transfer Protocol] server).
The best fix: Add non-volatile (NV) SIMMs and Legato's Prestoserve software. This will help a lot more than anything else. If the disks are still too busy, you need more of them, and you need an NVRAM disk cache. A SPARCstorage Array (SSA) with 12 or so disks would give you a wider stripe. The SSA NVRAM is a reasonable substitute for the Prestoserve NVSIMMs, but both together is even better. Note that you do not need the storage capacity of 12 disk drives, but you look as if you need the random I/O performance of them. Twelve disks may seem extreme, but so does a 4700-ms service time!
Increasing ncsize and ufs_ninode to 34000 in /etc/system may help a little. With 640 megabytes of RAM, maxusers should be at 640 already, and the caches will already be quite large. If you have set maxusers directly to some low value then you should remove it from /etc/system and let it size automatically.
Where can I find the HP-UX version of SymbEL?
Q:
I've printed out your article to read while working on my HP-UX system.
Of course, my question is do you have a version for 9.X on a 9000/735?
--dave, (firm indeterminate)
A: It is difficult to build a useful SE language on HP-UX, AIX or other OSes. The trick on Solaris 2 is that the /dev/kstat interface is a readonly, nonpriviledged interface, note that vmstat etc are no longer setuid commands in Solaris 2. This means that you can write easy scripts that can get at almost all the performance data, without running as root, and without making the binary setuid.
That said, Rich Pettit has recently been looking at the performance interfaces on other platforms. They are effectively undocumented on HP-UX as far as we can tell, and without sourcecode, it is hard to work out what to do.
Overall, one of my aims was to make Solaris 2 a better OS to work with than other OSes, so I'm not very motivated to spend time working on ports. I also have way too many unimplemented ideas for Solaris 2 to work on.
Which is better, SPARC or Pentium?
Q:
My organisation is arguing about SPARC servers versus
UNIX pentium-based servers. What I really have to prove
to them is that the SPARC motherboard is faster, more
reliable and robust, and is a market leader.
Could you please mail me relevant information QUICKLY before
the management decides to scrap Sun?
Thanks
--Jean-Pierre, (firm indeterminate)
A: This is really a job for you local Sun sales team to take on. What I can say is that we have been benchmarking Solaris on Intel 166MHz Pentium's and on Ultra1's, for network server workloads the Ultra's are two to three times faster. The PC motherboard and ethernet hardware does not have the bandwith to compete. We actually get throughput on SBus of over 100MBytes/s on an Ultra 1, the PCI bus on a Pentium PC is rated at 100MB/s or so in theory, but in practice you are lucky if you get over 10MB/s.
The PC hardware is designed down to a price, and most of the I/O adaptors are also not designed for demanding applications.
Some Web server benchmarks backing this up should be put up on www.sun.com soon, but they are not available yet.
The other issue is support, and since the hardware and software are from the same company, we can give better integrated support with less finger- pointing between hardware/OS/IOcard suppliers when something isn't working.
Why doesn't 32 megabytes
seem like enough?
Q:
The short question is, "why does it seem that a SPARC 4
running solaris 2.4 w/ 32 megabytes of RAM has too little memory?"
I have such a system and it seems almost hopeless to have emacs, gdb, netscape, and g++ running -- unless you like to hear the disk spinning (paging).
I think your columns are great, so thanks!
--Dave, (firm indeterminate)
A: See Adrian's May column for an indirect answer to this question.
Is there a way to measure
the amount of CPU used by AIO "waiting" methods?
Q:
Based on the way kaio works in Solaris, the question has
come up regarding the possibility of being CPU bound because
you are i/o bound. The thought process on this is based on
the asumption that Solaris is asynchronously "waiting" for
the i/o to complete by doing a SIGIO with a polling timeout
versus the synchronous method of using the aiowait function.
Can you help to clear this up ? Is there a way to measure
the amount of CPU used by these "waiting" methods ?
--Marty Carangelo, Amdahl
A: Solaris doesn't poll. The user application might if it was using aio, but Solaris waits for the interrupt to wake up the thread that issued the I/O.
How many syscalls are too many?
Q:
I don't find a threshold mentioned for the three fault
categories. Here is a sample of output from a SC2000
r b w swap free re mf pi po fr de sr s1 s1 s1 s3 in sy cs us sy id 0 0 0 2502796 1136148 0 0 0 0 0 0 0 0 0 0 0 172 103473 498 61 29 10 0 0 0 2502796 1136148 0 0 0 0 0 0 0 0 0 0 0 137 104002 487 62 29 9 0 0 0 2502796 1136148 0 0 0 0 0 0 0 0 0 0 0 154 104144 463 62 29 9 0 0 0 2502796 1136148 0 0 0 0 0 0 0 1 0 0 0 109 104189 467 63 28 9It seems to me the fault/sy numbers are high. What do you consider high?
A: syscalls are a byproduct of an application doing work. The more the better, as it means you are doing more work more quickly.
In absolute terms 100000 is quite reasonable for an SC2000, especially since you have usr:sys times in a 2:1 ratio which is quite healthy.
What can I do when the kernel memory button in ruletool goes black?
Q:
We have a SPARCcenter 2000 with 8 cpus, 3 gb of RAM, 150 gb of disk.
We are running a lot of processes, a dbms, and have quite a few users.
I tuned the system file as near as I can tell according to the
guidelines in your book and using the output of ruletool. The box keeps
coming to its knees with kmem allocation errors - no kmem available
(from ruletool). Of course - the kernel mem button in ruletool goes
black and the box can't recover. I end up having to stop-a and reboot.
Any help you can provide would be greatly appreciated as I am about to
be ran out of Dodge by the local town folk.
--Ted Regan, EDS
A: fixed in Solaris 2.5.1 with an algorithm change and a bigger free list.
In the meantime set slowscan=500
, and keep doubling
lotsfree and desfree until the alloc fails go away.
try this first (assuming lots - 3GB of RAM)
set slowscan=500 set lotsfree=4000 set desfree=2000
How large can a process be in Solaris?
Q:
You say a SPARC Center 2000 can support 5gig of ram, is there
still a 4gig per process limit on this system?
Does Solaris 2.5 support more than 4 gig per process?
Does Solaris 2.5.1 support more than 4 gig per process?
--Chris Krebs, (firm indeterminate)
A: Note that the Enterprise 6000 machine can support 30GB of RAM, and both it and the SC2000 are limited by DRAM density, not address space.
The SC2000 has a 32bit virtual address space that maps to a 36bit physical address space. That is how lots of 4GB processes can share a much larger amount of RAM.
About the author
Adrian
Cockcroft joined Sun Microsystems in 1988, and currently works as a
performance specialist for the Computer Systems Division of Sun. He
wrote Sun
Performance and Tuning: SPARC and Solaris and Sun
Performance and Tuning: Java and the Internet, both published
by Sun Microsystems Press
Books.
The answers to questions posed here are those of the author, and do not represent the views of Sun Microsystems Inc.
Resources
virtual_adrian.se
rule:
If you have problems with this magazine, contact
webmaster@sunworld.com
URL: http://www.sunworld.com/common/cockcroft.letters.html
Last updated: