Readers speak out:
|
|
I've been trying to tune our proxy servers which run the Harvest cached 3.0 code. Peter Danzig is working on the code to make improvements. But I think we are getting hung up on disk I/O. We have a Sun Ext Disk pack 6x2Gig FW SCSI for disk caching and each disk is Translog'd using 64 MByte log for the 1.8 GB master.
Here is a snapshot of the iostat during peak periods. 5 minute sample time:
extended disk statistics tty cpu disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b tin tout us sy wt id fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 26 29 44 2 sd0 0.8 0.9 6.2 13.7 0.0 0.0 25.0 0 2 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd19 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd24 13.4 12.7 95.9 50.8 0.0 0.7 26.5 0 32 sd25 20.5 22.0 153.3 83.1 0.0 1.3 31.4 0 54 sd26 19.7 19.3 142.2 73.0 0.0 1.2 31.5 0 52 sd27 16.8 16.6 122.3 63.2 0.0 1.1 31.5 0 43 sd28 19.4 18.7 140.3 71.2 0.0 1.2 32.2 0 50 sd29 19.3 19.6 139.6 75.7 0.0 1.3 32.6 0 50
Any ideas on what to do?
Ron Tazuma
Adrian responds:
It's not clear whether you have the disks striped together or have
six separate sections? I assume they are separate.
Since you also have a bunch of idle disks, it would make a lot of
sense to move the 64 MByte trans logs from the main disks to the
idle disks. This would free up the SCSI bus and the main disk
spindles in the 6x2G disk pack. Right now, you are getting
about 30-50 I/O per second, at 30-50 percent busy, on disks that
spin at 120 rps, so you are getting less than one disk I/O per disk
revolution and that implies a random access pattern. The logs should
be sequential, so move them onto another idle disk where they can
run independently and hopefully get some sequential activity.
30 ms service times are not good. With better layout or NVRAM
(SPARCstorage Array) it should be possible to get closer to 10 ms.
(I have seen 3-6 ms for some very well-optimized SSA workloads.)
In your column in issue 3/97 of SunWorld (http://www.sunworld.com/swol-03-1997/swol-03-perf.html/) you write: "Using the rnode cache ...nrnode is set to twice the value of ncsize. It doesn't normally need tuning, but if the DNLC is increased, nrnode increases as well."
I cannot reproduce that here in 2.5.1.
1) SunOS unicorn 5.5.1 Generic_103640-05 sun4m sparc SUNW,SPARCclassic untweaked /etc/system, 64 MB: root@unicorn> adb -k /dev/ksyms physmem 3e4c ncsize/D ncsize: ncsize: 1144 nrnode/D nrnode: nrnode: 1144 ufs_ninode/D ufs_ninode: ufs_ninode: 1144 (Note that nrnode=ncsize, not twice that value.) 2) SunOS celan 5.5.1 Generic_103640-05 sun4u sparc SUNW,Ultra-1 ncsize=4362 in /etc/system, 128 MB: root@celan> adb -k /dev/ksyms physmem 3de9 ncsize/D ncsize: ncsize: 4362 nrnode/D nrnode: nrnode: 2181 ufs_ninode/D ufs_ninode: ufs_ninode: 4362
That is, on the latter machine I have doubled ncsize, but nrnode doesn't increase automatically. Of course, you probably get a lot of mail from people asking about performance issues, so I don't even expect you to reply. Just wanted to keep you informed about that "contradiction."
Christian
Adrian responds:
Thanks for pointing this out. I described the way it used to work, but it seems they have changed the code more recently. I'll have to re-check the source.
I just started using your product called Virtual Adrian and find it to be a very useful tool. I have been on the Internet to get a better understanding of the meaning of the statistics represented. I have not been able to get an explanation of the field "delay" within the "AGGREGATE DISK INFORMATION" screen. This is of concern to me because I am seeing very high numbers in that field (like 1135), when my svc time is only 27 ms and no queuing is reported for that particular disk. I'm seeing stats like this for several disks on and off throughout the day. Could you please explain how that field is calculated and what it is representing? Thank you in advance for your help.
Deb Platt
The Hartford Insurance Group
Adrian responds:
The easy way to find out is to read the se code:
Delay is the service time multiplied by the number of I/Os, so a
disk that has little activity has a low delay and a disk that is
used a lot has delay that depends on its service time. The idea is
that each disk I/O delays the progress of a process in the system,
and delay is the total amount of delay caused to the system by that
disk. When tuning, start with the disk that has caused the highest
delay, as it will have the most impact.
Thanks a lot for your very good columns in SunWorld. I always read them with great interest. Now I have an additional question about your 12/96 column Tips for TCP/IP monitoring and tuning (http://www.sunworld.com/swol-12-1996/swol-12-perf.html/):
1. Is there a possibility in resetting the statistics shown by
netstat -i
or netstat-s
?
2. How is an error defined in the netstat -i
output.
When is the error counter increased for Inerr
and
Outerr
? I get errors on the Ethernet, but only when the
load is high on the segment, so I want to investigate where they
come from. But for that it would be nice to have the answers on the
both questions.
Peter Haag
Adrian responds:
1. No, but it's easy to read them then look for differences using an
SE script. SE includes
2. It implies that the Ethernet frame was corrupted, or that the
link went up or down (you would have seen a no carrier message). To
debug it you need a proper network analyzer, as you may have bad
cabling that is causing the problem.netstatx.se
, which would be a
good place to start.
I've been reading your column almost since it started and also have your book, both of which I find extremely useful (to put it mildly). I work on the Teknekron (TIBco) trading system product and have been for the past three years or so. Currently I'm at JP Morgan, working as a project manager for its Teknekron system. One of the projects I'm working on is to integrate Ultras into the existing Teknekron infrastructure, which consists of mostly SPARC20s with some SPARC10s, running a mix of SunOS4.1.3_U1, Solaris 2.4 and Solaris 2.5.1.
I don't know how much you know about Teknekron, but the background to my problem is that the current udp buffer size set in the Teknekron network daemon needs to be increased from 50 kb to 128 kb to ensure that the Ultras don't overflow the buffer on the slower machines on the network, which would cause a re-request storm. This buffer increase works under Solaris 2.5.1, but does not work under Solaris 2.4/1.1.1.
I've been looking into this and it appears to be a problem with lack
of support for RFC 1323. The udp buffer size on Solaris 2.4./1.1.1
seems to be fixed at 50 k, and I cannot find a way to increase it.
Also if you do a ndd
on /dev/udp, a new parameter
(udp_max_buf) is available under 2.5.1, but not 2.4. This is set at
256 kb on 2.5.1 which is great. I can't find a patch that might
provide this support, so do you have any information that might help
me solve this problem?
Jon Kent
Adrian responds:
In general, only bug fixes are backported. In some cases this
includes performance bugs. Have you checked with the very latest
patches for Solaris 2.4 ? SunOS 4.X is not going to change.
The simplest fix is to upgrade everything to Solaris 2.5.1, so you
can set a bigger buffer. If you want a patch to 2.4, then you need
to make a service call and ask for the 2.5.1 code to be backported,
and see what they say.
It seems like a design error that your protocol requires UDP buffers
to prevent overrun, it might be better to use TCP for this kind of
application, as it manages the buffering for you. Effectively you
seem to be using a TCP-like layer on top of UDP. With all the tuning
of TCP for Web traffic, TCP can be faster and lower overhead than
UDP in some cases.
It is very helpful to read your column as a Solaris system administrator. I want to ask you two technical questions.
When I use SE2.4 to monitor our UltraSPARC:
1. It reports a very heavy load in its system disk which contains
"/,swap,/var,/opt 4 file system," (and) the color is often red. When
I use fuser -c
to check the process in our file system,
it reports more than 50 processes related-to. We have a 30-gigabyte
SPARCstorage subsystem. I want to balance the disk load. Can you
teach how to do it?
2. When I use SE 2.5 to do quick tune, it always reports as follows when it checks the network condition: "Error on interface le0 (ierror=0, oerror=1)" Can you tell me what it is and how can I repair it?
I appreciate your help and time. Your last tip to install a patch for Solaris helped me a lot.
Zhang Shiming
System Engineer
Beijing PDN Xinren Information Technology Co. Ltd.
Adrian responds:
1. Use /usr/ucb/ps uax head to see which processes are active, then
use truss -p on those processes to see if they are doing disk
accesses and /usr/proc/bin/pfiles to see what files they have open.
If your system is paging/scanning hard then the swap partition will
be overloaded and you need to add RAM or add another swap file on a
separate disk.
2. One output error is OK, it is probably caused by disconnecting
the Ethernet cable at some time in the past. The quick tune looks at
totals since boot, which may not be as useful as looking at a timed
interval.
Someone logged into my system -- I run solaris 2.5 on SPARC computer. That person logged into an ftp port on someone else's account. He erased the account contents. I am not sure if I can ask you this, but could you help me or direct me to someplace where I could find out how to disable those in. ftpd & in.telnetd services for most hosts except for few of them? So that, for example, those services would be available only for few hosts instead of all of them. I know how to disable the service for all or allow all the users, but have no idea how to disable it only for certain hosts. Thank you in advance for any help you might give me.
Chrys
Peter responds:
You should look into the rpcbind package (section 3.5 of the FAQ). That, or the subcomponent tcp-wrappers, will let you do exactly what you ask for.
I just read your intro to cryptography in SunWorld. Clear and very helpful. Thanks!
Leif Smith
I feel the In-box Direct message is too long. If I wanted the whole table of contents, I would go to your Web site! Otherwise it's o.k.
Name and Affiliation withheld
Stephen Lawton responds:
Dear subscribers:
In-box Direct letters, which are sent in HTML format, are sent only
to those readers who signed up with Because you signed up through In-box Direct, we do not know as
much about you as a reader as we would otherwise. And because you
signed up through the Netscape site, we make certain assumptions
about you, such as your probable use of the Netscape mail reader.
For this reason, we can send you the full table of contents
at the beginning of each month with live links.
This is a service we will be expanding to all of our subscribers
-- the ability to get their reader letters in HTML format. I'm
sorry that you find a complete table of contents overkill, but other
readers have told us that they appreciate getting the complete
directory of all stories via e-mail -- that it helps them return to
the site more often.
Our reader letters are designed to help you, our subscribers,
keep track of the new stories on our site. If you're reading this
and are not a subscriber, I encourage you to fill out the
Subscription
Form today. You'll get a letter from SunWorld each
week updating you on what's new on the site, important breaking
stories, and early looks at stories that will be appearing on the
site later in the months.
Thanks for your continued support of SunWorld. We look
forward to serving you for a long time to come.
Stephen Lawton
Editor-in-Chief
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL:
http://www.sunworld.com/swol-04-1997/swol-04-letters.html
Last modified: