Readers speak out:
April Letters to the Editor

Fine tuning proxy servers

To Adrian Cockroft:

I've been trying to tune our proxy servers which run the Harvest cached 3.0 code. Peter Danzig is working on the code to make improvements. But I think we are getting hung up on disk I/O. We have a Sun Ext Disk pack 6x2Gig FW SCSI for disk caching and each disk is Translog'd using 64 MByte log for the 1.8 GB master.

Here is a snapshot of the iostat during peak periods. 5 minute sample time:

 
                                  extended disk statistics       tty        
 cpu disk      r/s  w/s   Kr/s   Kw/s wait actv  svc_t  %w  %b  tin tout us
 sy wt id fd0       0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0    0    0
 26 29 44  2 sd0       0.8  0.9    6.2   13.7  0.0  0.0   25.0   0   2 
 sd1       0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 
 sd17      0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 
 sd18      0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 
 sd19      0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 
 sd24     13.4 12.7   95.9   50.8  0.0  0.7   26.5   0  32 
 sd25     20.5 22.0  153.3   83.1  0.0  1.3   31.4   0  54 
 sd26     19.7 19.3  142.2   73.0  0.0  1.2   31.5   0  52 
 sd27     16.8 16.6  122.3   63.2  0.0  1.1   31.5   0  43 
 sd28     19.4 18.7  140.3   71.2  0.0  1.2   32.2   0  50 
 sd29     19.3 19.6  139.6   75.7  0.0  1.3   32.6   0  50

Any ideas on what to do?

Ron Tazuma

Adrian responds:

It's not clear whether you have the disks striped together or have six separate sections? I assume they are separate.

Since you also have a bunch of idle disks, it would make a lot of sense to move the 64 MByte trans logs from the main disks to the idle disks. This would free up the SCSI bus and the main disk spindles in the 6x2G disk pack.

Right now, you are getting about 30-50 I/O per second, at 30-50 percent busy, on disks that spin at 120 rps, so you are getting less than one disk I/O per disk revolution and that implies a random access pattern. The logs should be sequential, so move them onto another idle disk where they can run independently and hopefully get some sequential activity.

30 ms service times are not good. With better layout or NVRAM (SPARCstorage Array) it should be possible to get closer to 10 ms. (I have seen 3-6 ms for some very well-optimized SSA workloads.)

Re-checking the rnode source

To Adrian Cockroft:

In your column in issue 3/97 of SunWorld (http://www.sunworld.com/swol-03-1997/swol-03-perf.html/) you write: "Using the rnode cache ...nrnode is set to twice the value of ncsize. It doesn't normally need tuning, but if the DNLC is increased, nrnode increases as well."

I cannot reproduce that here in 2.5.1.

 
1) SunOS unicorn 5.5.1 Generic_103640-05 sun4m sparc SUNW,SPARCclassic
    untweaked /etc/system, 64 MB:
 
 root@unicorn> adb -k /dev/ksyms
 physmem 3e4c
 ncsize/D
 ncsize:
 ncsize:         1144
 nrnode/D
 nrnode:
 nrnode:         1144
 ufs_ninode/D 
 ufs_ninode:
 ufs_ninode:     1144
 
 (Note that nrnode=ncsize, not twice that value.)
 
 2) SunOS celan 5.5.1 Generic_103640-05 sun4u sparc SUNW,Ultra-1
    ncsize=4362 in /etc/system, 128 MB:
 
 root@celan> adb -k /dev/ksyms
 physmem 3de9
 ncsize/D
 ncsize:
 ncsize:         4362
 nrnode/D
 nrnode:
 nrnode:         2181
 ufs_ninode/D
 ufs_ninode:
 ufs_ninode:     4362

That is, on the latter machine I have doubled ncsize, but nrnode doesn't increase automatically. Of course, you probably get a lot of mail from people asking about performance issues, so I don't even expect you to reply. Just wanted to keep you informed about that "contradiction."

Christian

Adrian responds:

Thanks for pointing this out. I described the way it used to work, but it seems they have changed the code more recently. I'll have to re-check the source.

Need help understanding some Virtual Adrian statistics

To Adrian Cockroft:

I just started using your product called Virtual Adrian and find it to be a very useful tool. I have been on the Internet to get a better understanding of the meaning of the statistics represented. I have not been able to get an explanation of the field "delay" within the "AGGREGATE DISK INFORMATION" screen. This is of concern to me because I am seeing very high numbers in that field (like 1135), when my svc time is only 27 ms and no queuing is reported for that particular disk. I'm seeing stats like this for several disks on and off throughout the day. Could you please explain how that field is calculated and what it is representing? Thank you in advance for your help.

Deb Platt
The Hartford Insurance Group

Adrian responds:

The easy way to find out is to read the se code:

Delay is the service time multiplied by the number of I/Os, so a disk that has little activity has a low delay and a disk that is used a lot has delay that depends on its service time. The idea is that each disk I/O delays the progress of a process in the system, and delay is the total amount of delay caused to the system by that disk. When tuning, start with the disk that has caused the highest delay, as it will have the most impact.

Resetting the statistics

To Adrian Cockroft:

Thanks a lot for your very good columns in SunWorld. I always read them with great interest. Now I have an additional question about your 12/96 column Tips for TCP/IP monitoring and tuning (http://www.sunworld.com/swol-12-1996/swol-12-perf.html/):

1. Is there a possibility in resetting the statistics shown by netstat -i or netstat-s?

2. How is an error defined in the netstat -i output. When is the error counter increased for Inerr and Outerr? I get errors on the Ethernet, but only when the load is high on the segment, so I want to investigate where they come from. But for that it would be nice to have the answers on the both questions.

Peter Haag

Adrian responds:

1. No, but it's easy to read them then look for differences using an SE script. SE includes netstatx.se, which would be a good place to start.

2. It implies that the Ethernet frame was corrupted, or that the link went up or down (you would have seen a no carrier message). To debug it you need a proper network analyzer, as you may have bad cabling that is causing the problem.

Overflowing the buffer

To Adrian Cockroft:

I've been reading your column almost since it started and also have your book, both of which I find extremely useful (to put it mildly). I work on the Teknekron (TIBco) trading system product and have been for the past three years or so. Currently I'm at JP Morgan, working as a project manager for its Teknekron system. One of the projects I'm working on is to integrate Ultras into the existing Teknekron infrastructure, which consists of mostly SPARC20s with some SPARC10s, running a mix of SunOS4.1.3_U1, Solaris 2.4 and Solaris 2.5.1.

I don't know how much you know about Teknekron, but the background to my problem is that the current udp buffer size set in the Teknekron network daemon needs to be increased from 50 kb to 128 kb to ensure that the Ultras don't overflow the buffer on the slower machines on the network, which would cause a re-request storm. This buffer increase works under Solaris 2.5.1, but does not work under Solaris 2.4/1.1.1.

I've been looking into this and it appears to be a problem with lack of support for RFC 1323. The udp buffer size on Solaris 2.4./1.1.1 seems to be fixed at 50 k, and I cannot find a way to increase it. Also if you do a ndd on /dev/udp, a new parameter (udp_max_buf) is available under 2.5.1, but not 2.4. This is set at 256 kb on 2.5.1 which is great. I can't find a patch that might provide this support, so do you have any information that might help me solve this problem?

Jon Kent

Adrian responds:

In general, only bug fixes are backported. In some cases this includes performance bugs. Have you checked with the very latest patches for Solaris 2.4 ? SunOS 4.X is not going to change.

The simplest fix is to upgrade everything to Solaris 2.5.1, so you can set a bigger buffer. If you want a patch to 2.4, then you need to make a service call and ask for the 2.5.1 code to be backported, and see what they say.

It seems like a design error that your protocol requires UDP buffers to prevent overrun, it might be better to use TCP for this kind of application, as it manages the buffering for you. Effectively you seem to be using a TCP-like layer on top of UDP. With all the tuning of TCP for Web traffic, TCP can be faster and lower overhead than UDP in some cases.

Balancing the disk load

To Adrian Cockroft:

It is very helpful to read your column as a Solaris system administrator. I want to ask you two technical questions.

When I use SE2.4 to monitor our UltraSPARC:

1. It reports a very heavy load in its system disk which contains "/,swap,/var,/opt 4 file system," (and) the color is often red. When I use fuser -c to check the process in our file system, it reports more than 50 processes related-to. We have a 30-gigabyte SPARCstorage subsystem. I want to balance the disk load. Can you teach how to do it?

2. When I use SE 2.5 to do quick tune, it always reports as follows when it checks the network condition: "Error on interface le0 (ierror=0, oerror=1)" Can you tell me what it is and how can I repair it?

I appreciate your help and time. Your last tip to install a patch for Solaris helped me a lot.

Zhang Shiming
System Engineer
Beijing PDN Xinren Information Technology Co. Ltd.

Adrian responds:

1. Use /usr/ucb/ps uax head to see which processes are active, then use truss -p on those processes to see if they are doing disk accesses and /usr/proc/bin/pfiles to see what files they have open.

If your system is paging/scanning hard then the swap partition will be overloaded and you need to add RAM or add another swap file on a separate disk.

2. One output error is OK, it is probably caused by disconnecting the Ethernet cable at some time in the past. The quick tune looks at totals since boot, which may not be as useful as looking at a timed interval.

Protecting yourself from disabling telnet and ftp sessions

For Peter Galvin:

Someone logged into my system -- I run solaris 2.5 on SPARC computer. That person logged into an ftp port on someone else's account. He erased the account contents. I am not sure if I can ask you this, but could you help me or direct me to someplace where I could find out how to disable those in. ftpd & in.telnetd services for most hosts except for few of them? So that, for example, those services would be available only for few hosts instead of all of them. I know how to disable the service for all or allow all the users, but have no idea how to disable it only for certain hosts. Thank you in advance for any help you might give me.

Chrys

Peter responds:

You should look into the rpcbind package (section 3.5 of the FAQ). That, or the subcomponent tcp-wrappers, will let you do exactly what you ask for.

Kudos on the encryption primer

I just read your intro to cryptography in SunWorld. Clear and very helpful. Thanks!

Leif Smith

Why do your reader letters contain a table of contents?

I feel the In-box Direct message is too long. If I wanted the whole table of contents, I would go to your Web site! Otherwise it's o.k.

Name and Affiliation withheld

Stephen Lawton responds:

Dear subscribers:

In-box Direct letters, which are sent in HTML format, are sent only to those readers who signed up with SunWorld from Netscape Communications' In-box Direct page. Readers who sign up with SunWorld using the subscription form on our site get a different letter.

Because you signed up through In-box Direct, we do not know as much about you as a reader as we would otherwise. And because you signed up through the Netscape site, we make certain assumptions about you, such as your probable use of the Netscape mail reader. For this reason, we can send you the full table of contents at the beginning of each month with live links.

This is a service we will be expanding to all of our subscribers -- the ability to get their reader letters in HTML format. I'm sorry that you find a complete table of contents overkill, but other readers have told us that they appreciate getting the complete directory of all stories via e-mail -- that it helps them return to the site more often.

Our reader letters are designed to help you, our subscribers, keep track of the new stories on our site. If you're reading this and are not a subscriber, I encourage you to fill out the Subscription Form today. You'll get a letter from SunWorld each week updating you on what's new on the site, important breaking stories, and early looks at stories that will be appearing on the site later in the months.

Thanks for your continued support of SunWorld. We look forward to serving you for a long time to come.

Stephen Lawton
Editor-in-Chief

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-04-1997/swol-04-letters.html
Last modified:

Readers speak out:April Letters to the Editor

Fine tuning proxy servers

Re-checking the rnode source

Need help understanding some Virtual Adrian statistics

Resetting the statistics

Overflowing the buffer

Balancing the disk load

Protecting yourself from disabling telnet and ftp sessions

Kudos on the encryption primer

Why do your reader letters contain a table of contents?

Readers speak out:
April Letters to the Editor