Click on our Sponsors to help Support SunWorld

SE Toolkit FAQ

Answers to the most frequently asked questions about SE

January 1998

Abstract

We released a new version of SE last month, and there are some common questions about the release which are answered in this FAQ. (1,000 words)

Mail this
article to
a friend

What do I need to watch out for when I'm using SE?

Rich and I get a lot of questions about SE on our se-feedback alias. In order to reduce the number of times we have to answer the same question this month's column is a FAQ.

Where can I get a version of SE that runs on Solaris 2.6?
This is still the most common question! A new version, SE3.0 was introduced last month. It's a major rewrite with many new features and includes Solaris 2.5, 2.5.1, and 2.6 support for SPARC and x86 (no older releases any more) and many more network interface types. Keep reading SunWorld to stay up to date. The SE distribution directory is now http://www.sun.com/sun-on-net/performance/se3.

Where can I get a version of SE that runs on Solaris 2.3 or 2.4?
We don't have enough time or test systems to support many releases, you can use the previous SE2.5.0.2 release on Solaris 2.3 through 2.5.1. The SE2.5.0.2 distribution directory is http://www.sun.com/960601/columns/adrian/se2.5.0.2.

Why did Solaris 2.6 support take so long?
For Solaris 2.6 we had to selectively filter out the partition, tape, and NFS data from the iostat class and disk rule. The rule now makes sure it only has disks that contain partitions to look at. The changes in TCP that are in 2.6 and were backported to 2.5.1 were taken care of; SE now looks for some key patches to see if they are installed and sets preprocessor #defines to cope with the changes.

Why do I get the error "Fatal: member: txunderruns vanished!: Near line 201"?
This problem occurs with SE2.5.0.2 and FDDI 5.0 interfaces. There are three possible fixes:

Upgrade to SE3.0.
Visit the SE2.5.0.2 download page and get the FDDI patch.
Add the latest patch to FDDI which should reinstate the metric that went missing.

Why do I get the error "Fatal: member: txunderruns0 vanished!: Near line 255"?
This problem occurs with SE3.0 and older FDDI interface code. Update your FDDI patch level or try running scripts with % se -DOLD_FDDI script.se

Why do I get the error "Fatal: member: defer vanished!: Near line 285"?
This problem occurs with Solaris 2.5 and hme interfaces. There are three possible fixes:

Upgrade to a later Solaris release.
Get the hme patch for Solaris 2.5.
As a temporary work-around, change the member "defer" to "missing1" in the ks_hme_network structure in /opt/RICHPse/include/kstat.se like this:
```
#ifdef MINOR_VERSION >= 51
        ulong defer;
#else
        ulong missing1;
#endif
```

The next build of SE3 will figure this out automatically.

Why do I get the error "Fatal: member: framming vanished!: Near line 160"?
This problem occurs with Solaris 2.5.1 and le interfaces. The le patch for Solaris 2.5.1 corrects the spelling from framming to framing. SE3.0 tries to detect this patch, but if you don't have the patch directory in /var/sadm/patch it can't tell that the patch is installed. There are three possible fixes. 1) Upgrade to Solaris 2.6. 2) Reinstall the le patch for Solaris 2.5.1. 3) Create the directory /var/sadm/patch/103903-03 by hand. 4) As a temporary work-around, run scripts using % se -DLE_PATCH script.se to force the update.

Why do my networks keep indicating BLACK?
You may see messages like this from virtual_adrian, or black states reported by zoom saying "Errors seen, fix hardware or cables."

Adrian detected slow net(s): Wed Dec  3 20:29:42 1997
Problem: network failure
State Name      Ipkt/s  Ierr/s  Opkt/s  Oerr/s Coll% NoCP/s Defr/s
black le0          2.4     0.1     0.8     0.0  0.00   0.00   0.00

You can see that it is reporting 0.1 input errors/s. Over a 30-second default period this is more than one error. Single errors generate a warning. Multiple errors generate this message. You may have a bad cable or misconfigured Ethernet switch. It is also possible that another system on the network is generating bad packets that are appearing as broadcast packets on the network, so your Solaris system picks them up. This turns out to be very common with PCs, as there is a lot of flaky PC networking hardware in use. The way to track it down is to use snoop to capture packets until you find a bad one, and look at the Ethernet address that it came from. I found packets claiming to be old-version DECnet or Novell IPX packets that were being generated by a PC at random. The PC was on a TCP/IP-only network. If the from address looks like 8:0:20:xx:xx:xx, then it is very likely to be a Sun system. If not then look for another type of hardware on that network segment. You may also want to look at the raw network interface counters to see what kind of errors is being picked up.

% netstat -k hme0
hme1:
ipackets 1282154 ierrors 298 opackets 2439971 oerrors 10 collisions 209327 
defer 8 framing 1 crc 0 sqe 0 code_violations 0 len_errors 1 
drop 0 buff 0 oflo 0 uflo 0 missed 0 tx_late_collisions 2 
retry_error 8 first_collisions 0 nocarrier 0 inits 7 nocanput 0 
allocbfail 0 runt 0 jabber 0 babble 0 tmd_error 0 tx_late_error 0 
rx_late_error 0 slv_parity_error 0 tx_parity_error 0 rx_parity_error 0 
slv_error_ack 0 tx_error_ack 0 rx_error_ack 0 tx_tag_error 0 
rx_tag_error 0 eop_error 0 no_tmds 0 no_tbufs 0 no_rbufs 0 
rx_late_collisions 0 rbytes 320156380 obytes 645775764 multircv 7184 multixmt 3 
brdcstrcv 449782 brdcstxmt 1295 norcvbuf 0 noxmtbuf 1

The command shown is Solaris 2.6 specific, use % netstat -k | more on older releases to find the network data. The interface shown has had a few specific errors (framing, len_errors, retry_error), but not enough to worry about. The input errors are most probably coming from bad broadcast packets.

It's up to you to figure out how to fix network errors, all my tool did was tell you that they were there in case you weren't looking for them. I can't figure out all possible failure modes for you. You can increase the threshold that SE uses to one/second using: setenv ENET_ERROR_PROBLEM 1.0 before you run a script, and you won't see black states unless it gets very bad.

How can I see disk stripes, RAID units, etc. with SE?
You need to be using the latest DiskSuite. It automatically registers I/O kstats so that iostat shows mdXX performance. If you are using an older release neither iostat or SE will see the md devices. DiskSuite 4.1 is the version that started reporting data. Veritas Volume Manager does not generate I/O kstats, so SE cannot get at the data.

SE picks up mdXX, and the SE rules code filters out disks that don't have partitions on them, so only the top level md entries get picked up by tools like zoom and virtual_adrian. (This filtering is needed in 2.6 to filter out disk partition data.)

The RSM2000 is a hardware RAID controller-based system that uses dual redundant access to the RAID unit over two SCSI buses. Since there are two ways to get at each disk with different controller, target numbers, a pseudo device is used. The RAID unit presents each RAID5 or stripe as if it was a single disk partition, but SE cannot figure out what the disk really is and omits it entirely. We know this is a problem, but it is hard to solve and, we haven't had time to figure it out yet.

Do my old modified scripts still work?
SE3.0 is an incompatible upgrade to the language specification. If you have your own custom scripts they will need some changes to the code and the APIs. Most of the changes are to make it more C-like. The call to time() now takes an argument. You have to use time(0). The addr() function is replaced by an ampersand (&) operator, although there is still no real pointer support. Some of the C interface APIs are a bit cleaner.

Many of the classes have also been upgraded. In particular the p_iostat_class now includes full disk name and partition information, so the path_to_inst class is no longer needed. The code takes longer to start up, but is much more efficient at runtime when you have a large number of disks. The set of #includes at the head of each script is slightly different to previous releases.

Does SE3 work reliably? How many people are using it?
SE3 seems to be off to a good start. So far several hundred users have used the notification e-mail to tell us that they have installed it, and the only problems we have seen so far are related to patch levels as listed above.

Wrap up
Remember that you have the source to the scripts. If you find a problem you may be able to fix it yourself. Any problems, suggested fixes, or queries about SE should go to the se-feedback@chessie.eng.sun.com alias only, which gets to myself and Rich Pettit and is logged for posterity.

Disclaimer -- SE is an unsupported experimental toolkit that is designed to make it easy to rapidly generate prototypes and try out new ideas. It is not a production quality performance management product. Vendors of such products are welcome to use the ideas expressed in the SE toolkit to improve their products' ability to manage Solaris-based systems.

Acknowledgments -- Rich Pettit no longer works at Sun but has still found time to update SE. Please pay him back for his time and effort by taking a look at his real product, the Resolute Software RAPS -- Realtime Application Performance System. Mike Bennett wrote the tcp_monitor GUI and several of the TCP rules. Thanks to the TCP group at SunSoft for feedback on the rules, which should be considered experimental at this stage.

Click on our Sponsors to help Support SunWorld

Resources

The SE3.0 Toolkit page http://www.sun.com/sun-on-net/performance/se3/
"At last! The updated SE release has arrived," December 1997 SunWorld Performance column http://www.sun.com/sunworldonline/swol-12-1997/swol-12-perf.html
Resolute Software http://www.resolute.com
See Adrian Cockcroft's frequently asked questions for answers to three dozen performance-related questions. Subjects covered include performance monitoring commands, tuning variables, logins and processes, how to interpret the output of performance measurements, and how to optimize Web servers and news servers. http://www.sun.com/sunworldonline/common/cockcroft.letters.html
virtual_adrian.se rule http://www.sun.com/951001/columns/adrian/column2.html
Interested in Web server performance? Go to SunWorld's Site Index http://www.sun.com/sunworldonline/common/swol-siteindex.html#webperf
If you want to build performance tools and utilities, get a copy of the SE Performance Toolkit Version 2.5.0.2 http://www.sun.com/960601/columns/adrian/se2.5.html
Adrian Cockcroft's profile (complete with low- and high-bandwidth bios) http://www.sun.com/950901/columns/adrian/adrian.html
A full listing of Adrian Cockcroft's other Performance Q&A columns in SunWorld http://www.sun.com/sunworldonline/common/swol-backissues-columns.html#perf

Other Cockcroft columns at www.sun.com

"New Release of the SE Performance Toolkit" http://www.sun.com/960301/columns/adrian/column7.html
"Solaris 2.5 Performance Update" http://www.sun.com/960201/columns/adrian/
"Confessions of an Ultra 1 User" http://www.sun.com/951107/columns/adrian/column3.html
"Advanced Monitoring and Tuning" http://www.sun.com/951001/columns/adrian/column2.html
"System Performance Monitoring" http://www.sun.com/950901/columns/adrian/column1.html

About the author
Adrian Cockcroft joined Sun Microsystems in 1988, and currently works as a performance specialist for Sun's Computer Systems Division. He wrote Sun Performance and Tuning: SPARC and Solaris and Sun Performance and Tuning: Java and the Internet, both published by Sun Microsystems Press Books.

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-01-1998/swol-01-perf.html
Last modified:

Comments:
Name:
Email:
Company Name: