SE Toolkit FAQ
Answers to the most frequently asked questions about SE
We released a new version of SE last month, and there are some common questions about the release which are answered in this FAQ. (1,000 words)
What do I need to watch out for when I'm using SE?
Rich and I get a lot of questions about SE on our se-feedback alias. In order to reduce the number of times we have to answer the same question this month's column is a FAQ.
Where can I get a version of SE that runs on Solaris 2.6?
This is still the most common question! A new version, SE3.0 was introduced last month. It's a major rewrite with many new features and includes Solaris 2.5, 2.5.1, and 2.6 support for SPARC and x86 (no older releases any more) and many more network interface types. Keep reading SunWorld to stay up to date. The SE distribution directory is now http://www.sun.com/sun-on-net/performance/se3.
Where can I get a version of SE that runs on Solaris 2.3 or 2.4?
We don't have enough time or test systems to support many releases, you can use the previous SE220.127.116.11 release on Solaris 2.3 through 2.5.1. The SE18.104.22.168 distribution directory is http://www.sun.com/960601/columns/adrian/se22.214.171.124.
Why did Solaris 2.6 support take so long?
For Solaris 2.6 we had to selectively filter out the partition, tape, and NFS data from the iostat class and disk rule. The rule now makes sure it only has disks that contain partitions to look at. The changes in TCP that are in 2.6 and were backported to 2.5.1 were taken care of; SE now looks for some key patches to see if they are installed and sets preprocessor #defines to cope with the changes.
Why do I get the error "Fatal: member: txunderruns vanished!: Near line 201"?
This problem occurs with SE126.96.36.199 and FDDI 5.0 interfaces. There are three possible fixes:
Why do I get the error "Fatal: member: txunderruns0 vanished!: Near line 255"?
This problem occurs with SE3.0 and older FDDI interface code. Update your FDDI patch level or try running scripts with % se -DOLD_FDDI script.se
Why do I get the error "Fatal: member: defer vanished!: Near line 285"?
This problem occurs with Solaris 2.5 and hme interfaces. There are three possible fixes:
#ifdef MINOR_VERSION >= 51 ulong defer; #else ulong missing1; #endif
The next build of SE3 will figure this out automatically.
Why do I get the error "Fatal: member: framming vanished!: Near line 160"?
This problem occurs with Solaris 2.5.1 and le interfaces. The le patch for Solaris 2.5.1 corrects the spelling from framming to framing. SE3.0 tries to detect this patch, but if you don't have the patch directory in /var/sadm/patch it can't tell that the patch is installed. There are three possible fixes. 1) Upgrade to Solaris 2.6. 2) Reinstall the le patch for Solaris 2.5.1. 3) Create the directory /var/sadm/patch/103903-03 by hand. 4) As a temporary work-around, run scripts using % se -DLE_PATCH script.se to force the update.
Why do my networks keep indicating BLACK?
You may see messages like this from virtual_adrian, or black states reported by zoom saying "Errors seen, fix hardware or cables."
Adrian detected slow net(s): Wed Dec 3 20:29:42 1997 Problem: network failure State Name Ipkt/s Ierr/s Opkt/s Oerr/s Coll% NoCP/s Defr/s black le0 2.4 0.1 0.8 0.0 0.00 0.00 0.00You can see that it is reporting 0.1 input errors/s. Over a 30-second default period this is more than one error. Single errors generate a warning. Multiple errors generate this message. You may have a bad cable or misconfigured Ethernet switch. It is also possible that another system on the network is generating bad packets that are appearing as broadcast packets on the network, so your Solaris system picks them up. This turns out to be very common with PCs, as there is a lot of flaky PC networking hardware in use. The way to track it down is to use snoop to capture packets until you find a bad one, and look at the Ethernet address that it came from. I found packets claiming to be old-version DECnet or Novell IPX packets that were being generated by a PC at random. The PC was on a TCP/IP-only network. If the from address looks like 8:0:20:xx:xx:xx, then it is very likely to be a Sun system. If not then look for another type of hardware on that network segment. You may also want to look at the raw network interface counters to see what kind of errors is being picked up.
% netstat -k hme0 hme1: ipackets 1282154 ierrors 298 opackets 2439971 oerrors 10 collisions 209327 defer 8 framing 1 crc 0 sqe 0 code_violations 0 len_errors 1 drop 0 buff 0 oflo 0 uflo 0 missed 0 tx_late_collisions 2 retry_error 8 first_collisions 0 nocarrier 0 inits 7 nocanput 0 allocbfail 0 runt 0 jabber 0 babble 0 tmd_error 0 tx_late_error 0 rx_late_error 0 slv_parity_error 0 tx_parity_error 0 rx_parity_error 0 slv_error_ack 0 tx_error_ack 0 rx_error_ack 0 tx_tag_error 0 rx_tag_error 0 eop_error 0 no_tmds 0 no_tbufs 0 no_rbufs 0 rx_late_collisions 0 rbytes 320156380 obytes 645775764 multircv 7184 multixmt 3 brdcstrcv 449782 brdcstxmt 1295 norcvbuf 0 noxmtbuf 1
The command shown is Solaris 2.6 specific, use
% netstat -k | more on older releases to find the network data. The interface shown has had
a few specific errors (framing, len_errors, retry_error), but not enough to
worry about. The input errors are most probably coming from bad broadcast
It's up to you to figure out how to fix network errors, all my tool did was tell you that they were there in case you weren't looking for them. I can't figure out all possible failure modes for you. You can increase the threshold that SE uses to one/second using: setenv ENET_ERROR_PROBLEM 1.0 before you run a script, and you won't see black states unless it gets very bad.
How can I see disk stripes, RAID units, etc. with SE?
You need to be using the latest DiskSuite. It automatically registers I/O kstats so that iostat shows mdXX performance. If you are using an older release neither iostat or SE will see the md devices. DiskSuite 4.1 is the version that started reporting data. Veritas Volume Manager does not generate I/O kstats, so SE cannot get at the data.
SE picks up mdXX, and the SE rules code filters out disks that don't have partitions on them, so only the top level md entries get picked up by tools like zoom and virtual_adrian. (This filtering is needed in 2.6 to filter out disk partition data.)
The RSM2000 is a hardware RAID controller-based system that uses dual redundant access to the RAID unit over two SCSI buses. Since there are two ways to get at each disk with different controller, target numbers, a pseudo device is used. The RAID unit presents each RAID5 or stripe as if it was a single disk partition, but SE cannot figure out what the disk really is and omits it entirely. We know this is a problem, but it is hard to solve and, we haven't had time to figure it out yet.
Do my old modified scripts still work?
SE3.0 is an incompatible upgrade to the language specification. If you have your own custom scripts they will need some changes to the code and the APIs. Most of the changes are to make it more C-like. The call to
time() now takes an argument. You have to use
addr() function is replaced by an ampersand (&) operator,
although there is still no real pointer support. Some of the C
interface APIs are a bit cleaner.
Many of the classes have also been upgraded. In particular the
p_iostat_class now includes full disk name and partition
information, so the path_to_inst class is no longer needed. The code
takes longer to start up, but is much more efficient at runtime when
you have a large number of disks. The set of
#includes at the head
of each script is slightly different to previous releases.
Does SE3 work reliably? How many people are using it?
SE3 seems to be off to a good start. So far several hundred users have used the notification e-mail to tell us that they have installed it, and the only problems we have seen so far are related to patch levels as listed above.
Remember that you have the source to the scripts. If you find a problem you may be able to fix it yourself. Any problems, suggested fixes, or queries about SE should go to the firstname.lastname@example.org alias only, which gets to myself and Rich Pettit and is logged for posterity.
Disclaimer -- SE is an unsupported experimental toolkit that is
designed to make it easy to rapidly generate prototypes and try out
new ideas. It is not a production quality performance management
product. Vendors of such products are welcome to use the ideas
expressed in the SE toolkit to improve their products' ability to
manage Solaris-based systems.
Acknowledgments -- Rich Pettit no longer works at Sun but has still
found time to update SE. Please pay him back for his time and effort
by taking a look at his real product, the Resolute Software RAPS -- Realtime
Application Performance System. Mike Bennett wrote the
tcp_monitor GUI and several of the TCP rules. Thanks to the TCP
group at SunSoft for feedback on the rules, which should be
considered experimental at this stage.
Acknowledgments -- Rich Pettit no longer works at Sun but has still found time to update SE. Please pay him back for his time and effort by taking a look at his real product, the Resolute Software RAPS -- Realtime Application Performance System. Mike Bennett wrote the tcp_monitor GUI and several of the TCP rules. Thanks to the TCP group at SunSoft for feedback on the rules, which should be considered experimental at this stage.
About the author
Adrian Cockcroft joined Sun Microsystems in 1988, and currently works as a performance specialist for Sun's Computer Systems Division. He wrote Sun Performance and Tuning: SPARC and Solaris and Sun Performance and Tuning: Java and the Internet, both published by Sun Microsystems Press Books.
If you have technical problems with this magazine, contact email@example.com