Click on our Sponsors to help Support SunWorld
Performance Q & A by Adrian Cockcroft

Do collision levels accurately tell you the real story about your Ethernet?

What do these collision numbers mean, and are there better alternatives for monitoring your network?

SunWorld
September  1998
[Next story]
[Table of Contents]
[Search]
Subscribe to SunWorld, it's free!

Abstract
Ethernets suffer from collisions, but are the reported numbers accurate? And what do they actually mean? In this column we explain some odd effects, and introduce alternative measures. (1,600 words)


Mail this
article to
a friend

Q: Why do I have so many collisions on my Ethernet?

A: Collision levels are no longer a good way to decide whether an Ethernet is overloaded.

I'll start by explaining what a collision is, then look at how they are measured and reported, and how that causes problems. I'll end by pointing out alternative measurements that give a better indication of when a network is busy.

Ethernet was developed as a shared bus-like network. When the network is idle, any connected system can start to transmit. If two systems try to transmit at the same time they collide, back off for a randomly generated short delay, then try again. When a shared-bus Ethernet with many systems on it gets busy, contention increases to the point where activity is dominated by collisions, and overall throughput drops off significantly. In contrast, a small number of active systems can work quite efficiently, achieving high throughput over the network without causing too many collisions.

More recently, Ethernet switches have been used to form star-like networks in which each system connects to a port on a central switch. The switch takes data from one port and passes it onto a second port without causing collisions on all the other ports. Each port can have multiple systems on it, so some collisions still occur on each port -- but it is less of a problem for the network overall.

The latest optimization is to connect only one system per port. In this scenario, data can be sent in both directions at once without colliding. Both the system and the switch need to correctly recognize this "full-duplex" mode, however; there can be problems when one end thinks it can transmit any time it likes and the other is only expecting to receive when it isn't transmitting. This tends to cause Ethernet errors, which will be reported in addition to collisions.

How are collisions measured?
Collisions are only detected by the systems which are trying to transmit. The total number of collisions on a network could be measured using special monitoring hardware or data from a central switch, but the usual number reported is the view from a single system. Each system on the network will see different throughput and collision rates for the same network.

Collisions are counted by the Ethernet interface hardware. The hardware is quite intelligent -- it is given a pointer to a list of packets to transmit and counts how many collisions occur while the packets are being sent. Different interfaces use different counting methods.

The older le interface based on the AMD LANCE design uses a two-bit counter: There are none, one, two, or "lots" of collisions. At high collision rates, this counter can sometimes under-report the real number of collisions on the interface.

The newer qe, be, hme, and qfe interfaces all use an eight-bit counter that doesn't miss any collisions. Whenever the counter wraps around, 255 collisions are added to the total for the interface. Intermediate values are not used. This causes some interesting side effects. For example, look at the following data, which shows collision rates of between 28 percent and 40 percent. Can you see what is happening?

% se netmonitor.se 10
netmonitor.se thresholds set at 100.0 packets/s and 10.0% collisions

netmonitor.se detected slow net(s): Mon Aug 10 15:44:33 1998
Name      Ipkt/s  Ierr/s  Opkt/s  Oerr/s  Coll/s Coll%
qe0         71.4     0.0    76.0     0.0    25.5 33.60
qe1         87.0     0.0    88.3     0.0    25.5 28.89

netmonitor.se detected slow net(s): Mon Aug 10 15:45:13 1998
Name      Ipkt/s  Ierr/s  Opkt/s  Oerr/s  Coll/s Coll%
qe0         71.2     0.0    72.6     0.0    25.5 35.16
qe1         64.1     0.0    65.2     0.0    25.5 39.14
qe2         76.3     0.0    72.8     0.0    25.5 35.07

netmonitor.se detected slow net(s): Mon Aug 10 15:45:43 1998
Name      Ipkt/s  Ierr/s  Opkt/s  Oerr/s  Coll/s Coll%
qe1         61.5     0.0    63.6     0.0    25.5 40.13

The command is running at a 10-second interval and only reports interfaces that have a collision rate higher than 10 percent. There are two strange things happening. The collision rate is always reported as 25.5 per second, and it does not show up in every 10-second interval on each interface. It's likely that the real collision rate is much lower, and that in most 10-second intervals the eight-bit counter does not wrap around. In those intervals, there is a zero collision rate, and nothing is reported. In the intervals where it does wrap around, 255 is divided by the interval of 10 to get 25.5, which is high enough to get reported.

To test this theory, increase the measurement interval.

% se netmonitor.se 60
netmonitor.se thresholds set at 100.0 packets/s and 10.0% collisions

netmonitor.se detected slow net(s): Tue Aug 11 11:36:41 1998
Name      Ipkt/s  Ierr/s  Opkt/s  Oerr/s  Coll/s Coll%
qe2         88.1     0.0    82.9     0.0     8.5 10.29

netmonitor.se detected slow net(s): Tue Aug 11 11:37:41 1998
Name      Ipkt/s  Ierr/s  Opkt/s  Oerr/s  Coll/s Coll%
qe1         91.2     0.0    95.8     0.0    12.8 13.35

netmonitor.se detected slow net(s): Tue Aug 11 11:38:41 1998
Name      Ipkt/s  Ierr/s  Opkt/s  Oerr/s  Coll/s Coll%
qe1         76.5     0.0    82.6     0.0     8.5 10.32

netmonitor.se detected slow net(s): Tue Aug 11 11:39:41 1998
Name      Ipkt/s  Ierr/s  Opkt/s  Oerr/s  Coll/s Coll%
qe0         63.5     0.0    66.5     0.0     8.5 12.82
qe1         80.0     0.0    88.6     0.0    12.8 14.43
qe2         85.4     0.0    79.6     0.0     8.5 10.70

In any particular time interval you will get zero, 255, 510, or 765 reported collisions. Divide by the interval (60 seconds) to get the rates of zero, 4.25, 8.5, and 12.75 collisions per second. Netmonitor ignores anything under 10 percent, so at these packet rates, it will not report the times you got zero or 4.25 collisions per second.

If you want better averages, run measurements over a much longer time interval. Note that if you only report measurements at times when your network is very busy, the average collision rate will appear to be higher than it is.


Advertisements

So how can I tell if the network is busy?
The netmonitor.se command was written many years ago. I now have a better network monitoring rule, although it is still not as good as I'd like. What is really needed is a different rule for each kind of network interface, as they all report different data. The diversity of this data is a big problem for performance tool writers. It is also the main cause of problems with the SE toolkit, because the data definitions can change in each product release or patch.

The rule used by virtual_adrian.se and zoom.se tries to use a variable called defers instead of collisions. It is reported on most Ethernet interfaces and tends to remain at zero unless the network is congested. It counts the number of times a transmission was deferred to a future time -- so it's related to a slowdown in outgoing data. The new rule also looks at the error rates and at a variable called nocanput. This counts the number of times a received packet was discarded due to slow processing in the TCP/IP stack and lack of buffering on input. If you see a lot of these on Solaris 2.5.1, the TCP/IP fixes provided in patches and Solaris 2.6 should help reduce them. A faster CPU could also help. When a TCP/IP packet is discarded, the other system has to time out and retransmit, just as if the packet had been discarded by a congested router.

The SE toolkit also tries to obtain the data throughput in bytes for each interface, but not all interfaces support it -- and the names used vary. The byte data rate is available to le and hme interfaces running on Solaris 2.6 or recent patches of 2.5.1. The SE toolkit doesn't report the data rate in megabits per second for each interface, but hme now contains a variable called ifspeed that is set to 10 or 100 megabits per second. Using this information, the data rate could be compared to the network bandwidth to see if the utilization as seen by this system is high. For a single system on a switched full-duplex interface, this would be accurate.

Updated monitoring tool
This data is summarized by a new version of nx.se. I also filed a Request for Enhancement (RFE) asking for this format to be added to Solaris 2 as netstat -x.

The nx.se format lists TCP as if it was an interface, with input and output segment and data rates, resets per second, outgoing connection attempt fails per second, percentage of bytes retransmitted, and incoming and outgoing connections per second.

It then lists all the interfaces. For interfaces that provide this information, (at present, only le and hme), nx.se reports kilobytes in and out. NoCP is nocanput; Defr is defer.

% /opt/RICHPse/bin/se nx.se
Current tcp RtoMin is 200, interval 5, start Thu Aug 27 00:23:09 1998

00:23:14 Iseg/s Oseg/s InKB/s OuKB/s Rst/s  Atf/s  Ret%  Icn/s  Ocn/s
tcp        0.0    0.2   0.00   0.01   0.00   0.00   0.0   0.00   0.00
Name    Ipkt/s Opkt/s InKB/s OuKB/s IErr/s OErr/s Coll% NoCP/s Defr/s
hme0       1.2    1.2   0.09   0.20  0.000  0.000   0.0   0.00   0.00
hme1       0.0    0.0   0.00   0.00  0.000  0.000   0.0   0.00   0.00
hme2       0.0    0.0   0.00   0.00  0.000  0.000   0.0   0.00   0.00

This modified script is available below, and it will be the default version when we get around to releasing a new version of SE with proper support for the upcoming Solaris 2.7 release. At present, the Solaris 2.6 version of SE3.0 mostly works on Solaris 2.7beta, with the addition of a link from se.sparc.5.6 to se.sparc.5.7 in /opt/RICHPse/bin.

Wrap up
Thanks to Pat Militzer for asking the question and sending me the netmonitor.se data I used in this column.


Click on our Sponsors to help Support SunWorld


Resources

Other Cockcroft columns at www.sun.com

About the author
Adrian Cockcroft joined Sun Microsystems in 1988, and currently works as a performance specialist for the Server Division of SMCC. He wrote Sun Performance and Tuning: SPARC and Solaris and Sun Performance and Tuning -- Java and the Internet, both published by SunSoft Press PTR Prentice Hall. Reach Adrian at adrian.cockcroft@sunworld.com.

What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough
 
 
 
    

SunWorld
[Table of Contents]
Subscribe to SunWorld, it's free!
[Search]
Feedback
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-09-1998/swol-09-perf.html
Last modified:

SidebarBack to story

nx.se


/*
 * Copyright (c) 1994 by Sun Microsystems, Inc.
 */

#include <stdio.se>
#include <stdlib.se>
#include <string.se>
#include <mib.se>
#include <p_netstat_class.se>
#include <tcp_class.se>

#define SAMPLE_INTERVAL   5	/* 5 seconds to see what's happening */

main(int argc, string argv[3])
{
  p_netstat p_netstat$net;
  p_netstat tmp_netstat;
  tcp tcp$tcp;
  tcp tmp_tcp;
  int i;
  int interval = SAMPLE_INTERVAL;
  ulong now;
  tm_t tm_now;
  char tm_buf[16];
  long n;

  switch(argc) {
  case 1:
    break;
  case 2:
    interval = atoi(argv[1]);
    break;
  default:
    printf("use: %s [interval]\n", argv[0]);
    exit(1);
  }
  tmp_netstat = p_netstat$net;
  tmp_tcp = tcp$tcp;
  n = time(0);
  printf("Current tcp RtoMin is %d, interval %d, start %s", tmp_tcp.last.tcpRtoMin,
	interval, ctime(&n));
  fflush(stdout);
  for(;;) {
    sleep(interval);
    now = time(0);
    tm_now = localtime(&now);
    strftime(tm_buf, sizeof(tm_buf), "%T", tm_now);
    tmp_tcp = tcp$tcp;
    printf("\n%8s%7s%7s%7s%7s%6s%7s%6s%7s%7s\n", tm_buf, "Iseg/s", "Oseg/s",
      "InKB/s", "OuKB/s", "Rst/s", "Atf/s", "Ret%", "Icn/s", "Ocn/s");
    printf("tcp     %6.1f %6.1f %6.2f %6.2f %6.2f %6.2f %5.1f %6.2f %6.2f\n",
      tmp_tcp.InDataSegs, tmp_tcp.OutDataSegs,
      tmp_tcp.InDataBytes/1024.0, tmp_tcp.OutDataBytes/1024.0, 
      tmp_tcp.OutRsts, tmp_tcp.AttemptFails,
      tmp_tcp.RetransPercent,
      tmp_tcp.PassiveOpens,
      tmp_tcp.ActiveOpens); 
    printf("%-7s%7s%7s%7s%7s%7s%7s%6s%7s%7s\n",
      "Name", "Ipkt/s", "Opkt/s", "InKB/s", "OuKB/s",
      "IErr/s", "OErr/s", "Coll%", 
      "NoCP/s", "Defr/s");
    for(i=0; i < tmp_netstat.net_count; i++) {
      p_netstat$net.number$ = i;
      tmp_netstat = p_netstat$net;
      printf("%-7s %6.1f %6.1f %6.2f %6.2f %6.3f %6.3f %5.1f %6.2f %6.2f\n",
          tmp_netstat.name$,
          tmp_netstat.ipackets,
          tmp_netstat.opackets,
          tmp_netstat.ioctets/1024.0,
          tmp_netstat.ooctets/1024.0,
          tmp_netstat.ierrors, tmp_netstat.oerrors,
          tmp_netstat.collpercent,
	  tmp_netstat.nocanput,
	  tmp_netstat.defer);
      }
    fflush(stdout);
    }
}

SidebarBack to story