ARP networking tricks
Slow network connections, machines that can't boot, and white lies told in response to ARP requests are just some things that make the ARP protocol persnickety. We give you tips for combating these problems
This month we take a closer look at ARP, starting with the interactions between the ARP family of protocols and the boot process. We'll also examine ways of sending packets to IP networks and show you how to run multiple IP networks on the same physical network -- a crucial strategy for most networking transition and migration plans. (2,500 words)
In March we introduced the Address Resolution Protocol (ARP), a half-brother to the TCP/IP stack that maps the logical IP address space into the real world of Ethernet hardware addresses. We didn't venture very far away from the safe confines of a local area network and some simple troubleshooting, which is fine for introducing the relationships of ARP to IP and various network broadcast activities, but it doesn't describe the real real world very well. If you're reading this, you've encountered a byzantine complex of hosts, routers, desktops, and networks that will lay claim to some measure of high availability, redundancy and well-ordered connectedness. To keep the packets flowing from one end to the other, no matter what secret packet vacuum networks you encounter on the way, you'll need to play games with IP addresses or MAC addresses and fool clients' ARP requests into believing your temporary version of network reality -- however cobbled together it might be.
This month we're going to complicate our simple explanation of ARP with a closer look at stupid networking tricks, starting with the interactions between the ARP family of protocols and the boot process. We'll toss some routers into the fray, and look at ways of sending packets to IP networks that aren't quite where the routers think they are, and finally we'll show you how to run multiple IP networks on the same physical network, a necessary strategy for most networking transition and migration plans. We'll sprinkle hints and pointers through this conclusion to dealing with things that go bump, ARP, or otherwise collide in the night.
Double reverse slam-dunks: Reversing the ARP protocol
As we saw in my March column, ARP is used to locate the hardware, or MAC address for a given IP address. It takes the logical, 32-bit IP addresses like 126.96.36.199 and associates with it a 48-bit physical address like 8:0:20:45:7:b, uniquely identifying the machine with the noted IP address. ARP is the workhorse for establishing IP-level connections to new, previously uncontacted hosts.
What do you do if you know your own MAC address, but need to find an IP address to use? Such is the case with diskless clients, network computers, and PCs that are configured without a name service or file of hostnames and IP addresses -- they can look up their own MAC addresses in hardware, but have to ask some network authority to supply the correct IP address. The most common suppliers of the IP address are Sun's diskless workstation boot parameter (bootparam) remote procedure call, the bootp client booting protocol, and the Dynamic Host Configuration Protocol (DHCP). DHCP is described in RFC 983, and is at the heart of many PC-LAN based TCP/IP configurations. All of these protocols use Reverse ARP (RARP) to match the MAC address to a corresponding IP address.
RARP differs from ARP in more than just the direction of the lookup;
while ARP is a full-fledged part of a TCP/IP implementation, RARP is
almost always handled in a user-level process on the server side.
You'll see the
in.rarpd server daemon opening on
/dev/dlpi (in Solaris) or /dev/nit (in SunOS),
extracting raw RARP broadcast frames off the network. Why bother
implementing a network layer protocol in user space? The short answer
is "configuration control." RARP uses the /etc/ethers file or
ethers NIS map to correlate MAC addresses to IP
addresses; this is maintained by you, our gentle reader, in the same
manner that IP address-to-host name mappings are noted in a hosts
file. The diskless client booting protocol uses other configuration
files to indicate the location of the root and /usr
filesystems, and DHCP has its own set of user-level files as well.
More information can be found in the manual page for
We're taking this detour through RARP land primarily for completeness. ARP failures tend to be dramatic and affect entire network segments -- and that's when you're just using the dynamically inserted ARP entries. We'll look at further complications shortly. ARP troubles appear as slow or intermittent network connections, hideous network file access performance, and the occasional broadcast storm. RARP difficulties, on the other hand, tend to blow up one machine at a time. You'll see machines that hang during boot or fail to boot cleanly, or can't configure network interfaces correctly.
If you see messages of the form
revarp: no response for MAC address 8:0:20:a:b:c
while booting your desktop, it means that the system couldn't determine a valid IP address for itself during the boot sequence. Check your hostname configuration in /etc/nodename, /etc/hostname.le0 (or /etc/hostname.hme0 if you're Ultra-inclined) and in the /etc/hosts file; if there are inconsistencies your desktop may decide to ask the network to supply the missing IP bits. More likely, you have an empty /etc/hostname.le0 file, so the host configures its le0 interface but can't figure out the IP address from the (missing) host name. What you'll see if you look in /etc/init.d/rootusr is a line of the form
ifconfig -a auto-revarp
that indicates the RARP protocol should be used to configure the built-in interfaces.
Proxy votes: Hiding hideous hairballs with proxy ARP
A proxy server is simply one that "stands in" for the real McCoy, much like a stockholder casts a proxy vote at the annual meeting of a publicly traded company instead of personally attending the meeting to vote in person. Proxy ARP is a publicly available package that allows a machine to answer ARP requests for others that can't be there to answer the requests on their own behalf.
Here's an ideal example: you're supporting several remote workstations connected over PPP links to a router attached to one of your LAN campus segments. To conserve IP networks, you'd really like to keep the remote machines on the same IP network as the local ones, merely "giving away" a few IP addresses to the remote machines and their PPP end points on the router. While this is ideal from a configuration management point of view, it's a nightmare for your routers and other machines that need to reach these remote workstations. Creating host routes for the remote machines only partially solves the problem; those machines on the same IP network aren't going to be looking for a router to get to machines that are ostensibly on the same wire.
Proxy ARP fits the bill nicely -- it allows you to designate one machine to answer ARP requests for the remote machines, providing the MAC address of the router with the PPP end points on it. The router can then handle forwarding to the appropriate destination. Why wouldn't you use published ARP entries on the proxy ARP server? Publishing an ARP entry has the same effect as using proxy ARP, namely, the server answers ARP requests for the IP address in the entry, supplying the matching MAC address (not its own). However, published ARP entries fill a slot in the ARP cache, making this solution minimally scalable. Publishing ARP entries is adequate for a handful of remote machines; to support several dozen you'll do better with a proxy ARP server so that the server's ARP cache doesn't become half-full of infrequently used entries.
The big zero: Multiple IP networks on a single wire
Proxy ARP services let you spread a single IP network over several physical segments, saving the virtual real estate of your IP address space while minimizing complexity. How do you do the opposite, that is, put several logical IP networks on the same physical network segment? Better yet, why would you want to do such a thing in the first place? Think about migrating to a new IP addressing scheme, a new network topology, or merging two networks together. You're going to go through a phase where you want old and new network numbers to live together on the same wire, or where you've consolidated two or more LANs onto the same physical segment while migrating to Fast Ethernet, ATM, or an FDDI ring design. As Ethernet switches gain in popularity, the creation of virtual LANs (VLANs) often requires running more than one IP network on a series of ports switched like a single segment; rather than renumber your machines you can work around the IP routing issues with a clever combination of route definition and ARP fiddling.
There are two ways to solve this IP migration problem: supply each machine with routing information so it knows that both networks are on the same wire, or give each machine an IP address on each network so that it has explicit connectivity to each logical network. The multiple IP address per network interface trick is useful when you need to make a machine appear with several names, for example, when you're hosting Web content for more than one named host.
To merge the IP networks with the routing solution, you use a zero-cost route that indicates the target network is directly connected to the local machine. Let's assume the first IP network is 188.8.131.52, and the "new" network is 184.108.40.206. To add a route to the 201-network, you'd insert the following line in /etc/init.d/inetsvc:
huey# route add network 220.127.116.11 `cat /etc/hostname.le0` 0
Zero-cost routes imply that the local machine will send an ARP request for each machine on the target network, since that network is accessible over the local le0 interface. Normally, a host will send an ARP request using the IP address of the designated router for the network; but in this case the router is the machine itself so it does the ARP for the destination IP address. Strange-looking ARP requests for IP addresses that shouldn't be on a particular network segment may be caused by a misplaced zero-cost route.
Here's a corner case where proxy ARP once again comes to the rescue: While merging two IP networks together, you have some "old" machines that have no need to reach other systems on the same network, but have to use the single router that has an IP address on the "new" IP network. That is, router 18.104.22.168 has to handle packets from hosts 192.9.200.x, and you don't want to create zero-cost routes on every one of those machines. Using proxy ARP or a published ARP entry on one of the 201.2.14.x machines, create a "fake" IP address for the router on the 22.214.171.124 network, say, 126.96.36.199. Then insert an ARP entry using the MAC address of the router, so that ARP requests for IP address 188.8.131.52 are answered with the Ethernet address of your router, and packets will be delivered to the router from hosts on the 192.9.200.x network. Make sure those "old" machines put 184.108.40.206 in their /etc/defaultrouter files, or otherwise manually set up a route through the "fake" IP address. Note that this address won't appear in RIP packets or other routing information, so you'll have to use a route command, an entry in /etc/gateways, or a default route to point your old machines at the router of multiple personalities.
More Multiple Personalities: Virtual interfaces
When you're renumbering a network or trying to make a machine impersonate multiple personalities, you need to create virtual interfaces on top of a single physical network interface. For SunOS users, you'll need a package called
written by John Ioannidis while at Columbia University. Solaris has
this feature built into the
ifconfig command and IP
implementation. Virtual interfaces use notations of
le0:3 and so on to number the virtual interfaces created on top of
le0 device. Solaris 2.5 and earlier releases have
a limit of 256 virtual interfaces per physical network connection; this
limit is relaxed in Solaris 2.5.1/Internet supplement and later releases.
Turn on the virtual interface by adding lines like the following to a new boot script called /etc/rc2.d/S99vif:
huey# ifconfig le0:1 hostname-net1 broadcast + netmask + huey# ifconfig le0:2 hostname-net2 broadcast + netmask +
There's no need to "plumb" the virtual interface because the necessary
STREAMS code was pushed onto the physical interface when it was
configured earlier in the boot process. Virtual interfaces simply add
more IP addresses to the same underlying device. The explicit
ifconfig command sets the broadcast address and netmask.
Virtual interfaces aren't handled nicely by the
/etc/hostname.le0 convention, as the boot code that reads
those files expects to plumb, configure, and RARP as needed. Life is
simpler when you're telling these little white IP lies, so stick the
configuration in a separate boot file.
When you run multiple IP networks on the same physical segment, you may confuse some machines' routing tables. A typical problem is messages like "packet from unknown router" that appear when a machine receives a packet that comes from an IP address that does not appear to be on the local network. In our previous example, any machine on the 220.127.116.11 network may receive a packet from a router using IP address 18.104.22.168, but become confused when it finds that it has no direct connection to network 22.214.171.124. Adding a second IP address on the local interface, or a zero-cost route will dissipate the messages.
When you add second and consecutive IP addresses to an interface, you cause it to answer ARP requests for all of those addresses. The Solaris kernel maintains a list of IP addresses per interface and matches incoming ARP requests against this list. You'll see the same MAC address appearing in ARP caches tied to several IP addresses if you use the virtual interface mechanism, and that multiplicity could impair network management discovery tools or other scripts that rely on ARP cache browsing to build their view of the local network.
ARP is part of the kernel like the virtual memory system or the filesystem cache -- ideally you never have to deal with it or know that it's there, but when it breaks it wreaks havoc on your day to day operations. Knowing where to poke it, how to watch for trouble and trace it back will help you keep too many things from bumping into each other in the dark.
About the author
Hal Stern is a Distinguished Systems Engineer and the Chief Technology Officer for Sun Microsystems in the Northeast US. He's switched to writing every other month so he can finish various other projects, including four books, three feature stories, two small children, and a random leak in his garage. You can encourage or discourage this behavior by reaching Hal at email@example.com.
You can buy Hal Stern's Managing NFS and NIS at Amazon.com Books.
If you have technical problems with this magazine, contact firstname.lastname@example.org