Click on our Sponsors to help Support SunWorld

A TCP/IP primer

Here's what you need to know about the programmatic plumbing running the Internet (and your LAN).

November 1995

Abstract

Sure, you ride along on the information super-highway (this is the only time we'll use that abominable phrase) but do you know what the road is made of? We think not. People needing the basics should read on. (3,000 words)

Mail this
article to
a friend

Networking has become key to many business processes. Keeping that network running smoothly falls into the domain of mission-critical functions. Now that the Internet has exploded onto the front pages of USA Today and has references sprinkled through the Wall Street Journal, the amount of work created for system administrators who run sites, fix configuration problems, and resolve user complaints has also increased in magnitude.

This month, we're going to look at the business of making connections between TCP/IP end-points. It's a small subset of the broader network-administration problems of performance, security, and controlled-growth management, but it's also the most fundamental. If you can't get connected or you can't keep unwanted network entities from connecting to you, then the big picture gets obscured by the pile of complaints reading "the network is down."

Making TCP/IP connections appears to be fairly simple: You identify the other host, specify a service you want to use, and blast data down the wire, hoping that you'll enjoy some semblance of good performance and reliable security all the while. Meeting those expectations is what makes TCP/IP connection management an interesting problem. We'll review the mechanics of TCP/IP connections and service location and go through the details of linking two end-points over a socket.

Advertisements

Junction introduction: Getting linked

The first requirement for establishing a TCP/IP connection is the name of the other end. That name consists of two parts: the remote IP address and a port number. IP addresses are obtained through the Domain Name Service (DNS), Network Information Service (NIS), or the local /etc/hosts file, all of which map hostnames to IP addresses. The port number is a bit trickier because it depends on the service being used. Each host numbers ports starting at 1, with a separate set of ports for TCP and the User Datagram Protocol (UDP). Since each IP packet contains information about the higher-level protocol it is carrying, the two port sets cannot be confused. Port numbers below 1024 are known as reserved ports and can be opened only by processes running as root; port numbers 1024 and above are unrestricted.

One way a budding TCP/IP connection determines the remote port number is to look in the /etc/services file (or the services.byname NIS database). This file correlates the well-known name of a service, the port number, and the protocol used for that service. Not every service is listed in /etc/services, since some connections are made to well-known (or hard-coded) port numbers. The X Window System, for example, uses port 6000 for its default window server. If you see cryptic port numbers floating around your network, use RFC 1340 to decipher them. This list of well-known port numbers will tell, for example, that port 119 is used for the Server Message Block (SMB) protocol of Lan Manager, even though this doesn't appear in most /etc/services files.

Every TCP/IP connection is uniquely identified by the local and remote IP addresses and the local and remote port numbers. Let's say you have five ftp sessions going at once, all connected to port 21 on your public ftp server from your desktop. How are the five distinguished, since they all connect to the same remote IP address and port number? The differentiator is the local port number, which is assigned by the local TCP/IP stack when the connection is made.

To get a list of active TCP/IP connections, use netstat -a:

huey% netstat -a
UDP
   Local Address      State
-------------------- -------
      *.sunrpc        Idle
      *.32782         Idle
      *.nfsd          Idle
      *.811           Idle
     
TCP
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q  State
-------------------- -------------------- ----- ------ ----- ------ -------
kfir.6000            kfir.33298           16384      0 16384      0 ESTABLISHED
kfir.33322           suntea01.45677        8760      0  8760      0 ESTABLISHED
      *.33313              *.*                0      0  8576      0 LISTEN

The first part of the listing shows UDP services, which are connectionless. The addresses are of the form IP.port, with a * indicating a "don't care" value. In the TCP section, you see both local and remote addresses, the window sizes, and the queue for both sides. The window sizes clamp the largest buffer that may be transmitted in one shot, while the queue shows you the packet backlog. Finally, the state column tells you how the connection has progressed or fallen apart. The first TCP connection in the example is a process talking to the X Window server on port 6000, the second is a process talking to another machine (probably a database server, given the unknown port number), and the final TCP connection is waiting to happen -- it's a process waiting for connections on port 33313.

We'll talk primarily about TCP (as opposed to UDP) services from here on, since we're looking at how connections are made and broken. TCP is a connection-oriented, or stream protocol, so it handles the additional overhead of building, maintaining, and tearing down connections. Some of the debugging and security tips, like dealing with configuration files and instituting outside access controls, apply to both protocols, but we'll point out TCP-specific issues and problems.

Connection direction: The wonders of inetd and rpcbind
Before you can establish a connection, you need a process listening on the other end. Long-lived services, such as the in.routed routing daemon, start processes at boot time that listen on their designated ports. Given the wide universe of services, however, it's not efficient to have several hundred servers sitting idle waiting for connections that may never be made. Furthermore, some services, like ftp or telnet, expect many connections to be made and need to spawn additional processes to handle sessions as they are created.

The inetd daemon plays intermediary for this procedure, listening on sockets for well-known services. The /etc/inetd.conf file determines the services handled by inetd, and the daemon to be started when a connection is made. When you issue telnet fuzzy, for example, your machine connects to port 23 on machine fuzzy, held open by inetd on the remote side. The inetd daemon forks and executes a copy of in.telnetd, the daemon associated with the telnet service, handing the open file descriptor for the incoming socket connection to the new telnet daemon. When adding a new Internet service to a machine, consider whether it should be managed by inetd or with a stand-alone daemon. If you expect the service to be used sporadically, with multiple connections, inetd is a good managing agent, since it will restart daemons and spawn subprocesses but not overrun the machine with idle processes.

Of course, life isn't simple enough to leave you with only two sources of port numbers. RPC-based services, such as NIS, NFS, and the network lock manager (rpc.lockd), use a different arbiter of port numbers. RPC services are identified by 32-bit RPC program numbers, a version number, and a protocol. The portmapper (Solaris 1.x) and rpcbind (Solaris 2.x) processes map RPC identifiers to TCP or UDP port numbers. Each RPC service registers its program numbers, versions, and protocols with rpcbind at start-up, and then rpcbind happily hands these out to connection-inquiring processes. View the list of currently registered processes using rpcinfo -p:

duey% rpcinfo -p
program vers proto   port  service
 100007    3   udp  32773  ypbind
 100007    3   tcp  32771  ypbind
 100011    1   udp  32782  rquotad
 100021    1   udp  32789  nlockmgr
 100021    1   tcp  32779  nlockmgr
 100001    2   udp  32799  rstatd
 100001    3   udp  32799  rstatd
 100001    4   udp  32799  rstatd
 100005    2   udp  32819  mountd
 100005    2   tcp  32789  mountd

You'll see entries for both TCP and UDP protocols. Note that different versions of the same RPC program can use the same port registration, since usually they are handled by the same process.

In the UDP world, RPC servers also respond to broadcasts, which is a neat trick considering the broadcasting process doesn't know what the remote port number is in advance -- and the port number might be different on each RPC server host. The rpcbind process comes to the rescue, completing an indirect call to the local RPC server. In addition to handling broadcasts, the indirect calling mechanism eliminates an extra round-trip between client and server. Instead of asking for the port number, and then calling the RPC server directly, the RPC client asks rpcbind to make the call on its behalf. The best-known example of indirect calling is NIS when searching for a server. The NIS client sends out a broadcast to the portmap/rpcbind processes, asking for an indirect call to the ypserv service. The rpcbind processes listen on port 111, given the somewhat solar-centric name sunrpc. If an NIS server has registered with rpcbind, the request is passed to that server, with the result delivered directly back to the calling client.

Friction, fraction, and inaction: It's broken

Given the number of daemons and configuration files involved, it's no wonder that TCP/IP connection attempts break down with frequency. Here's a laundry list of common problems and solutions or workarounds.

You can't locate the remote host, and applications are complaining about hosts being unreachable or unknown. The first thing to check is that you can resolve the host's name into an IP address. If NIS or DNS is broken, then you'll get "host unknown" type errors. Make sure your DNS resolver configuration, in /etc/resolv.conf, is correct and that you are getting answers from DNS. If you choose a very heavily loaded DNS server, your problems may start before you even attempt to contact the remote machine. Errors along the lines of "host unreachable" or "net unreachable" indicate a routing problem. You have turned the hostname into an IP address, but now you can't get to the IP network from your current location.
Your application can't find the service in question, or you find that inetd isn't listening on the services you had configured. The first thing to check is the state of the /etc/services file and the NIS services database. A blank line in /etc/services may confuse NIS, making it impossible for inetd to locate services by name. Make sure that the service you want is really listed in /etc/services: if you are mixing Solaris 1 and Solaris 2 environments you should use the more up-to-date /etc/services file. In Solaris 2.x, NFS reads /etc/services for its hard-coded port number (2049). Finally, make sure your local /etc/services file doesn't conflict with the NIS database that you consider to be the source of truth, because the default NIS configuration in Solaris 2.x uses the local version of /etc/services first. This is done for performance reasons, and is governed by the line in /etc/nsswitch.conf that reads:
```
services:	files nis
```
If you are trying to debug a problem with an intermittent network failure, or one in which NIS results are not consistent, try removing nis from the hosts and services paths, install "clean" copies of /etc/services and /etc/hosts, and watch the system's behavior with NIS taken out of the loop.
You're getting "RPC server not registered" errors when attempting to connect. The remote server may have exited, or it's possible that rpcbind died. Dump out the RPC registrations to look for server-process problems; if you don't get an answer from rpcbind then you know the port-mapping process is broken. If you appear to be connecting to RPC servers but don't get any data back, check the health of the server process by pinging the null RPC procedure of the server:
```
duey% rpcinfo -u luey nfs 2
program 100003 version 2 ready and waiting
```
This example uses the UDP protocol to call the RPC nfs service, using version 2. RPC service names are listed in /etc/rpc.
You start the connection, but never get an acknowledgement back from the other end. Network analysis of this crisis shows packets going from your machine to the other side, but nothing coming back. The two most common reasons are you have a performance problem on the remote server or you are legitimately blocked from accessing the service. Make sure that inetd or other server processes aren't running out of file descriptors; after about 60 connections you'll pass the default limit of 64 open file descriptors. Also make sure that your connection queue is deep enough.
Packets get lost on the way home. a less-common but uglier problem is that your routes aren't symmetric: you can find the remote host, but it can't find its way back to you. RIP and other routing protocols don't promise bidirectional traffic; they just inform you of network topology. If you are connecting to a site that's located several networks away, make sure your local point of connection -- service provider or local on-ramp to the company's backbone -- knows how to get back to you.
Once connected, you're unable to move much data. Small packets sneak through, but larger transfers time out. This may be due to a mismatch in the maximum transmission units (MTU) of the various networks along the way. Solaris 2.x systems attempt "MTU discovery" by sending packets with the "don't fragment" bit set. If the packets are small enough to fit onto a smaller-MTU link, there's no problem. When the packet has to be fragmented, however, the router or gateway is expected to send back an ICMP packet indicating an error. The Solaris-sending side then tunes down the packet size, reducing it in response to each ICMP error until the packets slide through the tightest internet link without fragmentation. Older routers that don't send the right ICMP messages will cause problems, silently dropping packets that can't be fragmented or forwarded over a smaller MTU link. Disable MTU discovery using ndd:
```
duey# ndd -set /dev/ip ip_path_mtu_discovery 0
```
Various Ethernet-FDDI bridges also have this problem, unable to pass full-sized FDDI frames onto the smaller Ethernet medium. Your best bet is to watch the traffic patterns and notice if the time-outs and connection problems appear in concert with full-sized frames.
You start a server process, and get an "Address already in use" error, indicating the the desired port number is already used by another process. Look for connections in the CLOSE_WAIT or TIME_WAIT state as shown by netstat -a; port assignments live on after the processes owning them exit. The default recycling period is about 4 minutes but this, too, can be decreased using ndd:
```
duey# ndd -set /dev/tcp tcp_close_wait_interval 60000
```
The example sets the timeout to one minute.

This cursory overview of TCP/IP headaches wasn't intended to make you an expert. Practical experience and some good references are essentials. Doug Comer's Internetworking with TCP/IP (Prentice-Hall), Craig Hunt's TCP/IP Network Administration (O'Reilly), and the relevant RFCs are a good start. You'll also need some diagnostic tools. While they are commonly called "sniffers," that name is a proper trademark of Network General Corp. Some basic items for your toolbox include traceroute for identifying routing problems and snoop (Solaris 2) or etherfind (Solaris 1) for capturing local packet traffic. The network sniffer FAQ at http://www.iss.net contains pointers to several other network analysis tools like Interman and Etherman.

Sites that have a heavy dependency on the X Window System, particularly those groups developing applications to run under X, should pick up the tools described in the sidebar X-rated sessions.

And if you don't know what listen() has to do with flower people, or if you do and you need a break, acquaint yourself with Spinal Tap (if you have access to news groups).

Click on our Sponsors to help Support SunWorld

Resources

RFC 1340
http://www.cis.ohio-state.edu/htbin/rfc/rfc1340.html TCP/IP Network Administration http://www.ora.com/gnn/bus/ora/item/tcp.html
traceroute
ftp://ftp.uu.net/systems/unix/bsd-sources/usr.sbin/traceroute
http://www.iss.net
http://www.iss.net
http://www.iss.net/vd/sniff.html
http://www.iss.net/vd/sniff.html
A list of Hal Stern's other Sysadmin
/sunworldonline/common/swol-backissues-columns.html#sysadmin

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-11-1995/swol-11-sysadmin.html
Last modified:

X-rated sessions

X Window System clients talk to their display servers over a TCP connection. If you're trying to debug an X problem, such as an invalid request, you'll probably need to intercept the stream of requests and server responses that flow over the connection. An invaluable tool is xscope, a simple application that looks like an X display server to your application but is really an X-knowledgeable snoop equivalent. xscope dumps out the conversation between client and server while relaying the packets to and from the real server.

By default, the X server listens on port 6000, corresponding to display hostname:0. On a machine with multiple displays, the server for hostname:1 is on port 6001, hostname:2 on port 6002, and so on. xscope poses as a second (or third) display on a host and gateways to the real, first display. The simplest invocation of xscope is to have it pretend to be display 1 on the local host:

duey% xhost +duey
duey% xscope -i1
duey% xapp -display duey:1

The xhost command is necessary to allow connections to the real server from the local xscope application. xscope is started, listening as display 1 by virtue of the -i1 flag, and then the application is aimed at the new pseudo-display. You can debug several applications at once with several xscope sessions, each pretending to be a different display.

Since xscope is playing with TCP/IP connections, it can run on any host, whether or not the client or server processes are there. You can connect to display servers on other hosts, or connect to an xscope session from remote client servers. The next logical step is to extend the xscope notion to retransmit X requests to multiple servers. To enter the world of broadcasting, you'll need xmx. This X multicasting application was created at Brown University to drive more than 100 machines in a teaching environment from a single source. It's not exactly video conferencing, but it's a simple and fast way to make the right connections.

Comments:
Name:
Email:
Company Name: