A TCP/IP primer
Here's what you need to know about the programmatic plumbing running the Internet (and your LAN).
Sure, you ride along on the information super-highway (this is the only time we'll use that abominable phrase) but do you know what the road is made of? We think not. People needing the basics should read on. (3,000 words)
Networking has become key to many business processes. Keeping that network running smoothly falls into the domain of mission-critical functions. Now that the Internet has exploded onto the front pages of USA Today and has references sprinkled through the Wall Street Journal, the amount of work created for system administrators who run sites, fix configuration problems, and resolve user complaints has also increased in magnitude.
This month, we're going to look at the business of making connections between TCP/IP end-points. It's a small subset of the broader network-administration problems of performance, security, and controlled-growth management, but it's also the most fundamental. If you can't get connected or you can't keep unwanted network entities from connecting to you, then the big picture gets obscured by the pile of complaints reading "the network is down."
Making TCP/IP connections appears to be fairly simple: You identify the other host, specify a service you want to use, and blast data down the wire, hoping that you'll enjoy some semblance of good performance and reliable security all the while. Meeting those expectations is what makes TCP/IP connection management an interesting problem. We'll review the mechanics of TCP/IP connections and service location and go through the details of linking two end-points over a socket.
Junction introduction: Getting linked
The first requirement for establishing a TCP/IP connection is the name of the other end. That name consists of two parts: the remote IP address and a port number. IP addresses are obtained through the Domain Name Service (DNS), Network Information Service (NIS), or the local /etc/hosts file, all of which map hostnames to IP addresses. The port number is a bit trickier because it depends on the service being used. Each host numbers ports starting at 1, with a separate set of ports for TCP and the User Datagram Protocol (UDP). Since each IP packet contains information about the higher-level protocol it is carrying, the two port sets cannot be confused. Port numbers below 1024 are known as reserved ports and can be opened only by processes running as root; port numbers 1024 and above are unrestricted.
One way a budding TCP/IP connection determines the remote port number is to look in the /etc/services file (or the services.byname NIS database). This file correlates the well-known name of a service, the port number, and the protocol used for that service. Not every service is listed in /etc/services, since some connections are made to well-known (or hard-coded) port numbers. The X Window System, for example, uses port 6000 for its default window server. If you see cryptic port numbers floating around your network, use RFC 1340 to decipher them. This list of well-known port numbers will tell, for example, that port 119 is used for the Server Message Block (SMB) protocol of Lan Manager, even though this doesn't appear in most /etc/services files.
Every TCP/IP connection is uniquely identified by the local and remote IP addresses and the local and remote port numbers. Let's say you have five ftp sessions going at once, all connected to port 21 on your public ftp server from your desktop. How are the five distinguished, since they all connect to the same remote IP address and port number? The differentiator is the local port number, which is assigned by the local TCP/IP stack when the connection is made.
To get a list of active TCP/IP connections, use
huey% netstat -a UDP Local Address State -------------------- ------- *.sunrpc Idle *.32782 Idle *.nfsd Idle *.811 Idle TCP Local Address Remote Address Swind Send-Q Rwind Recv-Q State -------------------- -------------------- ----- ------ ----- ------ ------- kfir.6000 kfir.33298 16384 0 16384 0 ESTABLISHED kfir.33322 suntea01.45677 8760 0 8760 0 ESTABLISHED *.33313 *.* 0 0 8576 0 LISTEN
The first part of the listing shows UDP services, which are connectionless. The addresses are of the form IP.port, with a * indicating a "don't care" value. In the TCP section, you see both local and remote addresses, the window sizes, and the queue for both sides. The window sizes clamp the largest buffer that may be transmitted in one shot, while the queue shows you the packet backlog. Finally, the state column tells you how the connection has progressed or fallen apart. The first TCP connection in the example is a process talking to the X Window server on port 6000, the second is a process talking to another machine (probably a database server, given the unknown port number), and the final TCP connection is waiting to happen -- it's a process waiting for connections on port 33313.
We'll talk primarily about TCP (as opposed to UDP) services from here on, since we're looking at how connections are made and broken. TCP is a connection-oriented, or stream protocol, so it handles the additional overhead of building, maintaining, and tearing down connections. Some of the debugging and security tips, like dealing with configuration files and instituting outside access controls, apply to both protocols, but we'll point out TCP-specific issues and problems.
Connection direction: The wonders of inetd and rpcbind
Before you can establish a connection, you need a process listening on the other end. Long-lived services, such as the in.routed routing daemon, start processes at boot time that listen on their designated ports. Given the wide universe of services, however, it's not efficient to have several hundred servers sitting idle waiting for connections that may never be made. Furthermore, some services, like ftp or telnet, expect many connections to be made and need to spawn additional processes to handle sessions as they are created.
The inetd daemon plays intermediary for this procedure,
listening on sockets for well-known services. The
/etc/inetd.conf file determines the services handled by
inetd, and the daemon to be started when a connection is made.
When you issue
telnet fuzzy, for example, your machine
connects to port 23 on machine fuzzy, held open by inetd on
the remote side. The inetd daemon forks and executes a copy of
in.telnetd, the daemon associated with the telnet service,
handing the open file descriptor for the incoming socket connection to
the new telnet daemon. When adding a new Internet service to a machine,
consider whether it should be managed by inetd or with a
stand-alone daemon. If you expect the service to be used sporadically,
with multiple connections, inetd is a good managing agent,
since it will restart daemons and spawn subprocesses but not overrun
the machine with idle processes.
Of course, life isn't simple enough to leave you with only two
sources of port numbers. RPC-based services, such as NIS, NFS, and the
network lock manager (rpc.lockd), use a different arbiter of port
numbers. RPC services are identified by 32-bit RPC program numbers, a
version number, and a protocol. The portmapper (Solaris 1.x) and
rpcbind (Solaris 2.x) processes map RPC identifiers to TCP or
UDP port numbers. Each RPC service registers its program numbers,
versions, and protocols with rpcbind at start-up, and then
rpcbind happily hands these out to connection-inquiring
processes. View the list of currently registered processes using
duey% rpcinfo -p program vers proto port service 100007 3 udp 32773 ypbind 100007 3 tcp 32771 ypbind 100011 1 udp 32782 rquotad 100021 1 udp 32789 nlockmgr 100021 1 tcp 32779 nlockmgr 100001 2 udp 32799 rstatd 100001 3 udp 32799 rstatd 100001 4 udp 32799 rstatd 100005 2 udp 32819 mountd 100005 2 tcp 32789 mountd
You'll see entries for both TCP and UDP protocols. Note that different versions of the same RPC program can use the same port registration, since usually they are handled by the same process.
In the UDP world, RPC servers also respond to broadcasts, which is a neat trick considering the broadcasting process doesn't know what the remote port number is in advance -- and the port number might be different on each RPC server host. The rpcbind process comes to the rescue, completing an indirect call to the local RPC server. In addition to handling broadcasts, the indirect calling mechanism eliminates an extra round-trip between client and server. Instead of asking for the port number, and then calling the RPC server directly, the RPC client asks rpcbind to make the call on its behalf. The best-known example of indirect calling is NIS when searching for a server. The NIS client sends out a broadcast to the portmap/rpcbind processes, asking for an indirect call to the ypserv service. The rpcbind processes listen on port 111, given the somewhat solar-centric name sunrpc. If an NIS server has registered with rpcbind, the request is passed to that server, with the result delivered directly back to the calling client.
Friction, fraction, and inaction: It's broken
Given the number of daemons and configuration files involved, it's no wonder that TCP/IP connection attempts break down with frequency. Here's a laundry list of common problems and solutions or workarounds.
services: files nis
If you are trying to debug a problem with an intermittent network
failure, or one in which NIS results are not consistent, try removing
nis from the hosts and services paths, install "clean"
copies of /etc/services and /etc/hosts, and watch the
system's behavior with NIS taken out of the loop.
duey% rpcinfo -u luey nfs 2 program 100003 version 2 ready and waiting
This example uses the UDP protocol to call the RPC nfs service, using version 2. RPC service names are listed in /etc/rpc.
duey# ndd -set /dev/ip ip_path_mtu_discovery 0
Various Ethernet-FDDI bridges also have this problem, unable to pass full-sized FDDI frames onto the smaller Ethernet medium. Your best bet is to watch the traffic patterns and notice if the time-outs and connection problems appear in concert with full-sized frames.
netstat -a; port assignments live on after the processes owning them exit. The default recycling period is about 4 minutes but this, too, can be decreased using ndd:
duey# ndd -set /dev/tcp tcp_close_wait_interval 60000
The example sets the timeout to one minute.
This cursory overview of TCP/IP headaches wasn't intended to make you an expert. Practical experience and some good references are essentials. Doug Comer's Internetworking with TCP/IP (Prentice-Hall), Craig Hunt's TCP/IP Network Administration (O'Reilly), and the relevant RFCs are a good start. You'll also need some diagnostic tools. While they are commonly called "sniffers," that name is a proper trademark of Network General Corp. Some basic items for your toolbox include traceroute for identifying routing problems and snoop (Solaris 2) or etherfind (Solaris 1) for capturing local packet traffic. The network sniffer FAQ at http://www.iss.net contains pointers to several other network analysis tools like Interman and Etherman.
Sites that have a heavy dependency on the X Window System, particularly those groups developing applications to run under X, should pick up the tools described in the sidebar X-rated sessions.
And if you don't know what
listen() has to do with flower
people, or if you do and you need a break, acquaint yourself with
Spinal Tap (if you have access to news groups).
If you have technical problems with this magazine, contact firstname.lastname@example.org
X Window System clients talk to their display servers over a TCP connection. If you're trying to debug an X problem, such as an invalid request, you'll probably need to intercept the stream of requests and server responses that flow over the connection. An invaluable tool is xscope, a simple application that looks like an X display server to your application but is really an X-knowledgeable snoop equivalent. xscope dumps out the conversation between client and server while relaying the packets to and from the real server.
By default, the X server listens on port 6000, corresponding to display hostname:0. On a machine with multiple displays, the server for hostname:1 is on port 6001, hostname:2 on port 6002, and so on. xscope poses as a second (or third) display on a host and gateways to the real, first display. The simplest invocation of xscope is to have it pretend to be display 1 on the local host:
duey% xhost +duey duey% xscope -i1 duey% xapp -display duey:1
The xhost command is necessary to allow connections to the real server from the local xscope application. xscope is started, listening as display 1 by virtue of the -i1 flag, and then the application is aimed at the new pseudo-display. You can debug several applications at once with several xscope sessions, each pretending to be a different display.
Since xscope is playing with TCP/IP connections, it can run on any host, whether or not the client or server processes are there. You can connect to display servers on other hosts, or connect to an xscope session from remote client servers. The next logical step is to extend the xscope notion to retransmit X requests to multiple servers. To enter the world of broadcasting, you'll need xmx. This X multicasting application was created at Brown University to drive more than 100 machines in a teaching environment from a single source. It's not exactly video conferencing, but it's a simple and fast way to make the right connections.