Click on our Sponsors to help Support SunWorld
Sysadmin by Hal Stern

You too can understand
device numbers and mapping in Solaris

Unlocking the secret of Unix device drivers puts you in charge. We show you how it's done.

SunWorld
December  1996
[Next story]
[Table of Contents]
[Search]
Subscribe to SunWorld, it's free!

Abstract
Ever look at the /devices directory and wondered what the heck all those files were for? Read on to discover how Solaris configures its devices at boot time and how you can master the intricacies also. Note: This story uses the PRE tag and, as a result, the text might not size correctly if you change your browser window size. One result might be text running off the right side of the page. (2,900 words)


Mail this
article to
a friend

Running into someone who understands the intricacies of Unix device drivers is no longer the awe-inspiring experience retold from the days of yore. If you were impressed by Unix gurus who professed to write drivers using cat as a text editor, it's time to join the real world and enjoy improvements in kernel configuration, device mapping, and installation that have made low-level kernel knowledge less of a necessity for the average Unix system manager. But why dedicate a column to device numbering and mapping in Solaris?

While installation has become much more automated, troubleshooting remains a labor-intensive process. What do you do when you add a new disk drive, and it begins using a device number for which your database isn't prepared? How do you prevent device numbers from changing across reboots, and how do you get them to change when you need to remove hardware or replace failed components? Do you have high-availability configurations that require identical disk device names on both machines, even though the SCSI host adaptors are not quite identically installed and cabled? How do you fix older or third-party applications with hard-wired device names that fail in the brave new world of tongue-twisting geographical device names?

This month, we're going to put you back in charge of the hardware configuration with a tour of the device identification and numbering process. We'll start with a look back at how device numbers have been assigned and managed by Unix, and how the Solaris kernel makes the process much more dynamic -- and less deterministic at times. We'll dive a bit more deeply into the depths of device autoconfiguration and numbering under Solaris, followed by a look at persistence in device numbering and how to override the defaults and fix some common problems.

Land of 1,000 devices
The late jazz bassist Charles Mingus said that taking something complex and making it simple showed true creativity. One of the elegant simplicities of the Unix operating system is the way in which it presents physical device interfaces to the system programmer. Devices, such as disk drives, framebuffers, pseudo-terminals, and real serial ports appear as filesystem entries, allowing the usual set of file manipulation system calls to be used as the application programming interface. There's no need to learn a separate device liturgy for each new type of hardware. Reducing the API suite to a single set of interfaces makes it easier to port a database, for example, that may use raw disk devices or a filesystem.


Advertisements

However, the output of ls shows you that device entries in the filesystem aren't quite identical to those of regular files or directories:

luey% ls -l sd@3,0:a*
brw-r-----   1 root     sys       32, 24 Oct 14 12:17 sd@3,0:a
crw-r-----   1 root     sys       32, 24 Oct 14 12:17 sd@3,0:a,raw

The first character in the mode tells you if this is a character (c) or block (b) device; character devices are read a byte at a time, like normal files, while block devices can only be accessed in multiples of the block size. Disks are the most common block devices, while network interfaces, terminal devices of all flavors, and tape drives are character devices. Device, or special files, also sport a pair of numbers in place of a size; the numbers are the major and minor identifiers, respectively. Major numbers are indexes into the kernel's table of device drivers, associating routines to manipulate the device with the user-visible name for the hardware. Minor numbers are simply instance numbers for the device -- they tell you how many you have, and which particular unit of the device family you're addressing. The difficult problem is telling the kernel about a new device, and making sure it creates the appropriate associations between filesystem entries and its own configuration tables.

SunOS 4.x and its Berkeley heritage embedded the problem of device numbering in the kernel configuration file. If you wanted to add a new device or increase the largest device minor number in use, you had to reconfigure and rebuild the kernel. Even simple tasks, such as telling the kernel that the SCSI disk on target 4 was to be known as sd4 required hand-crafting configuration files and a kernel rebuild. SunOS devices live in the /dev directory of the root filesystem, a flat namespace for all device types and instances.

Solaris 2.x introduced dynamic kernel configuration, removing kernel configuration, builds, and links from the repertoire of regular system care and feeding. The Solaris kernel identifies the drivers it needs, links them in while building a table of major numbers, and then assigns minor numbers to devices it finds after booting. Add a new disk device, and Solaris assigns it the next available minor number. When you add a new type of device such as a quad ethernet controller, the major number table gets updated and the board's devices are identified starting with minor number 0. The /dev directory is now just a directory of links to the actual mapping of filesystem entries to geographic device descriptions in /devices.

Robbins 8th & Walnut: Our Name Is Our Address*
File names in the /devices hierarchy reflect the machine's physical connections and logical bus layout: the type of I/O interface, any address and slot or unit number, and a device name and minor number or other identifier:

brw-rw-rw-  1 root     sys       36,   0 Oct 14 12:17 ./obio/SUNW,fdtwo@0,700000:a
brw-r-----  1 root     sys       32,  26 Oct 14 12:17 ./iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0:c
crw-------  1 stern    11010     39,   0 Oct 14 12:17 ./iommu@f,e0000000/sbus@f,e0001000/cgsix@2,0:cgsix0

The first example is the floppy drive on my SPARCstation 10. It's attached to the on-board I/O controller (obio), and the device name is SUNW,fdtwo. It's at location 0, address 700000, and this device refers to the "a" partition of the disk. The second and third examples are for SBus-based devices. The second is a SCSI disk attached to the on-board SCSI controller. It's connected to the main system bus through the IOMMU (I/O memory management unit), which has a control address associated with it. Most on-board Sbus-connected devices that are on-board live in slot "f" -- including the control units. The next element in the pathname shows you it's an SBus device, also controlled through slot "f". The "esp" elements that follow are the ESP SCSI host adaptor's DMA channel, and the ESP SCSI host interface unit, also with control information. The final pathname component is the SCSI disk definition: it's at target 3, logical unit (LUN) 0, and this device refers to the "c" partition. The final example is for the frame buffer, a cgsix device, sitting in SBus slot 2 as indicated by the cgsix@2,0 element. While these pathnames are quite complex, they provide you a detailed view of how hardware is plugged into the machine, and what has been discovered by the boot prom. On server machines with multiple SBus interfaces, you'll see more variation in the IOMMU and Sbus addressing.

Building the device tree, and creating the symbolic links to it, is a complex process that is part of every system boot. The subtle hand-offs and dependencies involved in adding a new device would tax the skills of the American Ballet Theater or the Dallas Cowboys. Before we get into diagnostics and fine-tuning device configurations, let's walk through the boot process to see how the configuration files, minor numbers, and links are assembled.

Building it from memory: Constructing the device landscape
After a power-on self test, every current Sun/SPARC system uses its open boot prom (OBP, see "Open boot secrets revealed", SunWorld October 1995) to probe out attached hardware, building a machine topology that is kept in memory and handed off to the nascent kernel. If the reconfigure -r flag was passed to the boot program, the system will rebuild the /dev and /devices directories, adding new devices or renumbering and re-assigning those that have moved within the system.

A system device reconfiguration occurs in three major steps:

  1. Before the system is rebooted, any new types of devices must have their major numbers noted in /etc/name_to_major, and the appropriate device drivers installed in /kernel/drv. Most of the dirty work is done by the add_drv utility from within the vendor's installation script. The /etc/name_to_major file contains the associations of device types to major numbers; those major numbers are the first of the comma-separated numbers you'll see in an ls -l listing of the filesystem entries. Device aliases are also created by add_drv and noted in /etc/driver_aliases; a device alias is a short-hand notation for an even more hideously complex device type. For example, "fd" suffices to name the on-board floppy drive even though the formal device type is "SUNW,fdtwo".

  2. As the system boots, the drvconfig utility takes the in-memory device table assembled by the boot prom and builds the /devices directory. It adds in pseudo-devices, such as the kernel memory interface and the pseudo tty devices used for network logins, and assigns minor numbers to the devices recorded in /devices. As drvconfig does its work, it sets the permissions on each filesystem entry based on configuration information in /etc/minor_perm, assigning an owner and a filesystem mode to each new entry in /devices. While SunOS 4.x did most of the minor number assignment and device configuration in the kernel or at kernel build time, Solaris 2.x does it all from user level once the system has started the boot process. Using a user-level tool provides flexibility in reading disk-based configuration files, overrides, or other system-specific preferences without the muss and fuss of rebuilding kernels.

  3. To round out the sequence, the devlinks utility is executed to build the /dev directory, providing somewhat less horrific device names that are merely symbolic links back to the geographically undesirable /devices entries. Another departure from SunOS 4.x is that entries in /dev are organized in subdirectories by device type, so you'll find disks in /dev/dsk and tty devices in /dev/tty. Applications that need to open and close devices appreciate the narrower device directory.

If you feel that a small sleight-of-hand is going on somewhere between locating devices and building a consistent view of the world, you're either remarkably perceptive, of you've experienced that sinking feeling that comes from realizing that you are now swapping to the disk that had your database on it and that a major customer's order file is now represented by the swap pages underlying a rude JPEG image.

Consistency is everything: Retaining device state across reboots
If everything is done dynamically, how do you ensure that life remains the same across reboots? The answer comes from step 2 above, where drvconfig builds the /devices tree and assigns minor numbers. As drvconfig does its work, it is charged with maintaining a sense of history between boots -- it notes the mapping of physical, geographic addresses to minor numbers in the /etc/path_to_inst file, and updates this file if needed with new device information. Essentially, drvconfig's use of /etc/path_to_inst ensures that once you put root on sd3, it stays there, and that the data and log segments of your database on sd2 don't get mixed up with sd3 after a reboot to add a third disk drive.

If drvconfig can find a match between a device in the in-memory tree and an entry in /etc/path_to_inst, it continues using the minor number previously assigned. If a new device appears, it is given the next available minor number. The full geographic path to the device is noted in /etc/path_to_inst as shown by this excerpt for the sd3 and fd0 devices from the example above:

"/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0" 3 "sd"
"/obio/SUNW,fdtwo@0,700000" 0 "fd"

Note that a device minor number isn't re-used if the device once existed and then doesn't respond at boot time -- you don't want to renumber your disks if one dies, for example, and you're counting on your disk mirroring to get you through the failure. Smoldering disks shouldn't lead to a melting database as the hardware failure is communicated to you through a software disaster.

The implications of the "no re-use" policy can lead to unintentional renumbering, however. Let's say you have a quad Ethernet controller in board 1, Sbus slot 1 of a server, and you want to move it to a different I/O board. Physically moving the card doesn't change a thing as far as available hardware, but you've modified the geographic description of the machine. As drvconfig scans the in-memory device tree, it will believe that the "old" quad Ethernet card is dead, and that a new one has appeared in a previously unused slot. As a result, your network interfaces are assigned the next available minor numbers and show up as qe4, qe5, qe6 and qe7. If you hadn't taken the time to modify your /etc/hostname.* configurations, you'll have trouble using the network.

To work around this dynamic derailment of your desired configuration, edit /etc/path_to_inst by hand. You might want to do this if you add a second network interface and want to switch their minor numbers, changing the physical interface that is qe0 or le0 and therefore becomes the default route. To implement a change in minor device numbering, either correct the minor numbers you find in /etc/path_to_inst, or remove the entries for the devices you want renumbered and let drvconfig start from ground zero on a reboot. You must do a boot -r to get the changes to take effect. The manual page has more information but fails to put the following warning in huge flashing lights: do not remove /etc/path_to_inst, or you won't be able to find somewhat important devices like the root disk and the swap device. As with all key configuration files, make a backup, and preferably copy the file to another machine so you can inspect your handiwork later if required.

Small cordless devices: How to play with your hardware and not get toasted
Device configuration is yet another area where things go subtly wrong when you are under the most pressure. Here are some of the more useful tips and tricks to help your play with your devices:

Knowing how the system assembles the software representation of its hardware configuration may represent the closest thing to a computer's mind-body problem. It's up to you and your managerial devices to coax it through times of crisis.


Click on our Sponsors to help Support SunWorld


Resources


What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough
 
 
 
    

SunWorld
[Table of Contents]
Subscribe to SunWorld, it's free!
[Search]
Feedback
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-12-1996/swol-12-sysadmin.html
Last modified: