You too can understand
Ever look at the /devices directory and wondered what the heck all those files were for? Read on to discover how Solaris configures its devices at boot time and how you can master the intricacies also. Note: This story uses the PRE tag and, as a result, the text might not size correctly if you change your browser window size. One result might be text running off the right side of the page. (2,900 words)
Running into someone who understands the
intricacies of Unix device drivers is no longer the awe-inspiring
experience retold from the days of yore. If you were impressed by Unix
gurus who professed to write drivers using
cat as a text
editor, it's time to join the real world and enjoy improvements in
kernel configuration, device mapping, and installation that have made
low-level kernel knowledge less of a necessity for the average Unix
system manager. But why dedicate a column to device numbering and
mapping in Solaris?
While installation has become much more automated, troubleshooting remains a labor-intensive process. What do you do when you add a new disk drive, and it begins using a device number for which your database isn't prepared? How do you prevent device numbers from changing across reboots, and how do you get them to change when you need to remove hardware or replace failed components? Do you have high-availability configurations that require identical disk device names on both machines, even though the SCSI host adaptors are not quite identically installed and cabled? How do you fix older or third-party applications with hard-wired device names that fail in the brave new world of tongue-twisting geographical device names?
This month, we're going to put you back in charge of the hardware configuration with a tour of the device identification and numbering process. We'll start with a look back at how device numbers have been assigned and managed by Unix, and how the Solaris kernel makes the process much more dynamic -- and less deterministic at times. We'll dive a bit more deeply into the depths of device autoconfiguration and numbering under Solaris, followed by a look at persistence in device numbering and how to override the defaults and fix some common problems.
Land of 1,000 devices
The late jazz bassist Charles Mingus said that taking something complex and making it simple showed true creativity. One of the elegant simplicities of the Unix operating system is the way in which it presents physical device interfaces to the system programmer. Devices, such as disk drives, framebuffers, pseudo-terminals, and real serial ports appear as filesystem entries, allowing the usual set of file manipulation system calls to be used as the application programming interface. There's no need to learn a separate device liturgy for each new type of hardware. Reducing the API suite to a single set of interfaces makes it easier to port a database, for example, that may use raw disk devices or a filesystem.
However, the output of
ls shows you that device entries
in the filesystem aren't quite identical to those of regular files or
luey% ls -l sd@3,0:a* brw-r----- 1 root sys 32, 24 Oct 14 12:17 sd@3,0:a crw-r----- 1 root sys 32, 24 Oct 14 12:17 sd@3,0:a,raw
The first character in the mode tells you if this is a character (c) or block (b) device; character devices are read a byte at a time, like normal files, while block devices can only be accessed in multiples of the block size. Disks are the most common block devices, while network interfaces, terminal devices of all flavors, and tape drives are character devices. Device, or special files, also sport a pair of numbers in place of a size; the numbers are the major and minor identifiers, respectively. Major numbers are indexes into the kernel's table of device drivers, associating routines to manipulate the device with the user-visible name for the hardware. Minor numbers are simply instance numbers for the device -- they tell you how many you have, and which particular unit of the device family you're addressing. The difficult problem is telling the kernel about a new device, and making sure it creates the appropriate associations between filesystem entries and its own configuration tables.
SunOS 4.x and its Berkeley heritage embedded the problem of device numbering in the kernel configuration file. If you wanted to add a new device or increase the largest device minor number in use, you had to reconfigure and rebuild the kernel. Even simple tasks, such as telling the kernel that the SCSI disk on target 4 was to be known as sd4 required hand-crafting configuration files and a kernel rebuild. SunOS devices live in the /dev directory of the root filesystem, a flat namespace for all device types and instances.
Solaris 2.x introduced dynamic kernel configuration, removing kernel configuration, builds, and links from the repertoire of regular system care and feeding. The Solaris kernel identifies the drivers it needs, links them in while building a table of major numbers, and then assigns minor numbers to devices it finds after booting. Add a new disk device, and Solaris assigns it the next available minor number. When you add a new type of device such as a quad ethernet controller, the major number table gets updated and the board's devices are identified starting with minor number 0. The /dev directory is now just a directory of links to the actual mapping of filesystem entries to geographic device descriptions in /devices.
Robbins 8th & Walnut: Our Name Is Our Address*
File names in the /devices hierarchy reflect the machine's physical connections and logical bus layout: the type of I/O interface, any address and slot or unit number, and a device name and minor number or other identifier:
brw-rw-rw- 1 root sys 36, 0 Oct 14 12:17 ./obio/SUNW,fdtwo@0,700000:a brw-r----- 1 root sys 32, 26 Oct 14 12:17 ./iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0:c crw------- 1 stern 11010 39, 0 Oct 14 12:17 ./iommu@f,e0000000/sbus@f,e0001000/cgsix@2,0:cgsix0
The first example is the floppy drive on my SPARCstation 10. It's
attached to the on-board I/O controller (obio), and the device name is
SUNW,fdtwo. It's at location 0, address 700000, and this
device refers to the "a" partition of the disk. The second and third
examples are for SBus-based devices. The second is a SCSI disk
attached to the on-board SCSI controller. It's connected to the main
system bus through the IOMMU (I/O memory management unit), which has a
control address associated with it. Most on-board Sbus-connected
devices that are on-board live in slot "f" -- including the control
units. The next element in the pathname shows you it's an SBus device,
also controlled through slot "f". The "esp" elements that follow are
the ESP SCSI host adaptor's DMA channel, and the ESP SCSI host
interface unit, also with control information. The final pathname
component is the SCSI disk definition: it's at target 3, logical unit
(LUN) 0, and this device refers to the "c" partition. The final
example is for the frame buffer, a cgsix device, sitting in SBus slot
2 as indicated by the
cgsix@2,0 element. While these
pathnames are quite complex, they provide you a detailed view of how
hardware is plugged into the machine, and what has been discovered by
the boot prom. On server machines with multiple SBus interfaces,
you'll see more variation in the IOMMU and Sbus addressing.
Building the device tree, and creating the symbolic links to it, is a complex process that is part of every system boot. The subtle hand-offs and dependencies involved in adding a new device would tax the skills of the American Ballet Theater or the Dallas Cowboys. Before we get into diagnostics and fine-tuning device configurations, let's walk through the boot process to see how the configuration files, minor numbers, and links are assembled.
Building it from memory: Constructing the device landscape
After a power-on self test, every current Sun/SPARC system uses its open boot prom (OBP, see "Open boot secrets revealed", SunWorld October 1995) to probe out attached hardware, building a machine topology that is kept in memory and handed off to the nascent kernel. If the reconfigure
-r flag was passed
to the boot program, the system will rebuild the /dev and
/devices directories, adding new devices or renumbering and
re-assigning those that have moved within the system.
A system device reconfiguration occurs in three major steps:
add_drvutility from within the vendor's installation script. The /etc/name_to_major file contains the associations of device types to major numbers; those major numbers are the first of the comma-separated numbers you'll see in an
ls -llisting of the filesystem entries. Device aliases are also created by
add_drvand noted in /etc/driver_aliases; a device alias is a short-hand notation for an even more hideously complex device type. For example, "fd" suffices to name the on-board floppy drive even though the formal device type is "SUNW,fdtwo".
drvconfigutility takes the in-memory device table assembled by the boot prom and builds the /devices directory. It adds in pseudo-devices, such as the kernel memory interface and the pseudo tty devices used for network logins, and assigns minor numbers to the devices recorded in /devices. As
drvconfigdoes its work, it sets the permissions on each filesystem entry based on configuration information in /etc/minor_perm, assigning an owner and a filesystem mode to each new entry in /devices. While SunOS 4.x did most of the minor number assignment and device configuration in the kernel or at kernel build time, Solaris 2.x does it all from user level once the system has started the boot process. Using a user-level tool provides flexibility in reading disk-based configuration files, overrides, or other system-specific preferences without the muss and fuss of rebuilding kernels.
devlinksutility is executed to build the /dev directory, providing somewhat less horrific device names that are merely symbolic links back to the geographically undesirable /devices entries. Another departure from SunOS 4.x is that entries in /dev are organized in subdirectories by device type, so you'll find disks in /dev/dsk and tty devices in /dev/tty. Applications that need to open and close devices appreciate the narrower device directory.
If you feel that a small sleight-of-hand is going on somewhere between locating devices and building a consistent view of the world, you're either remarkably perceptive, of you've experienced that sinking feeling that comes from realizing that you are now swapping to the disk that had your database on it and that a major customer's order file is now represented by the swap pages underlying a rude JPEG image.
Consistency is everything: Retaining device state across reboots
If everything is done dynamically, how do you ensure that life remains the same across reboots? The answer comes from step 2 above, where
drvconfig builds the /devices tree and assigns
minor numbers. As
drvconfig does its work, it is charged
with maintaining a sense of history between boots -- it notes the
mapping of physical, geographic addresses to minor numbers in the
/etc/path_to_inst file, and updates this file if needed with
new device information. Essentially,
drvconfig's use of
/etc/path_to_inst ensures that once you put root on sd3, it
stays there, and that the data and log segments of your database on
sd2 don't get mixed up with sd3 after a reboot to add a third disk
drvconfig can find a match between a device in the
in-memory tree and an entry in /etc/path_to_inst, it
continues using the minor number previously assigned. If a new device
appears, it is given the next available minor number. The full
geographic path to the device is noted in /etc/path_to_inst
as shown by this excerpt for the sd3 and fd0 devices from the example
"/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0" 3 "sd" "/obio/SUNW,fdtwo@0,700000" 0 "fd"
Note that a device minor number isn't re-used if the device once existed and then doesn't respond at boot time -- you don't want to renumber your disks if one dies, for example, and you're counting on your disk mirroring to get you through the failure. Smoldering disks shouldn't lead to a melting database as the hardware failure is communicated to you through a software disaster.
The implications of the "no re-use" policy can lead to unintentional
renumbering, however. Let's say you have a quad Ethernet controller in
board 1, Sbus slot 1 of a server, and you want to move it to a
different I/O board. Physically moving the card doesn't change a thing
as far as available hardware, but you've modified the geographic
description of the machine. As
drvconfig scans the
in-memory device tree, it will believe that the "old" quad Ethernet
card is dead, and that a new one has appeared in a previously unused
slot. As a result, your network interfaces are assigned the next
available minor numbers and show up as qe4, qe5, qe6 and qe7. If you
hadn't taken the time to modify your /etc/hostname.*
configurations, you'll have trouble using the network.
To work around this dynamic derailment of your desired configuration,
edit /etc/path_to_inst by hand. You might want to do this if
you add a second network interface and want to switch their minor
numbers, changing the physical interface that is qe0 or le0 and
therefore becomes the default route. To implement a change in minor
device numbering, either correct the minor numbers you find in
/etc/path_to_inst, or remove the entries for the devices you
want renumbered and let
drvconfig start from ground zero
on a reboot. You must do a
boot -r to get the changes to
take effect. The manual page has more information but fails to put the
following warning in huge flashing lights: do not remove
/etc/path_to_inst, or you won't be able to find somewhat
important devices like the root disk and the swap device. As with all
key configuration files, make a backup, and preferably copy the file
to another machine so you can inspect your handiwork later if
Small cordless devices: How to play with your hardware and not get toasted
Device configuration is yet another area where things go subtly wrong when you are under the most pressure. Here are some of the more useful tips and tricks to help your play with your devices:
prtconf -p, displaying what the boot prom found in the system. If you're having trouble getting a new device to work, make sure it's seen by the OBP using
prtconf; if it doesn't show up there it's not responding to the boot prom during self-test and auto configuration. Add the
-Poption and you'll see pseudo devices in the mix as well. Anything that's labeled "driver not attached" has been configured but not found.
drvconfiginvoked out of /etc/rc.S/S50drvconfig. Add the
-doption and watch the debugging information go by.
name="sd" class="scsi" target=0 lun=0; # target 0 LUN 0, default name="sd" class="scsi" target=0 lun=1; # target 0 LUN 1 name="sd" class="scsi" target=0 lun=2; # target 0 LUN 2 name="sd" class="scsi" target=0 lun=3; # target 0 LUN 3
You'll see the multiple units show up as sd@0,0 through sd@0,3 in /devices.
Knowing how the system assembles the software representation of its hardware configuration may represent the closest thing to a computer's mind-body problem. It's up to you and your managerial devices to coax it through times of crisis.
If you have technical problems with this magazine, contact firstname.lastname@example.org