|
You too can understand
|
Ever look at the /devices directory and wondered what the heck all those files were for? Read on to discover how Solaris configures its devices at boot time and how you can master the intricacies also. Note: This story uses the PRE tag and, as a result, the text might not size correctly if you change your browser window size. One result might be text running off the right side of the page. (2,900 words)
Mail this article to a friend |
Running into someone who understands the
intricacies of Unix device drivers is no longer the awe-inspiring
experience retold from the days of yore. If you were impressed by Unix
gurus who professed to write drivers using cat
as a text
editor, it's time to join the real world and enjoy improvements in
kernel configuration, device mapping, and installation that have made
low-level kernel knowledge less of a necessity for the average Unix
system manager. But why dedicate a column to device numbering and
mapping in Solaris?
While installation has become much more automated, troubleshooting remains a labor-intensive process. What do you do when you add a new disk drive, and it begins using a device number for which your database isn't prepared? How do you prevent device numbers from changing across reboots, and how do you get them to change when you need to remove hardware or replace failed components? Do you have high-availability configurations that require identical disk device names on both machines, even though the SCSI host adaptors are not quite identically installed and cabled? How do you fix older or third-party applications with hard-wired device names that fail in the brave new world of tongue-twisting geographical device names?
This month, we're going to put you back in charge of the hardware configuration with a tour of the device identification and numbering process. We'll start with a look back at how device numbers have been assigned and managed by Unix, and how the Solaris kernel makes the process much more dynamic -- and less deterministic at times. We'll dive a bit more deeply into the depths of device autoconfiguration and numbering under Solaris, followed by a look at persistence in device numbering and how to override the defaults and fix some common problems.
Land of 1,000 devices
The late jazz bassist Charles Mingus said that taking something
complex and making it simple showed true creativity. One of the
elegant simplicities of the Unix operating system is the way in which
it presents physical device interfaces to the system programmer.
Devices, such as disk drives, framebuffers, pseudo-terminals, and real
serial ports appear as filesystem entries, allowing the usual set of
file manipulation system calls to be used as the application
programming interface. There's no need to learn a separate device
liturgy for each new type of hardware. Reducing the API suite to a
single set of interfaces makes it easier to port a database, for
example, that may use raw disk devices or a filesystem.
|
|
|
|
However, the output of ls
shows you that device entries
in the filesystem aren't quite identical to those of regular files or
directories:
luey% ls -l sd@3,0:a* brw-r----- 1 root sys 32, 24 Oct 14 12:17 sd@3,0:a crw-r----- 1 root sys 32, 24 Oct 14 12:17 sd@3,0:a,raw
The first character in the mode tells you if this is a character (c) or block (b) device; character devices are read a byte at a time, like normal files, while block devices can only be accessed in multiples of the block size. Disks are the most common block devices, while network interfaces, terminal devices of all flavors, and tape drives are character devices. Device, or special files, also sport a pair of numbers in place of a size; the numbers are the major and minor identifiers, respectively. Major numbers are indexes into the kernel's table of device drivers, associating routines to manipulate the device with the user-visible name for the hardware. Minor numbers are simply instance numbers for the device -- they tell you how many you have, and which particular unit of the device family you're addressing. The difficult problem is telling the kernel about a new device, and making sure it creates the appropriate associations between filesystem entries and its own configuration tables.
SunOS 4.x and its Berkeley heritage embedded the problem of device numbering in the kernel configuration file. If you wanted to add a new device or increase the largest device minor number in use, you had to reconfigure and rebuild the kernel. Even simple tasks, such as telling the kernel that the SCSI disk on target 4 was to be known as sd4 required hand-crafting configuration files and a kernel rebuild. SunOS devices live in the /dev directory of the root filesystem, a flat namespace for all device types and instances.
Solaris 2.x introduced dynamic kernel configuration, removing kernel configuration, builds, and links from the repertoire of regular system care and feeding. The Solaris kernel identifies the drivers it needs, links them in while building a table of major numbers, and then assigns minor numbers to devices it finds after booting. Add a new disk device, and Solaris assigns it the next available minor number. When you add a new type of device such as a quad ethernet controller, the major number table gets updated and the board's devices are identified starting with minor number 0. The /dev directory is now just a directory of links to the actual mapping of filesystem entries to geographic device descriptions in /devices.
Robbins 8th & Walnut: Our Name Is Our Address*
File names in the /devices hierarchy reflect the
machine's physical connections and logical bus layout: the type of I/O
interface, any address and slot or unit number, and a device name and
minor number or other identifier:
brw-rw-rw- 1 root sys 36, 0 Oct 14 12:17 ./obio/SUNW,fdtwo@0,700000:a brw-r----- 1 root sys 32, 26 Oct 14 12:17 ./iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0:c crw------- 1 stern 11010 39, 0 Oct 14 12:17 ./iommu@f,e0000000/sbus@f,e0001000/cgsix@2,0:cgsix0
The first example is the floppy drive on my SPARCstation 10. It's
attached to the on-board I/O controller (obio), and the device name is
SUNW,fdtwo
. It's at location 0, address 700000, and this
device refers to the "a" partition of the disk. The second and third
examples are for SBus-based devices. The second is a SCSI disk
attached to the on-board SCSI controller. It's connected to the main
system bus through the IOMMU (I/O memory management unit), which has a
control address associated with it. Most on-board Sbus-connected
devices that are on-board live in slot "f" -- including the control
units. The next element in the pathname shows you it's an SBus device,
also controlled through slot "f". The "esp" elements that follow are
the ESP SCSI host adaptor's DMA channel, and the ESP SCSI host
interface unit, also with control information. The final pathname
component is the SCSI disk definition: it's at target 3, logical unit
(LUN) 0, and this device refers to the "c" partition. The final
example is for the frame buffer, a cgsix device, sitting in SBus slot
2 as indicated by the cgsix@2,0
element. While these
pathnames are quite complex, they provide you a detailed view of how
hardware is plugged into the machine, and what has been discovered by
the boot prom. On server machines with multiple SBus interfaces,
you'll see more variation in the IOMMU and Sbus addressing.
Building the device tree, and creating the symbolic links to it, is a complex process that is part of every system boot. The subtle hand-offs and dependencies involved in adding a new device would tax the skills of the American Ballet Theater or the Dallas Cowboys. Before we get into diagnostics and fine-tuning device configurations, let's walk through the boot process to see how the configuration files, minor numbers, and links are assembled.
Building it from memory: Constructing the device landscape
After a power-on self test, every current Sun/SPARC system uses its
open boot prom (OBP, see
"Open boot secrets revealed",
SunWorld October 1995) to probe out attached hardware,
building a machine topology that is kept in memory and handed off to
the nascent kernel. If the reconfigure -r
flag was passed
to the boot program, the system will rebuild the /dev and
/devices directories, adding new devices or renumbering and
re-assigning those that have moved within the system.
A system device reconfiguration occurs in three major steps:
add_drv
utility from within
the vendor's installation script. The /etc/name_to_major file
contains the associations of device types to major numbers; those
major numbers are the first of the comma-separated numbers you'll see
in an ls -l
listing of the filesystem entries. Device
aliases are also created by add_drv
and noted in
/etc/driver_aliases; a device alias is a short-hand notation
for an even more hideously complex device type. For example, "fd"
suffices to name the on-board floppy drive even though the formal
device type is "SUNW,fdtwo".
drvconfig
utility takes the
in-memory device table assembled by the boot prom and builds the
/devices directory. It adds in pseudo-devices, such as the
kernel memory interface and the pseudo tty devices used for network
logins, and assigns minor numbers to the devices recorded in
/devices. As drvconfig
does its work, it sets
the permissions on each filesystem entry based on configuration
information in /etc/minor_perm, assigning an owner and a
filesystem mode to each new entry in /devices. While SunOS
4.x did most of the minor number assignment and device configuration
in the kernel or at kernel build time, Solaris 2.x does it all from
user level once the system has started the boot process. Using a
user-level tool provides flexibility in reading disk-based
configuration files, overrides, or other system-specific preferences
without the muss and fuss of rebuilding kernels.
devlinks
utility is
executed to build the /dev directory, providing somewhat less
horrific device names that are merely symbolic links back to the
geographically undesirable /devices entries. Another
departure from SunOS 4.x is that entries in /dev are
organized in subdirectories by device type, so you'll find disks in
/dev/dsk and tty devices in /dev/tty. Applications
that need to open and close devices appreciate the narrower device
directory.
If you feel that a small sleight-of-hand is going on somewhere between locating devices and building a consistent view of the world, you're either remarkably perceptive, of you've experienced that sinking feeling that comes from realizing that you are now swapping to the disk that had your database on it and that a major customer's order file is now represented by the swap pages underlying a rude JPEG image.
Consistency is everything: Retaining device state across reboots
If everything is done dynamically, how do you ensure that life remains
the same across reboots? The answer comes from step 2 above, where
drvconfig
builds the /devices tree and assigns
minor numbers. As drvconfig
does its work, it is charged
with maintaining a sense of history between boots -- it notes the
mapping of physical, geographic addresses to minor numbers in the
/etc/path_to_inst file, and updates this file if needed with
new device information. Essentially, drvconfig
's use of
/etc/path_to_inst ensures that once you put root on sd3, it
stays there, and that the data and log segments of your database on
sd2 don't get mixed up with sd3 after a reboot to add a third disk
drive.
If drvconfig
can find a match between a device in the
in-memory tree and an entry in /etc/path_to_inst, it
continues using the minor number previously assigned. If a new device
appears, it is given the next available minor number. The full
geographic path to the device is noted in /etc/path_to_inst
as shown by this excerpt for the sd3 and fd0 devices from the example
above:
"/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0" 3 "sd" "/obio/SUNW,fdtwo@0,700000" 0 "fd"
Note that a device minor number isn't re-used if the device once existed and then doesn't respond at boot time -- you don't want to renumber your disks if one dies, for example, and you're counting on your disk mirroring to get you through the failure. Smoldering disks shouldn't lead to a melting database as the hardware failure is communicated to you through a software disaster.
The implications of the "no re-use" policy can lead to unintentional
renumbering, however. Let's say you have a quad Ethernet controller in
board 1, Sbus slot 1 of a server, and you want to move it to a
different I/O board. Physically moving the card doesn't change a thing
as far as available hardware, but you've modified the geographic
description of the machine. As drvconfig
scans the
in-memory device tree, it will believe that the "old" quad Ethernet
card is dead, and that a new one has appeared in a previously unused
slot. As a result, your network interfaces are assigned the next
available minor numbers and show up as qe4, qe5, qe6 and qe7. If you
hadn't taken the time to modify your /etc/hostname.*
configurations, you'll have trouble using the network.
To work around this dynamic derailment of your desired configuration,
edit /etc/path_to_inst by hand. You might want to do this if
you add a second network interface and want to switch their minor
numbers, changing the physical interface that is qe0 or le0 and
therefore becomes the default route. To implement a change in minor
device numbering, either correct the minor numbers you find in
/etc/path_to_inst, or remove the entries for the devices you
want renumbered and let drvconfig
start from ground zero
on a reboot. You must do a boot -r
to get the changes to
take effect. The manual page has more information but fails to put the
following warning in huge flashing lights: do not remove
/etc/path_to_inst, or you won't be able to find somewhat
important devices like the root disk and the swap device. As with all
key configuration files, make a backup, and preferably copy the file
to another machine so you can inspect your handiwork later if
required.
Small cordless devices: How to play with your hardware and not get toasted
Device configuration is yet another area where things go subtly
wrong when you are under the most pressure. Here are some of the
more useful tips and tricks to help your play with your devices:
prtconf -p
, displaying
what the boot prom found in the system. If you're having trouble
getting a new device to work, make sure it's seen by the OBP using
prtconf
; if it doesn't show up there it's not responding
to the boot prom during self-test and auto configuration. Add the
-P
option and you'll see pseudo devices in the mix as
well. Anything that's labeled "driver not attached" has been
configured but not found.
drvconfig
invoked out of
/etc/rc.S/S50drvconfig. Add the -d
option and
watch the debugging information go by.
name="sd" class="scsi" target=0 lun=0; # target 0 LUN 0, default name="sd" class="scsi" target=0 lun=1; # target 0 LUN 1 name="sd" class="scsi" target=0 lun=2; # target 0 LUN 2 name="sd" class="scsi" target=0 lun=3; # target 0 LUN 3
You'll see the multiple units show up as sd@0,0 through sd@0,3 in /devices.
Knowing how the system assembles the software representation of its hardware configuration may represent the closest thing to a computer's mind-body problem. It's up to you and your managerial devices to coax it through times of crisis.
|
Resources
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-12-1996/swol-12-sysadmin.html
Last modified: