Inside Solaris by Jim Mauro

Inside Solaris 7

Learn about the full 64-bit kernel and all the associated new commands and utilities

SunWorld
November  1998
[Next story]
[Table of Contents]
[Search]
Subscribe to SunWorld, it's free!

Abstract
On October 27, Sun announced the next major release of its Unix-based operating environment, Solaris 7. The single most salient feature of Solaris 7 is the implementation of a full 64-bit kernel, supporting both 32-bit and 64-bit applications. In addition, Solaris 7 adds integrated logging for UFS filesystems, dynamic reconfiguration (DR) support for Ultra Enterprise 3500 to 6500 I/O boards, new commands and utilities, and several other features in the areas of networking, administration, and the user environment. This month, Jim takes a look at the key features and functionality included in Solaris 7. (3,500 words)


Mail this
article to
a friend

The inclusion of 64-bit support in Solaris has been phased in over the past several releases, as the notion of what is really required in terms of 64-bit support can vary depending on application requirements. The use of 64 bits in an application can be categorized as follows:

Solaris 7 is available as both a true 64-bit operating system, using native 64-bit data types for data and pointer values, and a 32-bit operating system, for use on non-UltraSPARC (non 64-bit) processors. The Solaris 7 distribution maintains separate directory hierarchies for the storage and loading of either 32-bit or 64-bit binaries (more on this later).

By selecting the appropriate pathname, an UltraSPARC-based system can be booted in with either the 32-bit or 64-bit kernel. A non-UltraSPARC based system, i.e., a system using a 32-bit processor, must boot the 32-bit kernel. The 64-bit kernel provides source and binary compatibility for 32-bit applications, meaning that existing 32-bit applications should run on the 64-bit kernel. 32-bit versions of the dynamically linked libraries (DLLs) are available when the 64-bit kernel is booted (along with the 64-bit versions, of course).

The only exceptions to 32-bit binary compatibility on a 64-bit kernel are applications that read kernel memory via the libkvm routines (e.g., kvm_open(3K), kvm_nlist(3K), etc.), or use /dev/mem or /dev/kmem. Also, the process filesystem, /proc, exposes the various components of a process, including its address space. As such, a 32-bit application may not work properly when reading a /proc process path of a 64-bit application.

The organization of the system is such that separate directory paths are installed for the storage of 32-bit binaries and 64-bit binaries. This includes kernel modules and drivers, loadable system calls, shared object libraries, and commands. For each binary file that requires a 64-bit equivalent, a given directory path in the system will have a subdirectory named sparcv9. So, for kernel drivers and modules, the bits live in /kernel/drv, /usr/kernel/drv, and /platform//kernel/drv for 32-bit binaries, and in /kernel/drv/sparcv9/ for 64-bit binaries. In other words, for each of the directories listed for 32-bit kernel binaries, there is a subdirectory called sparcv9 that holds the corresponding 64-bit binaries. The same pattern follows loadable system calls and shared object libraries. (All the 32-bit shared object libraries live in /usr/lib, with the 64-bit versions in /usr/lib/sparcv9.)

From a command standpoint, most of the commands in Solaris 7 remain 32-bit binaries. There was no need to make them 64-bit binaries. There are some exceptions, and those commands live in /usr/bin/sparcv9, /bin/sparcv9, etc. Simply put, a consistent and well understood directory name space has been implemented that provides a clean and well defined separation of 32-bit and 64-bit binaries, and allows for the existence of both sets of binaries on a running system. This plays a big role in the 32-bit binary compatibility that exists when running a 64-bit kernel.

The fundamental issue with 32-bit to 64-bit applications and kernel modules has to do with the way the data model has changed in going from 32 bits to 64 bits. The data model defines the size of the different data types that can be declared in a program. The 32-bit data model used is known as the ILP32 model, where integers, longs, and pointers are all 32-bit (4-byte) data types. The 64-bit model implemented is LP64, where longs and pointers are 64 bits (8 bytes), and integers remain at 32 bits.

It isn't uncommon to find code that uses longs and integers interchangeably. Such code worked in the 32-bit ILP32 world since both data types are the same size. In the 64-bit LP64 model, however, they don't behave in an expected manner. Certain assumptions made about maximum pointer values also lead to unexpected program behavior, as pointers are 64 bits in the LP64 model.

It's beyond the scope of this column to get into the details of application porting on 64-bit Solaris 7. However, the Solaris 7 documentation set includes the Solaris 7 64-bit Developers Guide and 64-bit Solaris Advanced Systems Programming documents, which provide a detailed and thorough treatment of these issues.

Turning the Solaris 7 kernel into a 64-bit operating system required an extensive cleanup of the data types embedded in kernel data structures and used for kernel variables, for the reasons stated above. Additional work was required on potential data alignment and/or size disparities in communication paths between the kernel and user-land code (e.g., libraries), and on some of the kernel subsystems, such as on-disk UFS structures. For example, the Solaris kernel uses a lot of type definitions for providing data types that are intuitive in use (things like pid_t for process IDs, dev_t for device names, etc.). Obviously, these data types are not native to the hardware or language used to build the kernel (mostly C). They're defined in /usr/include/sys/types.h:

typedef ulong_t dev_t;                  /* expanded device type */
typedef long    pid_t;                  /* process id type      */

The pid_t type is actually a long, which is a native data type in C language. The dev_t is defined from another derived type, the ulong_t, which is also defined in the sys/types.h header file:

typedef unsigned long   ulong_t;

Any time you see a data type that has a _t postfix, it is a derived type defined most likely in sys/types.h.

That said, let's get back to the example.

In the UFS code for the on-disk structures, one of the types used is a daddr_t, which is a disk address. Pre-Solaris 7, daddr_t is derived from a long:

typedef long            daddr_t;        /* <disk address> type */

In going from the ILP32 data model definition to LP64, longs went from 32 bits to 64 bits. So, let's say we have a 64-bit kernel booted, and we need to read the superblock from a UFS filesystem. The address of the superblock was defined as a daddr_t type, which would be 4 bytes on a 32-bit system (because it is derived from a long) and 8 bytes on a 64-bit system. Thus, a 64-bit kernel would read 8 bytes, the 4-byte superblock disk address and the next 4 bytes of data on the disk, which in this case is a 4-byte value used to locate the first cylinder group. Obviously, this would result in severe compatibility problems. There is no need to make daddr_t a 64-bit data type and break backward compatibility with existing filesystems.

Such issues were addressed by creating fixed-width data types, where the actual bit size of the data type is explicit in the name. Extending our example above, in Solaris 7 the data type for the super block disk address changed from daddr_t to daddr32_t, which is a fixed-width, 32-bit data type. These are the types of changes that needed to be made to ensure compatibility with various variables and structure members of the kernel that didn't need to be expanded to a 64-bit data type, but had to work with a 64-bit operating system.

Application programs have similar issues. In Solaris 7, a 64-bit kernel will run both 32-bit and 64-bit applications (as previously mentioned). If you have a 32-bit application and a 64-bit application accessing data in a shared memory segment (which is supported), it's up to the developer to ensure there's no disparity between the amount of data the 32-bit program will read versus the 64-bit program for a given datum. The fixed-width data types are available for use by application programmers to address this exact issue (/usr/include/sys/inttypes.h).


Advertisements

Solaris 7 features
Having provided some background on going from 32 bits to 64 bits, let's walk through the feature set in Solaris 7. First, we'll look at the 64-bit address space. As we said, processes can have larger memory footprints, which means potentially more threads per process, larger data sets, or larger shared memory segments. The actual high limit is not 16 exabytes, which is the largest attainable value for a 64-bit unsigned data type, but rather 16 terabytes (TB, a trillion bytes). The "limit" is imposed by the current implementations of the UltraSPARC I and UltraSPARC II processors, which define a 44-bit virtual address. Limit is really the wrong word, as 16 TB of address space represents over 4,000 times more address space than the 32-bit world had to offer.

64-bit arithmetic will be more efficient as well, once the appropriate compilers are installed. Prior to Solaris 7, 64-bit integer arithmetic didn't make good use of the 64-bit hardware registers in the UltraSPARC processors. With the new compilers and Solaris 7, 64-bit data will use one 64-bit hardware register for 64-bit data manipulation and argument passing. You'll need new versions of the compilers in order to get this functionality. More precisely, new versions of the compiler will be required to create a 64-bit executable.

A new object file format, ELF64, defines the format of a 64-bit executable program that implements the LP64 data model. With the new compiler, the -xarch=v9 flag must be passed to the cc command in order to generate a 64-bit file. Such an executable will not be able to run on a 32-bit kernel -- even a 32-bit kernel running on a 64-bit processor (UltraSPARC). Note that the Solaris 7 environment will allow for the development of 64-bit programs under the 32-bit kernel.

Solaris 7 provides a number of user commands that can be used to determine which executables can be run on its system. The isainfo(1) command has several flags that provide information on which binaries the system is capable of running.

tibet26> uname -a
SunOS tibet26 5.7 s998_21 sun4u sparc SUNW,Ultra-1
tibet26> 
tibet26> isainfo -v
64-bit sparcv9 applications
32-bit sparc applications
tibet26>
tibet26> isainfo -n
sparcv9
tibet26>

The isalist(1) command (isa stands for instruction set architecture) provides executable support information by listing supported instruction set architectures for the kernel that has been booted.

tibet26> isalist
sparcv9+vis sparcv9 sparcv8plus+vis sparcv8plus sparcv8 
sparcv8-fsmuld sparcv7 sparc
tibet26>

Both of the above examples are from a 64-bit kernel running on an UltraSPARC system. You can use the SI_ISALIST flag in the sysinfo(2) call to acquire the same information from within a program.

Integrated UFS logging
Integrated UFS logging is another new feature, where a UFS filesystem can be mounted with logging enabled. (It's a mount(1M) option, e.g., "mount -o logging /dev/dsk/c0t0d0s0 /mnt".) UFS logging captures writes to the filesystem and logs changes to a log area in the filesystem before applying the changes to the filesystem metadata structures (inodes, superblock, etc.).

UFS logging is designed to eliminate filesystem corruption due to a crash by synchronously writing to the log area, and subsequently applying the changes to the filesystem. This significantly reduces the possibility that the filesystem could be left in an inconsistent state, requiring fsck(1M) to repair or (even worse) restore from a backup tape.

When a write transaction is sent to the filesystem that has been mounted with logging enabled, the changed data is written sequentially to the log, and then to a commit record. If a crash occurs, changes that haven't been committed are discarded, and committed transactions are rolled in the filesystem (much like logging in an RDBMS). The log scanning process is much faster than on fsck, so reboot times following crashes are greatly reduced. UFS logging also has the advantage of making synchronous writes to the filesystem faster, since all the updates are grouped together and written sequentially to the log, allowing the synchronous write to return faster. This holds true for directory updates as well.

The log itself is not on a separate partition, and is instead allocated from free blocks on the filesystem. If a separate partition is desired for logging, installing and configuring Sun Solstice DiskSuite is required. The log space required is about 1 megabyte (MB) of log for 1 GB of filesystem, with a maximum of 64 MB. The log is flushed regularly as it fills up. A filesystem unmount will also cause the log to be flushed.

Explaining the actual mechanics of logging would make for a lengthy column, as it would require a review of how writes are done on a UFS without logging. We'll keep this topic in mind for a future column.

Another new UFS feature is the no access time mount option. Filesystems mounted with this option will not update the inode access time field when a file is accessed, thus saving an inode update. The file's inode maintains three time fields: file creation, last modified, and last accessed. For some applications, it may not be necessary to track a file's last access time, especially applications that do a lot of small-file I/O (e.g., a news server). If it's determined that this feature won't compromise the application in any way, it does provide a performance improvement.

SACK in Solaris
Solaris 7 adds an important new feature in the TCP/IP code -- support of selective acknowledgements, or SACK, in accordance with RFC 2018 (see Resources below). This is essentially a performance (TCP network throughput) enhancement that is most noticeable with large data windows and with very long TCP connections. TCP with SACK provides the ability for a TCP receiver to selectively acknowledge receipt of out-of-order packets. Without SACK, TCP uses a cumulative acknowledge mechanism, where the sender can only determine if a single packet has been lost per round trip time. For large window data transfers requiring multiple packets, this can result in poor performance because the loss of a couple of packets could require a retransmission of all or most of the packets for a large data transfer. SACK allows for the resending of only the packets that have been dropped, as opposed to resending all the packets following the first packet lost, even if others were received. I have found an example of this on the Web. It goes like this:

Prior to selective acknowledgment, if TCP lost packets 4 and 7 out of an 8-packet window, TCP would receive acknowledgment of only packets 1, 2, and 3. Packets 4 through 8 would have to be resent. With selective acknowledgment, TCP receives acknowledgment of packets 1, 2, 3, 5, 6, and 8. Only packets 4 and 7 have to be resent. (Author unknown.)

TCP SACK is negotiated between the two sides involved in the connection at connect time, and of course both sides must support it. TCP SACK can be turned on and off with the tcp_sack_permitted parameter, which has three possible values:

A few more
There are several other additions to Solaris 7, most of which, like UFS logging, could easily consume an entire column. We'll take a brief look at them here, and go into more detail in future Inside Solaris columns.

New crash dump features have been added that allow for the creation of a dedicated dump partition (as opposed to using swap partitions by default). Now, there's no risk of having a dump file in a swap partition, forgetting to run savecore(1M), and having system paging activity overwrite the dump file before it can be saved to a file in a filesystem. Also, a compression algorithm is applied to dumps, so more dump data can be saved in less space. (We're seeing about a 3 to 1 compression ratio.) The savecore(1M) command has been enhanced to allow for getting a dump of a live system, using the -L flag to savecore(1M) and a dedicated dump partition. There's a new dumpadm(1M) command that should be used to manage the dump partition and dump administration.

Support for dynamic reconfiguration (DR) for I/O boards on the Ultra Enterprise 3x00 through 6x00 has been added. Without disrupting system operations, DR notifies the OS that you wish to add or remove an I/O board from a running system. DR has existed on the Sun Enterprise 10000 system since that product began shipping, and is now being added to the other server systems. DR support for CPU/memory boards for the Ultra Enterprise 3x00 through 6x00 will be added in a near future release.

A new version of BIND has been added, version 8.1.2, which adds RFC 2136 compliance (dynamic updates) and RFC 1996 (zone change notification). BIND is the executable required for support of the DNS. Also in the network services area, a new version of Sendmail, version 8.9.1b, ships with Solaris 7. To read about the new features introduced with this version, see the Sendmail link in the Resources section below.

Some interesting new commands have also been added:

Reference the appropriate man pages for details.

That's a wrap for this month. A discussion of Solaris 7 seemed like it would be very timely, providing readers with good overview of some of the key features of this new release. Next month we'll get back to the multithreaded process architecture series.


Resources


About the author
Jim Mauro is currently an area technology manager for Sun Microsystems in the northeast, focusing on server systems, clusters, and high availability. He has a total of 18 years of industry experience, working in educational services (he developed and delivered courses on Unix internals and administration) and software consulting. Reach Jim at jim.mauro@sunworld.com.

What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough
 
 
 
    

SunWorld
[Table of Contents]
Subscribe to SunWorld, it's free!
[Search]
Feedback
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-11-1998/swol-11-insidesolaris.html
Last modified: