Inside Solaris 7Learn about the full 64-bit kernel and all the associated new commands and utilities |
On October 27, Sun announced the next major release of its Unix-based operating environment, Solaris 7. The single most salient feature of Solaris 7 is the implementation of a full 64-bit kernel, supporting both 32-bit and 64-bit applications. In addition, Solaris 7 adds integrated logging for UFS filesystems, dynamic reconfiguration (DR) support for Ultra Enterprise 3500 to 6500 I/O boards, new commands and utilities, and several other features in the areas of networking, administration, and the user environment. This month, Jim takes a look at the key features and functionality included in Solaris 7. (3,500 words)
Mail this article to a friend |
he inclusion of 64-bit support in Solaris has been phased in over the past several releases, as the notion of what is really required in terms of 64-bit support can vary depending on application requirements. The use of 64 bits in an application can be categorized as follows:
Take a look at this sample program written and compiled on a SPARCstation 20 running Solaris 2.6. The SPARCstation 20 implements the SuperSPARC processor, which is a 32-bit RISC processor.
sunsys> cat p1.c #include <stdio.h> #include <unistd.h> #include <sys/types.h> main() { longlong_t l; u_longlong_t ul; l=0x7fffffffffffffff; ul=0xffffffffffffffff; printf("long (decimal): \t%lld\nlong(hex): \t\t%llx\n",l,l); printf("unsigned long(dec): \t%llu\nunsigned long(hex): \t%llx\n",ul,ul); } sunsys> gcc -o p1 p1.c sunsys> p1 long (decimal): 9223372036854775807 long(hex): 7fffffffffffffff unsigned long(dec): 18446744073709551615 unsigned long(hex): ffffffffffffffff sunsys>
In this example, I ran cat on a program source file called p1.c, compiled it (using the GNU C compiler, the "gcc" command line), and executed the program. The use of the longlong_t and u_longlong_t data types provide for values of up to 9 exabytes and 18 exabytes, respectively, which are the maximum values for the 64-bit data type. Note that support for large data types for floating point arithmetic has always existed, as it is required in compliance with IEEE floating point standards. Both the SPARC version 8 (32-bit) and SPARC version 9 (64-bit) architectures define a 64-bit doubleword (for double precision floating point) and a 128-bit quadword (for quad precision floating point).
Solaris 7 is available as both a true 64-bit operating system, using native 64-bit data types for data and pointer values, and a 32-bit operating system, for use on non-UltraSPARC (non 64-bit) processors. The Solaris 7 distribution maintains separate directory hierarchies for the storage and loading of either 32-bit or 64-bit binaries (more on this later).
By selecting the appropriate pathname, an UltraSPARC-based system can be booted in with either the 32-bit or 64-bit kernel. A non-UltraSPARC based system, i.e., a system using a 32-bit processor, must boot the 32-bit kernel. The 64-bit kernel provides source and binary compatibility for 32-bit applications, meaning that existing 32-bit applications should run on the 64-bit kernel. 32-bit versions of the dynamically linked libraries (DLLs) are available when the 64-bit kernel is booted (along with the 64-bit versions, of course).
The only exceptions to 32-bit binary compatibility on a 64-bit kernel are applications that read kernel memory via the libkvm routines (e.g., kvm_open(3K), kvm_nlist(3K), etc.), or use /dev/mem or /dev/kmem. Also, the process filesystem, /proc, exposes the various components of a process, including its address space. As such, a 32-bit application may not work properly when reading a /proc process path of a 64-bit application.
The organization of the system is such that separate directory paths
are installed for the storage of 32-bit binaries and 64-bit
binaries. This includes kernel modules and drivers, loadable system
calls, shared object libraries, and commands. For each binary file
that requires a 64-bit equivalent, a given directory path in the
system will have a subdirectory named sparcv9. So, for
kernel drivers and modules, the bits live in /kernel/drv,
/usr/kernel/drv, and /platform/
From a command standpoint, most of the commands in Solaris 7
remain 32-bit binaries. There was no need to make them 64-bit
binaries. There are some exceptions, and those commands live in
/usr/bin/sparcv9, /bin/sparcv9, etc. Simply put, a consistent and
well understood directory name space has been implemented that
provides a clean and well defined separation of 32-bit and 64-bit
binaries, and allows for the existence of both sets of binaries on a
running system. This plays a big role in the 32-bit binary compatibility
that exists when running a 64-bit kernel.
The fundamental issue with 32-bit to 64-bit applications and kernel
modules has to do with the way the data model has changed in going from
32 bits to 64 bits. The data model defines the size of the different
data types that can be declared in a program. The 32-bit data model
used is known as the ILP32 model, where integers, longs, and pointers
are all 32-bit (4-byte) data types. The 64-bit model implemented is
LP64, where longs and pointers are 64 bits (8 bytes), and integers
remain at 32 bits.
It isn't uncommon to find code that uses longs and
integers interchangeably. Such code worked in the 32-bit ILP32
world since both data types are the same size. In the 64-bit LP64
model, however, they don't behave in an expected manner. Certain
assumptions made about maximum pointer values also lead to
unexpected program behavior, as pointers are 64 bits in the LP64
model.
It's beyond the scope of this column to get into the details
of application porting on 64-bit Solaris 7. However, the Solaris 7
documentation set includes the Solaris 7 64-bit Developers Guide
and 64-bit Solaris Advanced Systems Programming documents, which
provide a detailed and thorough treatment of these issues.
Turning the Solaris 7 kernel into a 64-bit operating system required
an extensive cleanup of the data types embedded in kernel data structures
and used for kernel variables, for the reasons stated above.
Additional work was required on potential data alignment
and/or size disparities in communication paths between the kernel
and user-land code (e.g., libraries), and on some of the kernel
subsystems, such as on-disk UFS structures. For example, the
Solaris kernel uses a lot of type definitions for providing data
types that are intuitive in use (things like pid_t for process IDs,
dev_t for device names, etc.). Obviously, these data types are not
native to the hardware or language used to build the kernel (mostly
C). They're defined in /usr/include/sys/types.h:
The pid_t type is actually a long, which is a native data type in C language.
The dev_t is defined from another derived type, the ulong_t, which is also
defined in the sys/types.h header file:
Any time you see a data type that has a _t postfix,
it is a derived type defined most likely in sys/types.h.
That said, let's get back to the example.
In the UFS code for the on-disk structures, one of the types used is
a daddr_t, which is a disk address. Pre-Solaris 7, daddr_t is
derived from a long:
In going from the ILP32 data model definition to LP64, longs
went from 32 bits to 64 bits. So, let's say we have a 64-bit kernel
booted, and we need to read the superblock from a UFS filesystem.
The address of the superblock was defined as a daddr_t type, which
would be 4 bytes on a 32-bit system (because it is derived from a
long) and 8 bytes on a 64-bit system. Thus, a 64-bit kernel would
read 8 bytes, the 4-byte superblock disk address
and the next 4 bytes of data on the disk, which in this case
is a 4-byte value used to locate the first cylinder group.
Obviously, this would result in severe compatibility problems.
There is no need to make daddr_t a 64-bit data type and break
backward compatibility with existing filesystems.
Such issues were addressed by creating fixed-width data types,
where the actual bit size of the data type is explicit in the name.
Extending our example above, in Solaris 7 the data type for the
super block disk address changed from daddr_t to daddr32_t, which is
a fixed-width, 32-bit data type. These are the types of changes that
needed to be made to ensure compatibility with various variables and
structure members of the kernel that didn't need to be expanded to
a 64-bit data type, but had to work with a 64-bit operating system.
Application programs have similar issues. In
Solaris 7, a 64-bit kernel will run both 32-bit and
64-bit applications (as previously mentioned). If you have a 32-bit
application and a 64-bit application accessing data in a shared
memory segment (which is supported), it's up to the developer to
ensure there's no disparity between the amount of data the 32-bit program
will read versus the 64-bit program for a given datum.
The fixed-width data types are available for use by application
programmers to address this exact issue (/usr/include/sys/inttypes.h).
Solaris 7 features
64-bit arithmetic will be more efficient as well, once the
appropriate compilers are installed. Prior to Solaris 7, 64-bit integer
arithmetic didn't make good use of the 64-bit hardware registers in
the UltraSPARC processors. With the new compilers and Solaris
7, 64-bit data will use one 64-bit hardware register for 64-bit data
manipulation and argument passing. You'll need new versions of the
compilers in order to get this functionality. More precisely, new
versions of the compiler will be required to create a 64-bit
executable.
A new object file format, ELF64, defines the format of a
64-bit executable program that implements
the LP64 data model. With the new compiler, the -xarch=v9 flag
must be passed to the cc command in order to generate a 64-bit
file. Such an executable will not be able to run on a 32-bit kernel
-- even a 32-bit kernel running on a 64-bit processor (UltraSPARC). Note that the Solaris 7 environment will allow for the development of 64-bit programs under the 32-bit kernel.
Solaris 7 provides a number of user commands that can be used to determine
which executables can be run on its system. The isainfo(1)
command has several flags that provide information on which binaries
the system is capable of running.
The isalist(1) command (isa stands for instruction set architecture)
provides executable support information by listing supported
instruction set architectures for the kernel that has been booted.
Both of the above examples are from a 64-bit kernel running on an
UltraSPARC system. You can use the SI_ISALIST flag in the sysinfo(2) call to
acquire the same information from within a program.
Integrated UFS logging
UFS logging is
designed to eliminate filesystem corruption due to a crash by
synchronously writing to the log area, and subsequently applying the
changes to the filesystem. This significantly reduces the
possibility that the filesystem could be left in an inconsistent
state, requiring fsck(1M) to repair or (even worse) restore from
a backup tape.
When a write transaction is sent to the filesystem
that has been mounted with logging enabled, the changed data is
written sequentially to the log, and then to a commit record. If a
crash occurs, changes that haven't been committed are discarded,
and committed transactions are rolled in the filesystem (much like
logging in an RDBMS). The log scanning process is much faster than
on fsck, so reboot times following crashes are greatly reduced. UFS
logging also has the advantage of making synchronous writes to the
filesystem faster, since all the updates are grouped together and
written sequentially to the log, allowing the synchronous write to
return faster. This holds true for directory updates as well.
The log itself is not on a separate partition, and is instead allocated
from free blocks on the filesystem. If a separate partition is
desired for logging, installing and configuring Sun Solstice
DiskSuite is required. The log space required is about 1 megabyte (MB) of log
for 1 GB of filesystem, with a maximum of 64 MB. The log is flushed regularly as it fills up. A filesystem unmount will also cause the log to be flushed.
Explaining the actual mechanics of logging would make for a lengthy column,
as it would require a review of how writes are done on a UFS without logging. We'll keep this topic in mind for a future column.
Another new UFS feature is the no access time mount option.
Filesystems mounted with this option will not update the inode
access time field when a file is accessed, thus saving an inode update.
The file's inode maintains three time fields: file creation, last
modified, and last accessed. For some applications, it may not be
necessary to track a file's last access time, especially applications
that do a lot of small-file I/O (e.g., a news server). If it's
determined that this feature won't compromise the application in
any way, it does provide a performance improvement.
SACK in Solaris
TCP SACK is negotiated between the two sides involved in the
connection at connect time, and of course both sides must support
it. TCP SACK can be turned on and off with the
tcp_sack_permitted parameter, which has three possible values:
A few more
New crash dump features have been added that allow for the creation
of a dedicated dump partition (as opposed to using swap partitions
by default). Now, there's no risk of having a dump file in a
swap partition, forgetting to run savecore(1M), and having system
paging activity overwrite the dump file before it can be saved to
a file in a filesystem. Also, a compression algorithm is applied
to dumps, so more dump data can be saved in less space. (We're seeing
about a 3 to 1 compression ratio.) The savecore(1M) command has been
enhanced to allow for getting a dump of a live system, using the
-L flag to savecore(1M) and a dedicated dump partition. There's a
new dumpadm(1M) command that should be used to manage the dump
partition and dump administration.
Support for dynamic reconfiguration (DR) for I/O boards on the Ultra
Enterprise 3x00 through 6x00 has been added. Without disrupting system operations,
DR notifies the OS that you wish to add or remove an I/O
board from a running system. DR has existed on the Sun
Enterprise 10000 system since that product began shipping, and is now being
added to the other server systems. DR support for CPU/memory boards for the Ultra
Enterprise 3x00 through 6x00 will be added in a near future release.
A new version of BIND has been added, version 8.1.2, which adds RFC
2136 compliance (dynamic updates) and RFC 1996 (zone change
notification). BIND is the executable required for support of the
DNS. Also in the network services area, a new
version of Sendmail, version 8.9.1b, ships with Solaris 7.
To read about the new features introduced with this version, see the Sendmail
link in the Resources section below.
Some interesting new commands have also been added:
Reference the appropriate man pages for details.
That's a wrap for this month. A discussion of Solaris 7
seemed like it would be very timely, providing readers with good overview of some
of the key features of this new release. Next month we'll get back
to the multithreaded process architecture series.
Resources
About the author
If you have technical problems with this magazine, contact
webmaster@sunworld.com
URL: http://www.sunworld.com/swol-11-1998/swol-11-insidesolaris.html
typedef ulong_t dev_t; /* expanded device type */
typedef long pid_t; /* process id type */
typedef unsigned long ulong_t;
typedef long daddr_t; /* <disk address> type */
Advertisements
Having provided some background on going from 32 bits to 64 bits,
let's walk through the feature set in Solaris 7. First, we'll look at the 64-bit
address space. As we said, processes can have larger memory footprints,
which means potentially more threads per process, larger data sets, or
larger shared memory segments. The actual high limit is not 16
exabytes, which is the largest attainable value for a 64-bit unsigned
data type, but rather 16 terabytes (TB, a trillion bytes). The
"limit" is imposed by the current implementations of the UltraSPARC
I and UltraSPARC II processors, which define a 44-bit virtual
address. Limit is really the wrong word, as 16 TB of address space
represents over 4,000 times more address space than the 32-bit
world had to offer.
tibet26> uname -a
SunOS tibet26 5.7 s998_21 sun4u sparc SUNW,Ultra-1
tibet26>
tibet26> isainfo -v
64-bit sparcv9 applications
32-bit sparc applications
tibet26>
tibet26> isainfo -n
sparcv9
tibet26>
tibet26> isalist
sparcv9+vis sparcv9 sparcv8plus+vis sparcv8plus sparcv8
sparcv8-fsmuld sparcv7 sparc
tibet26>
Integrated UFS logging is another new feature, where a UFS filesystem can be
mounted with logging enabled. (It's a mount(1M) option,
e.g., "mount -o logging /dev/dsk/c0t0d0s0 /mnt".) UFS logging
captures writes to the filesystem and logs changes to a log area in
the filesystem before applying the changes to the filesystem
metadata structures (inodes, superblock, etc.).
Solaris 7 adds an important new feature in the TCP/IP code --
support of selective acknowledgements, or SACK, in accordance with
RFC 2018 (see Resources below). This is
essentially a performance (TCP network throughput) enhancement that
is most noticeable with large data windows and with very long TCP
connections. TCP with SACK provides the ability for a TCP receiver
to selectively acknowledge receipt of out-of-order packets. Without
SACK, TCP uses a cumulative acknowledge mechanism, where the sender
can only determine if a single packet has been lost per round trip
time. For large window data transfers requiring multiple packets,
this can result in poor performance because the loss of a couple of
packets could require a retransmission of all or most of the packets
for a large data transfer. SACK allows for the resending of only
the packets that have been dropped, as opposed to resending all the
packets following the first packet lost, even if others
were received. I have found an example of this on the Web. It goes
like this:
Prior to selective acknowledgment, if
TCP lost packets 4 and 7 out of an 8-packet window, TCP would
receive acknowledgment of only packets 1, 2, and 3. Packets 4
through 8 would have to be resent. With selective acknowledgment,
TCP receives acknowledgment of packets 1, 2, 3, 5, 6, and 8. Only
packets 4 and 7 have to be resent. (Author unknown.)
There are several other additions to Solaris 7, most of which, like
UFS logging, could easily consume an entire column.
We'll take a brief look at them here, and go into more detail in future Inside Solaris columns.
http://www.cis.ohio-state.edu/htbin/rfc/rfc2018.html
http://www.isc.org/bind.html
http://www.sendmail.org
http://www.sendmail.org/sun-specific/migration+sun.html
http://www.sunworld.com/common/swol-backissues-columns.html#insidesolaris
Jim Mauro is currently an area technology manager for Sun
Microsystems in the northeast, focusing on server systems,
clusters, and high availability. He has a total of 18 years of industry
experience, working in educational services (he developed
and delivered courses on Unix internals and administration) and
software consulting.
Reach Jim at jim.mauro@sunworld.com.
Last modified: