Performance monitoring and capacity planning
What you need to know when configuring a server for client/server computing
It's impossible to know if a new server will support a requested load because of the complex combination of hardware, operating system, DBMS, and the application. However, performing a rough analysis of the key transactions can help to determine what configurations will not handle the load. The authors outline this and other aspects of monitoring and capacity planning in this column. (6,900 words)
Managers of distributed, client/server environments wrestle with this problem too. But they must endure the added complexity of dealing with multiple sites.
In supporting the functions of capacity planning and performance monitoring, IT organizations are really assessing the service levels they provide, which are decided by customer expectations:
As those associated with the mainframe data centers know, this is not easy, and even old-hands guess wrong. The difficulties arise from several reasons:
While some view capacity planning as an art form, you can approach it scientifically. Keep in mind it is easier to find that a configuration will not support a specific service level than to predict it will. For example, it is easy to determine that a system with a single disk drive cannot achieve random access throughput rates of 130 accesses per second if that one disk can handle only 65. However, a system with two disks (each of which can handle 65 accesses per second) may or may not handle the same load, because the bottleneck may not be in the disk subsystem.
Also, actual use follows Parkinson's Law and will eventually employ all available resources. (This is most distressful when customers devour an over-configured system by piling on unplanned tasks.)
To properly analyze a distributed environment, view it as a series of connected components (computers, peripherals, software, networks, etc.), and break these down into their individual, measurable components. For example, client machines are composed of CPUs, memory, software, busses, and peripheral devices. Each of these can be monitored as to its effects on the entire configuration's total performance. As with the old saying about a chain being as strong as its weakest link, a distributed environment's performance is often set by its weakest component.
A distributed environment has to be viewed both from a low-level perspective, examining each system and subsystem, and from a high-level perspective where the entire network of systems is considered. Where there used to be a data center made up of a single, large mainframe, there is now an entire network (or networks) making up the data center. Hence our phrase, "The network is the data center." This view is important in preparing to install new systems and tuning a system after it is installed. Most performance and tuning efforts are largely after-the-fact (after systems are installed) analyses of encountered bottlenecks. As far as tuning a system (or a network), the process is really that of finding the weakest component and making it stronger, thus removing that bottleneck.
Configuring and capacity planning
This section provides information about how to configure a system for a database management environment. The concentration is on load characteristics and usage and how the usage interacts with the machine architecture to affect end-user performance.
The information comes from several years of experience and research at Sun Microsystems and is hardly the final word. The authors acknowledge the research and testing by all of the IT staff at Sun Microsystems. We especially want to thank Rich Stehn for his research in this area. His writings form the basis of this column.
Here's an outline of what's ahead:
Configuring DBMS servers
Probably the most common single class of applications for client/server systems is database management systems (DBMSs). There are several popular DBMSs, each with quite different characteristics. Because of these differences, the following discussion is general. While it is almost impossible to determine exactly how many users a system will support, using a few basic strategies can help you make an informed prediction.
All DBMSs are different
Both database-oriented applications and DBMSs themselves vary widely in nature, and cannot be pegged as purely transaction- or data-intensive.
While there are several fundamental database architectures available today, most Unix users settle on the relational model from either Oracle, Informix, Sybase, or Ingres.
Even with most systems operating under the same broad conceptual framework, there are architectural differences between the products. The most significant is the implementation of the DBMS itself. The are two major classes being "2N" and "multithreaded."
The older "2N" tools use a process on the server for each client, even if the client is running on a physically different system. Therefore, each client application uses two processes, one on the server and one on the client.
Multithreaded applications are designed to avoid the extra expense of managing so many processes, and typically have a cluster of processes (from one to five) running on the server. These processes are multithreaded internally so that they service requests from multiple clients. Most major DBMS vendors use a multithreaded implementation, or are moving in that direction.
Given the diversity of applications, DBMS implementations, workloads, users, and requirements, any vendor willing to provide a definite answer to "How many users will this system support?" is either winging it or has made a detailed, in-depth analysis of an actual installation just like yours. It's easy to tell which type of answer you're getting.
Application load characterization
Characterizing the load generated by a database application is impossible without detailed information about what the application accomplishes. Even with such information, many assumptions must be made about how the DBMS itself will access data, how effective the DBMS's disk cache might be on specific transactions, or even what the mix of transactions might be. For now, the most helpful characterization of the load generated is "light," "medium," "heavy," and "very heavy." The heaviest common workloads are those associated with very large applications such as Oracle Financials.
The primary application class that falls into the "very heavy" workload is decision support. Because of the diverse nature of decision support queries, it is very difficult for database administrators or the DBMS itself to provide extensive, useful optimization. Decision support queries stress the underlying system due to frequent multi-way joins.
Configuration guidelines for DBMS servers
Although configuration guidelines can (and will) be provided, their usefulness is drastically affected by application considerations. The efficiencies of the application and DBMS are much more important than the host machine configuration. There are literally hundreds of examples of small changes in applications or database schema making 100- or 1,000-fold (or more) improvements in performance.
For instance, a database select statement that requests one specific record may cause the DBMS to read one record from the tables or every record in the table, depending on whether or not the table is indexed by the lookup key. Often a table must be indexed by more than one key (or set of keys) to accommodate the different access patterns generated by the applications. Careful indexing can have dramatic effects on total systems performance. After systems are installed, it is worthwhile monitoring to decide if changes should be made to the database (even for internally-developed or "off-the-shelf" third-party applications). It is often possible to improve the performance of an application by reorganizing the database (even without changing anything in the application's source code).
Another consideration that receives little notice but often affects performance is internal lock contention. The DBMS must lock data against conflicting simultaneous access. Any other process that requires access to this data will be unable to go on until the lock is released. A system may perform poorly due to an inefficient locking strategy.
DBMSs offers many tunable parameters, some having dramatic effects on performance. Below are our recommendations for applications and DBMSs that have already been tuned.
The following questions summarize the process of arriving at an accurate DBMS configuration:
Most DBMS applications consist of three logical parts:
When it is feasible to do so, use of the client/server model to separate the front-end processing and presentation services from the DBMS provider usually provides a substantial improvement in total system performance. This lets the critical resource, the DBMS provider, operate unimpeded on its host system. This is particularly true for systems dominated by activities, such as driving hundreds or thousands of terminals.
The opposite of client/server mode is "timesharing." Timesharing usually delivers higher performance only when the presentation requirements are very light or when the concurrent user load is light. Applications that have forms-based presentation are usually never light.
Transaction processing monitors
The use of a transaction processing monitor is one method of achieving higher performance from a given configuration, especially in client/server mode. They are also extremely useful in constructing heterogeneous databases with some data in one format (such as Oracle on an HP server) and other data in a different format (such as IMS on an IBM mainframe). Also, some monitors offer lightweight presentation services.
The monitors achieve their results by positioning themselves between the DBMSs and the application. The application must be set-up to issue transactions (written in the monitor's language) rather than directly accessing the database (such as various forms of embedded SQL). The application programmer is also responsible for writing a description file that maps transactions into specific database access in the native language of the DBMS (this is SQL for almost all Unix DBMSs).
Data access flexibility
There is little restriction on the richness or complexity of DBMS access. It is reasonable for a transaction to issue a request for one set of data from an IDMS database on an IBM-type mainframe running MVS, another set of data from a local Oracle database, and then merge the two sets of data together for presentation to the application. The result can give the illusion that data is stored in a unified data warehouse.
The complexity of migrating data from one platform to another should not be underestimated because this often requires changes in data representation (such as translating from COBOL "PIC 9 (12)V99S" to C++ float) and data organization (such as from IMS's network architecture to the relational architecture used on most Unix DBMSs). The ability to preserve the processing and presentation sections of existing applications greatly reduces the complexity and risks associated with rightsizing those applications.
Besides the access flexibility provided by a transaction processing monitor, there are also several performance benefits from this arrangement. The monitor is always multithreaded. Because the monitor opens its own connection to the DBMS (instead of each application process making requests directly to the DBMS) the number of concurrent DBMS users is reduced. Under this configuration the DBMS typically services a single "user," the monitor. This is particularly important when the DBMS is of the "2N" type because only one shadow process is used (for connection to the monitor) instead of one connection for each end-user process. This can reduce the context-switching overhead on the back-end system dramatically.
Transaction processing monitors can also improve performance by reducing the amount of information transferred between the DBMS and the application. This is most important when the client and the server are connected by a busy (or low-bandwidth) network.
Memory usually has the largest impact on DBMS server performance. Because access to memory is approximately 30,000 times faster than access to a fast disk (yes, it's that much faster), reducing disk I/O is important. Tinkering with other parameters is useless without enough memory. If the hardware has to be proposed without sufficient information, it is wise to overestimate the memory needed. Fortunately, it is usually easy to recover from memory configuration mistakes (you can often add a little more).
Sizing the DBMS I/O cache
Each DBMS calls its data cache by a different name, but they all perform the same function. Usually, the cache is set up as a large area of shared memory and is set by a parameter in the DBMS's control files or tables. The size needed for the DBMS disk cache varies widely across applications, but the following rules-of-thumb apply.
Practical experience with Oracle and Sybase has shown that cache areas can be productively sized anywhere from 4 megabytes to more than a gigabyte. Even the large size can be exceeded now that larger databases are coming online with datacenter-sized machines. As with any cache, increasing the cache size will eventually reach a point of diminishing returns. An estimate of data cache size might be between 50 and 300 kilobytes per user. Each DBMS offers a mechanism for reporting the efficiency of the shared data cache. Most also provide estimates about what effects increasing or decreasing the cache size will have.
Given recent prices for memory, disks, and SCSI subsystems, the break-even point is at approximately five minutes, meaning data accessed more frequently than once every five minutes should be cached in memory. Therefore, an estimate of the data cache size is the total data that the application expects to use more frequently than once every five minutes system-wide. Allow at least this much space for the data cache. Also, reserve another 5 to 10 percent for storing the top-levels of B-tree indexes, stored procedures, and other DBMS control information.
Increasing cache size sometimes results in diminishing returns. These guidelines should be used to formulate an estimate. Each of the DBMS systems provides a mechanism for finding the costs or benefits of changing the size of the cache. After the system is installed and in use, use these mechanisms to examine the effects of resizing the DBMS I/O cache. The results can be surprising.
Although the macroscopic purpose of a DBMS is to manage volumes of data that are very large (and can inevitably be much larger than main memory), decades of research has shown that data access follows a 90/10 rule: 90 percent of all accesses are to 10 percent of the data. More recent research shows that this 90/10 follows a further 90/10 rule: Within the "hot" data referenced by 90 percent of the accesses, 10 percent of that data is accessed 90 percent of the time. Thus, about 80 percent of all accesses refer to about 1 percent of the total data. Although cache hit rates of approximately 95 percent are desirable, it can be economically impossible to blindly provide an in-memory cache for 10 percent of the data. However, it is usually feasible to provide cache for 1 percent of the data, even for large databases.
Other memory requirements
The system must also provide space for traditional memory uses. Allow at least 16 megabytes for the base operating system. Then, provide 2 to 4 megabytes for the DBMS's executables and sufficient space to keep the application binaries in memory. These binaries are usually 1 to 2 megabytes, but they can reach 16 to 20 megabytes. The operating system shares binaries when they are used by multiple processes, so space for only a single copy must be reserved. Allocate space for the DBMS server code itself, depending on the general architecture of the server. For "2N" architectures, allow from 100 to 500 kilobytes per user. The multithreaded architectures require only from 60 to 150 kilobytes because they have far fewer processes and much less overhead.
As a rule of thumb, try to configure at least 64 megabytes of memory per processor. Anything less can cause excessive paging.
Processor consumption will vary enormously between applications, DBMSs, individual users, and even times of day. Larger applications result in fewer users per processor.
Oracle Financials represent one of the heaviest workloads for a variety of reasons. Financials uses many different transactions, rather than just one. A non-trivial amount of application processing must be done on the DBMS host itself after the data is retrieved from the DBMS. Also, Financials can not be operated with a transaction processing monitor.
Neither the operating system nor the DBMSs scale in a linear fashion. The scaling factor at this time is about 70 percent, meaning that doubling the number of processors results in an improvement of about 70 percent.
These numbers are based entirely on an interactive user-load. However, these workloads are usually accompanied by batch processing. Configure one additional processor to handle batch loads.
Disk I/O subsystems
As noted previously, disk accesses are nearly 30,000 times slower than memory. Therefore, the best way to optimize disk I/O is to not do it at all. Unfortunately, this is not practical. Providing sufficient I/O capacity is critical to DBMS performance.
Without understanding the customer application, the database administrator's indexing strategy, and the DBMS's storage and search mechanisms, it is difficult to say what type of disk access a given transaction will generate. It is especially difficult to quantify complex transactions that are implicit in a third-party application. In the absence of firm data, it is reasonable to assume that a transaction that retrieves a few specifically named records from an indexed table by the index key will be a random access operation.
Queries that involve a range of key values may generate a combination of random and serial access (depending on the indexing used).
Updates to the database will cause updates in the affected indexes, and the log will also be written. To explain, all this requires a thorough understanding of the specific DBMS in use, the complex nature of queries and query optimization. Estimating the disk access requirements is imprecise without knowing how the DBMS stores data (even for simple transactions).
Where queries are expected to govern the total performance of an application, consult the experts (the margin of error with this type of query can be 1000:1).
Storage capacity vs. access capacity
A common problem in DBMSs is providing enough storage capacity and not enough access capacity. The disk I/O on most servers is random. Many types of disks are available today, but their random access performance is nearly the same.
The greatest I/O access capacity is usually achieved by using many small disks on many channels instead of fewer, bigger disks. This is not always practical for storage capacity reasons, nor may there be enough SBus slots available to add additional SCSI channels for disks. It is advisable to consider using the higher access-capacity subsystems in hosts where the actual I/O load is either bursty, or is completely unknown, or where the data is small compared to the number of users accessing it.
It is usually a serious mistake to choose disks solely because of their storage capacity. Although this is not a new problem (performance and tuning analysts with mainframe experience know all about this), it is getting more serious as disk storage capacities are increasing more rapidly than access capacity.
File systems vs. raw disks
Most DBMSs allow the database administrator to choose to place the DBMS files either in raw disks or in standard Unix file systems. Storage in the file system is somewhat less efficient (by at least 10 percent) because an additional layer of system software must be used for every DBMS disk access. Since the processor power is often the limiting resource in large DBMSs, the use of raw partitions improves performance at peak load. Consequently, most database administrators usually choose the raw disk partitions. If the system is expected to be pushed to its limits (particularly in CPU usage), this might be the best choice. However, note that processor efficiency really becomes an issue only under peak load, and most systems experience peak loads only on an occasional basis.
File system storage also costs in terms of capacity. The Unix file system consumes approximately 10 percent of the formatted disk's capacity with meta information about the files and file system. Also, the file system reserves 10 percent of the remaining space to allow free space to be found quickly when extending files. The disk's capacity is reduced by a total of approximately 19 percent when the DBMS uses the file system rather than raw partitions.
Given that storage in the file system costs more in terms of both processor power and storage capacity, it seems silly to take this approach, but there are several good reasons to consider using file system storage. Most of them are for flexibility and familiarity.
The first (and probably most important) use of the file system
permits standard Unix utilities to operate on the storage. For example,
the standard Unix
can be used to reliably backup and recover the DBMS files. Manipulation
of sections of the database is much easier to accomplish (moving a
database table from one disk drive to another is straightforward even
if the disks are different in size and type).
Although each of the DBMS vendors provides their own backup and recovery utilities, they are all different, and some of them are so slow that customers often resort to using physical volume copying with all its accompanying difficulties. Storage in the file system allows uniform, reliable procedures to operate throughout the system or network (if needed, the DBMS vendor's tools can also be used).
Sometimes, using the file system permits access to optimizations the DBMS may not use. The Unix file system attempts to cluster together data into much larger physical groupings than most DBMSs. Because disk space for tables is usually pre-allocated, the file system can be successful in grouping data together in 56 kilobytes blocks, while the DBMS storage managers usually operate in 2 kilobytes pages. Serial scans of tables or indexes stored this way will often be more efficient than equivalent tables in raw partitions. If the system's operations are mostly serial scans, file system storage will provide higher performance.
Also, because raw disks are effectively not managed by the operating system, disk partitions committed to the database are nearly irretrievable if disk space is needed for a non-database purpose.
Users usually think of the DBMS as storing and retrieving their data without considering what is actually stored on the disk. In practice, a considerable amount of additional information is maintained by the DBMS software. This means it is usually a mistake to assume a given amount of user data will fit into that same amount of disk space. The database schema, table indexes, B-tree directory nodes, temporary tables, pre-allocated space for hash tables and indexes, sort space, log files, archives, and a myriad of other functions all consume disk space.
Unless more accurate information is available, it is wise to configure twice as much disk space as raw data. This provides some flexibility for creating indexes to improve performance.
Another factor that influences I/O subsystem configuration is the intended distribution of data across the disks. Even a small system should employ four or more disks, including one or more for the:
I/O resources use
In designing an I/O subsystem, you should pay attention to not only the maximum capacity of the components, but also to how much they are used (called "resource utilization" in the trade). Most numbers used to describe the capacity of resources refer to throughput. Providing the maximum throughput rating is like saying that the speed limit on the freeway is 65 mph, but if the access ramps are so congested that getting on and off takes a long time, the average speed will be less than the posted speed.
The same principles apply to the various peripherals and peripheral busses, especially with the SCSI busses. It is usually not possible to estimate what average disk or SCSI bus use will be until the system becomes operational. As a result, the most appropriate configuration is one that spreads the frequently used data and indexes across as many disks and SCSI busses as are feasible within budget and technical constraints. Rarely accessed data should be packed as tightly as possible.
Once the system begins operation, you can measure the actual disk use and move data accordingly. Data movement must be done with balance in mind.
There are usually two or more mechanisms for spreading data across drives. Each DBMS can concatenate multiple drives or Unix files to spread data access. Third-party products offer similar capabilities, beyond hot sparing and disk striping. If a table is I/O bound, investigate the queries that are causing the I/O activity. If they are random access, the disk concatenation abilities native to the DBMS are adequate to spread the load. If the accesses are serial, a third-party product is more appropriate.
The primary benefit of disk striping is that the task of spreading data access is simpler for the database administrator. With real striping, this is an easy task and always yields optimal results. This is often not the case with logically dividing data across table-spaces using a DBMS's internal mechanisms. The goal is to separate heavily used tables on separate disks, but it is impossible to resolve the conflicting requirements for combinations of access to those tables, resulting in uneven loading. In our experience, third-party products level disk use more evenly.
DBMSs divide a table into a few large segments and place the data uniformly onto those segments. The key distinction between the DBMS's concatenation and disk striping is the placement of adjacent data. When the disks are concatenated, a serial scan uses each of the component disks heavily but in a serial manner (only one disk can help service the query). True striping divides the data along smaller boundaries and allows all the disks to help service even a small request. As a result, serial access is greatly improved by striping. Archive and log files are always accessed serially and are good candidates for striping.
As databases continue to grow in size and importance, backup procedures that disable the DBMS access become a negative. Online backups can pose significant configuration challenges since backup of large volumes of data associated with databases involves very I/O-intensive activity. Online backups frequently drive disk and SCSI bus use to very high levels and result in poor application performance.
If the DBMS is operated in client/server mode, the network(s) connecting clients with the server must be sized properly. For obvious reasons, it is wise to monitor network use closely.
Even when throughput is not important, latency issues often make it both convenient and useful to supply a private, dedicated network between a front-end system and the DBMS provider.
Client/server and wide-area networks
For an increasing number of applications, front-end and back-end systems can or should be placed in geographically separate locations. Such systems must be connected by wide-area networks. Leased lines carrying synchronous serial networks are usually the media used for such networks. Although the raw media speeds are considerably lower than typically found in local networks, the nature of the serial lines is such that very high use can be sustained.
For traditional business applications, client/server traffic normally has low enough data volume between front- and back-end systems that the lower network throughput is sufficient. Often, network latency is not an issue, but if the wide-area network is particularly long or if it is carried over high-latency media (such as satellite links), the application should be tested to decide its sensitivity to packet delays.
Transaction processing monitors may be used to reduce client/server traffic to an absolute minimum.
Network I/O subsystems suggestions:
Provisions for backup
Because databases are typically both huge and critically important, backup is a critical issue. The volume of data involved is usually immense, especially compared with the size and speed of backup drives. It is simply not practical to dump a 20-gigabyte database to 4mm tape drives at 500 kilobytes/second, which will take nearly 12 hours to complete. Even this figure omits other considerations such as database consistency and availability.
When to schedule backup
Scheduling a backup for a system used primarily during normal business hours is straightforward. Scripts are often used to carry out backup after close of business. Some sites do this unattended while others use operators on overtime. Unattended backups will require sufficient backup capability online.
Planning and configuration are more difficult if the system must be online for 24 hours or if the time necessary to perform the backup is longer than the available window.
Sometimes it is necessary to do an "online backup," that is, carry out a backup while the database is still active with users still connected and operational.
Backing up a database requires that it be consistent, which all committed updates to the database have not only been logged but have been written to the database tables as well.
Online backups pose a challenge: After a consistency point has been reached and the backup starts, all database updates must be prevented from updating the database tables until after the full backup has completed (or the backups are inconsistent). Most DBMSs offer an online backup feature.
This is probably the most crucial issue for sites with databases larger than 10 gigabytes. Small databases can be backed up using a single Exabyte or DAT tape drive. Multiple devices can be used in parallel to improve throughput, but resource contention makes this less effective for more than three or four tape drives.
If a unit supporting hardware compression is not available, it may still be worth considering the use of software compression. Backup speed is typically governed by the speed of the physical tape, so any method which reduces the amount of data to be written onto the tape should be examined, especially in light of ever faster CPUs. Databases are good candidates for compression because most of their tables and rows include a good proportion of white-space (due to the use of fill factors to maintain some free space for performance reasons). Some tables may also contain text or sparse binary data fields, which compress readily.
Most users carry out a full backup daily. Given their importance, you should spend time considering recovery, too. This includes time to restore (usually from tape) and time taken to roll forward in the database to incorporate changes made to the database since the last backup. Given the importance of the roll-forward activity, it is important to mirror the journals and archivelogs, which make full recovery possible.
In an environment where many transactions are written to the database, the time required to perform a roll-forward from the most recent checkpoint may set the frequency of backups.
Monitoring and checking backups
Backups must be monitored to ensure they have completed successfully. Also, it is important to document and test the recovery procedure. After a disaster is the wrong time to discover that some key item was missing from the backup strategy.
Capacity planning summary
It's impossible to know if a new server will support a requested load because of the complex combination of hardware, operating system, DBMS, and the application. However, it is possible to make some assumptions and then perform a rough analysis of the key transactions to determine what configurations will not handle the load. Although this is useful for simple applications, it is an ineffective approach for larger, more complex applications.
Estimating considers the known limitations of parts of the proposed system configuration and then compares these with the minimum estimated demand associated with the task at hand. While it is certain (usually) that a system with one processor and one disk will not accomplish the needed application at the required speed, it is far from certain that a system with four processors and 20 disks will attain the desired performance. This is because so much of the application is usually simplified out of most system estimates.
Minimal sizing accomplishes the initial requirements of bringing up the applications. Proper (or wanted) performance is accomplished with continual monitoring (and tuning) to find how the application reacts in a particular configuration. This is an ongoing process as long as the application is to be considered "alive and well."
If you have technical problems with this magazine, contact firstname.lastname@example.org