Performance analysis of client/server applications
A step-by-step guide to troubleshooting
When you encounter slow application response times, it is often difficult to pinpoint the problem. Troubleshooting in today's complex client/server environments requires comprehensive analysis. We take you through step-by-step solutions to performance problems in Unix-based computers, databases, networks, and applications.
Client/server computing introduces both complexity and dependencies between core computing elements. No longer is the operating system or application stored locally. The client system accesses information over the network and from a multitude of systems, both on the LAN, WAN, and yes, the Internet. Poor performance on any server the client depends on or for any intermediate network will lead to poor response time. How do you find the problems and improve performance with such a complicated configuration?
Consider a client system that is dependent on the network so the operating system can be downloaded at boot time. As more nodes are added to the network and as nodes exchange more information, congestion on the network increases, leading to increased collisions and re-transmissions. Or, if the Maximum Transmission Unit (MTU) of some communications device has been changed (reduced), it takes more packets to boot the client system than before. In both situations, what users will notice is that the system is taking longer to boot. Adding more memory, or a faster processor on the client side, will not lead to any significant change in performance. Being able to correctly select the parameter that has changed or a threshold that has been exceeded is critical in identifying bottlenecks.
The key challenge that many of us face as system administrators and application developers is figuring out what parameter or factor in which system or network is preventing applications from performing optimally and consistently. You must follow a step-by-step problem solving approach, which we outline here. There are four key areas you should examine to resolve any performance problem successfully:
Timely identification of bottlenecks is key to maintaining consistent application performance. Most users are not tolerant of an application that performs inconsistently -- fast one day and slow the next. Your first objective is to make the application perform consistently. An application that used to perform consistently but has recently been inconsistent typically points to one or more resources being used to its limits.
For example, the application may be waiting on such system resources as:
The system environment includes two areas:
The client/server application executes, at a minimum, on two systems. Typically, associated with any client/server application are three critical systems:
The configuration of all systems must be clearly defined. The important configuration elements are:
System related parameters are summarized in the following table.
|System Elements||Things to consider|
|1. Client System Configuration||What is the processor on the client system? Intel 80486, Pentium, Digital Alpha, MIPS? What is the clock speed? 66 MHz, 120 MHz?|
|2. Operating System Server Configuration||What is the processor on the operating system server? Intel 80486, Pentium, Digital Alpha, MIPS? What is the clock speed? 66 MHz, 120 MHz?|
|3. Database Server Configuration||What is the processor on the database server system? Intel 80486, Pentium, Digital Alpha, MIPS? What is the clock speed? 66 MHz, 120 MHz?|
|4. Installed Memory/Maximum Memory on Client System||To determine if the system is configured optimally for the client/server and other applications.|
|5. Installed Memory/Maximum Memory on Server System||To determine if the system is configured optimally for the client/server and other applications.|
|6. Installed Memory/Maximum Memory on Database System Server||To determine if the database system is configured optimally for the client/server and other applications.|
|7. Virtual Memory/Maximum Virtual Memory on Client System (if applicable)||To determine if the system is configured optimally for the client/server and other applications.|
|8. Virtual Memory/Maximum Virtual Memory on Server System (if applicable)||To determine if the system is configured optimally for the client/server and other applications.|
|9. Network Interface on Client System||Ethernet, Token Ring, and/or FDDI.|
|10. Network Interface on Operating System Server||Ethernet, Token Ring, and/or FDDI.|
|11. Network Interface on Database Server||Ethernet, Token Ring, and/or FDDI.|
You need to accurately determine and state what components of the client/server application run on which systems. For example, it could be stated that in the case of a Powerbuilder application, the Powerbuilder client application runs on the enduser PC while the Powerbuilder (server) executable may run on a file server. Both the Powerbuilder client application and the Powerbuilder (server) executable run at the branch (remote) office that supports its own LAN.
On Unix systems, you can use the following commands to provide information on how the system is performing:
iostat command provides I/O statistics such as
transfers per second, bytes per second and milliseconds per seek. By
default, information is averaged since the system was booted. The
vmstat (Virtual Memory Statistics) provides information on
virtual memory, disk access, and CPU utilization. The results provided
by the command are averaged since the system was booted (initialized).
To get information on peak system activity, which may indicate
potential system bottlenecks, specify an interval as an argument to the
vmstat command. For CPU statistics specify an interval of
about two seconds, while for disk statistics specify an interval of 60
vmstat command sleeps for the interval
defined. Typically, if the CPU idle time is greater than 20 percent
then it implies the system is either I/O bound or memory bound.
CPU idle time includes the time:
Examine the output under columns for:
r: Provides information on jobs that are currently runable. If this number is high then it implies that the CPU is forced to switch between runable jobs -- may indicate that the system is CPU-bound.
b: Provides information on jobs sleeping at negative priority typically because a process is waiting for disk, tape, or other resources. If this number is high and CPU idle time is high then it could indicate that the system is I/O bound.
w: This field specifies the number of jobs that executed in the last 20 seconds and have now been swapped out. If this field is nonzero then it implies that the system may not have sufficient memory.
ps command with options such as
a BSD Unix system and
ef on SVR4 provides useful
information on processes running on the system. Check the information
Identify the processes that are consistently the highest users of CPU and/or memory. Note that the memory information provided relates to physical memory and does not include the memory used by the kernel or the instruction segment for each process. Examine the STAT column, and if the process is ever in the RW state it implies that the system is experiencing memory problems -- specifically a shortage of memory. RW implies that the system swapped out a process that was either running or had run recently.
uptime command provides useful information on:
The kernel maintains information on the averages of the count of active jobs in the system for the last one, five, and 15 minutes. The first load number provides information on the current CPU load. If the number is greater than four the system may be CPU-bound.
Execute the command
pstat -T (use the
option). This provides information on the amount of free swap space.
On Sun multiprocessor systems, note these two commands:
psrinfo -v command provides information on the type
of processor that you are using and its status. The
mpstat command provides information that is similar to the
vmstat command. Note the
smtx field which
counts the number of times the kernel attempted to acquire a semaphore
for exclusive usage and the request was denied. The request is denied
if the another CPU is holding on to the same data structure, thus
indicating that there is contention for the same resource. A high
number implies that a CPU was forced to wait for another CPU to release
man pages on your Unix to determine how to
interpret the data correctly in various fields provided by commands
discussed in this article. Field names and output format vary from one
flavor of Unix to another.
Also, consider using the public domain utility,
top. top displays and updates information about the top 15 processes
on the system.
There are significant differences, from a network perspective, between running a client/server over a Local Area Network (LAN) and running the same application over the Wide Area Network (WAN). The impact of network latency is more pronounced on WANs. This is primarily due to the fact that typical WAN segments operate at 56 or 64 kilobits per second while most LAN segments are 10 megabits per second. Propagation delay and delays introduced as a consequence of routers processing packets will impact the performance of the client/server application. These delays may be more pronounced on WANs than LANs.
In the network area, the performance of an application may be impacted by the following:
Network related parameters are summarized in the following table.
|Network Elements||Things to consider|
|1. Protocol Stack -- Single or Multiple||Is all communication between the client and server applications over a single protocol stack such as TCP/IP or are multiple protocol stacks involved?|
|2. Protocol Stack on Client System Segment||Examples are: TCP/IP, Novell's NetWare (IPX/SPX), AppleTalk, SNA, and/or DECnet.|
|3. Protocol Stack on Server System Segment||Examples are: TCP/IP, Novell's NetWare (IPX/SPX), AppleTalk, SNA, and/or DECnet.|
|4. Data Rate -- Client System Segment||Examples are: Ethernet 10 Mbps, Token Ring 4 Mbps or 16 Mbps, or FDDI 100 Mbps.|
|5. Data Rate -- Server System Segment||Examples are: Ethernet 10 Mbps, Token Ring 4 Mbps or 16 Mbps, or FDDI 100 Mbps.|
|6. Data Rate on WAN (if applicable)||Examples are: Frame Relay 56 kbps, 256 kbps, T1 1.5 Mbps, T3 45 Mbps, or ATM. Are there other factors that impact performance? For example, Frame Relay CIR.|
|7. Average Utilization on Client Segment (business hours only)||To determine how the LAN segment, to which the client system is connected, is performing.|
|8. Average Utilization on Server Segment (business hours only)||To determine how the LAN segment, to which the server system is connected, is performing. If the load on the server segment is consistently high then that could be a factor that impacts the performance of the client/server application - this is even if the application itself does not place a significant load on the network.|
|9. Average Utilization on WAN Segment (business hours only)||To determine the load on the WAN.|
|10. Dominant Protocol on Client LAN Segment||Is it primarily IPX/SPX or TCP/IP? Within the protocol stack, which protocol is seen the most on the client LAN segment? For example, if TCP/IP is the dominant protocol stack then is it NFS, XWS, NIS, RIP, SNMP or some other protocol that generates the most packets on the network?|
|11. Dominant Protocol on Server LAN Segment||Is it primarily IPX/SPX or TCP/IP? Within the protocol stack, which protocol is seen the most on the server LAN segment? For example, if TCP/IP is the dominant protocol stack then is it NFS, XWS, NIS, RIP, SNMP or some other protocol that generates the most packets on the network?|
|12. Dominant Protocol on WAN Segment||Is it primarily IPX/SPX or TCP/IP? Within the protocol stack, which protocol is seen the most on the WAN? For example, if TCP/IP is the dominant protocol stack then is it NFS, XWS, NIS, RIP, SNMP or some other protocol that generates the most packets on the network?|
|13. Routing Protocol||Examples are: RIPv1, RIPv2, OSPF, IGRP.|
|14. Routing Tables Exchanged Dynamically?||Example: Is the routed (or some other routing process) daemon running on the system? Is the route command used to define static routes?|
|15. Number of links (hops) between client and server systems.||To determine impact of network latency on the performance of the application. Is there a way to reduce the number of hop-counts between the client application and the server application? If yes, is that consistent with the network architecture? If it is not consistent with the network architecture then what are the network issues involved? Typically, what is the latency in the router to process a packet?|
|16. Client System Router CPU Utilization||To determine if the performance of the application is impacted due to a busy router. Router is so busy it is unable to keep up with the number of packets that it needs to process.|
|17. Server System Router CPU Utilization||To determine if the performance of the application is impacted due to a busy router. Router is so busy it is unable to keep up with the number of packets that it needs to process.|
|18. Client System Router -- LAN Segment MTU||Verify that the Maximum Transmission Unit (MTU) is set to the highest value defined by the LAN technology in use.|
|19. Client System Router -- WAN Segment MTU||Verify that the MTU is set to the highest value defined by the WAN technology in use.|
|20. Server System Router -- LAN Segment MTU||Verify that the MTU is set to the highest value defined by the LAN technology in use.|
|21. Server System Router -- WAN Segment MTU||Verify that the MTU is set to the highest value defined by the WAN technology in use.|
To summarize, the following information is key to determining if the network is the bottleneck:
On Unix systems, you can execute the following commands to provide information on how the system is configured on the network and how the system is using the network:
ifconfig command provides information on the IP
address(es), subnet masks, and broadcast addresses used by the network
interface(s) on the system. Verify the MTU value for each network
interface in the output of the
ifconfig command. The
netstat -a command lists the state of all network
netstat -s provides information on the
number of IP, TCP, UDP, and, most importantly, ICMP packets processed by
the system. Verify the types of ICMP messages, especially Source
Quench, Redirect, and Time Exceeded -- these typically imply some type of
network or communication device problem. The
command provides routing table information.
nfsstat command provides information on NFS server
and client performance. Examine the server portion of the output when
you execute the command on the NFS server (or execute
-s); likewise, look at the client portion of the
nfsstatoutput if the command is executed on the NFS client
nfsstat -c). In general, if NFS is used
significantly in the environment, consider using an NFS write
accelerator product, such as the Legato Prestoserve. One of the major
bottlenecks in NFS performance is synchronous writes. With a product
such as Prestoserve, the write requests are written into a
battery-backed RAM buffer, and an immediate ACK is sent to the client.
The requests are then written, at a later time, on the server.
To determine how the database performs, you need to address the following areas:
Database related parameters are summarized in the following table.
|Database Elements||Things to consider|
|1. Software||Describe the database software. For example, Sybase System 10.0.2 with three database engines running. Database engine 0 is responsible for all network I/O while engine 2 and 3 process all queries.|
|2. Data Characteristics||Is most of the data accessed by users read-only in nature? Are some screens (scripts) read-only (such as reports) while others read-write (such as a new loan order)?|
|3. Database System Architecture||Is the database system architecture centralized or distributed? Why? Based on the requirements of the application does it make sense to configure database replication servers? If a significant number of transactions require read-only access to data then configuring database replication servers may help improve application performance.|
|4. Single Processor vs. Multiple Processors||Is the database able to evenly distribute load between multiple processors? Is the processing of database requests symmetric or asymmetric? For example, if there are multiple database engines then can any engine service network I/O and database queries or are there limitations? Does the database administrator have control over the work executed by each database engine (process). For example, database engine 0 may be reserved to only process network I/O, while two additional database engines process database queries. Does the database vendor recommend how to effectively utilize engines on a multi-processor system? What are the merits of using a multi-processor system with multiple database engines versus multiple database server systems (central server with replica servers)?|
|5. Processing -- front-end vs. back-end||Should some parts of the application be written as stored procedure and others as scripts generated by software such as Powerbuilder? How is that determination made?|
|6. Performance||Execute the SP_WHO command on a consistent basis to determine the state of the database engines. Typically, how many users are logged in? As far as the database is concerned what are most users doing? For example, are there a large number of SELECTs? Do the SELECTs last for a long time? What event in the application is forcing the database to spend considerable time processing information? Can it be justified? Are there alternatives available in either the application design and coding or in the way the database is configured that will reduce the load on the database and the system? Execute the SP_LOCK command on a consistent basis to determine pages that are locked. Typically, how many pages are locked? Who are the users whose pages are locked? What is the lock type? How long do the locks exist?|
To determine how the applications perform, you need to address the following areas:
Application related parameters are summarized in the following table.
|1. Software||Describe the software that was used to develop the application. For example, Powerbuilder version 4.0.3 may be required on the Novell file server and the DLLs are downloaded to the client PC per Powerbuilder application login.|
|2. End User Application Interface||Describe how the end user invokes or gains access to the application. Identify the systems involved in this initialization process.|
|3. Number of different modules, routines or screens in the application.||Describe the application in terms of each of it's elements. Each element (routine) may use the network or system differently and it is imperative that each module be analyzed individually.|
|4. Processing -- front-end vs. back-end
||If the developer has a choice to write a module that requires:
|5. Module Execution and Systems||For each module we need to determine what part of the module executes on which system.|
|6. Version Control System||Name the version control system in use. Define the methodology to introduce version changes.|
|7. CPU Utilization||It's important to characterize the CPU utilization on each system (client and server) for each module. If a given module requires significant CPU resources on a given system then, it needs to be determined if there is any change that may be made to the application to provide the functionality and to improve the utilization of CPU resources. For example, when does it make sense to use an SQL GROUP clause versus the SQL WHERE clause? In a situation where computation between fields is not required then a WHERE clause may seem appropriate. This may reduce the CPU load on the database server hence improving the response seen by the end user. The objective is to question the impact of one or more SQL calls versus another set of SQL calls or the usage of system calls and functions on the load on the system.|
|8. Network Utilization||Need to characterize the network utilization of all segments (client LAN, WAN, server LAN) for each module. Need to further determine:
The last word
Before an application, or a new version of an application, or any enhancement that is made to the application is moved to a production system, it is important that some tests be executed to verify the impact of the change in the application on various system elements. Understanding how the application performs with the changes introduced is critical because the application is competing with users and other applications for utilizing CPU and network resources. How the new release of the application functions will determine not just the performance of the client/server application but that of other applications on the network.
Performance testing must be a key element in the process of moving an application from development to a production environment. The tools to analyze system, network, database, and application-related entities must be in place to effectively determine bottlenecks and recommend solutions.
About the author
Uday O. Pabrai is an industry expert providing solutions in the areas of Internet, intranet and TCP/IP architecture, infrastructure and deployment. His clients include Fortune 1000 and U.S. Government agencies such as Microsoft, AT&T, CBOE, Landis & Gyr, Norwest Mortgage, and Thomas J. Lipton. His articles have appeared in several publications. Reach Uday at email@example.com.
If you have technical problems with this magazine, contact firstname.lastname@example.org