Sizing up your Web server
Capacity planning for Web servers boils down to 3 primary considerations. We tell you what they are so you can more accurately estimate Web server load and configuration
Capacity planning for Web servers has become a hot -- and critical -- issue as Web-based service transactions have outrageously multiplied. We help you cut through the chaos by narrowing your most important planning considerations down to three elements. (2,300 words)
There are really three primary considerations in sizing Web servers: the amount of user demand, the capability of the networking infrastructure, and the nature of the content to be provided by the server. Notably absent from this list is the capability of the server, which typically turns out to be much greater than demand, especially when considered in the context of realistic content and network infrastructure.
The world's most popular Web server is presently home.netscape.com, which is hit more than 120 million times a day. This load is spread evenly across pretty much all 24 hours of the clock, so this rate corresponds to a sustained load of about 1500 hits/sec. The peak rates are observed to be about double the sustained load, or about 3000 hits/sec. Because of the number of hits on these page, the size of the images is kept under strict limits, and the average hit size is about five kilobytes (KB). (Remember this when we get to network infrastructure shortly.) This hit rate corresponds to an aggregate data rate out of the site of about eight megabytes (MB) per second. Only a small proportion of the requests invoke any server function other than simple data transmission.
|1 Vanity domains are Web site aliases that map to the same server or server complex. For example, a company with several well-known brand names might have a Web site for each brand, even though a single server or server complex may handle requests destined for all these sites.|
More typical Internet presence sites for Fortune 500 companies sustain 500,000 to 4,000,000 hits per day, commonly spread across a number of vanity domain names.1A sustained rate of a million hits per day corresponds to a sustained rate of approximately 12 hits/sec and peak rates of about three times that (lower load levels are somewhat more variable than very high loads such as that experienced by Netscape). Compared with Netscape's load, most Fortune 500 systems see a far higher proportion of script execution, primarily on behalf of search engines, forms processing, and e-mail requests. Smaller companies typically see correspondingly smaller hit rates, and of course home pages for individuals are even less frequented. Busy pages in this category see 100 to 1000 hits per day, which is negligible. However, sites like these are customarily hosted by an ISP, which offers Web hosting for a flat fee. These servers often host hundreds or thousands of "vanity" Web sites on a single system. But even these somewhat hyperbolic numbers yield mundane loads: even if a server hosts 10,000 sites each averaging 1,000 hits per day, the aggregate sustained load amounts to only 115 hits/sec.
The typical intranet site carries even less load than external Web sites. These systems typically sustain one to 10 hits per minute, even when serving large populations. Within Sun there are several sites servicing 3,000 to 5,000 users, all sustaining less than 10 hits per minute. This sounds incredibly low, but consider that 10 hits per minute from a population of 5,000 users means that about one out of every eight users uses the Web site every hour. A rate of ten hits per minute from a 1000-user population means that more than half of all users make a request every hour. Unless the user's job is to use the Web site, it's unlikely that requests are even this frequent.
Note that while we discussed macroscopic long-term average loads, actual usage is quite different on a microscopic basis: it's much more bursty. Rather than one user in eight submitting a request every minute, much more likely is a situation in which one user in twenty submits 15 requests in a minute and half, followed by complete silence for a significant period. Requests to size Web servers are often expressed as "for N users," where the specified users are really the entire population of users who might reasonably submit a request to the server. As seen here, this type of specification isn't very useful, because Web usage tends to be much less evenly distributed in time than most traditional services.
The net effect of all this is that usage is lower than one might expect. While mid-range SPECweb96 scores are averaging more than 2,000 hits/sec and high-end scores are approaching 4,000 hits/sec, typical heavy-use Internet servers are sustaining peak loads of less than 20 percent of these ranges, and intranet servers are even lower. Of course, Web requests can make very different demands on their servers, and we'll return to this issue.
The next issue to be considered is the amount of network bandwidth consumed by each request. As best as is known, the average HTTP get operation retrieves about 13 KB. This figure is a balance between much smaller text retrievals and typical image retrievals, which can often reach 30 to 90 KB. Think back to the Web pages you've surfed recently. Most images are pretty small, because a 90-KB image takes about 45 seconds to download on a 28.8-Kb modem link. Most servers have quite limited network bandwidth. For example, most intranet servers have a single 10BaseT network connection, meaning that they are necessarily capable of at most 1 MB/sec throughput. Servers are thus limited to about 1 MB / 13 KB = 77 operations per second per 10BaseT network. A pair of 100BaseT networks can handle up to about 1500 ops/sec.
Proxy servers make even greater demands on network infrastructure, since they are both servers and clients. The function of a proxy server is to interpose itself between clients and the servers that own the actual content, usually also providing a caching function. For requests to pages in the proxy's cache, network demand is the same as for a standard server. But for requests that are not in the cache, the proxy must function as a client by making a request of the real server, then functioning as a server by sending the (now cached) page to the client that made the original request. For these requests, network demand is double that of a non-proxy server. The proportion of proxy requests that are satisfied by a cache depends on the diversity of page requests and on the size of the proxy's cache. Generally speaking, caching proxy servers are usually able to satisfy only about 20 percent of requests from their caches. (As with most other aspects of Internet usage, only limited usage data is available.)
Servers dedicated to Internet service are even more restricted. Most Internet servers connect to the external world using T1 lines at 1.5 Mbits/sec, with larger sites being equipped with fractional T3 links at 45 Mbits/sec. These may sound pretty quick, but even a full T3 line yields only about 5 MB/sec. Running at 100 percent utilization (not recommended!); this permits about 400 ops/sec.
Even servers at Internet service providers (ISPs) are often bandwidth limited. Most ISPs have shared connections to the Internet backbone running at OC-3 (155 Mbit) or occasionally OC-12 (622 Mbit) speeds. Although both ATM and many common 100BaseT media are capable of full-duplex operation, this doesn't help a Web server, since the data flow is almost entirely unidirectional -- from server to client. When planning a Web server, there's no point in configuring a server with capability much greater than the available network bandwidth, especially after considering the type of content to be provided by the server.
The third major consideration is the type of operations that will be requested of the server. The most obvious requests are the transmission of images and blocks of text. This is the data that is found on virtually every first page in a site and which forms the basis of virtually every page. This type of page is usually called "static HTML."
When using the HTTP 1.0 protocol that is the most commonly used today, the client must create a connection -- a socket -- to the server for every request. The server sends the requested data over the socket, whereupon the connection is closed. Unfortunately, this isn't a very efficient way to do business because creating and destroying a socket is a much more expensive operation than actually retrieving and transmitting the data. In fact, for most operating systems, managing the socket is more than twice as costly as sending the data! For Solaris 2.6, socket management is about three times as expensive as retrieving a 13-KB page from a file and sending it over a TCP/IP connection. The newer HTTP 1.1 protocol recognizes this issue and defines a new operating mode called keep-alive; in this mode, the client and server maintain a single connection for a number of requests, drastically reducing the CPU cost of servicing static HTML requests. Unfortunately, the most common Web servers don't implement this protocol yet, so the world runs on HTTP 1.0.
SPECweb96 and most other common Web benchmarks (WebBench, WebStone, etc.) test only static HTML. As a result, they tend to measure the efficiency of the server's operating system at creating and destroying sockets as a first-order effect. Data retrieval and transmission is very much a secondary consideration. Before considering how to interpret benchmark results, let's continue our look at content types.
Another type of content that is often referred to as "static" is an HTML form. The term "static" is probably a misnomer in this context. The form itself is certainly static, but the data that is being manipulated is just as certainly not static. To handle the form input, the Web server must invoke a program, either through the server's application programming interface (API) or more commonly through the specification of a script that handles the data.
Thus the Web form is an instance of a much larger class of Web request that might best be called "executable services." Most often these requests are shell or batch files, necessitating that the Web server create a new heavyweight process to service the request. For current operating systems such as Solaris or NT, process creation is far more costly than even creating or destroying a network connection, sometimes by an order of magnitude. The first generation of Web servers forked a process to handle every request, with the result that even well-configured servers could handle only 20 to 30 requests/sec.
To add insult to injury, for interpreted scripts such as shell scripts or Perl programs, executing the script can be even more expensive due to the CPU-intensive nature of interpreted languages.
Finally, the program that is to service the request must run, and this also normally consumes significant CPU time.
Examples of what we're talking about here are the familiar search engine request and service-oriented requests such as the route-generation and map-producing programs found at http://www.mapquest.com. Each of these requests invokes an external process to do the search or generate the map. For searches that address a large amount of data, the service process can consume a very large amount of resources, usually by far the majority. For example, one server at Sun provides search access to a large body of technical material and receives about 30 hits per minute during peak usage periods.
Judging from SPECweb96 results, this machine should be loafing along at less than one percent utilization, yet measurements show that it consumes two or three 85-MHz SuperSPARC processors during peak periods. The reason for the enormous disparity is that virtually every request is a search operation that consumes five to eight CPU seconds! Besides search engines (one of the most familiar), common examples of service processes are ad-rotation programs, database query programs, and ftp servers.
This situation is extremely common, if not ubiquitous. Most Web servers spend about five percent of their resources running the Web server and 95 percent executing CGI-bin scripts or other service processes. Sizing such systems becomes more of an exercise in sizing a system to handle the service processes, with a small amount of overhead associated with the Web server itself. One other issue that is rarely considered is the impact of encryption. When using the Secure Sockets Layer (SSL), some users have observed the CPU overhead of sending and receiving encrypted requests to be as high as 100 to 200 milliseconds per request, easily overwhelming any other processing. This overhead varies widely by implementation, key length and other factors, but it is always expensive. By design, longer keys take much more computing resource to decrypt, making them less vulnerable to attack by repetitive means. Unfortunately, this also means that legitimate users pay a substantial cost for security.
When sizing a Web server, remember that the common Web benchmarks don't normally test service processes or the overhead necessary to invoke them. SPECweb96 and WebBench don't address this issue at all, WebStone has some facilities to test this, but they are rarely used (if ever).
Planning a Web server for capacity involves considering quite a few non-traditional things. The user population can't be judged in quite the same way, and network bandwidth is commonly the most significant hardware bottleneck. And finally, the common benchmarks exercise only a small fraction of the services performed by most Web servers in the real world. By keeping these considerations firmly in mind, it's possible to accurately estimate Web server load and configuration.
About the author
Brian L. Wong is Chief Scientist for the Enterprise Engineering group at Sun Microsystems. He has been a data processing consultant and software architect and has held other positions at Sun. Reach Brian at firstname.lastname@example.org.
If you have technical problems with this magazine, contact email@example.com