What browser are you designing for?
Tuning your server's content for a particular
Your visitors are better served if your content matches their browser's capabilities. Learn how to read your server's agent log and discover which browsers are the most popular. After reading this you should go to "Log analysis redux", a clarification to this column. (2,000 words)
If you manage a server for any length of time and interact with your content providers, sooner or later you will encounter...
The Great Browser Debate
This debate centers around the design of the documents on your server. In one camp, we have those who want your documents to be usable by as many visitors as possible. Their arguments run something like this:
Why go to all the trouble to create a Web site only to make it difficult for people to read our documents? At best, they'll find a different browser and come back later; at worst, we'll forever lose them as a repeat visitor.
HTML is a standard of sorts. We should stick to the standard HTML constructs and make our pages work on lots of different browsers. Have you seen the number of browsers out there? Even using one nonstandard feature might alienate a huge percentage of our target market. Is something as esoteric as a scrolling marquee or frames vital to our pages? If you get right down to it, we could probably even get by without tables, too!
In the other camp, you have this argument:
Basic HTML is so restrictive as to be useless for all but the most trivial documents. Everyone else is using these extensions; we need to keep up or look so out of date that people will stop visiting. Our marketing folks want better control of the layout and appearance of our pages, and you can only do that with these new features.
Admittedly, we'll lose a few visitors, or our pages will break on some browsers. But overall, we'll look good and deliver our message. The Web is about flash and grabbing attention; you can't do that with plain old lists and limited font control.
At some point they'll all turn to you, the wise server administrator, and ask the $64,000 question: Which browsers are the most popular? Which ones are our visitors running? And that's when you smile, log in to your server, and examine your...
Server agent log
Faithful readers recall that last month, we picked apart our server's access log in minute detail. From that log, we learned who is visiting our site, when they visited, and what they looked at. What we didn't learn was the browser they used while visiting. That's because that information is not kept in the access log; it lives elsewhere.
For all NCSA-derived web servers, including the
Netscape servers, a common
logs is usually defined. This directory
contains, what else, all your log files. There you'll find your
access_log file and a file
agent_log. As the name implies, this file records
the name of the agent (or browser) that was used for each access made to
your site. Just as the server daemon writes a line to the access log
for each access, it also writes a line to the agent log.
When a browser connects to your server, it passes along a single line of text that describes the browser. There is no standard format to this string, although convention dictates that you include the browser name, a slash, and the browser version. Beyond that, all bets are off as to what you'll find in this file.
For example, here is a fairly straightforward (and common) entry in any agent log:
Mozilla/1.1 (Windows; I; 32bit)This indicates an access by a user running the 32-bit Windows version of Netscape 1.1. (The folks at Netscape like to call their browser "Mozilla").
It gets worse. Each version of Moz--err--Netscape includes the window system, version, and operating system version in it. All of these lines are also accesses by a user of Netscape 1.1:
Mozilla/1.1N (X11; I; SunOS 4.1.3_U1 sun4m) Mozilla/1.1N (X11; I; AIX 2) Mozilla/1.1N (Macintosh; I; PPC) Mozilla/1.1N (Macintosh; I; 68K) Mozilla/1.1N (X11; I; IRIX 5.2 IP22)
To make things more confusing, if a request happens to pass through a proxy server, that proxy server often tacks its identification on to the end of the version string. Thus, here are some more Netscape 1.1 accesses:
Mozilla/1.1 (Windows; I; 32bit) via proxy gateway CERN-HTTPD/3.0 libwww/2.17 Mozilla/1.1 (Windows; I; 32bit) via proxy gateway CERN-HTTPD/3.0 libwww/2.17 via proxy gateway CERN-HTTPD/3.0 libwww/2.17That last one is really neat: it went through two proxies before it hit your machine.
The agent names for Netscape are pretty short and sweet. Some of the others get really long; rummage around in your agent log if you want to see for yourself. The bottom line: making sense of this file and boiling down the information so that it is usable by your content providers will require a little bit of work. This is a job for a...
Nerdy unix hacker
If you aren't one, you'll need to find one. If you can't find one, you can use me, I don't mind.
In Unix, where the command line is still king, it takes a fairly simple sed script and a single pipeline of sed, sort, and uniq to canonicalize the entries in your agent log, sort them, and count how many times they occured. From this raw data, you can determine which browsers are being used to visit your site.
On my server, my agent log has grown to about 2.6 million entries, covering accesses back through last September or so. I chopped the file into approximately month-long sections, processed it using my script, and pulled out the top ten browsers used on my site from September, 1995, through the end of February, 1996. The results, with browser accesses converted to a percentage of the total traffic:
Well, you don't need a degree in either statistics or marketing to interpret this data. The only browser worth tracking tuning your documents for is Netscape. Other players have, at best, a small penetration into this market.
Most fascinating is the rapid growth of Netscape 2.0. Even though it was only in the beta-test phase, it was responsible for a third of the traffic by January. The gradual shift from Netscape 1.x to Netscape 2.0 is obvious. By the time you read this, Netscape 2.0 will be responsible for the majority of access traffic to my site.
Since the remaining browsers are difficult to see in the whole graph, here is top portion expanded for easier viewing:
The decline of Internet Explorer is interesting. It peaks in October, riding the tail-end of the Windows 95 marketing blitz. With the general availability of a stable Netscape 2.0 beta by November, Internet Explorer begins to fall off the chart. By February, the character-based Lynx browser is giving Internet Explorer a run for its money.
While all this data is certainly interesting, I'd be remiss if I didn't offer...
A few caveats
First and foremost, I stripped all platform and sub-version information from my log to consolidate this chart into very broad server categories. You might want to split Unix from Windows accesses, for example, to see if your customers are primarily ahead or behind the technology curve. Many browsers have several point releases with significantly different features; you could just check for those point releases as well.
More importantly, these figures are for my server only. Melmac is used primarily by people seeking advice on creating HTML documents; these people probably tend to gravitate to newer versions of the latest browsers. In short, your content attracts a certain kind of visitor who may be predisposed to a certain kind of browser. While I should certainly tune my documents to Netscape 2.0, you may find a completely different answer. Don't use my data to manage your server; take a moment to examine your logs and get the right answer for your documents.
Finally, this data changes over time. I had initially done a lump-sum assessment of my log, but realized that the introduction of Netscape 2.0 probably made things change a lot in the past few months. Make sure you periodically close out the current agent log and have your server start a new one. That way, you can process blocks of data from any given point in time, or combine the archived logs to create an aggregate view of your server. I'd recommend resetting your logs at least each month.
That, in a nutshell, is agent log interpretation. Before I close, though, let me offer a last few...
Bits and pieces
If you do take the time to process your agent log, share it with me. If I get enough logs, I'll offer up the aggregate results in a future column. For the best comparison, split your data into monthly chunks, as I have done.
Last month, I promised to cover both agent logs and referer logs. I may not be out of space, but I'm out of time, so next month we'll look at the last log your server keeps, the referer log.
Finally, a blatant plug: On April 1, my book, HTML: The Definitive Guide will be available from O'Reilly and Associates. While I cover server stuff in this column, my book covers the complete world of HTML, including the 2.0 standard and all Netscape and Internet Explorer extensions. Far from a fluffy "getting started" overview, my co-author, Bill Kennedy, and I have provided in-depth details on every HTML tag, including a few undocumented Netscape features. I'd suggest buying at least three or four copies, for the home, office, and car. They also make great gifts for all your content providers!
About the author
Chuck Musciano has been running Melmac for two years, serving up HTML tips and tricks to hundreds of thousands of visitors each month. He's been a beta-tester and contributor to the NCSA httpd project and speaks regularly on the Internet, World Wide Web, and related topics. His book, HTML: The Definitive Guide, has just been published by O'Reilly and Associates.
If you have technical problems with this magazine, contact firstname.lastname@example.org