Click on our Sponsors to help Support SunWorld

Inject compelling and up-to-the-minute data into your Web site with server-side includes -- Part 2

2 quick commands add dynamic content to your site and satify users' needs

December 1996

Abstract

Get to the real heart of server-side includes. These executable programs generate your documents and turn them into living resources on the Web. (1,950 words)

Mail this
article to
a friend

Last month, we started an investigation into the mysteries of server-side includes -- text inserted into a document by the server when the client makes a request. Server-side includes make static documents dynamic and compelling, injecting up-to-the-minute data into a page to satisfy a user's needs.

Before diving into the second half of our exploration, here's what we know so far:

Server-side includes are enabled by adding the includes parameter to the options directive within the <directory> tag in your server's access.conf file.
Includes are inserted into your documents using a special form of the HTML comment that begins with <--# and ends with -->.
Immediately after the <--# sequence, you can use one of six commands to place text into your document. So far, we've covered
- fsize to include the size of a file
- flastmod to include the last modification date of a file
- echo to insert the value of any of several environment variables defined by your server
- include to copy the contents of a file into your document

If any of this seems vague, take a moment to reread the previous column.

This month, we'll cover the remaining two commands: config and exec. We'll conclude with some applications of executable server-side includes and leave the door open for more experimentation on your part.

Customized display formats
The fsize and flastmod commands display the size and last modification date of a file in a standard format. An include like this

     Size is <!--#fsize file="document.html" -->.
     Modified on <!--#flastmod virtual="/somedir/other.html" -->.

results in this text being delivered to the client:

Size is 1K. Modified on Sunday, 29-Sep-96 21:50:58 EDT.

You can control how the size and time is presented with the config command. This command accepts three parameters:

errmsg: The value of the errmsg parameters is a string to be passed to the client if some error occurs during include processing. This string is also written to your server's error log. Since includes don't normally fail, this parameter is usually most helpful when debugging your includes.
timefmt: This parameter defines the format used to display times and dates. It uses the same format encoding as the Unix strftime() system call. In this format, a percent sign followed by a special character inserts an element of the date and time into the text, and all other characters are passed through unchanged. For example, "%m/%d/%y" displays the date as something like "12/1/96." To see all the possibilities available on your system, check your system's man page for strftime().
sizefmt: This parameter can be given one of two values. Using bytes displays sizes as an exact byte count, while abbrev displays sizes as kilobytes (suffixed with "K") or megabytes (suffixed with "M").

It's a good idea to put the config command at the beginning of your documents, so that all subsequent includes will use the correct formats. A complete command might look like this:

     <--#config sizefmt=bytes timefmt="%H:%M, %m/%d/%y" errmsg="Oops!" -->

The time and size formats affect the results of the fsize and flastmod commands, along with any size- or time-related variables used with the echo command.

Enabling executable includes
Finally, we get to the meat of the matter -- executing commands to generate your included text. If you enabled server-side includes by adding the includes parameter to the options directive in the <directory> tag in your httpd.conf file, you are ready to use executable server-side includes. If you used the includesNOEXEC option, you'll need to replace it with includes before continuing.

Why might you enable server-side includes but not executable includes by using includesNOEXEC? Security, of course. When programs referenced by the include are run, they run with the user-id of the http server. On a properly configured system, this is the user nobody, who has no special system privileges. On many systems, though, the server runs as root, which has unlimited access to the system. If a Web author installs a potentially damaging program and invokes it via a server-side include that runs as root, all sorts of damage could result. Before enabling executable includes, make sure your http server (or more correctly, the processes in your child server pool) is running as nobody.

Once you've got things configured and enabled, you can add an executable include to your documents with the exec command. This command accepts one of two parameters:

cmd: The value of the parameter is a conventional Unix command line that is passed to /bin/sh for execution. Any output from the command is placed into the document, replacing the server-side include.
cgi: The value of the parameter is a virtual path rooted in the server's CGI directory. The reference program is executed, and its output is placed in the document.

In general, you should use cmd instead of cgi. Most CGI programs output a number of http headers before getting to the actual results, and these headers will be placed in your document. You could place conventional server-side include programs in your CGI directory, of course, but then they are potentially invokable by users of your site. Since programs referenced by cmd do not suffer any of these problems, you'll find it easier to use cmd.

Advertisements

Tools at your disposal
The very first command you need to execute is the env command. There are two reasons. First, if it works, you'll know you've got things working correctly. Second, when it does work, you'll see all the environment variables at your disposal in your real include programs.

On my server, I inserted

     <pre>
     <!--#exec cmd="env" -->
     </pre>

into a text page and loaded it into my browser. The result was

     DATE_GMT=Friday, 01-Nov-96 04:26:19 EST
     DATE_LOCAL=Thursday, 31-Oct-96 23:26:19 EST
     DOCUMENT_NAME=test.html
     DOCUMENT_PATH_INFO=
     DOCUMENT_ROOT=/usr/local/http/docs
     DOCUMENT_URI=/test.html
     HTTP_ACCEPT=image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
     HTTP_CONNECTION=Keep-Alive
     HTTP_HOST=melmac.corp.harris.com
     HTTP_USER_AGENT=Mozilla/3.0 (Win16; I)
     LAST_MODIFIED=Thursday, 31-Oct-96 23:26:07 EST
     PATH=/usr/sbin:/usr/bin
     REMOTE_ADDR=137.237.110.237
     REMOTE_HOST=isdn237.corp.harris.com
     SCRIPT_FILENAME=/usr/local/http/docs/test.html
     SERVER_ADMIN=chuck.musciano@harris.com
     SERVER_NAME=melmac.corp.harris.com
     SERVER_PORT=80
     SERVER_SOFTWARE=Apache/1.0.0
     TZ=US/Eastern

These are the environment variables available to your server-side include programs. For the most part, you'll be using these to generate the custom text you'll be inserting into your documents. For example, the DOCUMENT_ROOT and DOCUMENT_URI together provide the complete pathname of the document invoking the include. In my example above, that path would be /usr/local/http/docs/test.html. The REMOTE_ADDR provides the IP address of the client machine, and the REMOTE_HOST yields that machine's name, if known. The HTTP_USER_AGENT tells you the browser being used so you can customize your document accordingly. (As we've covered in previous columns, don't be fooled by a browser identifying itself as "mozilla." There may be additional stuff later in the string that tells you it is actually Internet Explorer masquerading as Netscape).

Of course, you can generate results from an include that have nothing to do with these variables. But in reality, your pages are probably being customized to meet the needs of clients, and knowing who they are, their browser, and the document they requested will help you tailor the output to their needs.

A simple example: Access counters
A common problem that is easily solved with an include is that of creating an access counter. You see these counts everywhere, boasting the popularity of a given page. Inserting an access count into a document is easy:

     <!--#exec cmd="/usr/local/bin/access_counter" -->

Of course, you have to write the access_counter program, but that's pretty easy as well. Here's the general algorithm:

Derive the document pathname using DOCUMENT_ROOT and DOCUMENT_URI
Look up this path in a database
Increment the counter in the appropriate database record
Output something like "1,024 accesses"

Simple! In fact, I've already written this, and my access counter program is free for the asking if you'd care to try it.

If you write one yourself following this algorithm, you'll soon discover that it doesn't always work, based upon the load on your server. In fact, two crucial steps are missing:

Derive the document pathname using DOCUMENT_ROOT and DOCUMENT_URI
Obtain exclusive access to the count database
Look up this path in a database
Increment the counter in the appropriate database record
Release exclusive access to the count database
Output something like "1,024 accesses"

It's easy to forget when writing an include program that several copies of your program may be running at once. If your server is popular, it is not uncommon for several clients to request the same document at the same time. As a result, you must make sure that any file or database access within your programs are appropriately serialized to avoid race conditions and other ugly bugs.

Getting bolder: Client-driven content
Once you've mastered simple data insertion with your programs, you might want to move up to customized content based upon the client. A common application of this is to provide visitors from one domain one page, while visitors from other domains get another page. For example, visitors from your company's domain are shown your intranet pages while outside visitors are shown your Internet pages.

While a full-blown implementation of this probably requires perl or a C program, a quick hack using nothing more than a shell script will illustrate the concept. To create a document with content depending on the location of the client, put this line, and nothing else, in the document:

     <!--#exec cmd="/usr/local/bin/check-domain domain internal.html external.html" -->

Replace domain with the actual domain for which you are checking, less the trailing com, edu, or org, and replace internal.html and external.html with the names of the documents that correspond to the internal and external versions of this page.

The check-domain script looks something like this:

   #!/bin/sh

   domain=`echo $REMOTE_HOST | awk -F. -e '{print $(NF-1)}'`

   if [ $domain != $1 ]; then
      cat $3
   else
      cat $2
   fi

   exit 0

awk uses the "." as a token separator and prints the next to the last token of its input, which is the REMOTE_HOST variable made available to the program by the http server. Since all domains end in some generic group like com or org, the next to the last component of the name is the one you want to check. If everything works correctly, the domain variable will be set to something like sun or apple.

The script then compares the actual domain with the required domain. If there is a mismatch, the external page is copied into the document. Otherwise, the internal page is copied to the document.

I'm positive you perl wizards could make this much fancier and robust. I'd appreciate hearing about your efforts, keeping in mind that my offering here is nothing more than a simple example.

Considering the possibilities
Once you get the hang of server-side includes, you'll wonder how you lived without them. From simple boilerplate to complex database queries and security validations, server-side includes bring life to any document on the Web. Get cracking on livening up your site with these wonderful tools, and I'll see you again next month.

Click on our Sponsors to help Support SunWorld

Resources

"Collecting and using server statistics"
/sunworldonline/swol-03-1996/swol-03-webmaster.html
"What browser are you designing for?"
/sunworldonline/swol-04-1996/swol-04-webmaster.html
Apache
http://www.apache.org/
Netscape
http://www.netscape.com/
Chuck Musciano's sed script
http://members.aol.com/htmlguru/agent_log.html melmac, columnist Chuck Musciano's server http://melmac.corp.harris.com/ HTML Guru Home Page, columnist Chuck Musciano's other server http://members.aol.com/htmlguru/
Yahoo's log tool references
http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/HTTP/Servers/Log_Analysis_Tools/ analog http://www.statslab.cam.ac.uk/~sret1/analog/
NCSA httpd documentation
http://www.tesre.bo.cnr.it/docs/Overview.html
"Watching your Web server"
/sunworldonline/swol-03-1996/swol-03-perf.html HTML: The Definitive Guide http://www.ora.com/www/item/html.html
More on Web servers
/sunworldonline/common/swol-siteindex.html#webperf
A list of other Webmaster
/sunworldonline/common/swol-backissues-columns.html#webmaster

About the author
Chuck Musciano has been running Melmac and the HTML Guru Home Page since early 1994, serving up HTML tips and tricks to hundreds of thousands of visitors each month. He's been a beta-tester and contributor to the NCSA httpd project and speaks regularly on the Internet, World Wide Web, and related topics. His book, HTML: The Definitive Guide, is currently available from O'Reilly and Associates.

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-12-1996/swol-12-webmaster.html
Last modified:

Comments:
Name:
Email:
Company Name: