Weeding your Web site -- part 2

A checklist of what to do to keep your Web site healthy and uncluttered

November 1997

Abstract

In this second of two columns on how to keep your Web site "healthy," we delve deep into "roots" of your Website -- the crucial parts that may not be seen by your visitors. We give you some simple tips on root-level design and creation that will save you some serious headaches as your Web site grows. (2,000 words).

Mail this
article to
a friend

ast month I wrote that these two columns had been prompted by a co-worker of mine who noted that an internal site at our company had, "grown like a weed when we wanted a topiary."

Web sites are a lot like gardens, often starting small and growing large. Somewhere along the way, we lose control, and the site becomes cluttered, disorganized, and, well, weedy. Like a good gardener, a good Webmaster will periodically prune out the weeds, dead limbs, and runaway growth, allowing new pages and features to thrive on the site.

Previously we looked at five tips to improve your site from an external perspective. Those tips were:

Use effective graphics.
Provide navigation tools.
Link to the Webmaster.
Copyright your work.
Use a standard style.

Beauty is more than skin deep, and a good Web site is just as clean internally as it is externally. This month, we'll turn our attention to the gory details beneath your pretty pages, offering five tips on some internal pruning every Webmaster needs to do.

Advertisements

Internal pruning: Get your site up to speed
Good growth stems from good roots. The prettiest of plants will soon wither if it is not well-rooted in good soil. Similarly, the most elegant site will decay if it is not well-engineered and built with solid principals. If you spend all your time tweaking your pages but never take the time to build them with consistent design rules, you'll create a site that is hard to administer, hard to update, and doomed to failure, even as you work to keep it fresh and exciting.

Avoid these problems by making sure you follow my five rules listed below. Check them off as you work on your site. If you can check off these five tips, along with the tips from last month, your site will be beautiful inside and out -- and ready to grow.

Create good directory structures

What is the most important tool at your disposal for creating a well-organized site? JavaScript? Style sheets? Frames? None of these. The most important tool on your server is the directory, a place where you can collect similar documents and organize your pages.

Too many Web sites are flat: All the pages, images, CGI scripts, and everything else, are stuffed into the top-level document directory. You can easily spot a "flat" site. It's one where all the URLs to the pages have a server name and a document name, but no intervening path names.

Big deal, you may say. Who cares where the pages are, as long as people can get to them?

If your site has only three or four pages, or even 10 or 15, keeping them in one directory may make sense. In most cases, though, your pages can be logically grouped, and each of those groups should be in a common directory with a shared top-level index page.

Remember that your visitors can see your directory structure. Let this be a tool that they can exploit. If you are selling clothing and have pages dedicated to shirts, pants, and shoes, create directories named shirts, pants, and shoes. When people visit your site, the URL will help them see where they are going.

More importantly, those URLs will help you keep things organized. If you have hundreds of products but need to only update the pages dealing with shirts, having all those pages in a single directory makes life much easier.

Using directories makes it easier to understand your access logs. It is easy to grep out the hits on individual directories to see if more people are looking for pants instead of shoes. If your pages are all in a single directory, you'll need more creative document names and more complex log analysis scripts to distinguish among the files.

Finally, using directories makes it easier to provide intermediate home pages for subsets of your site. If people just want to buy shoes, they can see the index page for the shoes directory, keeping them from being distracted by other products.

At the very least, use directories to collect different types of documents. I always put all my images in one directory, named images. This makes it easy to share images among multiple pages and makes it easy to remove image hits from my log files before doing log analysis. (You're still counting image hits on your site? Shame!)

Fix up your links

Bad links are the hallmark of a poorly maintained site. The most obvious bad links are those that lead nowhere, victims of Web-rot. A good Webmaster is constantly on the prowl for bad links, updating them to point to the right spot or removing them if needed. This kind of task is tedious at best, and I recommend you automate the process if possible. I like to use MOMspider (see Resources below) to walk my Webs, identifying bad links and helping to keep Web-rot to a minimum. You might also check the Bot Spot for a list of other robots you might find useful.

A more insidious form of bad linking is the misuse of absolute and relative links. An absolute link includes a server name and path for the referenced resource and can point to any resource on the Web. A relative link explicitly omits the server name (and possibly the path) and is used to point to resources on the current site.

Absolute links should only be used to point to resources on other sites. When linking to pages on your own site, you should only use relative links.

This advice is intended to make your site more portable and maintainable. If all of your local links are relative, you can change your server's name and not impact any page on your site. If you build your directory structure correctly, you can move entire directories of documents around on your server (or even to other servers) and not break any link. You can even take your entire site, press it on a CD, and distribute it for offline viewing without any local links breaking. The effort needed to create relative links is fairly small, but the benefits are many, making your site much easier to grow and extend.

Automate your pages

Creating pages takes time, especially well-designed pages. A good site design has a consistent look and feel between pages, which means that there will be a large amount of similar HTML code used over and over on your pages. This boilerplate HTML might include your navigation tools (easily reusable if you use the right directories and relative links, hint, hint), copyright information, contact links, and other common page elements.

In the depths of HTML hell, authors retype this stuff every time they create a new page. Webmasters confined to HTML purgatory know enough to cut and paste this stuff between pages each time they build something new and might even have a template document with the boilerplate inserted in the right places. Angelic Webmasters go one step further, using server-based automation to insert the boilerplate for them as the pages are served.

The heavenly trick to make this work is the server-side include, or SSI. Most servers, and certainly all the Unix based ones, include the ability to insert text into a document as it is delivered to your visitors. SSI syntax varies from server to server, but the general idea is that you provide the name of a file to include via a comment in your document. The server removes the comment, replacing it with the contents of the file.

A slightly more sophisticated SSI retrieves the result of an executed command and inserts it in the document. This is great for automatic indexing and dynamic sites. I once used this trick for a site where folks contributed documents to a shared directory. The top-level page used a command-based SSI to extract the titles of the documents and build an index page. This eliminated the need for contributors to edit the index page, breaking things and stepping on each other's work. It also saved tons of time when maintaining the site because we were exploiting well-constructed links and a good directory structure.

Secure your directories

This is such a simple tip, but it is overlooked by even the best Webmasters.

Every server has a notion of an index page within a directory: a specially named document that is to be retrieved when the server is presented with the URL of a directory. For many servers, the page named index.html is the index page. If a server finds it in the directory, it gets displayed accordingly.

If that page doesn't exist, the server constructs an index page for the directory that looks a lot like output of ls, with file names, sizes, and creation dates. Each name is a link to that file in the directory.

It's easy to see why this is a disaster. Everyone has all sorts of old and partially completed pages in their directories. You would never link to this stuff from a real index page, but the server doesn't know better. As a result, everyone gets to view your dirty laundry.

I am always astounded to see directories on major Web sites with this security hole wide open. If I get bored surfing the Web, I'll often truncate URLs to the current directory just to see what I get from the server. I'm never bored with what turns up on an unsecured directory!

Use correct HTML

It is disappointing that even though using correct HTML is required advice for any Webmaster, years of forgiving browsers and lax HTML standards have produced legions of bad HTML authors.

There are rules for HTML and standards that dictate how it should be used. Unfortunately, browsers are so tolerant of bad HTML that authors get lazy, only fooling with their HTML until it works, not until it works correctly. As a result, we get all sorts of malformed HTML on the Web, with missing end tags, illegal element nesting, erroneous tag attributes, and poorly constructed documents.

First, HTML needs to be correct so your pages look good on as many browsers as possible. Even if Netscape gets it right and Explorer does a passable job of displaying your pages, a user of an old version of AOL or Mosaic might see gibberish. By using correct HTML, your pages have the best chance of being seen by as many people as possible.

More important, correct HTML is maintainable HTML. This may be hard to believe, but you will probably cede control of your site to some other Webmaster who will be tasked with picking up all the stuff you created. If you take the time to get it right, that person will have a much easier time updating and extending your documents. Neglect your HTML now, and you'll make a permanent enemy of the poor soul who has to pick up your pieces.

How does your garden grow?
Can you check off all five tips? How about the five from last month? If so, your site is in tip-top shape, inside and out. If not, take the time to get things right, so that you, your visitors, and your successors can all benefit.

What tips did I miss? Can you suggest a few more that I should share? Write and let me know, and I'll pass them on in a future column. In the same vein, my column on a Webmaster curriculum generated a lot of good feedback. I'll share your thoughts on more required courses for the aspiring Webmaster next month. Until then, happy pruning!

Resources

MOMspider http://www.ics.uci.edu/WebSoft/MOMspider/
Bot Spot http://www.botspot.com
Server side include links http://www.yahoo.com/Computers_and_Internet/Software/Internet/World_Wide_Web/Servers/Server_Side_Scripting/Server_Side_Includes__SSI_/
Sun distinguished engineer Jakob Nielsen's Web design site http://www.useit.com/
Full listing of previous Webmaster columns in SunWorld http://www.sun.com/sunworldonline/common/swol-backissues-columns.html#webmaster
Web server performance/management stories in SunWorld's Site Index http://www.sun.com/sunworldonline/common/swol-siteindex.html#webperf
Web server security stories in SunWorld's Site Index http://www.sun.com/sunworldonline/common/swol-siteindex.html#websec

About the author
Chuck Musciano has been running various Web sites, including the HTML Guru Home Page, since early 1994, serving up HTML tips and tricks to hundreds of thousands of visitors each month. He's been a beta tester and contributor to the NCSA httpd project and speaks regularly on the Internet, World Wide Web, and related topics. Reach Chuck at chuck.musciano@sunworld.com.

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-11-1997/swol-11-webmaster.html
Last modified:

Comments:
Name:
Email:
Company Name: