Weeding your Web site -- part 2
A checklist of what to do to keep your Web site healthy and uncluttered
In this second of two columns on how to keep your Web site "healthy," we delve deep into "roots" of your Website -- the crucial parts that may not be seen by your visitors. We give you some simple tips on root-level design and creation that will save you some serious headaches as your Web site grows. (2,000 words).
Web sites are a lot like gardens, often starting small and growing large. Somewhere along the way, we lose control, and the site becomes cluttered, disorganized, and, well, weedy. Like a good gardener, a good Webmaster will periodically prune out the weeds, dead limbs, and runaway growth, allowing new pages and features to thrive on the site.
Previously we looked at five tips to improve your site from an external perspective. Those tips were:
Internal pruning: Get your site up to speed
Good growth stems from good roots. The prettiest of plants will soon wither if it is not well-rooted in good soil. Similarly, the most elegant site will decay if it is not well-engineered and built with solid principals. If you spend all your time tweaking your pages but never take the time to build them with consistent design rules, you'll create a site that is hard to administer, hard to update, and doomed to failure, even as you work to keep it fresh and exciting.
Avoid these problems by making sure you follow my five rules listed below. Check them off as you work on your site. If you can check off these five tips, along with the tips from last month, your site will be beautiful inside and out -- and ready to grow.
Too many Web sites are flat: All the pages, images, CGI scripts, and everything else, are stuffed into the top-level document directory. You can easily spot a "flat" site. It's one where all the URLs to the pages have a server name and a document name, but no intervening path names.
Big deal, you may say. Who cares where the pages are, as long as people can get to them?
If your site has only three or four pages, or even 10 or 15, keeping them in one directory may make sense. In most cases, though, your pages can be logically grouped, and each of those groups should be in a common directory with a shared top-level index page.
Remember that your visitors can see your directory structure. Let
this be a tool that they can exploit. If you are selling clothing
and have pages dedicated to shirts, pants, and shoes, create
shoes. When people visit your site, the URL will help
them see where they are going.
More importantly, those URLs will help you keep things organized. If you have hundreds of products but need to only update the pages dealing with shirts, having all those pages in a single directory makes life much easier.
Using directories makes it easier to understand your access logs. It is easy to grep out the hits on individual directories to see if more people are looking for pants instead of shoes. If your pages are all in a single directory, you'll need more creative document names and more complex log analysis scripts to distinguish among the files.
Finally, using directories makes it easier to provide intermediate
home pages for subsets of your site. If people just want to buy
shoes, they can see the index page for the
directory, keeping them from being distracted by other products.
At the very least, use directories to collect different types of
documents. I always put all my images in one directory, named
images. This makes it easy to share images among
multiple pages and makes it easy to remove image hits from my log
files before doing log analysis. (You're still counting image hits
on your site? Shame!)
A more insidious form of bad linking is the misuse of absolute and relative links. An absolute link includes a server name and path for the referenced resource and can point to any resource on the Web. A relative link explicitly omits the server name (and possibly the path) and is used to point to resources on the current site.
Absolute links should only be used to point to resources on other sites. When linking to pages on your own site, you should only use relative links.
This advice is intended to make your site more portable and maintainable. If all of your local links are relative, you can change your server's name and not impact any page on your site. If you build your directory structure correctly, you can move entire directories of documents around on your server (or even to other servers) and not break any link. You can even take your entire site, press it on a CD, and distribute it for offline viewing without any local links breaking. The effort needed to create relative links is fairly small, but the benefits are many, making your site much easier to grow and extend.
In the depths of HTML hell, authors retype this stuff every time they create a new page. Webmasters confined to HTML purgatory know enough to cut and paste this stuff between pages each time they build something new and might even have a template document with the boilerplate inserted in the right places. Angelic Webmasters go one step further, using server-based automation to insert the boilerplate for them as the pages are served.
The heavenly trick to make this work is the server-side include, or SSI. Most servers, and certainly all the Unix based ones, include the ability to insert text into a document as it is delivered to your visitors. SSI syntax varies from server to server, but the general idea is that you provide the name of a file to include via a comment in your document. The server removes the comment, replacing it with the contents of the file.
A slightly more sophisticated SSI retrieves the result of an executed command and inserts it in the document. This is great for automatic indexing and dynamic sites. I once used this trick for a site where folks contributed documents to a shared directory. The top-level page used a command-based SSI to extract the titles of the documents and build an index page. This eliminated the need for contributors to edit the index page, breaking things and stepping on each other's work. It also saved tons of time when maintaining the site because we were exploiting well-constructed links and a good directory structure.
Every server has a notion of an index page within a directory: a
specially named document that is to be retrieved when the server is
presented with the URL of a directory. For many servers, the page
index.html is the index page. If a server finds
it in the directory, it gets displayed accordingly.
If that page doesn't exist, the server constructs an index page for
the directory that looks a lot like output of
file names, sizes, and creation dates. Each name is a link to that
file in the directory.
It's easy to see why this is a disaster. Everyone has all sorts of old and partially completed pages in their directories. You would never link to this stuff from a real index page, but the server doesn't know better. As a result, everyone gets to view your dirty laundry.
I am always astounded to see directories on major Web sites with this security hole wide open. If I get bored surfing the Web, I'll often truncate URLs to the current directory just to see what I get from the server. I'm never bored with what turns up on an unsecured directory!
There are rules for HTML and standards that dictate how it should be used. Unfortunately, browsers are so tolerant of bad HTML that authors get lazy, only fooling with their HTML until it works, not until it works correctly. As a result, we get all sorts of malformed HTML on the Web, with missing end tags, illegal element nesting, erroneous tag attributes, and poorly constructed documents.
First, HTML needs to be correct so your pages look good on as many browsers as possible. Even if Netscape gets it right and Explorer does a passable job of displaying your pages, a user of an old version of AOL or Mosaic might see gibberish. By using correct HTML, your pages have the best chance of being seen by as many people as possible.
More important, correct HTML is maintainable HTML. This may be hard to believe, but you will probably cede control of your site to some other Webmaster who will be tasked with picking up all the stuff you created. If you take the time to get it right, that person will have a much easier time updating and extending your documents. Neglect your HTML now, and you'll make a permanent enemy of the poor soul who has to pick up your pieces.
How does your garden grow?
Can you check off all five tips? How about the five from last month? If so, your site is in tip-top shape, inside and out. If not, take the time to get things right, so that you, your visitors, and your successors can all benefit.
What tips did I miss? Can you suggest a few more that I should share? Write and let me know, and I'll pass them on in a future column. In the same vein, my column on a Webmaster curriculum generated a lot of good feedback. I'll share your thoughts on more required courses for the aspiring Webmaster next month. Until then, happy pruning!
About the author
Chuck Musciano has been running various Web sites, including the HTML Guru Home Page, since early 1994, serving up HTML tips and tricks to hundreds of thousands of visitors each month. He's been a beta tester and contributor to the NCSA httpd project and speaks regularly on the Internet, World Wide Web, and related topics. Reach Chuck at email@example.com.
If you have technical problems with this magazine, contact firstname.lastname@example.org