Filter spam out!

Protect yourself from spam: A practical guide to procmail

By Cameron Laird and Kathryn Soraiz

SunWorld
December  1997
[Next story]
[Table of Contents]
[Search]
Sun's Site

Abstract
Need protection against spam? This article introduces the freeware utility, procmail and shows you, step by step, how to create a mail filter. But procmail can be used for much more than simply filtering spam. We'll also give you ideas for more advanced filtering features, like linking e-mail to your pager, or sorting useful mail. (2,500 words)


Mail this
article to
a friend
What do most Internet users do about unwanted e-mail? Well, some users keep their e-mail addresses secret, others petition Congress, and quite a few do nothing at all. The wise work smarter, not harder, by using computers to detect and filter out junk e-mail. In this article we'll focus on the last of these alternatives. We'll take you, step by step, through everything you need to know in order to enlist the aid of a Unix host in filtering unwanted e-mail traffic. Follow the directions here, and in the next hour your computer will be automatically and safely blocking spam from your in-box.

Favorable circumstances
There are quite a few ways to automate spam removal for your personal in-box, and the solution you choose will depend on the type of e-mail service you employ. We'll focus on the situations where a Unix machine hosts your e-mail reception You might use pine, mh, mail, or another command-line utility to read your messages while logged directly onto the host; or perhaps your personal computer connects to Unix through a POP (post office protocol) server to retrieve your correspondence. In either case, we'll show you how to build a useful filter into the Unix machine that delivers your e-mail to you. We call this a "host-based filter."

Can you get an off-the-shelf utility to do your filtering? Yes and no. point and click desktop clients like Eudora, AK-Mail, Pegasus, FirstClass, and Outlook have filtering capabilities, but their use presents a number of liabilities:

  1. They use much more bandwidth than host-based filters.
  2. Their mechanisms are proprietary and poorly portable.
  3. If you are more mobile than your computers, you'll suffer the inconvenience of viewing your e-mail traffic through different filters, depending on where you are.
Perhaps the best relationship between the host-based filters we describe and end-user filters is cooperation. It's entirely practical to cascade different kinds of filters. That is, use each one on the machine to which it applies. Once you've learned the principles of filtering, you'll have a lot of flexibility in how you apply them.


Advertisements

Your first e-mail filter
We rely on procmail for automating e-mail traffic. We'll focus on it's installation and use (for an overview of the procmail utility (see Resources below).

First, verify that procmail is installed on your host. Most ISPs that provide shell access make it available as /usr/local/bin/procmail/var/bin/procmail, or some local variation. Corporate networks are less likely to have procmail, so don't be surprised if you can't find it on your host. The directions available at the procmail home page are easy, so you or your sysadmin will have it up and running in short order.

Once you've confirmed that procmail is available as a Unix executable, take these steps:

  1. Create the file $HOME/.procmailrc with contents:
    	VERBOSE=on
    	MAILDIR=$HOME/mail
    	PMDIR=$HOME/.procmail
    	LOGFILE=$PMDIR/log
    	INCLUDERC=$PMDIR/rc.personal
    
  2. Create the file $HOME/.procmail/rc.personal with contents:
    	:0:
    	* ^Subject.*cable descrambler
    	discard
    
  3. Create the file $HOME/.forward with contents:
    	"|IFS=' ' && exec /usr/local/bin/procmail -f || exit 75 #YOUR EMAIL NAME"
    
    This is the step that's likely to involve the most variation from one system to another. In the last item in the Resources list below, we list the .forward commands we've found to work best under a variety of operating systems.
  4. Set necessary permissions with the commands:
    	chmod 644 $HOME/.forward
    	chmod a+x $HOME
    
  5. Test the filter by typing the four lines:
    	sendmail YOUR_EMAIL_ADDRESS
    	Subject: send us money
    	blah, blah
    	.
    
  6. And then the four lines:
    	sendmail YOUR_EMAIL_ADDRESS
    	Subject: send us money for our cable descrambler
    	blah, blah
    	.
    
What you should see now is that you have one new mail message, with the subject line, "send us money." You'll also have a new file, $HOME/mail/discard, that holds the "send us money for our cable descrambler" message.

You're in business. You now have the means to protect yourself from messages you don't want to see.

Variations on a theme
There are many steps you'll want to take to improve your filter. You can add a second rule, and your ruleset becomes:

	:0:
	* ^Subject.*cable descrambler
	discard

	:0:
	* ^Subject.*adults only
	discard
You can filter out messages from sites notorious for their junk:
	:0:
	* ^From: CyberMarketing@hotmail.com
	discard

or from suspicious-looking people:

	
	:0:
	* ^From anonymous
	discard

	:0:
	* ^From bulk
	discard     

You'll probably want to throw away the electronic equivalent of "Deliver to Resident" messages,

	:0:
	* ^To:.*public.com   
	discard

as well as anything courteous enough to announce itself as an unsolicited advertisement:

	:0:
	* ^X-Advert
	discard

Remember, you can combine all these filters and more in a single ruleset. It's entirely practical to have dozens or even hundreds of rules. You're better off starting small, so you can understand any peculiarities of your particular system. It's a good idea to start with rules that specify a destination, such as "discard." This gives you a chance to see what you're tossing out of your in-box, because it's still available in the file $HOME/mail/discard. Eventually you might want such rules as:

	:0:
	* ^Subject.*adults only
	/dev/null
which immediately tosses away the messages it detects. Procmail reads the third line in a rule -- "discard" in previous examples -- as a filename. Every modern Unix observes the convention that /dev/null acts like a file with the special property that anything written to it is ignored. The consequence in this case is to delete irretrievably any e-mail with a "Subject" that includes the phrase "adults only."

Advanced defenses
Simple combinations of the rules above are likely to eliminate at least three-quarters of your unwanted e-mail. For more precise filtering, read about procmail's regular expression matches, shell-command invocation, and other advanced features in the Resources listed below.

Policy and performance
Installation was easy enough, now lets have a look at the fine print (policy and performance issues you'll want to know about).

Before you begin experimenting with procmail, verify that it's permitted on your system. We know of no serious hazards associated with using procmail -- in fact, procmail was constructed to be a benign, low-impact Unix utility -- but some organizations have strict policies about installing or using certain kinds of software. You'll need to know what the local rules are before you embark on your procmail adventures.

Scalability is the one negative performance issue associated with procmail. An active e-mail host can be required to process millions of messages a day. Whatever procmail's performance profile, launching it at that rate is likely to affect system performance. "It's a pig," summarizes Karl Denninger, founder of MCSNet, a leading Chicago-land ISP known for its technical sophistication. Procmail was designed for small-scale use, so it can seriously distort operations on a heavily-loaded machine.

Despite this, procmail comes out a winner. It may take extra milliseconds to process each incoming message, but that's generally preferable to piling up unwanted e-mail for human users to sort. In fact, Sysadmins at the ISPs and corporate operations we contacted unanimously agreed that, despite minor performance and policy issues, they allow and even encourage the use of procmail. MCSNet, for example, is known for its aggressive centralized (non-procmail) anti-spam defenses, which filter hundreds of thousands of unwanted messages daily; moreover, the ISP's technical staff is always on the lookout for programs that impact system performance, yet MCSNet not only permits the use of procmail, the support staff is trained to help users develop their own filters.

Are you ahead yet?
You've invested a little time in reading this article, maybe you've installed procmail and set up your first filter. It's has already begun looking out for your interests. Soon it will be saving you several seconds a day which would otherwise be consumed by deleting junk mail. Your investment will easily pay for itself in the next year, perhaps sooner.

Only the beginning
So far, we've presented procmail as a way to filter out undesirable messages. Now we'll look at the ways procmail can be used to positively enhance the value of your e-mail traffic. You'll probably find that, much as you love filtering spam out, your favorite uses for procmail involve filtering useful e-mail in. To conclude this article, we'll suggest four general uses for your filter:

  1. Host-wide filtering
  2. Connection of e-mail to external actions
  3. Automation of mail-list sorting
  4. Insurance against terrorism

Host-wide filtering protects all users
You've constructed a filter for your e-mail. Other users on your system have the option to do the same. But why not set up a single ruleset to scan and filter all messages (for all users) that arrive at a particular site? Many Internet and online service providers, for example, now advertise spam filtering as a benefit to customers. Generally, this means they discard all inbound e-mail from a list of known spammers. With procmail, much more precision is possible. Sysadmins can use all the techniques we've illustrated above to develop front-end filters for their sites.

Caution: Site-wide automation is a considerably more delicate matter than the individual filters previously discussed. Host-wide filtering entails serious policy and performance considerations. Be conservative, for e-mail is likely to be among your system's most mission critical mechanisms. Even if you know your goal is site-wide protection, and even if you know you won't be using procmail, we recommend that you begin your experimentation with a personal filter. It's a valuable warm-up.

E-mail and the real world: connecting e-mail to external actions
One entertaining category of use for procmail is as a filter between e-mail in-boxes and the real world. You can, for example, combine procmail with other utilities to:

  1. Page you whenever you receive e-mail from a specific sender, such as your boss, or on a specific subject, such as "URGENT!"

  2. Implement "mailback." This is an automated service used to answer information inquiries. The Web's "brochureware" model has conquered most of the territory mailback ruled just a few years ago. Mailback, however, remains appropriate for specialized services, i.e., those involving security considerations or expensive connections.

  3. Perform an emergency shutdown, restart modem banks, or launch a backup operation in response to a coded e-mail message.
These operations demonstrate the real-world potential of procmail-filtered e-mail implementation. E-mail is just a mechanism, it doesn't provide a unique or irreplaceable value, so it's usually possible to design a solution without it. But, combined with procmail, e-mail does have the potential to provide a very mature, well understood, and widely distributed answer for today's asynchronous communication needs.

Automation of mail sorting
Defense against spam might have led you to procmail. Automatically sorting your incoming e-mail, though, is what will ultimately prove most rewarding.

Most readers are familiar with the notion of multiple "mail folders." We'll briefly introduce the Unix use of this term, and illustrate procmail's role.

Suppose you're a user on a Unix system. You know how to send mail and receive it. You use a command-line utility -- perhaps mail or pine -- to read your mail, which appears in a single list, sorted by date of arrival.

Let's say you subscribe to the SunWorld reader alerts and also to hppd-users, so you can learn about porting software to HP-UX. Take the ruleset you wrote for procmail above and append these rules:

	:0:
	* ^From owner-hppd-users@cv.ruu.nl
	hppd-users

	:0:
	* ^From SunWorld@FDDS.COM
	SunWorld

The SunWorld and hppd-users traffic will no longer appear in your "personal" mailbox. Instead, they will automatically be sorted into their own "folders," positioned in the file system as

	$HOME/mail/hppd-users
and
	$HOME/mail/SunWorld
You can access these folders from the command-line with invocations like mail -f $HOME/mail/hppd-users, pine -f $HOME/mail/SunWorld, and so on.

The effect is analogous to the difference between having everything piled atop your desk, and using individual manilla envelopes for separate project or activities. Focusing your attention productively is a tremendous boost. It will rationalize your correspondence enormously.

That's the idea of mail folders. You'll want to tailor them to your situation: You might customize a filter to sort e-mail from particular users, or by subject; configure a personal computer to access and maintain different folders on your Unix host; or write rules that duplicate certain messages, so copies appear in more than one folder.

While the ability to filter spam doesn't depend on the details of your software configuration, sorting e-mail into usable folders does, because the rules you write need to match the software you use to read your e-mail. This can be complex, but it's well worth effort, and you'll soon be enjoying the benefits of automated e-mail sorting.

Terrorism insurance
Electronic harassment and denial-of-service terrorism are realities in today's Internet world. Experience with a filtering infrastructure is a significant advantage in recovering from such attacks. Trying to learn the filtering part of a defense in the middle of an attack, however, is a bit like taking your first bicycle ride in a windstorm. So start now.

Conclusion
If you're receiving e-mail on a Unix host, you can probably use procmail to help clean spam out of your mailbox. Once you've got that down, it's likely that you'll go on to instruct procmail to handle more sophisticated sorting and controlling. The Resources links below detail the possibilities.

Acknowledgement
As with everything we do regarding the Internet, many people contributed to this article. We particularly thank Karl Denninger and Craig Johnston for their time and ideas, and Nancy McGough for her several fine FAQs.


Resources

Organizations Links Products Newsgroups that often discuss procmail: comp.mail.misc; comp.mail.sendmail; news.admin.net-abuse.email

About the author
Cameron Laird and Kathryn Soraiz manage their own software consultancy, Network Engineered Solutions, from just outside Houston, TX. Reach Cameron at cameron.laird@sunworld.com.

What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough
 
 
 
    

SunWorld
[Table of Contents]
Sun's Site
[Search]
Feedback
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-12-1997/swol-12-spam.html
Last modified: