Filter spam out!
Protect yourself from spam: A practical guide to procmail
Need protection against spam? This article introduces the freeware utility, procmail and shows you, step by step, how to create a mail filter. But procmail can be used for much more than simply filtering spam. We'll also give you ideas for more advanced filtering features, like linking e-mail to your pager, or sorting useful mail. (2,500 words)
There are quite a few ways to automate spam removal for your personal in-box, and the solution you choose will depend on the type of e-mail service you employ. We'll focus on the situations where a Unix machine hosts your e-mail reception You might use pine, mh, mail, or another command-line utility to read your messages while logged directly onto the host; or perhaps your personal computer connects to Unix through a POP (post office protocol) server to retrieve your correspondence. In either case, we'll show you how to build a useful filter into the Unix machine that delivers your e-mail to you. We call this a "host-based filter."
Can you get an off-the-shelf utility to do your filtering? Yes and no. point and click desktop clients like Eudora, AK-Mail, Pegasus, FirstClass, and Outlook have filtering capabilities, but their use presents a number of liabilities:
Your first e-mail filter
We rely on procmail for automating e-mail traffic. We'll focus on it's installation and use (for an overview of the procmail utility (see Resources below).
First, verify that procmail is installed on your host. Most ISPs that provide shell access make it available as /usr/local/bin/procmail/var/bin/procmail, or some local variation. Corporate networks are less likely to have procmail, so don't be surprised if you can't find it on your host. The directions available at the procmail home page are easy, so you or your sysadmin will have it up and running in short order.
Once you've confirmed that procmail is available as a Unix executable, take these steps:
VERBOSE=on MAILDIR=$HOME/mail PMDIR=$HOME/.procmail LOGFILE=$PMDIR/log INCLUDERC=$PMDIR/rc.personal
:0: * ^Subject.*cable descrambler discard
"|IFS=' ' && exec /usr/local/bin/procmail -f || exit 75 #YOUR EMAIL NAME"This is the step that's likely to involve the most variation from one system to another. In the last item in the Resources list below, we list the .forward commands we've found to work best under a variety of operating systems.
chmod 644 $HOME/.forward chmod a+x $HOME
sendmail YOUR_EMAIL_ADDRESS Subject: send us money blah, blah .
sendmail YOUR_EMAIL_ADDRESS Subject: send us money for our cable descrambler blah, blah .
$HOME/mail/discard, that holds the "send us money for our cable descrambler" message.
You're in business. You now have the means to protect yourself from messages you don't want to see.
Variations on a theme
There are many steps you'll want to take to improve your filter. You can add a second rule, and your ruleset becomes:
:0: * ^Subject.*cable descrambler discard :0: * ^Subject.*adults only discardYou can filter out messages from sites notorious for their junk:
:0: * ^From: CyberMarketing@hotmail.com discard
or from suspicious-looking people:
:0: * ^From anonymous discard :0: * ^From bulk discard
You'll probably want to throw away the electronic equivalent of "Deliver to Resident" messages,
:0: * ^To:.*public.com discard
as well as anything courteous enough to announce itself as an unsolicited advertisement:
:0: * ^X-Advert discard
Remember, you can combine all these filters and more in a single
ruleset. It's entirely practical to have dozens or even hundreds
of rules. You're better off starting small, so you can understand any
peculiarities of your particular system. It's a good idea to start
with rules that specify a destination, such as "discard." This gives
you a chance to see what you're tossing out of your in-box, because
it's still available in the file
Eventually you might want such rules as:
:0: * ^Subject.*adults only /dev/nullwhich immediately tosses away the messages it detects. Procmail reads the third line in a rule -- "discard" in previous examples -- as a filename. Every modern Unix observes the convention that
/dev/nullacts like a file with the special property that anything written to it is ignored. The consequence in this case is to delete irretrievably any e-mail with a "Subject" that includes the phrase "adults only."
Simple combinations of the rules above are likely to eliminate at least three-quarters of your unwanted e-mail. For more precise filtering, read about procmail's regular expression matches, shell-command invocation, and other advanced features in the Resources listed below.
Policy and performance
Installation was easy enough, now lets have a look at the fine print (policy and performance issues you'll want to know about).
Before you begin experimenting with procmail, verify that it's permitted on your system. We know of no serious hazards associated with using procmail -- in fact, procmail was constructed to be a benign, low-impact Unix utility -- but some organizations have strict policies about installing or using certain kinds of software. You'll need to know what the local rules are before you embark on your procmail adventures.
Scalability is the one negative performance issue associated with procmail. An active e-mail host can be required to process millions of messages a day. Whatever procmail's performance profile, launching it at that rate is likely to affect system performance. "It's a pig," summarizes Karl Denninger, founder of MCSNet, a leading Chicago-land ISP known for its technical sophistication. Procmail was designed for small-scale use, so it can seriously distort operations on a heavily-loaded machine.
Despite this, procmail comes out a winner. It may take extra milliseconds to process each incoming message, but that's generally preferable to piling up unwanted e-mail for human users to sort. In fact, Sysadmins at the ISPs and corporate operations we contacted unanimously agreed that, despite minor performance and policy issues, they allow and even encourage the use of procmail. MCSNet, for example, is known for its aggressive centralized (non-procmail) anti-spam defenses, which filter hundreds of thousands of unwanted messages daily; moreover, the ISP's technical staff is always on the lookout for programs that impact system performance, yet MCSNet not only permits the use of procmail, the support staff is trained to help users develop their own filters.
Are you ahead yet?
You've invested a little time in reading this article, maybe you've installed procmail and set up your first filter. It's has already begun looking out for your interests. Soon it will be saving you several seconds a day which would otherwise be consumed by deleting junk mail. Your investment will easily pay for itself in the next year, perhaps sooner.
Only the beginning
So far, we've presented procmail as a way to filter out undesirable messages. Now we'll look at the ways procmail can be used to positively enhance the value of your e-mail traffic. You'll probably find that, much as you love filtering spam out, your favorite uses for procmail involve filtering useful e-mail in. To conclude this article, we'll suggest four general uses for your filter:
Host-wide filtering protects all users
You've constructed a filter for your e-mail. Other users on your system have the option to do the same. But why not set up a single ruleset to scan and filter all messages (for all users) that arrive at a particular site? Many Internet and online service providers, for example, now advertise spam filtering as a benefit to customers. Generally, this means they discard all inbound e-mail from a list of known spammers. With procmail, much more precision is possible. Sysadmins can use all the techniques we've illustrated above to develop front-end filters for their sites.
Caution: Site-wide automation is a considerably more delicate matter than the individual filters previously discussed. Host-wide filtering entails serious policy and performance considerations. Be conservative, for e-mail is likely to be among your system's most mission critical mechanisms. Even if you know your goal is site-wide protection, and even if you know you won't be using procmail, we recommend that you begin your experimentation with a personal filter. It's a valuable warm-up.
E-mail and the real world: connecting e-mail to external actions
One entertaining category of use for procmail is as a filter between e-mail in-boxes and the real world. You can, for example, combine procmail with other utilities to:
Automation of mail sorting
Defense against spam might have led you to procmail. Automatically sorting your incoming e-mail, though, is what will ultimately prove most rewarding.
Most readers are familiar with the notion of multiple "mail folders." We'll briefly introduce the Unix use of this term, and illustrate procmail's role.
Suppose you're a user on a Unix system. You know how to send mail and receive it. You use a command-line utility -- perhaps mail or pine -- to read your mail, which appears in a single list, sorted by date of arrival.
Let's say you subscribe to the SunWorld reader alerts and also to hppd-users, so you can learn about porting software to HP-UX. Take the ruleset you wrote for procmail above and append these rules:
:0: * ^From email@example.com hppd-users :0: * ^From SunWorld@FDDS.COM SunWorld
The SunWorld and hppd-users traffic will no longer appear in your "personal" mailbox. Instead, they will automatically be sorted into their own "folders," positioned in the file system as
$HOME/mail/SunWorldYou can access these folders from the command-line with invocations like
mail -f $HOME/mail/hppd-users,
pine -f $HOME/mail/SunWorld, and so on.
The effect is analogous to the difference between having everything piled atop your desk, and using individual manilla envelopes for separate project or activities. Focusing your attention productively is a tremendous boost. It will rationalize your correspondence enormously.
That's the idea of mail folders. You'll want to tailor them to your situation: You might customize a filter to sort e-mail from particular users, or by subject; configure a personal computer to access and maintain different folders on your Unix host; or write rules that duplicate certain messages, so copies appear in more than one folder.
While the ability to filter spam doesn't depend on the details of your software configuration, sorting e-mail into usable folders does, because the rules you write need to match the software you use to read your e-mail. This can be complex, but it's well worth effort, and you'll soon be enjoying the benefits of automated e-mail sorting.
Electronic harassment and denial-of-service terrorism are realities in today's Internet world. Experience with a filtering infrastructure is a significant advantage in recovering from such attacks. Trying to learn the filtering part of a defense in the middle of an attack, however, is a bit like taking your first bicycle ride in a windstorm. So start now.
If you're receiving e-mail on a Unix host, you can probably use procmail to help clean spam out of your mailbox. Once you've got that down, it's likely that you'll go on to instruct procmail to handle more sophisticated sorting and controlling. The Resources links below detail the possibilities.
As with everything we do regarding the Internet, many people contributed to this article. We particularly thank Karl Denninger and Craig Johnston for their time and ideas, and Nancy McGough for her several fine FAQs.
About the author
Cameron Laird and Kathryn Soraiz manage their own software consultancy, Network Engineered Solutions, from just outside Houston, TX. Reach Cameron at firstname.lastname@example.org.
If you have technical problems with this magazine, contact email@example.com