|
Filter spam out!Protect yourself from spam: A practical guide to procmail
|
Need protection against spam? This article introduces the freeware utility, procmail and shows you, step by step, how to create a mail filter. But procmail can be used for much more than simply filtering spam. We'll also give you ideas for more advanced filtering features, like linking e-mail to your pager, or sorting useful mail. (2,500 words)
Mail this article to a friend |
Favorable circumstances
There are quite a few ways to automate spam removal for your personal in-box, and the
solution you choose will depend on the type of e-mail service you employ.
We'll focus on the situations where a Unix machine hosts your e-mail reception
You might use pine, mh, mail, or another command-line utility to read
your messages while logged directly onto the host; or perhaps your personal
computer connects to Unix through a POP (post office protocol) server to retrieve
your correspondence. In either case, we'll show you how to build a useful
filter into the Unix machine that delivers your e-mail to you. We call this a "host-based
filter."
Can you get an off-the-shelf utility to do your filtering? Yes and no. point and click desktop clients like Eudora, AK-Mail, Pegasus, FirstClass, and Outlook have filtering capabilities, but their use presents a number of liabilities:
|
|
|
|
Your first e-mail filter
We rely on procmail for automating e-mail traffic. We'll focus on it's installation
and use (for an overview of the procmail utility (see Resources below).
First, verify that procmail is installed on your host. Most ISPs that provide shell access make it available as /usr/local/bin/procmail/var/bin/procmail, or some local variation. Corporate networks are less likely to have procmail, so don't be surprised if you can't find it on your host. The directions available at the procmail home page are easy, so you or your sysadmin will have it up and running in short order.
Once you've confirmed that procmail is available as a Unix executable, take these steps:
$HOME/.procmailrc
with contents:
VERBOSE=on MAILDIR=$HOME/mail PMDIR=$HOME/.procmail LOGFILE=$PMDIR/log INCLUDERC=$PMDIR/rc.personal
$HOME/.procmail/rc.personal
with contents:
:0: * ^Subject.*cable descrambler discard
"|IFS=' ' && exec /usr/local/bin/procmail -f || exit 75 #YOUR EMAIL NAME"This is the step that's likely to involve the most variation from one system to another. In the last item in the Resources list below, we list the .forward commands we've found to work best under a variety of operating systems.
chmod 644 $HOME/.forward chmod a+x $HOME
sendmail YOUR_EMAIL_ADDRESS Subject: send us money blah, blah .
sendmail YOUR_EMAIL_ADDRESS Subject: send us money for our cable descrambler blah, blah .
$HOME/mail/discard
, that holds the "send us money for our
cable descrambler" message.
You're in business. You now have the means to protect yourself from messages you don't want to see.
Variations on a theme
There are many steps you'll want to take to improve your filter.
You can add a second rule, and your ruleset becomes:
:0: * ^Subject.*cable descrambler discard :0: * ^Subject.*adults only discardYou can filter out messages from sites notorious for their junk:
:0: * ^From: CyberMarketing@hotmail.com discard
or from suspicious-looking people:
:0: * ^From anonymous discard :0: * ^From bulk discard
You'll probably want to throw away the electronic equivalent of "Deliver to Resident" messages,
:0: * ^To:.*public.com discard
as well as anything courteous enough to announce itself as an unsolicited advertisement:
:0: * ^X-Advert discard
Remember, you can combine all these filters and more in a single
ruleset. It's entirely practical to have dozens or even hundreds
of rules. You're better off starting small, so you can understand any
peculiarities of your particular system. It's a good idea to start
with rules that specify a destination, such as "discard." This gives
you a chance to see what you're tossing out of your in-box, because
it's still available in the file $HOME/mail/discard
.
Eventually you might want such rules as:
:0: * ^Subject.*adults only /dev/nullwhich immediately tosses away the messages it detects. Procmail reads the third line in a rule -- "discard" in previous examples -- as a filename. Every modern Unix observes the convention that
/dev/null
acts like a file with the special property
that anything written to it is ignored. The consequence in this
case is to delete irretrievably any e-mail with a "Subject" that
includes the phrase "adults only."
Advanced defenses
Simple combinations of the rules above are likely to
eliminate at least three-quarters of your unwanted e-mail.
For more precise filtering, read about
procmail's regular expression matches, shell-command
invocation, and other advanced features in the Resources listed
below.
Policy and performance
Installation was easy enough, now lets have a look at the fine print (policy
and performance issues you'll want to know about).
Before you begin experimenting with procmail, verify that it's permitted on your system. We know of no serious hazards associated with using procmail -- in fact, procmail was constructed to be a benign, low-impact Unix utility -- but some organizations have strict policies about installing or using certain kinds of software. You'll need to know what the local rules are before you embark on your procmail adventures.
Scalability is the one negative performance issue associated with procmail. An active e-mail host can be required to process millions of messages a day. Whatever procmail's performance profile, launching it at that rate is likely to affect system performance. "It's a pig," summarizes Karl Denninger, founder of MCSNet, a leading Chicago-land ISP known for its technical sophistication. Procmail was designed for small-scale use, so it can seriously distort operations on a heavily-loaded machine.
Despite this, procmail comes out a winner. It may take extra milliseconds to process each incoming message, but that's generally preferable to piling up unwanted e-mail for human users to sort. In fact, Sysadmins at the ISPs and corporate operations we contacted unanimously agreed that, despite minor performance and policy issues, they allow and even encourage the use of procmail. MCSNet, for example, is known for its aggressive centralized (non-procmail) anti-spam defenses, which filter hundreds of thousands of unwanted messages daily; moreover, the ISP's technical staff is always on the lookout for programs that impact system performance, yet MCSNet not only permits the use of procmail, the support staff is trained to help users develop their own filters.
Are you ahead yet?
You've invested a little time in reading this
article, maybe you've installed procmail and set up your first filter.
It's has already begun looking out for your interests. Soon it will be saving you
several seconds a day which would otherwise be consumed by deleting junk mail.
Your investment will easily pay for itself in the next year, perhaps
sooner.
Only the beginning
So far, we've presented procmail as a way to filter out
undesirable messages. Now we'll look at the ways procmail can be used to
positively enhance the value of your e-mail traffic. You'll probably find that,
much as you love filtering spam out, your favorite uses for procmail
involve filtering useful e-mail in. To conclude this
article, we'll suggest four general uses for your filter:
Host-wide filtering protects all users
You've constructed a filter for your e-mail. Other users on your
system have the option to do the same. But why not set up a single
ruleset to scan and filter all messages (for all users) that arrive
at a particular site? Many Internet and online service providers,
for example, now advertise spam filtering as a benefit to customers.
Generally, this means they discard all inbound e-mail from a list of
known spammers. With procmail, much more precision is possible.
Sysadmins can use all the techniques we've illustrated above to develop front-end
filters for their sites.
Caution: Site-wide automation is a considerably more delicate matter than the individual filters previously discussed. Host-wide filtering entails serious policy and performance considerations. Be conservative, for e-mail is likely to be among your system's most mission critical mechanisms. Even if you know your goal is site-wide protection, and even if you know you won't be using procmail, we recommend that you begin your experimentation with a personal filter. It's a valuable warm-up.
E-mail and the real world: connecting e-mail to external actions
One entertaining category of use for procmail is as a filter
between e-mail in-boxes and the real world. You can, for example, combine procmail
with other utilities to:
Automation of mail sorting
Defense against spam might have led you to procmail. Automatically
sorting your incoming e-mail, though, is what will ultimately prove most
rewarding.
Most readers are familiar with the notion of multiple "mail folders." We'll briefly introduce the Unix use of this term, and illustrate procmail's role.
Suppose you're a user on a Unix system. You know how to send mail and receive it. You use a command-line utility -- perhaps mail or pine -- to read your mail, which appears in a single list, sorted by date of arrival.
Let's say you subscribe to the SunWorld reader alerts and also to hppd-users, so you can learn about porting software to HP-UX. Take the ruleset you wrote for procmail above and append these rules:
:0: * ^From owner-hppd-users@cv.ruu.nl hppd-users :0: * ^From SunWorld@FDDS.COM SunWorld
The SunWorld and hppd-users traffic will no longer appear in your "personal" mailbox. Instead, they will automatically be sorted into their own "folders," positioned in the file system as
$HOME/mail/hppd-usersand
$HOME/mail/SunWorldYou can access these folders from the command-line with invocations like
mail -f $HOME/mail/hppd-users
, pine -f
$HOME/mail/SunWorld
,
and so on.
The effect is analogous to the difference between having everything piled atop your desk, and using individual manilla envelopes for separate project or activities. Focusing your attention productively is a tremendous boost. It will rationalize your correspondence enormously.
That's the idea of mail folders. You'll want to tailor them to your situation: You might customize a filter to sort e-mail from particular users, or by subject; configure a personal computer to access and maintain different folders on your Unix host; or write rules that duplicate certain messages, so copies appear in more than one folder.
While the ability to filter spam doesn't depend on the details of your software configuration, sorting e-mail into usable folders does, because the rules you write need to match the software you use to read your e-mail. This can be complex, but it's well worth effort, and you'll soon be enjoying the benefits of automated e-mail sorting.
Terrorism insurance
Electronic harassment and denial-of-service terrorism are realities
in today's Internet world. Experience with a filtering infrastructure
is a significant advantage in recovering from such
attacks. Trying to learn the filtering part of a defense in the
middle of an attack, however, is a bit like taking your first bicycle
ride in a windstorm. So start now.
Conclusion
If you're receiving e-mail on a Unix host, you can probably use
procmail to help clean spam out of your mailbox. Once you've
got that down, it's likely that you'll go on to instruct
procmail to handle more sophisticated sorting and controlling. The
Resources links below detail the possibilities.
Acknowledgement
As with everything we do regarding the Internet, many people
contributed to this article. We particularly thank Karl Denninger
and Craig Johnston for their time and ideas, and Nancy McGough for
her several fine FAQs.
|
Resources
About the author
Cameron Laird and Kathryn Soraiz manage their own software consultancy, Network Engineered Solutions, from just outside Houston, TX.
Reach Cameron at cameron.laird@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-12-1997/swol-12-spam.html
Last modified: