The Internet is awash in a sea of spam. Email users complain to their mail providers, and the mail providers try to Do Something about the spam. It won't come as a surprise that some of the things mail providers try to do are misguided, not to say evil. This even includes vendors of mail systems, who should know better.
Among the most frequent (and most evil) things people do to try to combat spam are sending synthetic bounces and and silently discarding messages.
The term "synthetic bounce" might not be familiar. Some people call this a "delayed bounce." To describe a synthetic bounce, we first have to describe a real bounce. When someone attempts to send a message that the destination server knows it can't (or won't) deliver, the destination server replies with an SMTP permanent error code. These are the codes that start with a five. 550 recipient unknown is one that many people have seen. The sending server formats a delivery failure message and returns it to the sender's individual mailbox. The message has "bounced off" the server for which it was destined; it never got inside.
Some, maybe many, misguided attempts at spam filtering allow the destination server to accept a message. After it's inside, it gets scanned for virus infection and filtered to see whether it might be spam. If it turns out to be something you don't want to drop into the recipient's mailbox, you have a problem because you have already accepted the message for delivery.
Things go from misguided to evil when you decide to create, that is, synthesize, your own bounce message and send it back. Think about it for a minute; you didn't bounce the message, you caught it, and now you want to throw it back. The problem is, to whom do you throw it? If the message is legit, then you can just send your fake bounce to the return address. But duhhhhh! The reason you want to return the message is that you're pretty sure it isn't legit!
OK, quick... how many spammers and virus writers put their own return addresses on the garbage they send out?
Right in one guess! Zero! If the message isn't legitimate, neither is the return address, and your synthetic bounce will go astray. One of two things will happen. If the world is a lucky place at that moment, it'll come back to you because the return address was completely fake. Now you have Yet Another piece of bogus mail to deal with. That's somewhat fair because you created the bogus mail in the first place.
Spammers are more likely to use legitimate addresses, just not their own! Virus writers pick addresses at random from the address books of machines they've already infected. And, some "fake" addresses will match real mailboxes. So, some innocent third party will get your synthetic bounce. You've wasted their time, stolen their bandwidth, and deposited into their mailbox a piece of junk they can't do anything with. But they're likely to read it and ponder over it to figure that out. For a longer analysis of this problem, check this article: http://www.ironport.com/company/pp_enterprise_it_planet_05-02-2006.html. Do note that the problem is only with synthetic bounces; real, SMTP connection-level bounces go only to the originating server. That's what should happen.
Synthetic bounces are evil!
The problem with silently discarding messages is false positives. We are trying to discard spam, and the spammers are trying to make their junk look legit so we won't discard it. It is inevitable that some legitimate email will be mis-identfied as spam. How much? Not very much, but what if one of the discarded messages is the one offering you that new job, or that book contract? You never see it. The sender, believing the message has been delivered, thinks you're ignoring it. Bad. Evil, even.
It's also unnecessary. There are effective ways of stopping 70% or more of spam without discarding anything.
Some people would rather risk a few legitimate messages being discarded than deal with the remaining 25-30% of spam, and there are ways to allow them to make that decision without forcing it on every user of a mail domain.
Silently discarding messages is evil.
Stopping spam and virus-infected email should be a two-stage process: SMTP rejection of as much junk as you can identify at the edge of your network and possibly cleaning and tagging within. You can stop more than two-thirds of spam before it ever gets inside your network. That may reduce the volume that you have to clean and tag to a tolerable level.
If your mail gateway server can determine that a particular message cannot or should not be delivered, your server should issue an SMTP permanent error. If a message rejected this way was really legitimate, the sender will get a non-delivery message from his own mail server and will know that he must try another way of contacting the intended recipient.
Messages that cannot be delivered are those that are addressed to unknown users. That means your mail gateway has to know who your legitimate users are. That sounds like a duhhh! statement, but some organizations use a "firewall" server that just relays everything. That creates an unnecessary internal load. If you have a separate mail relay machine at the edge of your network, use something like LDAP to tell it who your users are. That way, it can reject mail for bogus addresses.
Some people erroneously believe that allowing a gateway mail relay to know who their users are somehow makes them more vulnerable to dictionary attack harvesting. It's not true, though. The spammers will just assume that all the addresses they try are deliverable, and your spam load will go up!
If you have internal-use-only mailing lists, reject mail for those, too. After all, they're for internal use, right?
Messages that should not be delivered are spam and those with malicious attachments.
You can reject mail from known spam sources by using a "realtime black list" or RBL. These are maintained by subscription services or volunteers, at costs from free to several thousand dollars a year. There are many levels of accuracy and aggressiveness available. I believe one of the best is the Spamhaus SBL+XBL list. It is also one of the least expensive.
You will have some false positives with RBLs, especially the more aggressive ones. If mail is rejected at the SMTP level, they're bad, but not evil, because the sender will be informed that his message wasn't delivered.
Spammers have gotten wise to RBLs, and the bulk of spam these days (2009) is delivered by "bot nets." A bot net is a collection of computers infected by malicious software. The computers are owned by others, but used surreptitiously by the spammer. The spam is actually sent by a computer at the end of a DSL line someplace, making it hard for the RBLs to keep up.
Some "email filter appliances" may be able to detect virus payloads while the SMTP connection is open. If you can detect them, reject them! Again, false positives are bad, but not evil, because the recipient will be notified.
There is an open-source program that will do RBL checks, rule-based spam-hunting, and virus scanning, all at the SMTP level. It'll also tag messages that are probably, but not certainly, spam as described below. This program is ASSP, the Anti-Spam SMTP Proxy. It's written in perl and provides a front-end for your real mail server. Configuring ASSP is not for the faint of heart, but you can do it, and it does work. (However, even ASSP will let some spam through; remember, it's a battle of wits between you and the spammer, and sending spam is how the spammer makes his living.)
The final lesson is, once you've accepted a message for delivery, you may not discard it nor generate a synthetic bounce. Whether it's a frog or a prince, once you've kissed it, it's yours!
If a message that has been accepted at the SMTP level is later found to have a virus infection, remove the infected attachment and deliver the message. Sure, it's probably junk, but it just might be that book contract! Let the recipient decide. (You might also consider tagging the message as described below.)
If a message that's been accepted appears to be spam based on some filtering criterion, tag it as probable spam and deliver it.
One option is to deliver mail tagged as spam to a Junk Mail or Quarantine folder that's accessible to the recipient. Recipients should be encouraged to look in the Junk Mail folder from time to time, both to check for misclassified mail (false positives) and to purge the spam. It might not be too misguided to auto-delete mail from the Junk Mail folder after it has been there, say, 30 days.
If your mail recipients use a variety of clients, as is common for ISPs and universities, you probably can't enforce the use of a Junk Mail folder, and some mail programs may not even have such an option. You should still tag messages that appear to be spam and help your users make the best use of those tags. You can publish on the Web instructions for using your particular tags with popular mail programs like Thunderbird and Eudora. One university that's pretty generally on the ball (Nova Southeastern University) prepends "** SPAM Score 7.0 **" (or whatever the number might be) to the subject of a message thought to be spam. Almost any email client can filter on the subject, and even people who choose not to try filtering can see immediately that a message has been tagged as spam. Altering the subject line is likely to be much more useful than adding a custom header that will be effectively invisible to most of your users. Best practice, as exhibited by Nova, is to do both. Most recipients will use the tag in the subject, but detailed information remains available to sophisticated mail users in the form of custom headers.
I promised you a way to allow some users to auto-delete mail flagged as spam without forcing that choice on everyone. Tagging is it. Instead of a rule that files tagged messages in a Junk Mail folder, those who want to can write a rule that just deletes the messages. Auto-deleting is still a Bad Idea, but if it's the recipient's choice and not some mail administrator's choice, it's not evil.
So, you see, it is possible to confront spam without being evil.
The best spam is none at all. You don't have to filter it, check for it, or "just hit delete." Because it's so cheap to do so, spammers will try thousands of addresses just to see whether any of them works. But, what they really want are guaranteed valid, working email addresses. There are three things you can do to reduce the probability that they'll get yours:
There's some tension here, because if you have a Web page, you want people to be able to contact you. However, spammers use Web "spiders" to troll the Web looking for email addresses, and you don't want them to find yours. I've written a short article on a bullet-resistant method of cloaking email addresses that may help. (It's "bullet-resistant" because spammers can defeat it. The hope is that they believe there are easier pickings elsewhere.)
Every time you put your email address in a form, you take the chance that it'll be sold as part of an electronic mailing list. If there is no reason for someone to have your email address, provide a bogus address. (Although the domain bogus.com exists, I am pretty sure they know what to do with email addressed to email@example.com. Alternatively, you could try firstname.lastname@example.org, the address the Federal Trade Commission uses to collect spam.)
If you have to give an email address, use a free account that you've set up for the purpose, not your employment email account nor the email account that you use for personal email. Even better, if you can do so, set up one-time accounts; if I buy something from Acme, the email address they get is email@example.com.
When a friend types your email address into a "forward this article" box, sends a "greeting card," or otherwise gives your address to a third party, it increases the opportunity for your address to end up on a spam list. Remember, your friends mean well, so be gentle. Say, "thanks for the interesting article, but in the future, I'd really rather you send a link yourself instead of giving out my email address." Um, and don't you do that with the addresses of others.
Copyright © by Bob Brown. Some rights reserved.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Last updated: 2014-10-02 7:17
This page is an archival record and is no longer being updated.