Counterstrike on Spam

Paul Graham highlighted an interesting concept in fighting off spammers. The basic idea is to make anti-spam tools do a counter strike in sites promoted by spammers. The basic idea is that a blacklist would be created to include repeat offenders. When a spam is seen, the server would check the blacklist to see if the site is on there. If it is, the tool would crawl the site, generating useless traffic for the spammer’s source, hence increasing the cost of sending out spam.

On its face, the argument seems to work. Some more thoughts on it:

High-volume auto-retrieval would only be practical for users on high-bandwidth connections, but there are enough of those to cause spammers serious trouble.

This part could be handled by having the mail servers themselves take care of this. In most cases, mail servers are sitting on broadband lines. The reason for this is that they need to always be on to receive mail. If such counterstrike is to work, it has to come from those mail servers.

A refinement to the system would be to also include a whitelist. The reason for a whitelist is that it would allow publishers to register with the whitelist in order to avoid a counterstrike. One of the difficult issues in dealing with spam is identifying false positive. A false positive occurs when a piece of mail is marked as spam even though it isn’t a piece of spam. Most false positives arise out of email lists from publishers because some of the criteria used to identify spam (lots of URLs, sent to many people, sender is not same as reply-to, etc…) are also met by publishers. By creating a whitelist, one could remove some of those false positives. Over time, an increasing amount of legitimate sources would be identified.

There is, however, a need for some level of accountability. Any list (either a blacklist or whitelist) should be published for everyone to look at and some policy should be set to establish steps as to how one can go about being removed or added from/to one of those lists.

Another item that needs to be addressed in this is the user-agent string generated by such a filter that fight back. Such a tool should use a popular user-agent like the one for IE so that it becomes indistinguishable from other traffic, making it harder to block it out. The tool should also generate IP addresses that are equivalent to the addresses of sub-domains below the mail server (otherwise, a spam site might just block the mail server from doing traffic blasts).

In general, I like the concept and hope that someone out there is working on implementing it. It falls in the great tradition of the net routing around problems. Spam is now getting to the level where it undermines the net as a whole, as spam messages are grabbing increasing amounts of bandwidth. Regulation alone cannot work as a lot of spam emerges from countries beyond US jurisdiction and it would take a very long time to implement any kind of policy that works globally.

The answer to the spam problem must thus be an engineered solution and the counterstrike approach seems sound. One could envision this being implemented as part of mail servers in the future, a step that would ensure some higher level of support.

However, the counterstrike approach will only work for so long as spammers will find new ways to subvert the system. A question remains as to what will be the next step. The counterstrike model will work in terms of hedging out some of the smaller players but larger sites might still continue to strike. A way to handle this part might be to completely black them out of the net. In order to do so, one would take the blacklist of sites and add it to a web proxy blocked site list. The effect here would be to black-out sites over time, based on their being found guilty of spam. This may be the next level of escalation in the spam wars and might get us to the point where, unfortunately, we might all end up in gated online communities, blocking out some of the people who are not willing to play nice. The net may lose some of its own freedom in the process but that, unfortunately, may be the only way to completely eradicate spam in the future.

Previous Post
Enhanced Webalizer.conf file now online
Next Post
September 11 Memorial: WHO were you?
%d bloggers like this: