David Madore's WebLog: Fighting spam (again and again)

As I am decidedly quite sick today, I thought I'd do some spamfighting.

Part one: this blog's comments system. When I launched it initially I thought—naïvely, in retrospect—that since I used custom Perl scripts rather than a common package like, say, DotClear, the spambots wouldn't be able to make sense of my forms and I wouldn't receive any spam. Wrong. I get quite a lot of it, really. Since everything is moderated (something I might change again eventually), you never actually see it, but it's a pain (for me) nonetheless. And it's also pretty incomprehensible: why has this entry's comments been loaded with spam, for example, whereas others get none and Google doesn't seem to find any suspicious reverse links? Anyway…

The usual solution is to display an image of distorted text or digits, and ask the reader to type the characters (thus providing work ad nauseam for the Indian coprocessors). Rather than do that, I thought I'd try some simple instructions in English: namely, to copy certain digits and letters in reverse order (so the instructions might say: please enter below the following signs in reverse order: 5d8335, and the user is expected to type 5338d5). This wouldn't work if everyone started doing the same, because the spambots would soon learn the trick, but I'm trying to take advantage of the fact that I'm the only one using my custom comments system. Now I also don't want to annoy readers by making them type random digits every time they wish to post a comment, so there is a workaround: if your browser has JavaScript enabled, it will “type” the digits for you and you won't even see the input field. If things work correctly, that is (which, Internet Explorer being what it is, isn't very likely, I'm afraid). Again, this won't work if the spambots use a JavaScript interpreter, so I'm taking a gamble here: I don't think they'll run JavaScript code because it'd be rather hard for them to do so and it's probably not worth the effort, and, besides, if they do run JavaScript code in the pages they harvest, they'd be vulnerable to all sorts of attack (such as using their processing power to compute all sorts of useful things just by making them follow links containg the result of the computations we want).

So, ideally, you should notice no changes in the comments system if you have JavaScript enabled, and, if you don't, you'll just have to type six hex digits to post a comment. I think that's the best I could do to minimize annoyance (the other idea I had in mind was to lay all sorts of blacklisting traps in the page, hidden by CSS, which the spambots would have triggered, getting them banned from the page… but that's hard to tune correctly). Don't hesitate to post a comment to check that it works. And if it doesn't, complain to me by mail (at davidwwwmadoreorg as usual).

…Which gets me to part two: email. I receive mountainloads of spam. So I have a bayesian spamfilter. Which, since nearly all the spam I get is in English and very little ham is in English whereas in French it's the converse, has learned to classify not ham and spam but French and English (a much easier problem, really). So any email sent to me in English has a very high probability of being treated as spam (one way to avoid this is to write bugahugathuga somewhere in the subject line, but that's not a satisfactory solution).

Part of a satisfactory solution would be moving away from that overspammed email address on ens.fr. Since I bought madore.org, I can, of course, receive email there. Well, I won't make the mistake of opening simply david at that domain: that would spamrotten again in no time. Instead, I'm explicitly forbidding that address and using various sub-addresses in the form david+something at the domain: the idea being that if one of them starts receiving spam I can easily close it (and figure out how it leaked to the spammers); I can also restrict some to specific senders, or use different filtering techniques according to the destination address.

Unfortunately, these days, we can't simply close or forbid an email address by sending back email bounces: if we do that, we get counted as spammers ourselves (because spam is always sent from forged sender addresses). So all the processing has to be done as the server receives the mail from the sender: this may not seem like a problem, but it is, because the time allotted in that window is small, and because that's not the way email was meant to be treated; my mail transfer agent has various ways of doing this, including one that was invented by another (infamous) mail agent, but merely getting david to reject email at connection time (rather than bouncing) when it is not followed by +something has proved pretty difficult in itself.

Pesky things, computers.