As I am decidedly quite sick today, I thought I'd do some spamfighting.
Part one: this blog's comments system. When I launched it initially I thought—naïvely, in retrospect—that since I used custom Perl scripts rather than a common package like, say, DotClear, the spambots wouldn't be able to make sense of my forms and I wouldn't receive any spam. Wrong. I get quite a lot of it, really. Since everything is moderated (something I might change again eventually), you never actually see it, but it's a pain (for me) nonetheless. And it's also pretty incomprehensible: why has this entry's comments been loaded with spam, for example, whereas others get none and Google doesn't seem to find any suspicious reverse links? Anyway…
The usual solution is to display an image of distorted text or
digits, and ask the reader to type the characters (thus providing work
ad nauseam for the Indian coprocessors).
Rather than do that, I thought I'd try some simple instructions in
English: namely, to copy certain digits and letters in reverse order
(so the instructions might say: please enter below the following
signs in reverse order:
, and the user
is expected to type 5d8335
5338d5
). This wouldn't work if
everyone started doing the same, because the spambots would soon learn
the trick, but I'm trying to take advantage of the fact that I'm the
only one using my custom comments system. Now I also don't want to
annoy readers by making them type random digits every time they wish
to post a comment, so there is a workaround: if your browser has
JavaScript enabled, it will “type” the digits for you and
you won't even see the input field. If things work
correctly, that is (which, Internet Explorer being what it is, isn't
very likely, I'm afraid). Again, this won't work if the spambots use
a JavaScript interpreter, so I'm taking a gamble here: I don't think
they'll run JavaScript code because it'd be rather hard for them to do
so and it's probably not worth the effort, and, besides, if they do
run JavaScript code in the pages they harvest, they'd be vulnerable to
all sorts of attack (such as using their processing power to compute
all sorts of useful things just by making them follow links containg
the result of the computations we want).
So, ideally, you should notice no changes in the comments system if
you have JavaScript enabled, and, if you don't, you'll just have to
type six hex digits to post a comment. I think that's the best I
could do to minimize annoyance (the other idea I had in mind was to
lay all sorts of blacklisting traps in the page, hidden by
CSS, which the spambots would have triggered, getting
them banned from the page… but that's hard to tune correctly).
Don't hesitate to post a comment to check that it works. And if it
doesn't, complain to me by mail (at davidwwwmadoreorg
as usual).
…Which gets me to part two: email. I receive mountainloads
of spam. So I have a bayesian spamfilter. Which, since nearly all
the spam I get is in English and very little ham is in English whereas
in French it's the converse, has learned to classify not ham and spam
but French and English (a much easier problem, really). So any email
sent to me in English has a very high probability of being treated as
spam (one way to avoid this is to write bugahugathuga
somewhere in the subject line, but that's not a satisfactory
solution).
Part of a satisfactory solution would be moving away from that
overspammed email address on ens.fr
. Since I bought
madore.org
, I can, of course, receive email there. Well,
I won't make the mistake of opening simply david
at that
domain: that would spamrotten again in no time. Instead, I'm
explicitly forbidding that address and using various sub-addresses in
the form david+something
at the domain: the
idea being that if one of them starts receiving spam I can easily
close it (and figure out how it leaked to the spammers); I can also
restrict some to specific senders, or use different filtering
techniques according to the destination address.
Unfortunately, these days, we can't simply close or forbid an email
address by sending back email bounces: if we do that, we get counted
as spammers ourselves (because spam is always sent from forged sender
addresses). So all the processing has to be done as the server
receives the mail from the sender: this may not seem like a problem,
but it is, because the time allotted in that window is small, and
because that's not the way email was meant to be treated; my mail
transfer agent has various ways
of doing this, including one that was
invented by another (infamous) mail agent, but merely getting
david
to reject email at connection time (rather
than bouncing) when it is not followed by
+something
has proved pretty difficult in
itself.
Pesky things, computers.