Roedy Green said:
I had a bit of a fright the other day. I thought for a while I was
under a email denial of service attack. I wondered if I would ever be
able to post even a munged public email address ever again.
There was one scheme I remember being suggested many years back,
which I wish had been carried through. It did contain an interesting
programming problem which would be on-topic here.
Basically it was a distributed mouse-trap scheme, which looked a
little like Usenet news. If you don't know what a mouse trap is,
then it works like this: sys admins create bogus 'cheese' email
addresses, then leave them in places where they know spammers would
look, but humans are unlikely to mistake them. For example clearly
flagged in the sig of a newsgroup posting, or as "email:" links on
an obscure web page which only web bots would find interesting. The
only email which will arrive at these addresses is spam - because
only automated software would be dumb enough to collect and use them.
The mailboxes of these addresses are tied to a software which analyses
each spam mail as it arrives, creates a signature, then scans legit-
imate user mailboxes (and incoming mail) for mail which matches that
sig. The beauty of the scheme is the more successful the spammers are
at harvesting addresses, the more cheese they will get, and the more
networks will be 'alerted' to their spam.
To round the idea off, the original suggestion proposed a kind of
Usenet like scheme, were sig's which arrive at one mouse trap are
automatically forwarded on to other ISPs and networks - in the same
way that newsgroup messages posted at one news service are then dist-
ributed around the net. This creates a global network of mouse traps
- once a spam arrives at one cheese address, a process of identification
and notification begins.
The programming problem is this: how to create a sig which is short
enough to be practical in such a system, while flexible enough to adapt
to random changes introduced into each message by spammers. For
example, MD5 and SHA-1 are useless - they will only work if the spam
bodies are identical. There needs to be a way of creating an 'imprint'
of a message which works even when the data isn't absolutely identical.
(Kind of like a fuzzy logic hashing algorithm!!
I'm not sure if that problem is even solvable - although one idea I
had was to focus on the constant aspects of a spam mail. No matter
what garbage they fill the body of the message with, ultimately all
spam (well, almost all!) has to have some kind of contact address -
so you can buy the crap they are selling. Some things like snail
mail addresses and phone numbers are quite inflexible - they can't
be easily randomised. Stuff like email addresses and web addresses
are more flexible - but only to a point. One can easily randomise
the filename part of a URL, but randomising the domain has less
possibilities - (because the spammer has to 'own' all the variations
used, and therefore in any practical sense they are unlikely to be
able to employ more than a few dozen different variations.)
There is an obvious problem with this... suppose our spammer uses
(e-mail address removed) as their contact address - our software
locates this and realise that the account name part of the address
can be easily manipulated, but the domain cannot... so it then
sends out a message warning all other anti-spam server to be on
the look out for mail containing "yahoo.com"...! Whoooops!
Ah well, it would have been a nice idea if it had worked!
-FISH- ><>