python spam filter: random words?

R

revyakin

I know fighting spam is like fighting global worming, but still..
50% of spam I get these days contains a random combination of letters
at the end of the subject line. Has anyone tried using that feature in
antispam filters? Since python is the only lang I am more or less
fluent in as an amature scripter, I was wondering if anyone in this
goup has comments on this idea.
Also, is it reivial make a python script filter executable from a
generic mail program like OE, or NS messenger?
I am also wondering why spammers add that stuff to their subject lines
anyway.
 
B

Ben Finney

I know fighting spam is like fighting global worming, but still..
^^^^^^^^^^^^^^
Given that some spam contains e-mail worms, the typo is appropriate :)
50% of spam I get these days contains a random combination of letters
at the end of the subject line. Has anyone tried using that feature in
antispam filters?

My experience has been that this practice is dropping off, since
Bayesian statistical-analysis filters will glide right by random words
as "not statistically significant.

What I'm seeing now is spam with words taken straight from the "likely
good" word lists of Bayesian filters :)
I am also wondering why spammers add that stuff to their subject lines
anyway.

To defeat spam filters that check for the occurrence of a known spam
message they've seen before. As noted above, though, these are being
superseded by Bayesian word metric analysis.
 
S

Sean 'Shaleh' Perry

What I'm seeing now is spam with words taken straight from the "likely
good" word lists of Bayesian filters :)

this was recently discussed on the spambayes list (the nifty Python
implementation of Paul Graham's ideas).

Apparently there are not enough uses of the word to make it statistically
interesting so spambayes ignores it. Or something like that. See the thread
there for full details.
 
T

Terry Reedy

Marc Wilson said:
In comp.lang.python, (e-mail address removed) (revyakin) (revyakin) wrote in
<[email protected]>::

|I know fighting spam is like fighting global worming, but still..
|50% of spam I get these days contains a random combination of letters
|at the end of the subject line. Has anyone tried using that feature in
|antispam filters?

How do you detect "random" letters? You can only (programmatically)
determine that a character sequence is "random" if it doesn't appear in some
sort of dictionary, and even there you have the risk of false positives due
to typos, acronyms etc.

Looking at successive letter pairs would go a long way. Out of the
(26+space)**2 conbinations, perhaps half occur in real words (ie, 'qx'
is a giveaway). Using triples would allow inclusion of common
three-letter acronyms as legal.

Terry J. Reedy
 
M

Marc Wilson

in <[email protected]>::

|> How do you detect "random" letters? You can only (programmatically)
|> determine that a character sequence is "random" if it doesn't appear
|in some
|> sort of dictionary, and even there you have the risk of false
|positives due
|> to typos, acronyms etc.
|
|Looking at successive letter pairs would go a long way. Out of the
|(26+space)**2 conbinations, perhaps half occur in real words (ie, 'qx'
|is a giveaway). Using triples would allow inclusion of common
|three-letter acronyms as legal.

For sale today on QXL.com....
--
Marc Wilson

Cleopatra Consultants Limited - IT Consultants
2 The Grange, Cricklade Street, Old Town, Swindon SN1 3HG
Tel: (44/0) 70-500-15051 Fax: (44/0) 870 164-0054
Mail: (e-mail address removed) Web: http://www.cleopatra.co.uk
_________________________________________________________________
Try MailTraq at https://my.mailtraq.com/register.asp?code=cleopatra
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top