pop3 email header classifier?

Robin Becker · Sep 19, 2003

Hi, I'm getting vast numbers of fake upgrade emails containing some kind
of virus. My rather old client can be made to reject these based on some
patterns in the subject line. They're nearly all based on the word
'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.

Is there a python tool that can be made to delete these from my POP3
mail box rather than let my client reject? Quite a few seem to have
semi-valid return addresses so I get postmaster rejects from
(e-mail address removed) etc.

I know about spam-bayes etc, but these things are over 120k each and it
seems pretty pointless to download them (as well as taking about an
hour).

Richie Hindle · Sep 19, 2003

[Robin]

Hi, I'm getting vast numbers of fake upgrade emails containing some kind
of virus. My rather old client can be made to reject these based on some
patterns in the subject line. They're nearly all based on the word
'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.

Is there a python tool that can be made to delete these from my POP3
mail box rather than let my client reject?

I have a webmail application that can be made to delete messages based on
regular expressions, at http://entrian.com/cgi-bin/pop3.py

I wrote it in response to a similar problem, whereby a spammer used my
address as his From address, and I received a couple of thousand bounce
messages a day.

You can set up regular expression filters on To, From and Subject, and set
it to either mark messages for deletion (so you get to review them before
deleting them) or delete them straight away (via the "I'm either brave or
stupid" checkbox, TM

You can save your filters for later use.

Take EXTREME CARE with this, particularly if you check the "I'm either
brave or stupid" box.

There is no way to recover a deleted message.
Don't sue me if it eats your hamster's emails.

You probably need something like (untested):

From: microsoft|ms\b
Subject: patch|latest|microsoft|update|upgrade|pack

There's no SSL version of this, so your POP3 account details will pass in
plain text over the internet (in theory my provider has a scheme whereby
you can access the site over SSL using their certificate, but it doesn't
work for some reason - if there's any interest I'll see whether I can make
it work).

(And no, I'm not going to harvest your POP3 account details. They never
even hit the hard drive.)

Robin Becker · Sep 20, 2003

someone has posted a poplib command line thing on much the same lines in
another thread.

[Robin]

Hi, I'm getting vast numbers of fake upgrade emails containing some kind
of virus. My rather old client can be made to reject these based on some
patterns in the subject line. They're nearly all based on the word
'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.

Is there a python tool that can be made to delete these from my POP3
mail box rather than let my client reject?

Click to expand...

I have a webmail application that can be made to delete messages based on
regular expressions, at http://entrian.com/cgi-bin/pop3.py

I wrote it in response to a similar problem, whereby a spammer used my
address as his From address, and I received a couple of thousand bounce
messages a day.

You can set up regular expression filters on To, From and Subject, and set
it to either mark messages for deletion (so you get to review them before
deleting them) or delete them straight away (via the "I'm either brave or
stupid" checkbox, TM You can save your filters for later use.

Take EXTREME CARE with this, particularly if you check the "I'm either
brave or stupid" box. There is no way to recover a deleted message.
Don't sue me if it eats your hamster's emails.

You probably need something like (untested):

From: microsoft|ms\b
Subject: patch|latest|microsoft|update|upgrade|pack

There's no SSL version of this, so your POP3 account details will pass in
plain text over the internet (in theory my provider has a scheme whereby
you can access the site over SSL using their certificate, but it doesn't
work for some reason - if there's any interest I'll see whether I can make
it work).

(And no, I'm not going to harvest your POP3 account details. They never
even hit the hard drive.)

Tim Roberts · Sep 22, 2003

Robin Becker said:
Hi, I'm getting vast numbers of fake upgrade emails containing some kind
of virus. My rather old client can be made to reject these based on some
patterns in the subject line. They're nearly all based on the word
'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.

Is there a python tool that can be made to delete these from my POP3
mail box rather than let my client reject? Quite a few seem to have
semi-valid return addresses so I get postmaster rejects from
(e-mail address removed) etc.

Is your e-mail client actually set up to send a RESPONSE when you receive a
virus attachment? If so, can you please STOP IT AT ONCE?

ALL viruses released in the last 3 years choose random names for both the
sender AND recipient. It is not possible to automatically extract the
infected individual's e-mail address from a virus message. You can find
the address of their e-mail server, but that's all.

By sending a polite "you sent me a virus" message, you are doing NOTHING to
stop the viruses, you are ANNOYING an innocent person, and you are DOUBLING
the e-mail volume damage caused by the virus script kiddies.

I got close to 10,000 helpful and completely bogus "you sent my a virus"
messages during the "SoBig" fiasco.

Robin Becker · Sep 22, 2003

Tim Roberts said:
Is your e-mail client actually set up to send a RESPONSE when you receive a
virus attachment? If so, can you please STOP IT AT ONCE?

I have no virus detection in the client and am deliberately not
rejecting. That was the whole point of my question I wanted to do
better.

As a point of fact with this SWEN worm, it does seem possible to kill by
a combination of the subject, from address and attachment size. The
spambayes approach would certainly work, but it wouldn't improve my
download times. I estimate I had about 50Mb of these things to download
yesterday (ie 3-4 hours @ 56k). By employing a kill script I could keep
up fairy easily.

I'm certainly not sending any response or rejecting, I'm using DELE
which should be a sink.

Alex Martelli · Sep 22, 2003

Robin said:
Hi, I'm getting vast numbers of fake upgrade emails containing some kind
of virus. My rather old client can be made to reject these based on some
patterns in the subject line. They're nearly all based on the word
'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.

Is there a python tool that can be made to delete these from my POP3
mail box rather than let my client reject? Quite a few seem to have
semi-valid return addresses so I get postmaster rejects from
(e-mail address removed) etc.

I know about spam-bayes etc, but these things are over 120k each and it
seems pretty pointless to download them (as well as taking about an
hour).

I posted an "emergency script" to be used for the purpose -- it
triggers SOLELY on mail size. I have now enhanced it with lots of
options etc, but the basic idea remains that of size-only triggering --
risky but, it IS an emergency. BTW, the "postmaster rejects" are
likely not connected to what you do with the "fake upgrade emails",
alas -- rather, virus senders are now faking "From:" &c addresses,
so everybody's getting lots of bounce msgs for mails they never sent.

Alex

David Mertz · Sep 23, 2003

|Is there a python tool that can be made to delete these from my POP3
|mail box rather than let my client reject?
|I know about spam-bayes etc, but these things are over 120k each and it
|seems pretty pointless to download them (as well as taking about an
|hour).

I do exactly this myself. For my article (about a year ago now) on Spam
filtering, for IBM developerWorks, I developed my own little custom
tool. I've refined it over time, but it remains kinda hackerish and
un(der)documented. Still, I'd be happy to share with anyone
interested... especially if anyone wants to make something nice out of
it for distribution.

The idea of what I do is a hodgepodge. But the general idea is that I
use [poplib] to download ONLY the headers. Those messages that are
convincingly spam based on that get deleted without me ever needing to
download bodies.

As a first line of defense, I have a collection of blacklist and
whitelist patterns (I only use strings and globs, not regexen; though
the latter would be easy to add). These look at specific headers fields
in which patterns might occur (or at the whole header, if I wish).

But the next line of defense is the usual naive Bayesian style. The
wrinkle here is that I do not use "words" in the headers for analysis,
but rather trigrams (sequences of three characters). I believe that for
headers-only, this is more accurate, although I have not rigorously
tested this. Things like routing IPs and spam mail clients are hard to
pick out by whole words, but trigrams do some magic.

The other feature of my 'spamfilter' tool is that it knows nothing at
all about specific mail clients. It just sits daemon-like, and
periodically deletes stuff it doesn't like. I check mail from a lot of
different clients, on a lot of different machines; so for me it would be
inconvenient to have the filtering tied to one particular mail
client/machine. My thing just runs and kills, even when I'm out of
town, and checking for internet cafes.

Yours, David...

--
mertz@ | The specter of free information is haunting the `Net! All the
gnosis | powers of IP- and crypto-tyranny have entered into an unholy
..cx | alliance...ideas have nothing to lose but their chains. Unite
| against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------

python-dev Summary for 2004-08-01 through 2004-08-15	17	Aug 24, 2004
[ANN] JRuby 1.1RC2 Released	1	Feb 16, 2008
Ruby Weekly News 29th Nov - 5th Dec 2004	11	Dec 7, 2004
Download the JAVA , .NET and SQL Server interview PDF	0	Sep 17, 2006
Ruby Weekly News 10th - 16th January 2005	9	Jan 18, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

pop3 email header classifier?

Robin Becker

Richie Hindle

Robin Becker

Tim Roberts

Robin Becker

Alex Martelli

David Mertz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads