[warning: 99% OT] does anything like this exist ?...

F

Fred Pacquier

First off, sorry for this message-in-a-bottle-like post... I haven't been
able to phrase my questions well enough to get a meaningful answer from
Google in my research. OTOH, it is standard flattery (but true) that this
group has a bunch of the nicest and most knowledgeable Usenet people
around, and I know for a fact that there are some pretty good spam-
related tools written in Python, so I thought I might get away with it
:)

Yes, it's about spam.

I have a (very old, POP3) email address that's flooded with spam (about
500 msgs per 24 hrs, no thanks to the ISP...) but that I still need to
use for various reasons. Of course it's unmanageable without some sort of
bayesian spam filter.

When I work from home or office or my laptop, tools like Thunderbird or
Pegasus+K9 do the job adequately. But I also frequently need to access it
"on the road" with whatever comes in handy (Webmail etc.), and that
becomes a problem if I've been away for even a short period.

One possible solution I've been mulling over, and looking for, would be
some sort of selective, on-line filter/downloader. I have an always-on,
Linux box at home on a decent DSL line ; on this I could have a daemon to
frequently poll that POP3 address, pulling only the headers (or maybe the
first body KB or so) and running those through a bayesian filter. If a
message comes out "clean" it is left on the server (so I can access it
from wherever I am), if it doesn't it is downloaded to a local mbox and
removed from the server (like getmail.py does) so I can still dig out the
occasional false positive when I'm home...

Is there a ready-made tool that fills this need ? If it's in Python so
much the better, but actually I'll run anything else within my means
(even perl :) if it works without having to tinker the code... Also,
it's possible that my single-minded approach is misled and there's a
better way to achieve that goal, so I'm ready to change my mind too :)

TIA - again, sorry for the noise,
fp
 
C

Cliff Wells

I have a (very old, POP3) email address that's flooded with spam (about
500 msgs per 24 hrs, no thanks to the ISP...) but that I still need to
use for various reasons. Of course it's unmanageable without some sort of
bayesian spam filter.
One possible solution I've been mulling over, and looking for, would be
some sort of selective, on-line filter/downloader. I have an always-on,
Linux box at home on a decent DSL line ; on this I could have a daemon to
frequently poll that POP3 address, pulling only the headers (or maybe the
first body KB or so) and running those through a bayesian filter. If a
message comes out "clean" it is left on the server (so I can access it
from wherever I am), if it doesn't it is downloaded to a local mbox and
removed from the server (like getmail.py does) so I can still dig out the
occasional false positive when I'm home...

Rather that doing that, you might consider simply setting up a local
IMAP server on your Linux box. Have a program (such as fetchmail) pull
down your POP3 email, filter it using procmail and feed it into the IMAP
server so that you can then access it from anywhere.
 
F

Fred Pacquier

Cliff Wells said:
Rather that doing that, you might consider simply setting up a local
IMAP server on your Linux box. Have a program (such as fetchmail)
pull down your POP3 email, filter it using procmail and feed it into
the IMAP server so that you can then access it from anywhere.

Thanks, Cliff. I realize this would be the "standard" way of going about
such things (didn't want to bloat the original post :). However I tend to
think of it as a fallback solution if nothing else comes up, if only for
the following reasons :

* a lot of upfront work : several major packages (fetchmail, procmail,
imapd) I have no hands-on experience with, a lot of reading/learning and a
lot of little things to get just right in each... In my mind the solution
I'm after could be a much simpler affair, but then maybe I'm daydreaming :)

* one more potential security hole on my home machine : I tend to limit
open ports on that one to a bare minimum, my admin skills & available time
being quite limited...

* availability : my ISP doesn't filter spam but otherwise does a good job
of keeping its server up. Depending on my own setup means I could get shut
out when away from home because of a power failure, DSL downtime, a full
disk, or any of a myriad other domestic problems...

I've been paged about so many similar setups in the workplace that I'll
avoid them at home if I can... But then, sometimes you _do_ have to turn to
plan B :)
 
R

Richie Hindle

[Fred]
I could have a daemon to
frequently poll that POP3 address, pulling only the headers (or maybe the
first body KB or so) and running those through a bayesian filter. If a
message comes out "clean" it is left on the server (so I can access it
from wherever I am), if it doesn't it is downloaded to a local mbox and
removed from the server (like getmail.py does) so I can still dig out the
occasional false positive when I'm home...

Andrew Dalke contributed a program to the SpamBayes wiki that does
exactly this:

http://www.entrian.com/sbwiki/SpamBayesCuller
http://www.entrian.com/sbwiki/RecentChangesOfSpamBayesCuller

"This program, sb_culler, uses SpamBayes to run a POP3 email culler. It
connects to my email servers every few minutes, downloads the emails,
classifies each one, and deletes the spam and viruses. (It makes a
local copy of the spam, just in case.)"

You'll need to download and install the SpamBayes source code as well,
from:


http://sourceforge.net/project/showfiles.php?group_id=61702&package_id=58141

Take either the .tgz or the .zip, not the .exe (which installs a binary
application rather than the source).
 
M

Maciej Dziardziel

Fred Pacquier wrote:

Is there a ready-made tool that fills this need ?

I don't know any, but writing such is not a big problem - python has builtin
support for pop3 and mail processing (examples are in its documentation),
so just download mail, pipe it through spamassassin and eventually delete
from server.
 
A

Andrew Dalke

Richie said:
Andrew Dalke contributed a program to the SpamBayes wiki that does
exactly this:

http://www.entrian.com/sbwiki/SpamBayesCuller
http://www.entrian.com/sbwiki/RecentChangesOfSpamBayesCuller
You'll need to download and install the SpamBayes source code as well,

I got email about 6 months ago asking if it is okay to include my
sb_culler program with the SpamBayes source code. I haven't checked
to see if that actually happened.

As written it's designed for a programmer to use. You had to
edit code to change options.

I've added a few new features since then, like reloading the
good_emails file if there are changes. Want to add another to
whitelist based on domain, and to remove emails with a given
subject, like "Sprava nebola dorucena". Perhaps I'll update
the wiki afterwards.

Andrew
(e-mail address removed)
 
F

Fred Pacquier

Richie Hindle said:
Andrew Dalke contributed a program to the SpamBayes wiki that does
exactly this:

"This program, sb_culler, uses SpamBayes to run a POP3 email culler.
It connects to my email servers every few minutes, downloads the emails,
classifies each one, and deletes the spam and viruses. (It makes a
local copy of the spam, just in case.)"

Wo-ow ! Talk about being spoiled... :)
Of course I suspected I was not alone in wanting to do this, and that
someone had already done it. But to see it described so exactly (only
better and in fewer words), and done in python too, and using Spambayes, no
less... well, I hadn't dared expect that much. Sort of makes me regret not
asking up front for that flashy pink finish and the leather upholstery, too
:)

Actually I should feel ashamed : such a happy ending will undoubtedly
unleash a tsunami of rabid OT posters on this last peaceful corner of a
beleaguered Usenet, which will subsequently collapse in the next hundred
days, taking down with it the Internet and civilization as we know it. I'm
really sorry folks...

Meanwhile, thanks a lot for the pointer Richie !
 
F

Fred Pacquier

Andrew Dalke said:
I got email about 6 months ago asking if it is okay to include my
sb_culler program with the SpamBayes source code. I haven't checked
to see if that actually happened.

I just did : it's been added to CVS on June 11, same code as on the Wiki,
but apparently not updated since, and not included in the contribs for
the 1.0 release.

I already have a couple of naive questions :

* is the documented change on line 348 enough to run the script with a
current version of Spambayes ?

* does use of sb_culler contribute to the training of the Spambayes db,
or does it assume that it is kept current independently (by means of
normal use by a mail client through the POP3 proxy for instance) ?
As written it's designed for a programmer to use. You had to
edit code to change options.

That's OK with me, as long as the code is in python :)
I've added a few new features since then, like reloading the
good_emails file if there are changes. Want to add another to
whitelist based on domain, and to remove emails with a given
subject, like "Sprava nebola dorucena". Perhaps I'll update
the wiki afterwards.

Well, if any of these new additions are available for outside use, I'd
sure appreciate a copy of the updated script...

TIA,
fp
 
F

Fred Pacquier

Maciej Dziardziel said:
I don't know any, but writing such is not a big problem - python has
builtin support for pop3 and mail processing (examples are in its
documentation), so just download mail, pipe it through spamassassin
and eventually delete from server.

I sort of knew this one would be forthcoming :)

I won't say I haven't been tempted - like everyone else here I've written
my own 'experimental' POP3 checker using the python standard lib, and
several more from the 'yet another' category... but I know my limits.

Also, over the years I've had my share of "unique/original" (heh) ideas,
spending ages to implement them, only to find out afterwards that someone
had already done it, and generally much better... just like Andrew Dalke
with sb_culler in this case.

I'm not looking for a pet project here, just to solve a problem. If I can
do it in a few hours of tinkering with a ready-made answer I certainly
won't try to roll my own. If I were a student, or in that mythical state
between retirement and brain-rot, maybe, but for now it's definitely Plan C
(or D :)
 
A

Andrew MacIntyre

Rather that doing that, you might consider simply setting up a local
IMAP server on your Linux box. Have a program (such as fetchmail) pull
down your POP3 email, filter it using procmail and feed it into the IMAP
server so that you can then access it from anywhere.

I've been using Charles Cazabon's getmail (pure Python) for quite some
time in place of fetchmail.
 
A

Andrew Dalke

Fred said:
* is the documented change on line 348 enough to run the script with a
current version of Spambayes ?

I upgraded to Spambayes 1.0. I think I needed to change something, but
I've forgotten. I've attached my current version (sans passwords and
identifing mailing list information) to this post.

* does use of sb_culler contribute to the training of the Spambayes db,
or does it assume that it is kept current independently (by means of
normal use by a mail client through the POP3 proxy for instance) ?

It does not contribute. It can't. That is, it computes a spam value
for a message but since it doesn't know if that classification is
correct it can't train itself.

Every few months I export all my saved mail (inbox, sent, various
archive folders) as "ham". I also save all the spam that gets through
my filter so I can export it as "spam." I then train SpamBayes on
those two data sets and use the result for my filter program.
Well, if any of these new additions are available for outside use, I'd
sure appreciate a copy of the updated script...

I didn't make those additions to the attached file. Feel free to do
so, and to update the Wiki. Shouldn't be more than a dozen or two
lines of code. Eg, to have a file of blacklisted subjects you could
use WhiteListFrom (which reads lines from a file, and reloads if the
file's timestamp changed) and tweak the test condition.

Enjoy, and thank you for your kind comments regarding my code. That
program has really helped with my email sanity. Thanks also to the
SpamBayes people and to Kevin Altis for his virus killer code I used.

Andrew
(e-mail address removed)


"""sb_culler.py -- remove spam from POP3 servers, leave ham.

I get about 150 spams a day and 12 viruses as background noise. I use
Apple's Mail.app on my laptop, which filters out most of them. But
when I travel my mailbox starts to accumulate crap, which is annoying
over dial-up. Even at home, during peak periods of a recent virus
shedding I got about 30 viruses an hour, and my 10MB mailbox filled up
while I slept!

I have a server machine at home, which can stay up full time. This
program, sb_culler, uses SpamBayes to run a POP3 email culler. It
connects to my email servers every few minutes, downloads the emails,
classifies each one, and deletes the spam and viruses. (It makes a
local copy of the spam, just in case.)

This program is designed for me, a programmer. The structure should
be helpful enough for other programmers, but even configuration must
be done by editing the code.

The virus identification and POP3 manipulation code is based on Kevin
Altis' virus killer code, which I've been gratefully using for the
last several months.

Written by Andrew Dalke, November 2003.
Released into the public domain on 2003/11/22.
Updated 2004/10/26
== NO copyright protection asserted for this code. Share and enjoy! ==

This program requires Python 2.3 or newer.
"""

import sets, traceback, md5, os
import poplib
import posixpath
from email import Header, Utils
from spambayes import mboxutils, hammie

import socket
socket.setdefaulttimeout(10)

DO_ACTIONS = 1
VERBOSE_LEVEL = 1

APPEND_TO_FILE = "append_to_file"
DELETE = "delete"
KEEP_IN_MAILBOX = "keep in mailbox"
SPAM = "spam"
VIRUS = "virus"

class Logger:
def __init__(self):
self.tests = {}
self.actions = {}

def __nonzero__(self):
return bool(self.tests) and bool(self.actions)

def pass_test(self, name):
self.tests[name] = self.tests.get(name, 0) + 1

def do_action(self, name):
self.actions[name] = self.actions.get(name, 0) + 1

def accept(self, text):
print text

def info(self, text):
print text

class MessageInfo:
"""reference to an email message in a mailbox"""
def __init__(self, mailbox, i, msg, text):
self.mailbox = mailbox
self.i = i
self.msg = msg
self.text = text

class Filter:
"""if message passes test then do the given action"""
def __init__(self, test, action):
self.test = test
self.action = action

def process(self, mi, log):
result = self.test(mi, log)
if result:
self.action(mi, log)
return self.action.descr + " because " + result
return False


class AppendFile:
"""Action: append message text to the given filename"""
def __init__(self, filename):
self.filename = filename
self.descr = "save to %r then delete" % self.filename
def __call__(self, mi, log):
log.do_action(APPEND_TO_FILE)
if not DO_ACTIONS:
return
f = open(self.filename, "a")
try:
f.write(mi.text)
finally:
f.close()
mi.mailbox.dele(mi.i)

def DELETE(mi, log):
"""Action: delete message from mailbox"""
log.do_action(DELETE)
if not DO_ACTIONS:
return
mi.mailbox.dele(mi.i)
DELETE.descr = "delete"

def KEEP(mi, log):
"""Action: keep message in mailbox"""
log.do_action(KEEP_IN_MAILBOX)
KEEP.descr = "keep in mailbox"


class Duplicate:
def __init__(self):
self.unique = {}
def __call__(self, mi, log):
digest = md5.md5(mi.text).digest()
if digest in self.unique:
log.pass_test(SPAM)
return "duplicate"
self.unique[digest] = 1
return False

class IllegalDeliveredTo:
def __init__(self, names):
self.names = names
def __call__(self, mi, log):
fields = mi.msg.get_all("Delivered-To")
if fields is None:
return False

for field in fields:
field = field.lower()
for name in self.names:
if name in field:
return False
log.pass_test(SPAM)
return "sent to random email"

class SpamAssassin:
def __init__(self, level = 8):
self.level = level
def __call__(self, mi, log):
if ("*" * self.level) in mi.msg.get("X-Spam-Status", ""):
log.pass_test(SPAM)
return "assassinated!"
return False

class WhiteListFrom:
"""Test: Read a list of email addresses to use a 'from' whitelist"""
def __init__(self, filename):
self.filename = filename
self._mtime = 0
self._load_if_needed()

def _load(self):
lines = [line.strip().lower() for line in
open(self.filename).readlines()]
self.addresses = sets.Set(lines)

def _load_if_needed(self):
mtime = os.path.getmtime(self.filename)
if mtime != self._mtime:
print "Reloading", self.filename
self._mtime = mtime
self._load()

def __call__(self, mi, log):
self._load_if_needed()
frm = mi.msg["from"]
realname, frm = Utils.parseaddr(frm)
status = (frm is not None) and (frm.lower() in self.addresses)
if status:
log.pass_test(SPAM)
return "it is in 'from' white list"
return False

class WhiteListSubstrings:
"""Test: Whitelist message if named field contains one of the substrings"""
def __init__(self, field, substrings):
self.field = field
self.substrings = substrings

def __call__(self, mi, log):
data = mi.msg[self.field]
if data is None:
return False
for s in self.substrings:
if s in data:
log.pass_test("'%s' white list" % (self.field,))
return "it matches '%s' white list" % (self.field,)
return False

class IsSpam:
"""Test: use SpamBayes to tell if something is spam"""
def __init__(self, sb_hammie, spam_cutoff = None):
self.sb_hammie = sb_hammie
if spam_cutoff is None:
spam_cutoff = options["Categorization", "spam_cutoff"]
self.spam_cutoff = spam_cutoff

def __call__(self, mi, log):
prob = self.sb_hammie.score(mi.msg)
if prob > self.spam_cutoff:
log.pass_test(SPAM)
return "it is spam (%4.3f)" % prob
if VERBOSE_LEVEL > 1:
print "not spam (%4.3f)" % prob
return False

# Simple check for executable attachments
def IsVirus(mi, log):
"""Test: a virus is any message with an attached executable

I've also noticed the viruses come in as wav and midi attachements
so I trigger on those as well.

This is a very paranoid detector, since someone might send me a
binary for valid reasons. I white-list everyone who's sent me
email before so it doesn't affect me.
"""
for part in mi.msg.walk():
if part.get_main_type() == 'multipart':
continue

filename = part.get_filename()
if filename is None:
if part.get_type() in ["application/x-msdownload",
"audio/x-wav", "audio/x-midi"]:
# Only viruses send messages to me with these types
log.pass_test(VIRUS)
return ("it has a virus-like content-type (%s)" %
part.get_type())
else:
extensions = "bat com exe pif ref scr vbs wsh".split()
base, ext = posixpath.splitext(filename)
if ext[1:].lower() in extensions:
log.pass_test(VIRUS)
return "it has a virus-like attachment (%s)" % ext[1:]
return False


def open_mailbox(server, username, password, debuglevel = 0):
mailbox = poplib.POP3(server)
try:
mailbox.user(username)
mailbox.pass_(password)
mailbox.set_debuglevel(debuglevel)
if VERBOSE_LEVEL > 1:
count, size = mailbox.stat()
print "Message count: ", count
print "Total bytes : ", size

except:
mailbox.quit()
raise
return mailbox


def _log_subject(mi, log):
encoded_subject = mi.msg.get('subject')
try:
subject, encoding = Header.decode_header(encoded_subject)[0]
except Header.HeaderParseError:
log.info("%s Subject cannot be parsed" % (mi.i,))
return
if encoding is None or encoding == 'iso-8859-1':
s = subject
else:
s = encoded_subject
log.info("%s Subject: %r" % (mi.i, s))


class Filters(list):
def add(self, test, action):
"""short-cut to make a Filter given the test and action"""
self.append(Filter(test, action))

def process_mailbox(self, mailbox):
count, size = mailbox.stat()
log = Logger()

for i in range(1, count+1):
if (i-1) % 10 == 0:
print " == %d/%d ==" % (i, count)
# Kevin's code used -1, but -1 doesn't work for one of
# my POP accounts, while a million does.
# Don't use retr because that may mark the message as
# read (so says Kevin's code)
message_tuple = mailbox.top(i, 1000000)
text = "\n".join(message_tuple[1])
msg = mboxutils.get_message(text)

mi = MessageInfo(mailbox, i, msg, text)

_log_subject(mi, log)

for filter in self:
result = filter.process(mi, log)
if result:
log.accept(result)
break
else:
# don't know what to do with this so just
# keep it on the server
print "From", mi.msg["from"], Utils.parseaddr(mi.msg["from"])
log.pass_test("unknown")
log.do_action(KEEP_IN_MAILBOX)
log.accept("unknown")

return log

def filter_server( (server, user, pwd), filters):
if VERBOSE_LEVEL:
print "=" * 78
print "Processing %s on %s" % (user, server)

mailbox = open_mailbox(server, user, pwd)
try:
log = filters.process_mailbox(mailbox)
finally:
mailbox.quit()
return log


##### User-specific

import time, sys, urllib

# A simple text interface.

def _unix_stop():
pass

def _ms_stop():
# ^C doesn't seem to work correctly in the DOS box
# so assume any keypress is a break
if msvcrt.kbhit():
raise SystemExit()

try:
import msvcrt
_check_for_stop = _ms_stop
except ImportError:
_check_for_stop = _unix_stop

def restart_network():
# This is called after too many connection failures.
# That usually means my ISP dropped my DHCP and I need to
# bounce my Linksys firewall/DHCP/hub.
return
print "Network appears to be down. Bringing Linksys down then up..."
try:
# Note this this example uses the default password. YMMV.
urllib.urlopen("http://:[email protected]/Gozila.cgi?pppoeAct=2").read()
urllib.urlopen("http://:[email protected]/Gozila.cgi?pppoeAct=1").read()
pass
except KeyboardInterrupt:
raise
except:
traceback.print_exc()

def wait(t, delta = 10):
"""Wait for 't' seconds"""
assert delta > 0, delta
assert t >= 1
first = True
for i in range(t, -1, -delta):
if VERBOSE_LEVEL:
if not first:
print "..",
print i,
sys.stdout.flush()

time.sleep(min(i, delta))

_check_for_stop()

first = False

print


def main():
filters = Filters()

duplicate = Duplicate()
filters.add(duplicate, AppendFile("spam2.mbox"))

# A list of everyone who has emailed me this year.
# Keep their messages on the server.
filters.add(WhiteListFrom("good_emails.txt"), KEEP)

# My mailing lists.
filters.add(WhiteListSubstrings("subject", [
'ABCD:',
'[Python-announce]',
'[Python]',
'[Bioinfo]',
'[EuroPython]',
]),
KEEP)

filters.add(WhiteListSubstrings("to", [
"(e-mail address removed)",
"(e-mail address removed)",
]),
KEEP)

names = ["john", "", "jon", "johnathan"]
valid_emails = ([name + "@lectroid.com" for name in names] +
[name + "@bigboote.org" for name in names] +
["(e-mail address removed)"])

filters.add(IllegalDeliveredTo(valid_emails), DELETE)
filters.add(SpamAssassin(), AppendFile("spam2.mbox"))


# Get rid of anything which smells like an exectuable.
filters.add(IsVirus, DELETE)

# Use SpamBayes to identify spam. Make a local copy then
# delete from the server.
h = hammie.open("cull.spambayes", False, "r")
filters.add(IsSpam(h, 0.90), AppendFile("spam.mbox"))

# These are my POP3 accounts.
server_configs = [("mail.example.com",
"(e-mail address removed)", "password"),
("popserver.big.com", "ceo", "12345"), ]


# The main culling loop.
error_count = 0
cumulative_log = {SPAM: 0, VIRUS: 0}
initial_log = None
start_time = None # init'ed only after initial_log is created
while 1:
error_flag = False
duplicate.unique.clear() # Hack!
for server, user, pwd in server_configs:
try:
log = filter_server( (server, user, pwd), filters)
except KeyboardInterrupt:
raw_input("Press enter to continue. ")
except StandardError:
raise
except:
error_flag = True
traceback.print_exc()
continue

if VERBOSE_LEVEL > 1 and log:
print " ** Summary **"
for x in (log.tests, log.actions):
items = x.items()
if items:
items.sort()
for k, v in items:
print " %s: %s" % (k, v)
print

cumulative_log[SPAM] += log.tests.get(SPAM, 0)
cumulative_log[VIRUS] += log.tests.get(VIRUS, 0)

if initial_log is None:
initial_log = cumulative_log.copy()
start_time = time.time()
if VERBOSE_LEVEL:
print "Stats: %d spams, %d virus" % (
initial_log[SPAM], initial_log[VIRUS])
else:
if VERBOSE_LEVEL:
delta_t = time.time() - start_time
delta_t = max(delta_t, 1) #

print "Stats: %d spams (%.2f/hr), %d virus (%.2f/hr)" % (
cumulative_log[SPAM],
(cumulative_log[SPAM] - initial_log[SPAM]) /
delta_t * 3600,
cumulative_log[VIRUS],
(cumulative_log[VIRUS] - initial_log[VIRUS]) /
delta_t * 3600)

if error_flag:
error_count += 1

if error_count > 0:
restart_network()
error_count = 0

delay = 10 * 60
while delay:
try:
wait(delay)
break
except KeyboardInterrupt:
print
while 1:
cmd = raw_input("enter, delay, or quit? ")
if cmd in ("q", "quit"):
raise SystemExit(0)
elif cmd == "":
delay = 0
break
elif cmd.isdigit():
delay = int(cmd)
break
else:
print "Unknown command."

if __name__ == "__main__":
main()
 
F

Fred Pacquier

Andrew MacIntyre said:
I've been using Charles Cazabon's getmail (pure Python) for quite some
time in place of fetchmail.

Yes, I found that hidden gem too after searching high and low for a less
daunting alternative to fetchmail. I use it during the holidays to prevent
my mailboxes from overflowing - easy to set up, works like a charm, and
uses the plain old mbox format so it integrates well with
Mozilla/Thunderbird.

During that time however there is no mail left on the server, and when I
come back there are those huge files to mount and despam in TB, so I'm
looking for something more permanent.
 
F

Fred Pacquier

Andrew Dalke said:
I upgraded to Spambayes 1.0. I think I needed to change something,
but I've forgotten. I've attached my current version (sans passwords
and identifing mailing list information) to this post.

Thanks for that, Andrew !
It does not contribute. It can't. That is, it computes a spam value
for a message but since it doesn't know if that classification is
correct it can't train itself.
Every few months I export all my saved mail (inbox, sent, various
archive folders) as "ham". I also save all the spam that gets through
my filter so I can export it as "spam." I then train SpamBayes on
those two data sets and use the result for my filter program.

I more or less suspected something such, sounds natural. Thanks for the
clarification though.
I didn't make those additions to the attached file. Feel free to do
so, and to update the Wiki. Shouldn't be more than a dozen or two
lines of code. Eg, to have a file of blacklisted subjects you could
use WhiteListFrom (which reads lines from a file, and reloads if the
file's timestamp changed) and tweak the test condition.

Usual python mantra : first make it work, then (maybe) make it better. I'll
probably skip the "then make it faster" part :)
Enjoy, and thank you for your kind comments regarding my code. That
program has really helped with my email sanity. Thanks also to the
SpamBayes people and to Kevin Altis for his virus killer code I used.

Oh, so he _is_ the Kevin (of PythonCard fame) who appears in the comments,
I'd been wondering... There probably still is an old, unfinished "sample"
of mine in the PyCard CVS, called "fpop.py" :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top