Fighting Spam with Python

Discussion in 'Python' started by David MacQuigg, Aug 25, 2005.

  1. Are you as mad about spam as I am? Are you frustrated with the
    pessimism and lack of progress these last two years? Do you have
    faith that an open-source project can do better than the big companies
    competing for a lock-in solution? If so, you might be interested in
    the Open-Mail project.

    I'm writing some scripts to check incoming mail against a registry of
    reputable senders, using the new authentication methods. Python is
    ideal for this because it will give mail-system admins the ability to
    experiment with the different methods, and provide some real-world
    feedback sorely needed by the advocates of each method. So far, we
    have SPF and CSV. See http://purl.net/macquigg/email/python for the
    latest project status.

    I welcome anyone who is interested in helping, expecially if you have
    some experience with mail transfer programs, like Sendmail or Postfix,
    or spam filtering programs, like SpamAssassin. My Python may not be
    the best, so I welcome suggestions there also. We need to make these
    scripts a model of clarity.

    --
    Dave
     
    David MacQuigg, Aug 25, 2005
    #1
    1. Advertising

  2. David MacQuigg

    Peter Hansen Guest

    David MacQuigg wrote:
    > Are you as mad about spam as I am? Are you frustrated with the
    > pessimism and lack of progress these last two years? Do you have
    > faith that an open-source project can do better than the big companies
    > competing for a lock-in solution? If so, you might be interested in
    > the Open-Mail project.
    >
    > I'm writing some scripts to check incoming mail against a registry of
    > reputable senders, using the new authentication methods. Python is
    > ideal for this because it will give mail-system admins the ability to
    > experiment with the different methods, and provide some real-world
    > feedback sorely needed by the advocates of each method. So far, we
    > have SPF and CSV. See http://purl.net/macquigg/email/python for the
    > latest project status.


    You might find www.spambayes.org of interest, in several ways.

    -Peter
     
    Peter Hansen, Aug 25, 2005
    #2
    1. Advertising

  3. David MacQuigg

    Larry Bates Guest

    Before you do too much work you should probably check out:

    http://spambayes.sourceforge.net/

    There has already been a lot of work done on this project.

    FYI, Larry

    David MacQuigg wrote:
    > Are you as mad about spam as I am? Are you frustrated with the
    > pessimism and lack of progress these last two years? Do you have
    > faith that an open-source project can do better than the big companies
    > competing for a lock-in solution? If so, you might be interested in
    > the Open-Mail project.
    >
    > I'm writing some scripts to check incoming mail against a registry of
    > reputable senders, using the new authentication methods. Python is
    > ideal for this because it will give mail-system admins the ability to
    > experiment with the different methods, and provide some real-world
    > feedback sorely needed by the advocates of each method. So far, we
    > have SPF and CSV. See http://purl.net/macquigg/email/python for the
    > latest project status.
    >
    > I welcome anyone who is interested in helping, expecially if you have
    > some experience with mail transfer programs, like Sendmail or Postfix,
    > or spam filtering programs, like SpamAssassin. My Python may not be
    > the best, so I welcome suggestions there also. We need to make these
    > scripts a model of clarity.
    >
    > --
    > Dave
    >
     
    Larry Bates, Aug 25, 2005
    #3
  4. On Thu, 25 Aug 2005 10:18:37 -0400, Peter Hansen <>
    wrote:

    >David MacQuigg wrote:
    >> Are you as mad about spam as I am? Are you frustrated with the
    >> pessimism and lack of progress these last two years? Do you have
    >> faith that an open-source project can do better than the big companies
    >> competing for a lock-in solution? If so, you might be interested in
    >> the Open-Mail project.
    >>
    >> I'm writing some scripts to check incoming mail against a registry of
    >> reputable senders, using the new authentication methods. Python is
    >> ideal for this because it will give mail-system admins the ability to
    >> experiment with the different methods, and provide some real-world
    >> feedback sorely needed by the advocates of each method. So far, we
    >> have SPF and CSV. See http://purl.net/macquigg/email/python for the
    >> latest project status.

    >
    >You might find www.spambayes.org of interest, in several ways.


    Integration of a good spam filter is one of our top priorities.
    Spambayes looks like a good candidate. The key new features needed in
    a spam filter are the ability to extract the sender's identity (not
    that of the latest forwarder), and to factor into the spam score the
    reputation of that identity. We could use some help on this
    integration.

    I guess I should have said a little more about the Open-Mail project.
    We are not focused on developing new authentication or filtering
    methods, but rather, providing a platform that will bring these pieces
    together and allow the mail admin to chose which methods are used and
    in what order. Interoperability has been the main barrier to
    widescale use of authentication. Python is superb at gluing these
    pieces together.

    In the flow we envision, the spam filter is the final process, used
    only on the 5% that is hard to classify. 80% will get an immediate
    reject. 15% will get an immediate accept without filtering, because
    the sender is authenticated and has a good reputation. Eventually,
    all reputable senders will join the 15%, and the 5% will shrink to
    where we can ignore it.

    --
    Dave
     
    David MacQuigg, Aug 25, 2005
    #4
  5. [David MacQuigg]

    > The key new features needed in a spam filter are the ability to
    > extract the sender's identity (not that of the latest forwarder), and
    > to factor into the spam score the reputation of that identity.


    This will only work if your system is immune to forgeries, while being
    largely widespread.

    > In the flow we envision, the spam filter is the final process, used
    > only on the 5% that is hard to classify. 80% will get an immediate
    > reject. 15% will get an immediate accept without filtering, because
    > the sender is authenticated and has a good reputation. Eventually,
    > all reputable senders will join the 15%, and the 5% will shrink to
    > where we can ignore it.


    It's fun to read statistics about a vision! :)

    > >You might find www.spambayes.org of interest, in several ways.


    Spambayes is surprisingly good as it already stands.

    --
    François Pinard http://pinard.progiciels-bpi.ca
     
    =?iso-8859-1?Q?Fran=E7ois?= Pinard, Aug 25, 2005
    #5
  6. On Thu, 25 Aug 2005 13:22:53 -0400, François Pinard wrote:
    >[David MacQuigg]
    >
    >> The key new features needed in a spam filter are the ability to
    >> extract the sender's identity (not that of the latest forwarder), and
    >> to factor into the spam score the reputation of that identity.

    >
    >This will only work if your system is immune to forgeries, while being
    >largely widespread.


    Stopping forgery is what the new authentication methods are all about.
    Getting these methods widely and effectively used is our big
    challenge, and one that I hope to accomplish with my efforts. There
    are a bunch of pieces that need to work together more smoothly.
    That's where Python comes in. There are some challenging constraints,
    like the system has to work without government regulation. I've got a
    first draft of a website for open-mail.org - temporarily at
    http://purl.net/macquigg/email/registry Suggestions are welcome.

    >> In the flow we envision, the spam filter is the final process, used
    >> only on the 5% that is hard to classify. 80% will get an immediate
    >> reject. 15% will get an immediate accept without filtering, because
    >> the sender is authenticated and has a good reputation. Eventually,
    >> all reputable senders will join the 15%, and the 5% will shrink to
    >> where we can ignore it.

    >
    >It's fun to read statistics about a vision! :)


    The 80% is real. http://messagelabs.com/emailthreats As to how the
    remaining 20% will split, that's a guess, but one that I think is
    realistic. See http://www.spamhaus.org/effective_filtering.html for
    comparable numbers using only IP blacklists and spam filtering.

    The 5% still needing filtering will be those senders that don't offer
    any authentication or that authenticate with an identity that has not
    yet acquired a reputation.

    >> >You might find www.spambayes.org of interest, in several ways.

    >
    >Spambayes is surprisingly good as it already stands.


    I haven't used Spambayes, but my experience with Spamnix (an offshoot
    of Spam Assassin) is that statistical filters always have a few false
    rejects. In my case, that's about two per week.

    The solution to this problem is a reliable system allowing receivers
    to determine the identity and reputation of an unknown sender. Then
    we can safely ignore the spam.

    -- Dave
     
    David MacQuigg, Aug 26, 2005
    #6
  7. [David MacQuigg]

    > Getting these methods widely and effectively used is our big
    > challenge, and one that I hope to accomplish with my efforts.


    I wish one of these methods, either yours or one of these few others
    which were developed and proposed in the recent years, will succeed. It
    might be useful, for someone involved like you are (thanks for all of
    us!), that you make a survey of those others, trying to understand why
    they failed to acquire popularity, not repeating the same errors if any.

    --
    François Pinard http://pinard.progiciels-bpi.ca
     
    =?iso-8859-1?Q?Fran=E7ois?= Pinard, Aug 26, 2005
    #7
  8. On Wed, 24 Aug 2005 22:46:28 -0700, rumours say that David MacQuigg <dmq
    at pobox.com> might have written:

    >I'm writing some scripts to check incoming mail against a registry of
    >reputable senders, using the new authentication methods. Python is
    >ideal for this because it will give mail-system admins the ability to
    >experiment with the different methods, and provide some real-world
    >feedback sorely needed by the advocates of each method. So far, we
    >have SPF and CSV. See http://purl.net/macquigg/email/python for the
    >latest project status.


    I am on the side of advocating SPF records --and I am one of the first
    four postmasters in my country's TLD that set up SPF records for two of
    the email domains I'm administrating. SPF is an internet draft now.[1]

    Your method is/will_not be free (as in beer), as hinted in
    http://www.ece.arizona.edu/~edatools/home/email/registry/Form-Sender01.htm
    .. *That* is a drawback similar to the licensing of the Microsoft's
    Sender/Caller-ID scheme. Why not support open, free standards?

    I have developped scripts of my own to perform various consistency
    checks (including SPF lookup) and maintain my own black list (I am
    consulting three RBL's which I have found to be close to my standards,
    but I want to avoid excessive usage of their bandwidth), and although it
    takes some time almost every day overseeing things, I would be very
    timid to support such a free (as in jazz :) scheme. I mean, the
    "reputation" idea is nice, but paying for this reputation won't help its
    spreading.

    Good luck with it as a business, though.


    [1]
    http://www.ietf.org/internet-drafts/draft-schlitt-spf-classic-02.txt
    http://www.ietf.org/internet-drafts/draft-newton-maawg-spf-considerations-00.txt
    --
    TZOTZIOY, I speak England very best.
    "Dear Paul,
    please stop spamming us."
    The Corinthians
     
    Christos Georgiou, Aug 26, 2005
    #8
  9. David MacQuigg

    John J. Lee Guest

    David MacQuigg <dmq at pobox.com> writes:
    [...]
    > I haven't used Spambayes, but my experience with Spamnix (an offshoot
    > of Spam Assassin) is that statistical filters always have a few false
    > rejects. In my case, that's about two per week.

    [...]

    That is precisely the problem that Bayesian filtering was designed to
    solve.

    AFAIK, Spam Assassin is a non-Bayesian filter. (Though I think I
    heard they were thinking of grafting on Bayesian filtering to their
    existing algorithms, I'm not sure if they did it, or even if that's
    actually a sane thing to do.)

    [David, in an earlier email]
    > reject. 15% will get an immediate accept without filtering, because
    > the sender is authenticated and has a good reputation. Eventually,
    > all reputable senders will join the 15%, and the 5% will shrink to
    > where we can ignore it.


    Two questions you seem to be implicitly assuming particular answers
    to: Is widespread authentication a good thing? Does it solve any
    problem not solved by Bayesian filtering plus good mail client
    support? My first reaction is to answer "no" to both questions, so to
    regard your effort as harmful. Might be interesting to hear why you
    think it's a good thing, though.


    John
     
    John J. Lee, Aug 27, 2005
    #9
  10. On Fri, 26 Aug 2005 10:36:28 -0400, François Pinard
    <> wrote:

    >[David MacQuigg]
    >
    >> Getting these methods widely and effectively used is our big
    >> challenge, and one that I hope to accomplish with my efforts.

    >
    >I wish one of these methods, either yours or one of these few others
    >which were developed and proposed in the recent years, will succeed.


    I don't have a method, and that is a key part of the strategy. The
    Registry is intended to support all methods. My main technical
    contribution, if you can call it that, is to figure out how we can tie
    these methods into a system where not all participants are using the
    same method. ( An inter-operability protocol, if you need a fancy
    name.)

    >It might be useful, for someone involved like you are (thanks for all of
    >us!), that you make a survey of those others, trying to understand why
    >they failed to acquire popularity, not repeating the same errors if any.


    The main reason for the current failure is that the effort to achieve
    a common authentication standard has degenerated into a war.

    I did try to find information on other attempts at setting up a
    Registry/Clearinghouse of reputation information. There has been an
    effort by Spamhaus to establish such a registry, but they were
    counting on senders to support it. That seems to me a fatal flaw.

    Our plans are to have *receivers* support the registry via
    subscription fees. Senders will need an incentive, and that will be
    provided by receivers who use the Registry to clear reputable mail,
    and send the rest to a spam filter.

    There are also some successful proprietary systems, like IronPort
    Senderbase, that I think are similar, but I don't know the details.
    You have to pay them big bucks for a "spam appliance".

    --
    Dave
     
    David MacQuigg, Aug 27, 2005
    #10
  11. Re: Licensing and Other Questions

    On Sat, 27 Aug 2005 01:35:58 +0300, Christos Georgiou
    <> wrote:

    >Your method is/will_not be free (as in beer), as hinted in
    >http://www.ece.arizona.edu/~edatools/home/email/registry/Form-Sender01.htm
    >. *That* is a drawback similar to the licensing of the Microsoft's
    >Sender/Caller-ID scheme. Why not support open, free standards?


    These are fees for services, not license fees. I don't know how you
    could miss that. The code is offered under the Python licence, which
    is the most unrestrictive of any license I know about.

    One of my goals is to provide an open-source version of what big
    companies are now paying millions for - spam appliances with
    proprietary methods.


    On Fri, 26 Aug 2005 23:20:05 GMT, (John J. Lee) wrote:
    >[David, in an earlier email]
    >> reject. 15% will get an immediate accept without filtering, because
    >> the sender is authenticated and has a good reputation. Eventually,
    >> all reputable senders will join the 15%, and the 5% will shrink to
    >> where we can ignore it.

    >
    >Two questions you seem to be implicitly assuming particular answers
    >to: Is widespread authentication a good thing? Does it solve any
    >problem not solved by Bayesian filtering plus good mail client
    >support? My first reaction is to answer "no" to both questions, so to
    >regard your effort as harmful. Might be interesting to hear why you
    >think it's a good thing, though.


    I really didn't intend for this to be a discussion of the merits of
    filtering vs authentication. I worry this will be a long discussion,
    with no satisfactory conclusion, so I suggest we move these topics to
    one of the email security forums. My conclusion, after participating
    in many such discussions, is that both filtering and authentication
    are necessary tools, and a complete system should have both.

    --
    Dave
     
    David MacQuigg, Aug 27, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rene Pijlman
    Replies:
    22
    Views:
    756
    Fredrik Lundh
    Dec 10, 2003
  2. Brendan
    Replies:
    8
    Views:
    254
    Brendan
    Dec 4, 2005
  3. Sergio Correia
    Replies:
    7
    Views:
    317
    Ben Finney
    Sep 18, 2007
  4. neil harper
    Replies:
    0
    Views:
    256
    neil harper
    Apr 5, 2011
  5. neil harper

    fighting game made with python

    neil harper, Apr 7, 2011, in forum: Python
    Replies:
    6
    Views:
    991
    geremy condra
    Apr 7, 2011
Loading...

Share This Page