python spam filter: random words?

Discussion in 'Python' started by revyakin, Aug 11, 2003.

  1. revyakin

    revyakin Guest

    I know fighting spam is like fighting global worming, but still..
    50% of spam I get these days contains a random combination of letters
    at the end of the subject line. Has anyone tried using that feature in
    antispam filters? Since python is the only lang I am more or less
    fluent in as an amature scripter, I was wondering if anyone in this
    goup has comments on this idea.
    Also, is it reivial make a python script filter executable from a
    generic mail program like OE, or NS messenger?
    I am also wondering why spammers add that stuff to their subject lines
    anyway.
     
    revyakin, Aug 11, 2003
    #1
    1. Advertising

  2. revyakin

    Ben Finney Guest

    On 10 Aug 2003 18:13:53 -0700, revyakin wrote:
    > I know fighting spam is like fighting global worming, but still..

    ^^^^^^^^^^^^^^
    Given that some spam contains e-mail worms, the typo is appropriate :)

    > 50% of spam I get these days contains a random combination of letters
    > at the end of the subject line. Has anyone tried using that feature in
    > antispam filters?


    My experience has been that this practice is dropping off, since
    Bayesian statistical-analysis filters will glide right by random words
    as "not statistically significant.

    What I'm seeing now is spam with words taken straight from the "likely
    good" word lists of Bayesian filters :)

    > I am also wondering why spammers add that stuff to their subject lines
    > anyway.


    To defeat spam filters that check for the occurrence of a known spam
    message they've seen before. As noted above, though, these are being
    superseded by Bayesian word metric analysis.

    --
    Ben Finney
     
    Ben Finney, Aug 11, 2003
    #2
    1. Advertising

  3. On Sunday 10 August 2003 18:28, Ben Finney wrote:
    > What I'm seeing now is spam with words taken straight from the "likely
    > good" word lists of Bayesian filters :)
    >


    this was recently discussed on the spambayes list (the nifty Python
    implementation of Paul Graham's ideas).

    Apparently there are not enough uses of the word to make it statistically
    interesting so spambayes ignores it. Or something like that. See the thread
    there for full details.
     
    Sean 'Shaleh' Perry, Aug 11, 2003
    #3
  4. revyakin

    Terry Reedy Guest

    "Marc Wilson" <> wrote in message
    news:p...
    > In comp.lang.python, (revyakin) (revyakin) wrote

    in
    > <>::
    >
    > |I know fighting spam is like fighting global worming, but still..
    > |50% of spam I get these days contains a random combination of

    letters
    > |at the end of the subject line. Has anyone tried using that feature

    in
    > |antispam filters?
    >
    > How do you detect "random" letters? You can only (programmatically)
    > determine that a character sequence is "random" if it doesn't appear

    in some
    > sort of dictionary, and even there you have the risk of false

    positives due
    > to typos, acronyms etc.


    Looking at successive letter pairs would go a long way. Out of the
    (26+space)**2 conbinations, perhaps half occur in real words (ie, 'qx'
    is a giveaway). Using triples would allow inclusion of common
    three-letter acronyms as legal.

    Terry J. Reedy
     
    Terry Reedy, Aug 11, 2003
    #4
  5. revyakin

    Marc Wilson Guest

    In comp.lang.python, "Terry Reedy" <> (Terry Reedy) wrote
    in <>::

    |> How do you detect "random" letters? You can only (programmatically)
    |> determine that a character sequence is "random" if it doesn't appear
    |in some
    |> sort of dictionary, and even there you have the risk of false
    |positives due
    |> to typos, acronyms etc.
    |
    |Looking at successive letter pairs would go a long way. Out of the
    |(26+space)**2 conbinations, perhaps half occur in real words (ie, 'qx'
    |is a giveaway). Using triples would allow inclusion of common
    |three-letter acronyms as legal.

    For sale today on QXL.com....
    --
    Marc Wilson

    Cleopatra Consultants Limited - IT Consultants
    2 The Grange, Cricklade Street, Old Town, Swindon SN1 3HG
    Tel: (44/0) 70-500-15051 Fax: (44/0) 870 164-0054
    Mail: Web: http://www.cleopatra.co.uk
    _________________________________________________________________
    Try MailTraq at https://my.mailtraq.com/register.asp?code=cleopatra
     
    Marc Wilson, Aug 12, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tony Meyer

    RE: python spam filter: random words?

    Tony Meyer, Aug 11, 2003, in forum: Python
    Replies:
    1
    Views:
    506
    Brandon J. Van Every
    Aug 11, 2003
  2. Replies:
    3
    Views:
    571
  3. zax75
    Replies:
    1
    Views:
    1,144
  4. globalrev
    Replies:
    4
    Views:
    824
    Gabriel Genellina
    Apr 20, 2008
  5. VK
    Replies:
    15
    Views:
    1,338
    Dr J R Stockton
    May 2, 2010
Loading...

Share This Page