Spam Filter Pattern Matching

Discussion in 'Perl Misc' started by mossoft, Jan 30, 2004.

  1. mossoft

    mossoft Guest

    I use SpamAssassin as a SPAM detector, the rules for the Bayes filter
    appear to be Perl based.
    I need a rule which detects a string in the subject like "Re: ABCDE,
    random three words", where the ABCDE bit can be between 2 and 8 upper
    case characters, and I came up with:

    /Re: [A-Z]{2,8}, .{1,20}? .{1,20}? .{1,20}?/i

    Does this look about right to all you experts?

    Ta.

    M.
     
    mossoft, Jan 30, 2004
    #1
    1. Advertising

  2. mossoft

    Dan Wilga Guest

    In article <>,
    (mossoft) wrote:

    > I use SpamAssassin as a SPAM detector, the rules for the Bayes filter
    > appear to be Perl based.
    > I need a rule which detects a string in the subject like "Re: ABCDE,
    > random three words", where the ABCDE bit can be between 2 and 8 upper
    > case characters, and I came up with:
    >
    > /Re: [A-Z]{2,8}, .{1,20}? .{1,20}? .{1,20}?/i


    The one I wrote yesterday (but haven't tested yet) is:

    ^Re:\s[A-Z][A-Z]+,(\s[a-z]+){3}

    I'd rather not assume the CAPS part will be from 2-8 chars, or that any
    of the individual words will be from 1-20 chars.

    In my experience, these subjects always have all lowercase alphas in the
    three words after the comma, so using "." here is overkill, IMHO.

    I've also found when writing regexps that \s is your friend. It's almost
    always preferable to use \s (or even \s+), rather than assume the
    character will be a real space. It might be a tab or a carriage return.
    Granted, it's not too likely in an email subject, but as a general rule
    it's very often true, and costs next to nothing.

    --
    Dan Wilga
    ** Remove the -MUNGE in my address to reply **
     
    Dan Wilga, Jan 30, 2004
    #2
    1. Advertising

  3. mossoft

    Dan Wilga Guest

    In article <>,
    Dan Wilga <> wrote:

    > The one I wrote yesterday (but haven't tested yet) is:
    >
    > ^Re:\s[A-Z][A-Z]+,(\s[a-z]+){3}


    No sooner did I write the above, then I got a piece of spam with an
    apostrophe in the three words at the end :-(.

    Perhaps this would work better:

    ^Re:\s[A-Z][A-Z]+,(\s[a-z\']+){3}

    --
    Dan Wilga
    ** Remove the -MUNGE in my address to reply **
     
    Dan Wilga, Jan 30, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rene Pijlman
    Replies:
    22
    Views:
    736
    Fredrik Lundh
    Dec 10, 2003
  2. Replies:
    3
    Views:
    523
  3. zax75
    Replies:
    1
    Views:
    1,104
  4. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    237
    Marc Bissonnette
    Jan 13, 2004
  5. Bobby Chamness
    Replies:
    2
    Views:
    232
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page