negative backreference?

Discussion in 'Perl Misc' started by eric.hall@gmail.com, Mar 4, 2005.

  1. Guest

    I'm a relative newbie with perl/regexp

    I'm trying to write a rule for SpamAssassin that looks at the top-most
    Received header and checks if the HELO identifer and the reverse DNS
    hostname are the same, and apply a weight accordingly.

    It's easy to see if they are the same, using an internal debug header
    and a backreference. Assume HEADER is of the form "rdns=hostname
    helo=hostname" then the simple rule of:

    HEADER =~ /rdns=(.*) helo=\1/

    will match when they are the same. But I need to match when they are
    different.

    I've tried negative look-ahead of various forms, but nothing seems to
    work correctly when backreferences are included. Is there a way out of
    this hole?

    Perl 5.8.1 on SuSE Linux Professional 9.0, if it matters.

    Thanks
     
    , Mar 4, 2005
    #1
    1. Advertising

  2. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > I'm a relative newbie with perl/regexp
    >
    > I'm trying to write a rule for SpamAssassin that looks at the top-most
    > Received header and checks if the HELO identifer and the reverse DNS
    > hostname are the same, and apply a weight accordingly.
    >
    > It's easy to see if they are the same, using an internal debug header
    > and a backreference. Assume HEADER is of the form "rdns=hostname
    > helo=hostname" then the simple rule of:
    >
    > HEADER =~ /rdns=(.*) helo=\1/
    >
    > will match when they are the same. But I need to match when they are
    > different.
    >
    > I've tried negative look-ahead of various forms, but nothing seems to
    > work correctly when backreferences are included. Is there a way out of
    > this hole?


    What have you tried? Negative lookahead should work just fine for this.

    Anno
     
    Anno Siegel, Mar 4, 2005
    #2
    1. Advertising

  3. Guest

    I've tried wrapping just the "rdns" part in a negative look-ahead, and
    I've tried wrapping the whole thing, and neither produces a match when
    the values are different.

    Here's the first test, using non-matching names:

    #!/usr/bin/perl

    $_ = "[ rdns=hostname1 helo=hostname2 ]";

    if ( /^[^\]]+ (?!rdns=(.*) helo=\1)/ ) {
    print "got <$1>\n";
    }

    $ test.pl
    got <>

    That looks like it works, but it produces the same results ("got <>")
    even when rdns and helo are the same, meaning that the test seems to
    err out in all cases.

    Am I trapping the wrong output or something?
     
    , Mar 4, 2005
    #3
  4. Guest

    ^[^\]]+ rdns=(\S*) helo=(?!\1) returns hostname1, as needed. From my
    admittedly limited understanding, it does not appear that the negative
    look-ahead is being interpreted as such, and this gobbledygoo should
    not work.

    I'm content to live with the mystery, but if somebody could explain it
    or reference material that says why it works, I'd be appreciative.
     
    , Mar 4, 2005
    #4
  5. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:

    Please give an attribution and some context in your reply.

    > I've tried wrapping just the "rdns" part in a negative look-ahead, and
    > I've tried wrapping the whole thing, and neither produces a match when
    > the values are different.
    >
    > Here's the first test, using non-matching names:
    >
    > #!/usr/bin/perl
    >
    > $_ = "[ rdns=hostname1 helo=hostname2 ]";
    >
    > if ( /^[^\]]+ (?!rdns=(.*) helo=\1)/ ) {
    > print "got <$1>\n";
    > }
    >
    > $ test.pl
    > got <>
    >
    > That looks like it works, but it produces the same results ("got <>")
    > even when rdns and helo are the same, meaning that the test seems to
    > err out in all cases.


    To me it doesn't look at all like it works, given that it failed to
    capture anything in $1.

    It's really rather simple. The test for equality can be done with just a
    backreference, without lookahead:

    for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
    print "got <$1>\n" if /rdns=(.*) helo=\1/;
    }

    That reports "AAA", the case where both are equal. Now turn the sense
    of the test around, wrapping the backreference in a negative lookahead:

    for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
    print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
    }

    Now it reports "BBB". That's it.

    Anno
     
    Anno Siegel, Mar 4, 2005
    #5
  6. [A complimentary Cc of this posting was sent to
    Anno Siegel
    <-berlin.de>], who wrote in article <d0aq0m$a3t$-Berlin.DE>:
    > That reports "AAA", the case where both are equal. Now turn the sense
    > of the test around, wrapping the backreference in a negative lookahead:
    >
    > for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
    > print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
    > }
    >
    > Now it reports "BBB". That's it.


    Do not think so. One needs some anchor at the end. Something like

    /rdns=(\w+) helo=(?!\1\b)/;

    (mutatis mutandis). Having different match than \w+ will lead so more
    complicated stuff than \b... In perfect life, one would use something
    like my (proposed) onion rings:

    /rdns=(\S*) helo=(?& \S* & (?!\1)/;

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Mar 5, 2005
    #6
  7. Anno Siegel Guest

    Ilya Zakharevich <> wrote in comp.lang.perl.misc:
    > [A complimentary Cc of this posting was sent to
    > Anno Siegel
    > <-berlin.de>], who wrote in article
    > <d0aq0m$a3t$-Berlin.DE>:
    > > That reports "AAA", the case where both are equal. Now turn the sense
    > > of the test around, wrapping the backreference in a negative lookahead:
    > >
    > > for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
    > > print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
    > > }
    > >
    > > Now it reports "BBB". That's it.

    >
    > Do not think so. One needs some anchor at the end. Something like
    >
    > /rdns=(\w+) helo=(?!\1\b)/;


    That's right.

    > (mutatis mutandis). Having different match than \w+ will lead so more
    > complicated stuff than \b... In perfect life, one would use something
    > like my (proposed) onion rings:
    >
    > /rdns=(\S*) helo=(?& \S* & (?!\1)/;


    Is that proposal available somewhere? I'm not sure how "(?&" is supposed
    to work. Should the parens balance?

    Anno
     
    Anno Siegel, Mar 5, 2005
    #7
  8. [A complimentary Cc of this posting was sent to
    Anno Siegel
    <-berlin.de>], who wrote in article <d0c48i$39r$-Berlin.DE>:
    > > (mutatis mutandis). Having different match than \w+ will lead so more
    > > complicated stuff than \b... In perfect life, one would use something
    > > like my (proposed) onion rings:
    > >
    > > /rdns=(\S*) helo=(?& \S* & (?!\1)/;


    /rdns=(\S*) helo=(?& \S* & (?!\1))/

    maybe even

    /rdns=(\S*) helo=(?& \S* &! \1)/

    > Is that proposal available somewhere?


    I think so. google for it...

    > I'm not sure how "(?&" is supposed to work.


    A & B & C & D ...

    B should match a substring of what A matched, C should match a
    substring of what B matched etc... One can replace & by &! (negating
    the following group). Actually, another, "anchored", flavor is useful
    in other situations: one where B should match *exactly* the string
    which A matched (and not a substring thereof). [example above uses
    the second flavor]

    It was never clear to me how to distinguish these two flavors; maybe
    something as simple as && vs &...

    > Should the parens balance?


    Sure, thanks.

    Yours,
    Ilya
     
    Ilya Zakharevich, Mar 9, 2005
    #8
  9. Anno Siegel Guest

    Ilya Zakharevich <> wrote in comp.lang.perl.misc:
    > [A complimentary Cc of this posting was sent to
    > Anno Siegel
    > <-berlin.de>], who wrote in article
    > <d0c48i$39r$-Berlin.DE>:
    > > > (mutatis mutandis). Having different match than \w+ will lead so more
    > > > complicated stuff than \b... In perfect life, one would use something
    > > > like my (proposed) onion rings:
    > > >
    > > > /rdns=(\S*) helo=(?& \S* & (?!\1)/;

    >
    > /rdns=(\S*) helo=(?& \S* & (?!\1))/
    >
    > maybe even
    >
    > /rdns=(\S*) helo=(?& \S* &! \1)/
    >
    > > Is that proposal available somewhere?

    >
    > I think so. google for it...


    I tried. In the presence of a number of _State of the Onion_s the puns
    overwhelmed me.

    > > I'm not sure how "(?&" is supposed to work.

    >
    > A & B & C & D ...
    >
    > B should match a substring of what A matched, C should match a
    > substring of what B matched etc... One can replace & by &! (negating
    > the following group).


    Ah... Now I'm getting the name too -- successive substrings.

    > Actually, another, "anchored", flavor is useful
    > in other situations: one where B should match *exactly* the string
    > which A matched (and not a substring thereof). [example above uses
    > the second flavor]


    With infinitesimal onion rings...

    > It was never clear to me how to distinguish these two flavors; maybe
    > something as simple as && vs &...


    Hard to remember which is which in a not-too-often-used construct.
    How about =& ?

    Anno
     
    Anno Siegel, Mar 9, 2005
    #9
  10. [A complimentary Cc of this posting was sent to
    Anno Siegel
    <-berlin.de>], who wrote in article <d0nlnj$6c1$-Berlin.DE>:
    > > > I'm not sure how "(?&" is supposed to work.

    > >
    > > A & B & C & D ...
    > >
    > > B should match a substring of what A matched, C should match a
    > > substring of what B matched etc... One can replace & by &! (negating
    > > the following group).

    >
    > Ah... Now I'm getting the name too -- successive substrings.
    >
    > > Actually, another, "anchored", flavor is useful
    > > in other situations: one where B should match *exactly* the string
    > > which A matched (and not a substring thereof). [example above uses
    > > the second flavor]

    >
    > With infinitesimal onion rings...
    >
    > > It was never clear to me how to distinguish these two flavors; maybe
    > > something as simple as && vs &...

    >
    > Hard to remember which is which in a not-too-often-used construct.
    > How about =& ?


    The idea of &= is appealing indeed. Uniformizing, it may become

    &~ &= &!~ &!=

    or just

    & &= &! &!=

    Hard to decide between these two...

    Thanks,
    Ilya
     
    Ilya Zakharevich, Mar 10, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. prem_eda
    Replies:
    5
    Views:
    7,875
    Pieter Hulshoff
    Oct 11, 2004
  2. paulm

    Newbie backreference question

    paulm, Jun 30, 2005, in forum: Python
    Replies:
    6
    Views:
    385
    paulm
    Jul 1, 2005
  3. Fredrik Lundh

    backreference in regexp

    Fredrik Lundh, Jan 31, 2006, in forum: Python
    Replies:
    2
    Views:
    354
    =?ISO-8859-1?Q?Sch=FCle_Daniel?=
    Jan 31, 2006
  4. Replies:
    4
    Views:
    633
    jeff emminger
    Aug 18, 2006
  5. abdulet
    Replies:
    2
    Views:
    542
    abdulet
    Oct 23, 2009
Loading...

Share This Page