negative backreference?

eric.hall · Mar 4, 2005

I'm a relative newbie with perl/regexp

I'm trying to write a rule for SpamAssassin that looks at the top-most
Received header and checks if the HELO identifer and the reverse DNS
hostname are the same, and apply a weight accordingly.

It's easy to see if they are the same, using an internal debug header
and a backreference. Assume HEADER is of the form "rdns=hostname
helo=hostname" then the simple rule of:

HEADER =~ /rdns=(.*) helo=\1/

will match when they are the same. But I need to match when they are
different.

I've tried negative look-ahead of various forms, but nothing seems to
work correctly when backreferences are included. Is there a way out of
this hole?

Perl 5.8.1 on SuSE Linux Professional 9.0, if it matters.

Thanks

Anno Siegel · Mar 4, 2005

I'm a relative newbie with perl/regexp

I'm trying to write a rule for SpamAssassin that looks at the top-most
Received header and checks if the HELO identifer and the reverse DNS
hostname are the same, and apply a weight accordingly.

It's easy to see if they are the same, using an internal debug header
and a backreference. Assume HEADER is of the form "rdns=hostname
helo=hostname" then the simple rule of:

HEADER =~ /rdns=(.*) helo=\1/

will match when they are the same. But I need to match when they are
different.

I've tried negative look-ahead of various forms, but nothing seems to
work correctly when backreferences are included. Is there a way out of
this hole?

What have you tried? Negative lookahead should work just fine for this.

Anno

eric.hall · Mar 4, 2005

I've tried wrapping just the "rdns" part in a negative look-ahead, and
I've tried wrapping the whole thing, and neither produces a match when
the values are different.

Here's the first test, using non-matching names:

#!/usr/bin/perl

$_ = "[ rdns=hostname1 helo=hostname2 ]";

if ( /^[^\]]+ (?!rdns=(.*) helo=\1)/ ) {
print "got <$1>\n";
}

$ test.pl
got <>

That looks like it works, but it produces the same results ("got <>")
even when rdns and helo are the same, meaning that the test seems to
err out in all cases.

Am I trapping the wrong output or something?

eric.hall · Mar 4, 2005

^[^\]]+ rdns=(\S*) helo=(?!\1) returns hostname1, as needed. From my
admittedly limited understanding, it does not appear that the negative
look-ahead is being interpreted as such, and this gobbledygoo should
not work.

I'm content to live with the mystery, but if somebody could explain it
or reference material that says why it works, I'd be appreciative.

Anno Siegel · Mar 4, 2005

Please give an attribution and some context in your reply.

I've tried wrapping just the "rdns" part in a negative look-ahead, and
I've tried wrapping the whole thing, and neither produces a match when
the values are different.

Here's the first test, using non-matching names:

#!/usr/bin/perl

$_ = "[ rdns=hostname1 helo=hostname2 ]";

if ( /^[^\]]+ (?!rdns=(.*) helo=\1)/ ) {
print "got <$1>\n";
}

$ test.pl
got <>

That looks like it works, but it produces the same results ("got <>")
even when rdns and helo are the same, meaning that the test seems to
err out in all cases.

To me it doesn't look at all like it works, given that it failed to
capture anything in $1.

It's really rather simple. The test for equality can be done with just a
backreference, without lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=\1/;
}

That reports "AAA", the case where both are equal. Now turn the sense
of the test around, wrapping the backreference in a negative lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
}

Now it reports "BBB". That's it.

Anno

Ilya Zakharevich · Mar 5, 2005

[A complimentary Cc of this posting was sent to
Anno Siegel

That reports "AAA", the case where both are equal. Now turn the sense
of the test around, wrapping the backreference in a negative lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
}

Now it reports "BBB". That's it.

Do not think so. One needs some anchor at the end. Something like

/rdns=(\w+) helo=(?!\1\b)/;

(mutatis mutandis). Having different match than \w+ will lead so more
complicated stuff than \b... In perfect life, one would use something
like my (proposed) onion rings:

/rdns=(\S*) helo=(?& \S* & (?!\1)/;

Hope this helps,
Ilya

Anno Siegel · Mar 5, 2005

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Anno Siegel

That reports "AAA", the case where both are equal. Now turn the sense
of the test around, wrapping the backreference in a negative lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
}

Now it reports "BBB". That's it.

Click to expand...

Do not think so. One needs some anchor at the end. Something like

/rdns=(\w+) helo=(?!\1\b)/;

That's right.

(mutatis mutandis). Having different match than \w+ will lead so more
complicated stuff than \b... In perfect life, one would use something
like my (proposed) onion rings:

/rdns=(\S*) helo=(?& \S* & (?!\1)/;

Is that proposal available somewhere? I'm not sure how "(?&" is supposed
to work. Should the parens balance?

Anno

Ilya Zakharevich · Mar 9, 2005

[A complimentary Cc of this posting was sent to
Anno Siegel

/rdns=(\S*) helo=(?& \S* & (?!\1))/

maybe even

/rdns=(\S*) helo=(?& \S* &! \1)/

Is that proposal available somewhere?

I think so. google for it...

I'm not sure how "(?&" is supposed to work.

A & B & C & D ...

B should match a substring of what A matched, C should match a
substring of what B matched etc... One can replace & by &! (negating
the following group). Actually, another, "anchored", flavor is useful
in other situations: one where B should match *exactly* the string
which A matched (and not a substring thereof). [example above uses
the second flavor]

It was never clear to me how to distinguish these two flavors; maybe
something as simple as && vs &...

Should the parens balance?

Sure, thanks.

Yours,
Ilya

Anno Siegel · Mar 9, 2005

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Anno Siegel

/rdns=(\S*) helo=(?& \S* & (?!\1))/

maybe even

/rdns=(\S*) helo=(?& \S* &! \1)/

Is that proposal available somewhere?

Click to expand...

I think so. google for it...

I tried. In the presence of a number of _State of the Onion_s the puns
overwhelmed me.

A & B & C & D ...

B should match a substring of what A matched, C should match a
substring of what B matched etc... One can replace & by &! (negating
the following group).

Ah... Now I'm getting the name too -- successive substrings.

Actually, another, "anchored", flavor is useful
in other situations: one where B should match *exactly* the string
which A matched (and not a substring thereof). [example above uses
the second flavor]

With infinitesimal onion rings...

It was never clear to me how to distinguish these two flavors; maybe
something as simple as && vs &...

Hard to remember which is which in a not-too-often-used construct.
How about =& ?

Anno

Ilya Zakharevich · Mar 10, 2005

[A complimentary Cc of this posting was sent to
Anno Siegel

A & B & C & D ...

B should match a substring of what A matched, C should match a
substring of what B matched etc... One can replace & by &! (negating
the following group).

Click to expand...

Ah... Now I'm getting the name too -- successive substrings.

Actually, another, "anchored", flavor is useful
in other situations: one where B should match *exactly* the string
which A matched (and not a substring thereof). [example above uses
the second flavor]

Click to expand...

With infinitesimal onion rings...

It was never clear to me how to distinguish these two flavors; maybe
something as simple as && vs &...

Click to expand...

Hard to remember which is which in a not-too-often-used construct.
How about =& ?

The idea of &= is appealing indeed. Uniformizing, it may become

&~ &= &!~ &!=

or just

& &= &! &!=

Hard to decide between these two...

Thanks,
Ilya

negative numbers are not equal...	8	Aug 14, 2008
How Python works: What do you know about support for negative indices?	13	Sep 10, 2010
Net DNS Update problem	1	Mar 21, 2007
Negative look-behind	4	Jun 1, 2004
[SUMMARY] Negative Sleep (#87)	0	Jul 20, 2006
Known issues with Perl under Cygwin?	3	Aug 27, 2005
Interpolation of qr-regexes containing backreferences	13	Jan 22, 2004
generate and send mail with python: tutorial	8	Aug 11, 2011

negative backreference?

eric.hall

Anno Siegel

eric.hall

eric.hall

Anno Siegel

Ilya Zakharevich

Anno Siegel

Ilya Zakharevich

Anno Siegel

Ilya Zakharevich

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads