negative backreference?

E

eric.hall

I'm a relative newbie with perl/regexp

I'm trying to write a rule for SpamAssassin that looks at the top-most
Received header and checks if the HELO identifer and the reverse DNS
hostname are the same, and apply a weight accordingly.

It's easy to see if they are the same, using an internal debug header
and a backreference. Assume HEADER is of the form "rdns=hostname
helo=hostname" then the simple rule of:

HEADER =~ /rdns=(.*) helo=\1/

will match when they are the same. But I need to match when they are
different.

I've tried negative look-ahead of various forms, but nothing seems to
work correctly when backreferences are included. Is there a way out of
this hole?

Perl 5.8.1 on SuSE Linux Professional 9.0, if it matters.

Thanks
 
A

Anno Siegel

I'm a relative newbie with perl/regexp

I'm trying to write a rule for SpamAssassin that looks at the top-most
Received header and checks if the HELO identifer and the reverse DNS
hostname are the same, and apply a weight accordingly.

It's easy to see if they are the same, using an internal debug header
and a backreference. Assume HEADER is of the form "rdns=hostname
helo=hostname" then the simple rule of:

HEADER =~ /rdns=(.*) helo=\1/

will match when they are the same. But I need to match when they are
different.

I've tried negative look-ahead of various forms, but nothing seems to
work correctly when backreferences are included. Is there a way out of
this hole?

What have you tried? Negative lookahead should work just fine for this.

Anno
 
E

eric.hall

I've tried wrapping just the "rdns" part in a negative look-ahead, and
I've tried wrapping the whole thing, and neither produces a match when
the values are different.

Here's the first test, using non-matching names:

#!/usr/bin/perl

$_ = "[ rdns=hostname1 helo=hostname2 ]";

if ( /^[^\]]+ (?!rdns=(.*) helo=\1)/ ) {
print "got <$1>\n";
}

$ test.pl
got <>

That looks like it works, but it produces the same results ("got <>")
even when rdns and helo are the same, meaning that the test seems to
err out in all cases.

Am I trapping the wrong output or something?
 
E

eric.hall

^[^\]]+ rdns=(\S*) helo=(?!\1) returns hostname1, as needed. From my
admittedly limited understanding, it does not appear that the negative
look-ahead is being interpreted as such, and this gobbledygoo should
not work.

I'm content to live with the mystery, but if somebody could explain it
or reference material that says why it works, I'd be appreciative.
 
A

Anno Siegel

Please give an attribution and some context in your reply.
I've tried wrapping just the "rdns" part in a negative look-ahead, and
I've tried wrapping the whole thing, and neither produces a match when
the values are different.

Here's the first test, using non-matching names:

#!/usr/bin/perl

$_ = "[ rdns=hostname1 helo=hostname2 ]";

if ( /^[^\]]+ (?!rdns=(.*) helo=\1)/ ) {
print "got <$1>\n";
}

$ test.pl
got <>

That looks like it works, but it produces the same results ("got <>")
even when rdns and helo are the same, meaning that the test seems to
err out in all cases.

To me it doesn't look at all like it works, given that it failed to
capture anything in $1.

It's really rather simple. The test for equality can be done with just a
backreference, without lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=\1/;
}

That reports "AAA", the case where both are equal. Now turn the sense
of the test around, wrapping the backreference in a negative lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
}

Now it reports "BBB". That's it.

Anno
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Anno Siegel
That reports "AAA", the case where both are equal. Now turn the sense
of the test around, wrapping the backreference in a negative lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
}

Now it reports "BBB". That's it.

Do not think so. One needs some anchor at the end. Something like

/rdns=(\w+) helo=(?!\1\b)/;

(mutatis mutandis). Having different match than \w+ will lead so more
complicated stuff than \b... In perfect life, one would use something
like my (proposed) onion rings:

/rdns=(\S*) helo=(?& \S* & (?!\1)/;

Hope this helps,
Ilya
 
A

Anno Siegel

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Anno Siegel
That reports "AAA", the case where both are equal. Now turn the sense
of the test around, wrapping the backreference in a negative lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
}

Now it reports "BBB". That's it.

Do not think so. One needs some anchor at the end. Something like

/rdns=(\w+) helo=(?!\1\b)/;

That's right.
(mutatis mutandis). Having different match than \w+ will lead so more
complicated stuff than \b... In perfect life, one would use something
like my (proposed) onion rings:

/rdns=(\S*) helo=(?& \S* & (?!\1)/;

Is that proposal available somewhere? I'm not sure how "(?&" is supposed
to work. Should the parens balance?

Anno
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Anno Siegel

/rdns=(\S*) helo=(?& \S* & (?!\1))/

maybe even

/rdns=(\S*) helo=(?& \S* &! \1)/
Is that proposal available somewhere?

I think so. google for it...
I'm not sure how "(?&" is supposed to work.

A & B & C & D ...

B should match a substring of what A matched, C should match a
substring of what B matched etc... One can replace & by &! (negating
the following group). Actually, another, "anchored", flavor is useful
in other situations: one where B should match *exactly* the string
which A matched (and not a substring thereof). [example above uses
the second flavor]

It was never clear to me how to distinguish these two flavors; maybe
something as simple as && vs &...
Should the parens balance?

Sure, thanks.

Yours,
Ilya
 
A

Anno Siegel

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Anno Siegel

/rdns=(\S*) helo=(?& \S* & (?!\1))/

maybe even

/rdns=(\S*) helo=(?& \S* &! \1)/
Is that proposal available somewhere?

I think so. google for it...

I tried. In the presence of a number of _State of the Onion_s the puns
overwhelmed me.
A & B & C & D ...

B should match a substring of what A matched, C should match a
substring of what B matched etc... One can replace & by &! (negating
the following group).

Ah... Now I'm getting the name too -- successive substrings.
Actually, another, "anchored", flavor is useful
in other situations: one where B should match *exactly* the string
which A matched (and not a substring thereof). [example above uses
the second flavor]

With infinitesimal onion rings...
It was never clear to me how to distinguish these two flavors; maybe
something as simple as && vs &...

Hard to remember which is which in a not-too-often-used construct.
How about =& ?

Anno
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Anno Siegel
A & B & C & D ...

B should match a substring of what A matched, C should match a
substring of what B matched etc... One can replace & by &! (negating
the following group).

Ah... Now I'm getting the name too -- successive substrings.
Actually, another, "anchored", flavor is useful
in other situations: one where B should match *exactly* the string
which A matched (and not a substring thereof). [example above uses
the second flavor]

With infinitesimal onion rings...
It was never clear to me how to distinguish these two flavors; maybe
something as simple as && vs &...

Hard to remember which is which in a not-too-often-used construct.
How about =& ?

The idea of &= is appealing indeed. Uniformizing, it may become

&~ &= &!~ &!=

or just

& &= &! &!=

Hard to decide between these two...

Thanks,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top