Lookahead peculiarity

P

Paul Kaletta

Hi,

I think I don't understand something about negative lookahead in
regular expressions.

This program

if ($string =~ /-+(?!foo)/) {
print "Does match!\n";
} else {
print "Doesn't match!\n";
}

reports that $string = "-foo" doesn't match. That's what I expected.

But why does $string = "--foo" match? It seems to me that negative
look-ahead doesn't work after a quantifier like "+".

Of course

"--foo" =~ /-+(.*)/;
print $1;

prints "foo". I figured that by just replacing (.*) with (?!foo) I
could create a regex that doesn't match "--foo". I simply can not
figure it out.

Thanks,

Paul
 
P

Paul Lalli

Paul said:
I think I don't understand something about negative lookahead in
regular expressions.

This program

if ($string =~ /-+(?!foo)/) {
print "Does match!\n";
} else {
print "Doesn't match!\n";
}

reports that $string = "-foo" doesn't match. That's what I expected.

But why does $string = "--foo" match? It seems to me that negative
look-ahead doesn't work after a quantifier like "+".

It works exactly as it's supposed to, actually. :) You have to think
about what you actually asked the regular expression to check, not what
your intent was. You asked:
"Does $string contain one or more dashes, not followed by foo?"
The answer is yes. It very specifically found ONE dash - which meets
the qualfiication of "one or more dashes" - that was not followed by
'foo'. That one dash was instead followed by another dash.
Of course

"--foo" =~ /-+(.*)/;
print $1;

prints "foo". I figured that by just replacing (.*) with (?!foo) I
could create a regex that doesn't match "--foo". I simply can not
figure it out.

The problem your experiencing is that regular expressions TRY to match.
If you were to run this through
use re 'debug';
You would likely see that it first tried to let -+ match as many dashes
as possible. That is, all two of them. But at that point, it saw that
the pattern was going to fail, because the next token - 'foo' - failed
to match the pattern of "not followed by foo". So the regexp then back
tracks, and lets -+ match one less than it first tried. That is,
exactly one dash. At this point, the pattern is successful, because
the next token is not foo (it's another dash), and so the entire
pattern succeeds.

I hope this clarifies it for you.

Paul Lalli
 
P

Paul Kaletta

Ah. Thanks Paul! Now it's clear to me why my regex didn't work as I
imagined. But anyway, what is the right way to match at least one dash
not followed by "foo"?

Regards,

Paul Kaletta
 
P

Peter Scott

if ($string =~ /-+(?!foo)/) {
print "Does match!\n";
} else {
print "Doesn't match!\n";
}
}
reports that $string = "-foo" doesn't match. That's what I expected.

But why does $string = "--foo" match? It seems to me that negative
look-ahead doesn't work after a quantifier like "+".

Backtracking. After matching -- it then sees if the next thing is not
foo. But it is. So it gives up a state and then sees that the next thing
is -foo, which is not foo. Bingo.

You can force backtracking not to occur. perldoc perlre:

$ perl -le 'for ("-foo", "--foo") { for my $re \
(qr/-+(?!foo)/, qr/(?>-+)(?!foo)/) { print "$_ ", \
(/$re/ ? "matches $re" : "does not match $re") } }'
-foo does not match (?-xism:-+(?!foo))
-foo does not match (?-xism:(?>-+)(?!foo))
--foo matches (?-xism:-+(?!foo))
--foo does not match (?-xism:(?>-+)(?!foo))
 
A

A. Sinan Unur

Ah. Thanks Paul!

Please quote some context when you reply.
Now it's clear to me why my regex didn't work as I imagined. But anyway,
what is the right way to match at least one dash not followed by
"foo"?

perldoc perlre dicusses this (right where you find the documentation for
(?!). Anyway:

#!/usr/bin/perl

use strict;
use warnings;

my @strings = qw( --foo --bar );

for my $s ( @strings ) {
if ( $s =~ /-+(\w+)/ and $1 ne 'foo' ) {
print "$s matches\n";
}
}
__END__

if you'd rather avoid using $'.

Sinan
 
D

Dave Weaver

Paul Kaletta said:
if ($string =~ /-+(?!foo)/) {
print "Does match!\n";
} else {
print "Doesn't match!\n";
}

reports that $string = "-foo" doesn't match. That's what I expected.

But why does $string = "--foo" match? It seems to me that negative
look-ahead doesn't work after a quantifier like "+".


Your pattern is looking for a "-" (one or more times) that is not
immediately followed by "foo".

Now consider the string you're matching against:

"--foo"
^
This "-" is not immediately followed by "foo", therefore the pattern
matches.
 
X

Xicheng

Paul said:
Ah. Thanks Paul! Now it's clear to me why my regex didn't work as I
imagined. But anyway, what is the right way to match at least one dash
not followed by "foo"?

you may try negative look-behind instead:

print "match\n" if /(?<!-)foo/;

Xicheng
 
A

Anno Siegel

Paul Kaletta said:
Ah. Thanks Paul! Now it's clear to me why my regex didn't work as I
imagined. But anyway, what is the right way to match at least one dash
not followed by "foo"?

Match a single dash followed by anything that isn't a sequence of more
dashes followed by "foo":

/-(?!-*foo)/

Anno
 
L

Lukas Mai

Paul Kaletta said:
Ah. Thanks Paul! Now it's clear to me why my regex didn't work as I
imagined. But anyway, what is the right way to match at least one dash
not followed by "foo"?

/(?>-+)(?!foo)/

TMTOWTDI, Lukas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top