lookahead bug (at least in 5.8.4)

U

use63net

'ab' =~ /(?=.*a)b/; # How the heck is this false?
'ab' =~ /(?=a)b/; # Nope, but still looks like it should match
'Xab' =~ /(?=.*a)b/; # Nice try, but putting in any other characters
doesn't work.

'ab' =~ /(?=.*b)a/; # At least this matches as expected.
 
B

Bart Lateur

'ab' =~ /(?=.*a)b/; # How the heck is this false?
'ab' =~ /(?=a)b/; # Nope, but still looks like it should match
'Xab' =~ /(?=.*a)b/; # Nice try, but putting in any other characters
doesn't work.

'ab' =~ /(?=.*b)a/; # At least this matches as expected.

The bug is in you. Your expectations are wrong. Let's go over them one
at a time.

/(?=.*a)b/

In English: find a location where you can next match a "b", and where
somewhere down the line (looking from the front of the match), there's
an "a". So this means that there should be an "a" following the "b".
Nope, not in this string. So it doesn't match.

/(?=a)b/;

Find a location where you can match a "b", and where the next character
is "a". So this one character should be both an "a" and a "b". That
can't be, so it is always false.

'Xab' =~ /(?=.*a)b/;

There's still no "a" following the "b".

'ab' =~ /(?=.*b)a/

Yes, there is a 'b' further down where it matches an "a". So this
matches.


If you want to match both an "a" somewhere down the line, and ditto with
a "b", try

/(?=.*a)(?=.*b)/

or maybe better

/^(?=.*a)(?=.*b)/s

The /^/ is not absolutely necessary, but it'll prevent useless trying
over and over again on failure. Either it should work from the start of
the string, or it won't work at all.
 
U

use63net

Nope, not my expectations. It's how I took it worked from the Perl
Cookbook - near the bottom of page 212:
 
X

xhoster

Nope, not my expectations.

What is not your expectations? Please quote some context.

It's how I took it worked from the Perl
Cookbook - near the bottom of page 212:

I think that newer versions of the Perl Cookbook has that fixed to
something like /^(?=.*ALPHA).*BETA/s. Anyway, if you try to understand
what those funny characters do, instead of just copying things by rote,
then you would not be so easily tripped up by other people's errors.

Xho
 
T

Tad McClellan

'ab' =~ /(?=.*a)b/; # How the heck is this false?


The 2nd word of the description in the docs explains that:

A zero-width positive look-ahead assertion.

I'll use <> to mark the regex engine's current position.

We begin at the start of the string:

<>ab

We do a successful *zero-width* look ahead on .*a

<>ab

(the current position did not advance because that is what zero-width means)

Now we need to match a "b" next, but there is an "a" next. Cannot match here.

Since the pattern is not anchored, we advance one character

a<>b

and try again.

But now we can't match the .*a lookahead expression, so the match must fail.

'ab' =~ /(?=a)b/; # Nope, but still looks like it should match
'Xab' =~ /(?=.*a)b/; # Nice try, but putting in any other characters
doesn't work.


Same reason for both of those.

'ab' =~ /(?=.*b)a/; # At least this matches as expected.


Let's do that one too:

<>ab

We do a successful zero-width look ahead on .*b

<>ab

Now we need to match an "a" next, and there is an "a" next. Match succeeds.
 
P

Peter J. Holzer

Nope, not my expectations. It's how I took it worked

What you "take how it works" are your expectations, no?
from the Perl Cookbook - near the bottom of page 212:

So far it's correct
like /ALPHA/ && /BETA/"

But this is at least ambiguous. What it meant is that both match at the
same position.
/^(?=.*ALPHA)BETA/s

There is no position in "ALPHABETA" where both /.*ALPHA/ and /BETA/ match:

At position 0, /.*ALPHA/ matches, but /BETA/ doesn't.
At position 5, /BETA/ matches, but /.*ALPHA/ doesn't.
At all other positions, neither /.*ALPHA/ or /BETA/ match.

OTOH, in the string "BETAALPHA",

At position 0, /.*ALPHA/ matches "BETAALPHA", and /BETA/ matches BETA.
At position 1, /.*ALPHA/ matches "ETAALPHA", but /BETA/ doesn't match.
At position 2, /.*ALPHA/ matches "TAALPHA", but /BETA/ doesn't match.
At position 3, /.*ALPHA/ matches "AALPHA", but /BETA/ doesn't match.
At position 4, /.*ALPHA/ matches "ALPHA", but /BETA/ doesn't match.
At all other positions, neither /.*ALPHA/ or /BETA/ match.

(of course, since we already had a match at position 0, the whole
pattern matches and the other positions aren't tried)

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top