Small confusion about negative lookbehind

D

david.karr

I'm writing a small test program to illustrate several aspects of
regular expressions. In the section illustrating "lookaround"s, I
found something I didn't understand. My testing is with JDK 1.4.2.

My candidate string is "ab".

The expressions I'm testing this string against are the following,
which also lists whether the string matched or not

a(?=b) // succeeds
(?=a)b // fails
(?<=a)b // succeeds
a(?<=b) // fails
(?<!x)b // succeeds
a(?<!x) // succeeds(!)

Looking at these, I first wonder what exactly is the semantic
difference between a "lookbehind" and "lookahead" construct. The
syntactic difference is obvious, but I find the question of why pattern
1 succeeds and pattern 2 fails is a little hazy. The one that really
bothers me, however, is pattern 6. Despite the lack of clarity I have
in how this is supposed to work, I was pretty certain that this pattern
would fail.

I could use some clarification of these constructs.
 
L

Lasse Reichstein Nielsen

I'm writing a small test program to illustrate several aspects of
regular expressions. In the section illustrating "lookaround"s, I
found something I didn't understand. My testing is with JDK 1.4.2.

Hey, I didn't even know about look-behinds :)
My candidate string is "ab".

The expressions I'm testing this string against are the following,
which also lists whether the string matched or not ....
Looking at these, I first wonder what exactly is the semantic
difference between a "lookbehind" and "lookahead" construct.

Both are zero-width predicates, which means (kindof) that it matches
not a character, but the position between characters. See a string
as not just a sequence of characters, but of alternating characters
and in-between positions. These positions are where the cursor is
when you write (if you use a bar cursor, not a block, obviously :).

Regular expressions describe not only strings, but also the positions
between the chars in strings, e.g. "\b" which matches a position which
is at a word boundary (word-charater on one side, non-word-character
on the other). The look-around patters work just the same.

The exact predicate determines how the position is matched. For a
look-ahead, the zero-width position is matched if the following
characters is matched by the look-ahead expression. For the
look-behind, the zero-width position is matched if the previous
characters match the look-behind expression.

So, "a(?=b)" matches an "a" followed by a zero-width string which is
followed by a "b". The matched substring of "ab" is "a".

"(?=a)b" matches a zero-width string which is followed by an "a",
followed by a "b". Since no position can be followed by both an "a"
and a "b", no string will match.

"(?<=a)b" matches a zero-width string preceeded by an "a", followed
by a "b". The matched substring of "ab" is "b".

"a(?<=b)" matches an "a" followed by a zero-width string preceeded by
a "b". Since that's not possible for any string, it fails.
(?<!x)b // succeeds

"(?<!x)b" matches a zero-width string not preceeded by an "x",
followed by a "b". The matched substring of "ab" is "b".
a(?<!x) // succeeds(!)

"a(?<!x)" matches an "a" followed by a zero-width string not preceeded
by an "x". This matches the string "a", even as a substring of "ab".

/L
 
H

hiwa

I'm writing a small test program to illustrate several aspects of
regular expressions. In the section illustrating "lookaround"s, I
found something I didn't understand. My testing is with JDK 1.4.2.

My candidate string is "ab".

The expressions I'm testing this string against are the following,
which also lists whether the string matched or not

a(?=b) // succeeds
(?=a)b // fails
(?<=a)b // succeeds
a(?<=b) // fails
(?<!x)b // succeeds
a(?<!x) // succeeds(!)

Looking at these, I first wonder what exactly is the semantic
difference between a "lookbehind" and "lookahead" construct. The
syntactic difference is obvious, but I find the question of why pattern
a(?=b) There is a 'b' after me 'a' //succeeds, matches 'a' of "ab"
(?=a)b There is a 'a' of which prefix is 'b' //fails with "ab"
(?<=a)b There is a 'a' before a 'b' //succeeds, matches 'b' of "ab"
a(?<=b) There is a 'b' before a 'a' //fails with "ab"
(?<!x)b There is no 'x' before 'b' //succeeds, matches 'b' of "ab"
a(?<!x) There is no 'x' before 'a' //succeeds, matches 'a' of "ab"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top