Small confusion about negative lookbehind

david.karr · May 30, 2005

I'm writing a small test program to illustrate several aspects of
regular expressions. In the section illustrating "lookaround"s, I
found something I didn't understand. My testing is with JDK 1.4.2.

My candidate string is "ab".

The expressions I'm testing this string against are the following,
which also lists whether the string matched or not

a(?=b) // succeeds
(?=a)b // fails
(?<=a)b // succeeds
a(?<=b) // fails
(?<!x)b // succeeds
a(?<!x) // succeeds(!)

Looking at these, I first wonder what exactly is the semantic
difference between a "lookbehind" and "lookahead" construct. The
syntactic difference is obvious, but I find the question of why pattern
1 succeeds and pattern 2 fails is a little hazy. The one that really
bothers me, however, is pattern 6. Despite the lack of clarity I have
in how this is supposed to work, I was pretty certain that this pattern
would fail.

I could use some clarification of these constructs.

Lasse Reichstein Nielsen · May 31, 2005

I'm writing a small test program to illustrate several aspects of
regular expressions. In the section illustrating "lookaround"s, I
found something I didn't understand. My testing is with JDK 1.4.2.

Hey, I didn't even know about look-behinds

My candidate string is "ab".

The expressions I'm testing this string against are the following,
which also lists whether the string matched or not ....
Looking at these, I first wonder what exactly is the semantic
difference between a "lookbehind" and "lookahead" construct.

Both are zero-width predicates, which means (kindof) that it matches
not a character, but the position between characters. See a string
as not just a sequence of characters, but of alternating characters
and in-between positions. These positions are where the cursor is
when you write (if you use a bar cursor, not a block, obviously

.

Regular expressions describe not only strings, but also the positions
between the chars in strings, e.g. "\b" which matches a position which
is at a word boundary (word-charater on one side, non-word-character
on the other). The look-around patters work just the same.

The exact predicate determines how the position is matched. For a
look-ahead, the zero-width position is matched if the following
characters is matched by the look-ahead expression. For the
look-behind, the zero-width position is matched if the previous
characters match the look-behind expression.

So, "a(?=b)" matches an "a" followed by a zero-width string which is
followed by a "b". The matched substring of "ab" is "a".

"(?=a)b" matches a zero-width string which is followed by an "a",
followed by a "b". Since no position can be followed by both an "a"
and a "b", no string will match.

"(?<=a)b" matches a zero-width string preceeded by an "a", followed
by a "b". The matched substring of "ab" is "b".

"a(?<=b)" matches an "a" followed by a zero-width string preceeded by
a "b". Since that's not possible for any string, it fails.

(?<!x)b // succeeds

"(?<!x)b" matches a zero-width string not preceeded by an "x",
followed by a "b". The matched substring of "ab" is "b".

a(?<!x) // succeeds(!)

"a(?<!x)" matches an "a" followed by a zero-width string not preceeded
by an "x". This matches the string "a", even as a substring of "ab".

/L

hiwa · May 31, 2005

I'm writing a small test program to illustrate several aspects of
regular expressions. In the section illustrating "lookaround"s, I
found something I didn't understand. My testing is with JDK 1.4.2.

My candidate string is "ab".

The expressions I'm testing this string against are the following,
which also lists whether the string matched or not

a(?=b) // succeeds
(?=a)b // fails
(?<=a)b // succeeds
a(?<=b) // fails
(?<!x)b // succeeds
a(?<!x) // succeeds(!)

Looking at these, I first wonder what exactly is the semantic
difference between a "lookbehind" and "lookahead" construct. The
syntactic difference is obvious, but I find the question of why pattern

a(?=b) There is a 'b' after me 'a' //succeeds, matches 'a' of "ab"
(?=a)b There is a 'a' of which prefix is 'b' //fails with "ab"
(?<=a)b There is a 'a' before a 'b' //succeeds, matches 'b' of "ab"
a(?<=b) There is a 'b' before a 'a' //fails with "ab"
(?<!x)b There is no 'x' before 'b' //succeeds, matches 'b' of "ab"
a(?<!x) There is no 'x' before 'a' //succeeds, matches 'a' of "ab"

Empty constructor confusion	16	Feb 4, 2014
regex negative lookbehind assertion not working correctly?	0	Mar 31, 2009
Negative number rounding confusion	0	Apr 12, 2010
Negative Lookbehind Replacement?	1	Feb 29, 2004
Oniguruma lookbehind question	10	Jan 7, 2006
Negative Lookbehind Using Windows Scripting Host	0	Feb 7, 2004
Lexical Analysis on C++	1	Oct 31, 2023
Negative Lookbehind and Wildcards	1	Feb 27, 2004

Small confusion about negative lookbehind

david.karr

Lasse Reichstein Nielsen

hiwa

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads