Small confusion about negative lookbehind

Discussion in 'Java' started by david.karr@wamu.net, May 30, 2005.

  1. Guest

    I'm writing a small test program to illustrate several aspects of
    regular expressions. In the section illustrating "lookaround"s, I
    found something I didn't understand. My testing is with JDK 1.4.2.

    My candidate string is "ab".

    The expressions I'm testing this string against are the following,
    which also lists whether the string matched or not

    a(?=b) // succeeds
    (?=a)b // fails
    (?<=a)b // succeeds
    a(?<=b) // fails
    (?<!x)b // succeeds
    a(?<!x) // succeeds(!)

    Looking at these, I first wonder what exactly is the semantic
    difference between a "lookbehind" and "lookahead" construct. The
    syntactic difference is obvious, but I find the question of why pattern
    1 succeeds and pattern 2 fails is a little hazy. The one that really
    bothers me, however, is pattern 6. Despite the lack of clarity I have
    in how this is supposed to work, I was pretty certain that this pattern
    would fail.

    I could use some clarification of these constructs.
     
    , May 30, 2005
    #1
    1. Advertising

  2. writes:

    > I'm writing a small test program to illustrate several aspects of
    > regular expressions. In the section illustrating "lookaround"s, I
    > found something I didn't understand. My testing is with JDK 1.4.2.


    Hey, I didn't even know about look-behinds :)

    > My candidate string is "ab".
    >
    > The expressions I'm testing this string against are the following,
    > which also lists whether the string matched or not

    ....
    > Looking at these, I first wonder what exactly is the semantic
    > difference between a "lookbehind" and "lookahead" construct.


    Both are zero-width predicates, which means (kindof) that it matches
    not a character, but the position between characters. See a string
    as not just a sequence of characters, but of alternating characters
    and in-between positions. These positions are where the cursor is
    when you write (if you use a bar cursor, not a block, obviously :).

    Regular expressions describe not only strings, but also the positions
    between the chars in strings, e.g. "\b" which matches a position which
    is at a word boundary (word-charater on one side, non-word-character
    on the other). The look-around patters work just the same.

    The exact predicate determines how the position is matched. For a
    look-ahead, the zero-width position is matched if the following
    characters is matched by the look-ahead expression. For the
    look-behind, the zero-width position is matched if the previous
    characters match the look-behind expression.

    So, "a(?=b)" matches an "a" followed by a zero-width string which is
    followed by a "b". The matched substring of "ab" is "a".

    "(?=a)b" matches a zero-width string which is followed by an "a",
    followed by a "b". Since no position can be followed by both an "a"
    and a "b", no string will match.

    "(?<=a)b" matches a zero-width string preceeded by an "a", followed
    by a "b". The matched substring of "ab" is "b".

    "a(?<=b)" matches an "a" followed by a zero-width string preceeded by
    a "b". Since that's not possible for any string, it fails.

    > (?<!x)b // succeeds


    "(?<!x)b" matches a zero-width string not preceeded by an "x",
    followed by a "b". The matched substring of "ab" is "b".

    > a(?<!x) // succeeds(!)


    "a(?<!x)" matches an "a" followed by a zero-width string not preceeded
    by an "x". This matches the string "a", even as a substring of "ab".

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, May 31, 2005
    #2
    1. Advertising

  3. hiwa Guest

    wrote in message news:<>...
    > I'm writing a small test program to illustrate several aspects of
    > regular expressions. In the section illustrating "lookaround"s, I
    > found something I didn't understand. My testing is with JDK 1.4.2.
    >
    > My candidate string is "ab".
    >
    > The expressions I'm testing this string against are the following,
    > which also lists whether the string matched or not
    >
    > a(?=b) // succeeds
    > (?=a)b // fails
    > (?<=a)b // succeeds
    > a(?<=b) // fails
    > (?<!x)b // succeeds
    > a(?<!x) // succeeds(!)
    >
    > Looking at these, I first wonder what exactly is the semantic
    > difference between a "lookbehind" and "lookahead" construct. The
    > syntactic difference is obvious, but I find the question of why pattern

    a(?=b) There is a 'b' after me 'a' //succeeds, matches 'a' of "ab"
    (?=a)b There is a 'a' of which prefix is 'b' //fails with "ab"
    (?<=a)b There is a 'a' before a 'b' //succeeds, matches 'b' of "ab"
    a(?<=b) There is a 'b' before a 'a' //fails with "ab"
    (?<!x)b There is no 'x' before 'b' //succeeds, matches 'b' of "ab"
    a(?<!x) There is no 'x' before 'a' //succeeds, matches 'a' of "ab"
    > 1 succeeds and pattern 2 fails is a little hazy. The one that really
    > bothers me, however, is pattern 6. Despite the lack of clarity I have
    > in how this is supposed to work, I was pretty certain that this pattern
    > would fail.
    >
    > I could use some clarification of these constructs.
     
    hiwa, May 31, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas F. O'Connell

    Negative Lookbehind and Wildcards

    Thomas F. O'Connell, Feb 27, 2004, in forum: Perl
    Replies:
    1
    Views:
    725
    Gunnar Hjalmarsson
    Feb 28, 2004
  2. mail
    Replies:
    1
    Views:
    534
    Will Stranathan
    Mar 2, 2004
  3. Gabriel Rossetti
    Replies:
    0
    Views:
    585
    Gabriel Rossetti
    Mar 31, 2009
  4. MRAB
    Replies:
    0
    Views:
    526
  5. mail
    Replies:
    0
    Views:
    100
Loading...

Share This Page