lookahead bug (at least in 5.8.4)

Discussion in 'Perl Misc' started by use63net@yahoo.com, Jun 11, 2007.

  1. Guest

    'ab' =~ /(?=.*a)b/; # How the heck is this false?
    'ab' =~ /(?=a)b/; # Nope, but still looks like it should match
    'Xab' =~ /(?=.*a)b/; # Nice try, but putting in any other characters
    doesn't work.

    'ab' =~ /(?=.*b)a/; # At least this matches as expected.
     
    , Jun 11, 2007
    #1
    1. Advertising

  2. Bart Lateur Guest

    wrote:

    >'ab' =~ /(?=.*a)b/; # How the heck is this false?
    >'ab' =~ /(?=a)b/; # Nope, but still looks like it should match
    >'Xab' =~ /(?=.*a)b/; # Nice try, but putting in any other characters
    >doesn't work.
    >
    >'ab' =~ /(?=.*b)a/; # At least this matches as expected.


    The bug is in you. Your expectations are wrong. Let's go over them one
    at a time.

    /(?=.*a)b/

    In English: find a location where you can next match a "b", and where
    somewhere down the line (looking from the front of the match), there's
    an "a". So this means that there should be an "a" following the "b".
    Nope, not in this string. So it doesn't match.

    /(?=a)b/;

    Find a location where you can match a "b", and where the next character
    is "a". So this one character should be both an "a" and a "b". That
    can't be, so it is always false.

    'Xab' =~ /(?=.*a)b/;

    There's still no "a" following the "b".

    'ab' =~ /(?=.*b)a/

    Yes, there is a 'b' further down where it matches an "a". So this
    matches.


    If you want to match both an "a" somewhere down the line, and ditto with
    a "b", try

    /(?=.*a)(?=.*b)/

    or maybe better

    /^(?=.*a)(?=.*b)/s

    The /^/ is not absolutely necessary, but it'll prevent useless trying
    over and over again on failure. Either it should work from the start of
    the string, or it won't work at all.

    --
    Bart.
     
    Bart Lateur, Jun 11, 2007
    #2
    1. Advertising

  3. Guest

    Nope, not my expectations. It's how I took it worked from the Perl
    Cookbook - near the bottom of page 212:
    -----------------
    True if both /ALPHA/ and /BETA/ match, but may overlap ... like /
    ALPHA/ && /BETA/"
    /^(?=.*ALPHA)BETA/s
    ----------------

    # I'd remove the "^", but it still doesn't match anything like
    "ALPHABETA"
     
    , Jun 11, 2007
    #3
  4. Guest

    wrote:
    > Nope, not my expectations.


    What is not your expectations? Please quote some context.


    > It's how I took it worked from the Perl
    > Cookbook - near the bottom of page 212:
    > -----------------
    > True if both /ALPHA/ and /BETA/ match, but may overlap ... like /
    > ALPHA/ && /BETA/"
    > /^(?=.*ALPHA)BETA/s
    > ----------------
    >
    > # I'd remove the "^", but it still doesn't match anything like
    > "ALPHABETA"


    I think that newer versions of the Perl Cookbook has that fixed to
    something like /^(?=.*ALPHA).*BETA/s. Anyway, if you try to understand
    what those funny characters do, instead of just copying things by rote,
    then you would not be so easily tripped up by other people's errors.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Jun 11, 2007
    #4
  5. <> wrote:

    > 'ab' =~ /(?=.*a)b/; # How the heck is this false?



    The 2nd word of the description in the docs explains that:

    A zero-width positive look-ahead assertion.

    I'll use <> to mark the regex engine's current position.

    We begin at the start of the string:

    <>ab

    We do a successful *zero-width* look ahead on .*a

    <>ab

    (the current position did not advance because that is what zero-width means)

    Now we need to match a "b" next, but there is an "a" next. Cannot match here.

    Since the pattern is not anchored, we advance one character

    a<>b

    and try again.

    But now we can't match the .*a lookahead expression, so the match must fail.


    > 'ab' =~ /(?=a)b/; # Nope, but still looks like it should match
    > 'Xab' =~ /(?=.*a)b/; # Nice try, but putting in any other characters
    > doesn't work.



    Same reason for both of those.


    > 'ab' =~ /(?=.*b)a/; # At least this matches as expected.



    Let's do that one too:

    <>ab

    We do a successful zero-width look ahead on .*b

    <>ab

    Now we need to match an "a" next, and there is an "a" next. Match succeeds.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad McClellan, Jun 11, 2007
    #5
  6. On 2007-06-11 17:56, <> wrote:
    > Nope, not my expectations. It's how I took it worked


    What you "take how it works" are your expectations, no?

    > from the Perl Cookbook - near the bottom of page 212:
    > -----------------
    > True if both /ALPHA/ and /BETA/ match, but may overlap ...


    So far it's correct

    > like /ALPHA/ && /BETA/"


    But this is at least ambiguous. What it meant is that both match at the
    same position.

    > /^(?=.*ALPHA)BETA/s
    > ----------------
    >
    > # I'd remove the "^", but it still doesn't match anything like
    > "ALPHABETA"
    >


    There is no position in "ALPHABETA" where both /.*ALPHA/ and /BETA/ match:

    At position 0, /.*ALPHA/ matches, but /BETA/ doesn't.
    At position 5, /BETA/ matches, but /.*ALPHA/ doesn't.
    At all other positions, neither /.*ALPHA/ or /BETA/ match.

    OTOH, in the string "BETAALPHA",

    At position 0, /.*ALPHA/ matches "BETAALPHA", and /BETA/ matches BETA.
    At position 1, /.*ALPHA/ matches "ETAALPHA", but /BETA/ doesn't match.
    At position 2, /.*ALPHA/ matches "TAALPHA", but /BETA/ doesn't match.
    At position 3, /.*ALPHA/ matches "AALPHA", but /BETA/ doesn't match.
    At position 4, /.*ALPHA/ matches "ALPHA", but /BETA/ doesn't match.
    At all other positions, neither /.*ALPHA/ or /BETA/ match.

    (of course, since we already had a match at position 0, the whole
    pattern matches and the other positions aren't tried)

    hp

    --
    _ | Peter J. Holzer | I know I'd be respectful of a pirate
    |_|_) | Sysadmin WSR | with an emu on his shoulder.
    | | | |
    __/ | http://www.hjp.at/ | -- Sam in "Freefall"
     
    Peter J. Holzer, Jun 12, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    7
    Views:
    543
  2. Michael Powe

    regexp lookahead

    Michael Powe, May 3, 2006, in forum: Java
    Replies:
    3
    Views:
    3,413
    Jussi Piitulainen
    May 4, 2006
  3. tobiah

    Positive lookahead assertion

    tobiah, Sep 7, 2006, in forum: Python
    Replies:
    8
    Views:
    622
    Steve Holden
    Sep 8, 2006
  4. Diez B. Roggisch

    cmd.Cmd bug or at least docu-bug

    Diez B. Roggisch, May 29, 2008, in forum: Python
    Replies:
    1
    Views:
    357
    Michele Simionato
    May 29, 2008
  5. AAaron123
    Replies:
    0
    Views:
    651
    AAaron123
    Oct 3, 2008
Loading...

Share This Page