Strange result with Regexp

Discussion in 'Javascript' started by howa, Apr 2, 2008.

  1. howa

    howa Guest

    E.g.

    var s = "12345d";
    document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));


    It shows:

    s=12345d, xxdx

    while I expect

    xd

    Any suggestions?

    Thanks.
    howa, Apr 2, 2008
    #1
    1. Advertising

  2. howa <> writes:

    > var s = "12345d";
    > document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));


    > It shows:
    >
    > s=12345d, xxdx


    I would have expected xdx, but your result is equally valid.
    The regular expression /[0-9]*/ matches *zero* or more digits.
    Change it to /[0-9]+/.

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
    Lasse Reichstein Nielsen, Apr 2, 2008
    #2
    1. Advertising

  3. howa

    pr Guest

    howa wrote:
    > E.g.
    >
    > var s = "12345d";
    > document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));
    >
    >
    > It shows:
    >
    > s=12345d, xxdx
    >
    > while I expect
    >
    > xd
    >
    > Any suggestions?


    As Lasse says, '*' matches zero or more. In theory, globally replacing a
    zero-length string should be an infinite task. In practice
    (fortunately), the regular expression engine avoids consecutive
    zero-length matches. Therefore you have one 5-digit match and two
    0-digit matches, one each side of the 'd'.

    These examples look even odder:

    "d".replace(/[0-9]*/g, "x") // xdx
    "dddd".replace(/[0-9]*/g, "x") // xdxdxdxdx

    To preserve your sanity :) try to consider '*' as a last resort. And
    only use that 'g' flag if you mean it.
    pr, Apr 2, 2008
    #3
  4. howa

    Lee Guest

    howa said:
    >
    >E.g.
    >
    >var s = "12345d";
    >document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));
    >
    >
    >It shows:
    >
    >s=12345d, xxdx
    >
    >while I expect
    >
    >xd
    >
    >Any suggestions?


    You don't really want to be specifying "zero or more",
    or even "one or more". Simply replace *each individual*
    digit with an "x", allowing the "g" flag to do the work:

    replace(/[0-9]/g,"x")


    --
    Lee, Apr 2, 2008
    #4
  5. howa wrote:
    > E.g.
    >
    > var s = "12345d";
    > document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));
    >
    >
    > It shows:
    >
    > s=12345d, xxdx
    >
    > while I expect
    >
    > xd
    >
    > Any suggestions?


    Remove 'g' modifier from regexp and you will get your xd
    Alexey Kulentsov, Apr 2, 2008
    #5
  6. pr wrote:
    > howa wrote:
    >> var s = "12345d";
    >> document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));
    >>
    >> It shows:
    >>
    >> s=12345d, xxdx
    >>
    >> while I expect
    >>
    >> xd

    >
    > [...] In theory, globally replacing a zero-length string should be an
    > infinite task. In practice (fortunately), the regular expression engine
    > avoids consecutive zero-length matches. Therefore you have one 5-digit
    > match and two 0-digit matches, one each side of the 'd'.


    Not at all. In theory, there is an ε (epsilon) production; please read
    about Regular Grammars:

    http://en.wikipedia.org/wiki/Formal_grammar#Regular_grammars

    In practice, Regular Expressions match *non-overlapping* occurrences of the
    pattern in the string which means that even with global matching no position
    is visited twice by the matcher; please read ECMA-262 Ed. 3 Final, section
    15.5.4.11:

    http://www.ecmascript.org/docs.php

    Here is what happens, in a nutshell (I used `^' to indicate the next
    possible match, and `ε' for the empty word/string to be matched):

    0. Input string: "12345d"
    Regular Expression: /[0-9]*/g --> lastIndex=0
    Replacement string: "x"

    1. Find matches for the Regular Expression.

    position 0 1 2 3 4 5
    ε1ε2ε3ε4ε5εdε
    ^ ^ ^ ^ ^
    (/[0-9]*/, lastIndex=0) --> ("12345", index=0, lastIndex=5)

    Greedy matching, so the longest match wins.
    The global flag is set, continue.

    2. Find more matches for the Regular Expression.

    position 0 1 2 3 4 5
    ε1ε2ε3ε4ε5εdε
    ^
    (/[0-9]*/, lastIndex=5) --> (ε, index=5, lastIndex=5)

    The longest and only possible match that remains is the empty string;
    next possible match after position 4.
    The global flag is set, continue.

    3. Find more matches for the Regular Expression.

    position 0 1 2 3 4 5 6
    ε1ε2ε3ε4ε5εdε
    ^
    (/[0-9]*/, lastIndex=5) --> (ε, index=6, lastIndex=6)

    The longest and only possible match that remains is the empty string;
    next possible match after position 5.
    The global flag is set, continue.

    4. Find more matches for the Regular Expression.

    position 0 1 2 3 4 5 6
    ε1ε2ε3ε4ε5εdε
    ^
    (/[0-9]*/, lastIndex=6) --> (null, index=6, lastIndex=0)

    End of string, no further matches possible.

    5. Found matches:

    ("12345", index=0, lastIndex=5),
    (ε, index=5, lastIndex=6),
    (ε, index=6, lastIndex=6),

    6. Replace all matches with the replacement string each.

    position 0 1 2 3 4 5 6
    ε1ε2ε3ε4ε5εdε

    Result: x xdx

    7. Result: "xxdx"

    You can confirm this when evaluating the return value of
    "12345d".match(/[0-9]*/g) -- as defined in the Specification -- which is
    ["12345", "", ""] whereas the matches "" can be understood as those
    literally matching ε, the empty word/string.


    HTH

    PointedEars
    --
    Anyone who slaps a 'this page is best viewed with Browser X' label on
    a Web page appears to be yearning for the bad old days, before the Web,
    when you had very little chance of reading a document written on another
    computer, another word processor, or another network. -- Tim Berners-Lee
    Thomas 'PointedEars' Lahn, Apr 3, 2008
    #6
  7. howa

    pr Guest

    Thomas 'PointedEars' Lahn wrote:
    > pr wrote:
    >> [...] In theory, globally replacing a zero-length string should be an
    >> infinite task. In practice (fortunately), the regular expression engine
    >> avoids consecutive zero-length matches. Therefore you have one 5-digit
    >> match and two 0-digit matches, one each side of the 'd'.

    >
    > Not at all. In theory, there is an ε (epsilon) production; please read
    > about Regular Grammars:
    >
    > http://en.wikipedia.org/wiki/Formal_grammar#Regular_grammars


    I didn't know about those.

    >
    > In practice, Regular Expressions match *non-overlapping* occurrences of the
    > pattern in the string which means that even with global matching no position
    > is visited twice by the matcher; please read ECMA-262 Ed. 3 Final, section
    > 15.5.4.11:


    Are you going to tell me that zero-length strings can overlap? Is that
    another mathematics thing?

    >
    > http://www.ecmascript.org/docs.php
    >


    15.5.4.10:

    | If regexp.global is true: Set the regexp.lastIndex property to 0 and
    | invoke RegExp.prototype.exec repeatedly until there is no match. If
    | there is a match with an empty string (in other words, if the value
    | of regexp.lastIndex is left unchanged), increment regexp.lastIndex
    | by 1.

    and 15.10.2.5

    | Step 1 of the RepeatMatcher's closure d states that, once the
    | minimum number of repetitions has been satisfied, any more
    | expansions of Atom that match the empty string are not considered
    | for further repetitions. This prevents the regular expression engine
    | from falling into an infinite loop on patterns such
    | as:
    |
    | /(a*)*/.exec("b")

    > Here is what happens, in a nutshell (I used `^' to indicate the next
    > possible match, and `ε' for the empty word/string to be matched):
    >

    [...]

    Your explanation is more detailed but I don't think it says anything
    mine didn't. Seems one of us misread.
    >
    > You can confirm this when evaluating the return value of
    > "12345d".match(/[0-9]*/g) -- as defined in the Specification -- which is
    > ["12345", "", ""] whereas the matches "" can be understood as those
    > literally matching ε, the empty word/string.


    Exactly; 'one 5-digit match and two 0-digit matches', since the
    expression matched zero or more digits. Or, to put it another way:

    (function () {
    var s = "12345d";
    var re = /[0-9]*/g, results;
    while ((results = re.exec(s)) &&
    confirm(["'" + results[0] + "'", results.index,
    re.lastIndex].join(" | ") + "\n")) {
    if (results[0].length == 0) {
    re.lastIndex++;
    }
    }
    })();
    pr, Apr 4, 2008
    #7
  8. pr wrote:
    > Thomas 'PointedEars' Lahn wrote:
    >> pr wrote:
    >>> [...] In theory, globally replacing a zero-length string should be an
    >>> infinite task. In practice (fortunately), the regular expression engine
    >>> avoids consecutive zero-length matches. Therefore you have one 5-digit
    >>> match and two 0-digit matches, one each side of the 'd'.

    >> Not at all. [...]
    >> In practice, Regular Expressions match *non-overlapping* occurrences of the
    >> pattern in the string which means that even with global matching no position
    >> is visited twice by the matcher; please read ECMA-262 Ed. 3 Final, section
    >> 15.5.4.11:

    >
    > Are you going to tell me that zero-length strings can overlap? Is that
    > another mathematics thing?


    I was talking about patterns in the string, about not strings. IOW,

    (ab|abc)

    matches only "ab" in "abcd", not also "abc", because these two patterns in
    the string overlap. This is accomplished quite simply by continue matching
    at the endIndex of the previous match, and not at its index. Which is the
    reason why one observes the result of "xxdx".

    >> http://www.ecmascript.org/docs.php

    >
    > 15.5.4.10:
    >
    > | If regexp.global is true: Set the regexp.lastIndex property to 0 and
    > | invoke RegExp.prototype.exec repeatedly until there is no match. If
    > | there is a match with an empty string (in other words, if the value
    > | of regexp.lastIndex is left unchanged), increment regexp.lastIndex
    > | by 1.
    >
    > and 15.10.2.5
    >
    > | Step 1 of the RepeatMatcher's closure d states that, once the
    > | minimum number of repetitions has been satisfied, any more
    > | expansions of Atom that match the empty string are not considered
    > | for further repetitions. This prevents the regular expression engine
    > | from falling into an infinite loop on patterns such
    > | as:
    > |
    > | /(a*)*/.exec("b")


    What you said is quite different from that. It has not anything to do with
    "consecutive zero-length matches". As I have showed, there are consecutive
    zero-length matches that are considered.

    In plain English, the above paragraph merely says that once the matcher has
    tried to match the empty word (length=0), it stops and continues at the
    position of the next occurrence of the pattern in the string, as I have showed.

    >> Here is what happens, in a nutshell (I used `^' to indicate the next
    >> possible match, and `ε' for the empty word/string to be matched):

    > [...]
    >
    > Your explanation is more detailed but I don't think it says anything
    > mine didn't.


    Yes, it does.

    > Seems one of us misread.


    Yes, you did.


    PointedEars
    --
    var bugRiddenCrashPronePieceOfJunk = (
    navigator.userAgent.indexOf('MSIE 5') != -1
    && navigator.userAgent.indexOf('Mac') != -1
    ) // Plone, register_function.js:16
    Thomas 'PointedEars' Lahn, Apr 4, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. J.Ram
    Replies:
    7
    Views:
    638
  2. Pavel
    Replies:
    7
    Views:
    507
    Pavel
    Sep 19, 2010
  3. Lakshmi Sreekanth

    i = 10; result = ++i - --i; How result become ZERO

    Lakshmi Sreekanth, Sep 21, 2010, in forum: C Programming
    Replies:
    52
    Views:
    1,139
    Nick Keighley
    Sep 23, 2010
  4. Michael Tan
    Replies:
    32
    Views:
    925
    Ara.T.Howard
    Jul 21, 2005
  5. Joao Silva
    Replies:
    16
    Views:
    337
    7stud --
    Aug 21, 2009
Loading...

Share This Page