JSON.parse

Discussion in 'Javascript' started by Douglas Crockford, Dec 29, 2005.

  1. There is a new version of JSON.parse in JavaScript. It is vastly
    faster and smaller than the previous version. It uses a single call to
    eval to do the conversion, guarded by a single regexp test to assure
    that it is safe.

    JSON.parse = function (text) {
    return
    (/^(\s|[,:{}\[\]]|"(\\["\\bfnrtu]|[^\x00-\x1f"\\])*"|-?\d+(\.\d*)?([eE][+-]?\d+)?|true|false|null)+$/.test(text))
    && eval('(' + text + ')');
    };

    It is ugly, but it is really efficient. See
    http://www.crockford.com/JSON/js.html
    Douglas Crockford, Dec 29, 2005
    #1
    1. Advertising

  2. Douglas Crockford

    VK Guest

    Douglas Crockford wrote:
    (/^(\s|[,:{}\[\]]|"(\\["\\bfnrtu]|[^\x00-\x1f"\\])*"|-?\d+(\.\d*)?([eE][+-]?\d+)?|true|false|null)+$/.test(text))

    > && eval('(' + text + ')');
    > };


    Far of being a RegExp guru - trully sincerly not :-

    In case of static RegExp are not they more runtime effective if
    precompiled?

    var re = /r/e/g/e/x/p/;
    ....
    re.test(string);
    ....
    VK, Dec 29, 2005
    #2
    1. Advertising

  3. Douglas Crockford <> writes:

    ....
    > guarded by a single regexp test to assure that it is safe.
    >
    > JSON.parse = function (text) {
    > return
    > (/^(\s|[,:{}\[\]]|"(\\["\\bfnrtu]|[^\x00-\x1f"\\])*"|-?\d+(\.\d*)?([eE][+-]?\d+)?|true|false|null)+$/.test(text))


    Looks reasonable (but a comment stating what it is supposed to match
    would would make it much more readable :)

    For efficiency, I'd change \s to \s+.

    If the regexp doesn't match, then false is returned. This can also
    be the value of the JSON expression. Perhaps it would be safer to
    return undefined if the test fails, i.e.,
    re.test(text) ? eval("("+test+")") : undefined;
    or
    if(re.test(text)) { return eval("("+test+")"); }

    Also, you could move the creation of the RegExp object out of the
    function, and reuse it for each call, instead of creating a new,
    lengthy, RegExp for each call. However, that is only important if
    calls are frequent, which they probably shouldn't be anyway.

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
    Lasse Reichstein Nielsen, Dec 29, 2005
    #3
  4. Lasse Reichstein Nielsen wrote in news: in
    comp.lang.javascript:

    > Also, you could move the creation of the RegExp object out of the
    > function, and reuse it for each call, instead of creating a new,
    > lengthy, RegExp for each call. However, that is only important if
    > calls are frequent, which they probably shouldn't be anyway.
    >


    VK, stated something simialar to this too.

    But AIUI a RegExp that comes from a /.../ expression is (supposed
    to be) compiled when the function body is compiled, IOW it only
    happens once, the /.../ expresion having been replaced with a
    compiled RegExp object.

    Rob.
    --
    http://www.victim-prime.dsl.pipex.com/
    Rob Williscroft, Dec 29, 2005
    #4
  5. Douglas Crockford

    Guest

    Lasse Reichstein Nielsen napisal(a):
    > Looks reasonable (but a comment stating what it is supposed to match
    > would would make it much more readable :)
    >
    > For efficiency, I'd change \s to \s+.


    Please oh please, don't sacrifice unambiguity for grammar correctness.
    .. matches any single character. The above would mean a sequence of one
    or more whitespaces followed by a single arbitrary character (...and
    then the rest of re) which not only slows the regexp instead of
    speeding it up, but also changes its meaning.
    It took me a while to understand the . means the end of the sentence
    here.

    For efficiency, I'd change "\s" to "\s+".

    Sure the rules of English state the final dot should go INSIDE the
    quotation marks, but that would be even worse.
    , Dec 29, 2005
    #5
  6. Douglas Crockford

    VK Guest

    Douglas Crockford wrote:
    > There is a new version of JSON.parse in JavaScript. It is vastly
    > faster and smaller than the previous version. It uses a single call to
    > eval to do the conversion, guarded by a single regexp test to assure
    > that it is safe.
    >
    > JSON.parse = function (text) {
    > return
    > (/^(\s|[,:{}\[\]]|"(\\["\\bfnrtu]|[^\x00-\x1f"\\])*"|-?\d+(\.\d*)?([eE][+-]?\d+)?|true|false|null)+$/.test(text))
    > && eval('(' + text + ')');
    > };
    >
    > It is ugly, but it is really efficient. See
    > http://www.crockford.com/JSON/js.html


    Semi-irrelevant to this post but important to know:

    1. JSON engine versioning
    Would the above to be considered as JSON 1.01, JSON 1.1 or JSON 2.0 or
    ?
    It is crutial for benchmark references and proper download refs.

    2. In the light of recent events (like JSON as one of official data
    interfaces of Yahoo!) does author plan to change anyhow the licensing
    (I hope not).

    3. Leaving JSON engine in the public domain would it be possible to
    narrow the covering license? So far JSON goes under the proprietary
    "The Software shall be used for Good, not Evil." As good as it is -
    would it be possible to move the software under one of more lecally
    specific free software licenses? Like GNU General License or another
    well defined copyleft license? If it is not desirable could author to
    collaborate on the definition of Evil in the application to JSON? Say
    non-ECMA-compliant code or no Firefox support - would it be an evil? Or
    the license means the Evil in the social and religious aspects only?

    I'm not trying to be nasty - but sometimes a dot counts for big
    troubles.
    VK, Dec 29, 2005
    #6
  7. Thomas 'PointedEars' Lahn wrote in
    news: in comp.lang.javascript:

    > Rob Williscroft wrote:
    >
    >> Lasse Reichstein Nielsen wrote [...]:
    >>> Also, you could move the creation of the RegExp object out of the
    >>> function, and reuse it for each call, instead of creating a new,
    >>> lengthy, RegExp for each call. However, that is only important if
    >>> calls are frequent, which they probably shouldn't be anyway.

    >>
    >> VK, stated something simialar to this too.
    >>
    >> But AIUI a RegExp that comes from a /.../ expression is (supposed
    >> to be) compiled when the function body is compiled, IOW it only
    >> happens once, the /.../ expresion having been replaced with a
    >> compiled RegExp object.

    >
    > `/.../' is equivalent to `new RegExp(...)', see ECMAScript (ES) 3,
    > 7.8.5. There is a RegExp object created on each call and GC'd shortly
    > after, so it is more efficient to create that object once and make it
    > globally available. To avoid spoiling the global namespace and attach
    > the object reference to the method that uses it, I wrote
    >
    > JSON.parse = function(...) { ... JSON.parse.rx ... };
    > JSON.parse.rx = /.../


    Thanks for the reference,

    Standard ECMA-262 3rd Edition - December 1999

    7.8.5 Regular Expression Literals

    A regular expression literal is an input element that is converted
    to a RegExp object (section 15.10) when it is scanned. The object
    is created before evaluation of the containing program or function
    begins. Evaluation of the literal produces a reference to that object;
    it does not create a new object. ...

    The above confirms my "AIUI" above, and confirms that there *isn't*
    a "new RegExp object created on each call".

    Has this version (ECMA-262) been superseeded ?
    >
    > However, it should be taken into account that RegExp.prototype.test()
    > is doing very much the same as RegExp.prototype.exec() does (ES3,
    > 15.10.6.3) and so it may not be wise to use a globally available
    > RegExp object that retains the status of the last match.
    >


    This shouldn't be a problem for a RegExp that only ever has test()
    called on it (as with the OP's code) as AFAICT exec() will only
    ever reset the lastIndex property to 0 (which is the default anyway).

    Rob.
    --
    http://www.victim-prime.dsl.pipex.com/
    Rob Williscroft, Dec 29, 2005
    #7
  8. Rob Williscroft wrote:

    > Thomas 'PointedEars' Lahn wrote [...]:
    >> Rob Williscroft wrote:
    >>> But AIUI a RegExp that comes from a /.../ expression is (supposed
    >>> to be) compiled when the function body is compiled, IOW it only
    >>> happens once, the /.../ expresion having been replaced with a
    >>> compiled RegExp object.

    >>
    >> `/.../' is equivalent to `new RegExp(...)', see ECMAScript (ES) 3,
    >> 7.8.5. There is a RegExp object created on each call and GC'd shortly
    >> after, so it is more efficient to create that object once and make it
    >> globally available. To avoid spoiling the global namespace and attach
    >> the object reference to the method that uses it, I wrote
    >>
    >> JSON.parse = function(...) { ... JSON.parse.rx ... };
    >> JSON.parse.rx = /.../

    >
    > Thanks for the reference,
    >
    > Standard ECMA-262 3rd Edition - December 1999
    >
    > 7.8.5 Regular Expression Literals
    >
    > A regular expression literal is an input element that is converted
    > to a RegExp object (section 15.10) when it is scanned. The object
    > is created before evaluation of the containing program or function
    > begins. Evaluation of the literal produces a reference to that object;
    > it does not create a new object. ...
    >
    > The above confirms my "AIUI" above, and confirms that there *isn't*
    > a "new RegExp object created on each call".


    Yes, indeed. Somehow I overlooked the following sentences all the time,
    and it appears I was not the only one here. Thank /you/ for pointing that
    out.

    > Has this version (ECMA-262)


    It is ECMA-262 (ECMAScript) _Edition_ 3, actually.

    > been superseeded ?


    There is a PDF and Microsoft Word version of the ECMAScript Language
    Specification that have 3 more pages (ref. PDF versions), are titled
    "Edition 3 Final" and dated March 24, 2000 inside. (They refer to
    themselves being downloadable from ftp.ecma.ch. However, [ftp.]ecma.ch
    is no longer and ftp.ecma-international.org appears not to provide
    access with anonymous login.)

    These can be downloaded from

    <URL:http://www.mozilla.org/js/language/>

    Although it does not appear to include the required corrections mentioned
    in the errata, the "Final" addition and the date indicate that this is the
    latest revision published by the ECMA; it is unclear why only the December
    1999 revision is linked on ecma-international.org. (Maybe the mozilla.org
    folks have access to more recent information on ECMA's FTP server because
    the Mozilla Foundation is an ECMA member.) A text comparison between the
    two revisions I did today is inconclusive as yet.

    However, whether it should be considered normative or not, that latest
    revision says the same as its predecessor; you are correct.

    >> However, it should be taken into account that RegExp.prototype.test()
    >> is doing very much the same as RegExp.prototype.exec() does (ES3,
    >> 15.10.6.3) and so it may not be wise to use a globally available
    >> RegExp object that retains the status of the last match.

    >
    > This shouldn't be a problem for a RegExp that only ever has test()
    > called on it (as with the OP's code) as AFAICT exec() will only
    > ever reset the lastIndex property to 0 (which is the default anyway).


    No, it could pose a problem since the next match will start from the
    position the `lastIndex' property indicates. The value of that property is
    reset to 0 iff "I < 0 or I > length" (15.10.6.2.6.), where according to
    step 2 `length' refers to the length of the string the method is passed.
    It is unclear what `I' refers to; known implementations suggest that this
    is a typo not covered in the errata and actually `i' is meant. If we
    assume this, `i' would be the value of ToInteger(lastIndex), according to
    step 4, which is in fact the behavior of those implementations. That means
    previous calls of RegExp.prototype.exec() on the same RegExp object do
    affect the current call on the same object, unless

    | 5. If the global property is false, let i = 0.

    According to 15.10.4.1,

    | The global property of the newly constructed object is set to a Boolean
    | value that is true if F contains the character "g" and false otherwise.

    So it does not pose a problem _here_, as Douglas is not using a global
    expression (and the expression is anchored on both sides anyway.)


    PointedEars
    Thomas 'PointedEars' Lahn, Dec 29, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Florian Frank
    Replies:
    0
    Views:
    226
    Florian Frank
    Jun 30, 2009
  2. sajuptpm
    Replies:
    2
    Views:
    315
    sajuptpm
    Dec 28, 2012
  3. Acácio Centeno
    Replies:
    1
    Views:
    240
    dieter
    Feb 15, 2013
  4. Bryan Britten
    Replies:
    9
    Views:
    255
    Bryan Britten
    May 28, 2013
  5. David Karr
    Replies:
    1
    Views:
    152
    David Karr
    Jun 17, 2013
Loading...

Share This Page