JSON.parse

D

Douglas Crockford

There is a new version of JSON.parse in JavaScript. It is vastly
faster and smaller than the previous version. It uses a single call to
eval to do the conversion, guarded by a single regexp test to assure
that it is safe.

JSON.parse = function (text) {
return
(/^(\s|[,:{}\[\]]|"(\\["\\bfnrtu]|[^\x00-\x1f"\\])*"|-?\d+(\.\d*)?([eE][+-]?\d+)?|true|false|null)+$/.test(text))
&& eval('(' + text + ')');
};

It is ugly, but it is really efficient. See
http://www.crockford.com/JSON/js.html
 
V

VK

Douglas Crockford wrote:
(/^(\s|[,:{}\[\]]|"(\\["\\bfnrtu]|[^\x00-\x1f"\\])*"|-?\d+(\.\d*)?([eE][+-]?\d+)?|true|false|null)+$/.test(text))
&& eval('(' + text + ')');
};

Far of being a RegExp guru - trully sincerly not :-

In case of static RegExp are not they more runtime effective if
precompiled?

var re = /r/e/g/e/x/p/;
....
re.test(string);
....
 
L

Lasse Reichstein Nielsen

....
guarded by a single regexp test to assure that it is safe.

JSON.parse = function (text) {
return
(/^(\s|[,:{}\[\]]|"(\\["\\bfnrtu]|[^\x00-\x1f"\\])*"|-?\d+(\.\d*)?([eE][+-]?\d+)?|true|false|null)+$/.test(text))

Looks reasonable (but a comment stating what it is supposed to match
would would make it much more readable :)

For efficiency, I'd change \s to \s+.

If the regexp doesn't match, then false is returned. This can also
be the value of the JSON expression. Perhaps it would be safer to
return undefined if the test fails, i.e.,
re.test(text) ? eval("("+test+")") : undefined;
or
if(re.test(text)) { return eval("("+test+")"); }

Also, you could move the creation of the RegExp object out of the
function, and reuse it for each call, instead of creating a new,
lengthy, RegExp for each call. However, that is only important if
calls are frequent, which they probably shouldn't be anyway.

/L
 
R

Rob Williscroft

Lasse Reichstein Nielsen wrote in in
comp.lang.javascript:
Also, you could move the creation of the RegExp object out of the
function, and reuse it for each call, instead of creating a new,
lengthy, RegExp for each call. However, that is only important if
calls are frequent, which they probably shouldn't be anyway.

VK, stated something simialar to this too.

But AIUI a RegExp that comes from a /.../ expression is (supposed
to be) compiled when the function body is compiled, IOW it only
happens once, the /.../ expresion having been replaced with a
compiled RegExp object.

Rob.
 
B

bwucke

Lasse Reichstein Nielsen napisal(a):
Looks reasonable (but a comment stating what it is supposed to match
would would make it much more readable :)

For efficiency, I'd change \s to \s+.

Please oh please, don't sacrifice unambiguity for grammar correctness.
.. matches any single character. The above would mean a sequence of one
or more whitespaces followed by a single arbitrary character (...and
then the rest of re) which not only slows the regexp instead of
speeding it up, but also changes its meaning.
It took me a while to understand the . means the end of the sentence
here.

For efficiency, I'd change "\s" to "\s+".

Sure the rules of English state the final dot should go INSIDE the
quotation marks, but that would be even worse.
 
V

VK

Douglas said:
There is a new version of JSON.parse in JavaScript. It is vastly
faster and smaller than the previous version. It uses a single call to
eval to do the conversion, guarded by a single regexp test to assure
that it is safe.

JSON.parse = function (text) {
return
(/^(\s|[,:{}\[\]]|"(\\["\\bfnrtu]|[^\x00-\x1f"\\])*"|-?\d+(\.\d*)?([eE][+-]?\d+)?|true|false|null)+$/.test(text))
&& eval('(' + text + ')');
};

It is ugly, but it is really efficient. See
http://www.crockford.com/JSON/js.html

Semi-irrelevant to this post but important to know:

1. JSON engine versioning
Would the above to be considered as JSON 1.01, JSON 1.1 or JSON 2.0 or
?
It is crutial for benchmark references and proper download refs.

2. In the light of recent events (like JSON as one of official data
interfaces of Yahoo!) does author plan to change anyhow the licensing
(I hope not).

3. Leaving JSON engine in the public domain would it be possible to
narrow the covering license? So far JSON goes under the proprietary
"The Software shall be used for Good, not Evil." As good as it is -
would it be possible to move the software under one of more lecally
specific free software licenses? Like GNU General License or another
well defined copyleft license? If it is not desirable could author to
collaborate on the definition of Evil in the application to JSON? Say
non-ECMA-compliant code or no Firefox support - would it be an evil? Or
the license means the Evil in the social and religious aspects only?

I'm not trying to be nasty - but sometimes a dot counts for big
troubles.
 
R

Rob Williscroft

Thomas 'PointedEars' Lahn wrote in
in comp.lang.javascript:
Rob said:
Lasse Reichstein Nielsen wrote [...]:
Also, you could move the creation of the RegExp object out of the
function, and reuse it for each call, instead of creating a new,
lengthy, RegExp for each call. However, that is only important if
calls are frequent, which they probably shouldn't be anyway.

VK, stated something simialar to this too.

But AIUI a RegExp that comes from a /.../ expression is (supposed
to be) compiled when the function body is compiled, IOW it only
happens once, the /.../ expresion having been replaced with a
compiled RegExp object.

`/.../' is equivalent to `new RegExp(...)', see ECMAScript (ES) 3,
7.8.5. There is a RegExp object created on each call and GC'd shortly
after, so it is more efficient to create that object once and make it
globally available. To avoid spoiling the global namespace and attach
the object reference to the method that uses it, I wrote

JSON.parse = function(...) { ... JSON.parse.rx ... };
JSON.parse.rx = /.../

Thanks for the reference,

Standard ECMA-262 3rd Edition - December 1999

7.8.5 Regular Expression Literals

A regular expression literal is an input element that is converted
to a RegExp object (section 15.10) when it is scanned. The object
is created before evaluation of the containing program or function
begins. Evaluation of the literal produces a reference to that object;
it does not create a new object. ...

The above confirms my "AIUI" above, and confirms that there *isn't*
a "new RegExp object created on each call".

Has this version (ECMA-262) been superseeded ?
However, it should be taken into account that RegExp.prototype.test()
is doing very much the same as RegExp.prototype.exec() does (ES3,
15.10.6.3) and so it may not be wise to use a globally available
RegExp object that retains the status of the last match.

This shouldn't be a problem for a RegExp that only ever has test()
called on it (as with the OP's code) as AFAICT exec() will only
ever reset the lastIndex property to 0 (which is the default anyway).

Rob.
 
T

Thomas 'PointedEars' Lahn

Rob said:
Thomas 'PointedEars' Lahn wrote [...]:
`/.../' is equivalent to `new RegExp(...)', see ECMAScript (ES) 3,
7.8.5. There is a RegExp object created on each call and GC'd shortly
after, so it is more efficient to create that object once and make it
globally available. To avoid spoiling the global namespace and attach
the object reference to the method that uses it, I wrote

JSON.parse = function(...) { ... JSON.parse.rx ... };
JSON.parse.rx = /.../

Thanks for the reference,

Standard ECMA-262 3rd Edition - December 1999

7.8.5 Regular Expression Literals

A regular expression literal is an input element that is converted
to a RegExp object (section 15.10) when it is scanned. The object
is created before evaluation of the containing program or function
begins. Evaluation of the literal produces a reference to that object;
it does not create a new object. ...

The above confirms my "AIUI" above, and confirms that there *isn't*
a "new RegExp object created on each call".

Yes, indeed. Somehow I overlooked the following sentences all the time,
and it appears I was not the only one here. Thank /you/ for pointing that
out.
Has this version (ECMA-262)

It is ECMA-262 (ECMAScript) _Edition_ 3, actually.
been superseeded ?

There is a PDF and Microsoft Word version of the ECMAScript Language
Specification that have 3 more pages (ref. PDF versions), are titled
"Edition 3 Final" and dated March 24, 2000 inside. (They refer to
themselves being downloadable from ftp.ecma.ch. However, [ftp.]ecma.ch
is no longer and ftp.ecma-international.org appears not to provide
access with anonymous login.)

These can be downloaded from

<URL:http://www.mozilla.org/js/language/>

Although it does not appear to include the required corrections mentioned
in the errata, the "Final" addition and the date indicate that this is the
latest revision published by the ECMA; it is unclear why only the December
1999 revision is linked on ecma-international.org. (Maybe the mozilla.org
folks have access to more recent information on ECMA's FTP server because
the Mozilla Foundation is an ECMA member.) A text comparison between the
two revisions I did today is inconclusive as yet.

However, whether it should be considered normative or not, that latest
revision says the same as its predecessor; you are correct.
This shouldn't be a problem for a RegExp that only ever has test()
called on it (as with the OP's code) as AFAICT exec() will only
ever reset the lastIndex property to 0 (which is the default anyway).

No, it could pose a problem since the next match will start from the
position the `lastIndex' property indicates. The value of that property is
reset to 0 iff "I < 0 or I > length" (15.10.6.2.6.), where according to
step 2 `length' refers to the length of the string the method is passed.
It is unclear what `I' refers to; known implementations suggest that this
is a typo not covered in the errata and actually `i' is meant. If we
assume this, `i' would be the value of ToInteger(lastIndex), according to
step 4, which is in fact the behavior of those implementations. That means
previous calls of RegExp.prototype.exec() on the same RegExp object do
affect the current call on the same object, unless

| 5. If the global property is false, let i = 0.

According to 15.10.4.1,

| The global property of the newly constructed object is set to a Boolean
| value that is true if F contains the character "g" and false otherwise.

So it does not pose a problem _here_, as Douglas is not using a global
expression (and the expression is anchored on both sides anyway.)


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top