When does whitespace in JavaScript matter?

P

Peter Michaux

Hi,

I'm thinking about code minimization. I can think of a few places where
whitespace matters

a + ++b
a++ + b
a - --b
a-- -b
when a line ends without a semi-colon in which case the new line
character matters.

Any others?

Thanks,
Peter
 
R

Randy Webb

Peter Michaux said the following on 1/8/2007 2:58 PM:
Hi,

I'm thinking about code minimization. I can think of a few places where
whitespace matters

In those 4 instances where whitespace will matter I always wrap it in
parentheses to remove the whitespace issue.
a + ++b
a++ + b
a - --b
a-- -b
when a line ends without a semi-colon in which case the new line
character matters.

That is not always true though:

function someFunction()
{

The newline character there is irrelevant. But, if you replace all new
lines with ;\n then you break the code. Line feeds and semicolons are a
spot where you have to either do it manually or write a JS parser.

Another example where the newline can't be summarily replaced but the
newline doesn't matter is in a loop:

for (i in something)
while(yourWifeIsntPregnant)

Whether the { is on the same line or the next line doesn't matter, the
newline is still irrelevant yet there is no semicolon at the end of it.

You can't summarily say "don't put a semicolon after )" either. Think of
anonymous functions....()
You could check ) to see if it is () or ) but it can still get you in
trouble with a function with no parameter:

function myFunction()
{

}

Where the newline character can't be replaced with a ;newline but if
it's a global anonymous function and you want to remove newlines then
you *must* have the semicolon or it breaks the code.

var x = (function(){
//some code here
})();

Can't remove the semicolon and the newline, one has to be there.

Semicolons and newlines will become your new nightmare :)
 
P

Peter Michaux

Randy said:
Peter Michaux said the following on 1/8/2007 2:58 PM:

Semicolons and newlines will become your new nightmare :)

I imagine so. I may just decide to leave new lines in place.

It's more the comments, blank lines and beginning or mid-line spaces
that are a bigger concern. I have been using jslint to check for
problems before jsmin but jslint has many programmer preferences built
in and not just outright problems that will be harmful when minimizing.
I suppose I could strip jslint down.

What I really want is a command I can type inside a directory that
checks if the javascript files in that directory are minimizable and if
so minimizes them. Following that step all the files are gzipped. This
will save Apache the trouble of having to use mod_deflate for every
single request for a file. I don't know why they didn't build caching
for this into mod_deflate but since I have to do the minimize step
anyway I can also piggy back the compression step on the same command.
I know one thing for sure: I don't want to have to do all this
deployment stuff by hand starting with http://jslint.com anymore.

Peter
 
V

VK

Peter said:
Hi,

I'm thinking about code minimization. I can think of a few places where
whitespace matters

a + ++b
a++ + b
a - --b
a-- -b
when a line ends without a semi-colon in which case the new line
character matters.

Any others?

I guess Books of ECMA, Book of Tokens is the first place to look for
(ECMAScript 3rd ed., sec.7, "Lexical Conventions")

Besides that there is one cross-browser exception to add, but the book
above first.
 
P

Peter Michaux

VK said:
I guess Books of ECMA, Book of Tokens is the first place to look for
(ECMAScript 3rd ed., sec.7, "Lexical Conventions")

Besides that there is one cross-browser exception to add, but the book
above first.

What is the exception?

Peter
 
V

VK

Peter said:
What is the exception?

You are free to disregard, because it is not officially required by
specs, just an exploit of the internal tokenizer mechanics. Yet if it's
for public use - they you may account it as well. Any way, for long
string literals (too long to place on one line) instead of
concatenation one uses sometimes backslash trick:

var longString = "aaaaaaaaaaaaaaaaaaaa\
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb\
cccccccccccccccccccccccccccccccccc"

If backslash is the very last character before the line break than the
tokenizer will be happy.

Saves a hell of cycles (no runtime concatenation) - but "hacky" of
course.
 
R

Richard Cornford

Peter said:
What is the exception?

Haven't you understood yet that VK has no understanding of javascript
and so does not know what is supposed to happen and what is not?
Whatever answer he gives you (if any) it is either going to be one of
his misunderstandings, or it will be total irrelevant to the subject of
determining which whitespace can safely be removed from javascript
source code.

It must also be worth pointing out that the more you encourage VK to
post (which is what you have just done) the more he will act to squander
the resources of the group in correcting his nonsense. Which will have
the consequence of denying you the time of the people who could
otherwise maybe usefully respond to your questions.

Richard.
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]>
, Mon, 8 Jan 2007 23:11:39, Richard Cornford
It must also be worth pointing out that the more you encourage VK to
post (which is what you have just done) the more he will act to squander
the resources of the group in correcting his nonsense. Which will have
the consequence of denying you the time of the people who could
otherwise maybe usefully respond to your questions.

It's a pity that you did not realise that earlier, while you were
responsible for maintaining the newsgroup FAQ.

ISTM that Randy may be heading in the same direction, although he has
not yet got unduly far along it.
 
V

VK

Peter said:
Hi,

I'm thinking about code minimization. I can think of a few places where
whitespace matters

a + ++b
a++ + b
a - --b
a-- -b
when a line ends without a semi-colon in which case the new line
character matters.

Any others? ------------------
------------------
I guess Books of ECMA, Book of Tokens is the first place to look for
(ECMAScript 3rd ed., sec.7, "Lexical Conventions")

Besides that there is one cross-browser exception to add, but the book
above first. ------------------
------------------
You are free to disregard, because it is not officially required by
specs, just an exploit of the internal tokenizer mechanics. Yet if it's
for public use - they you may account it as well. Any way, for long
string literals (too long to place on one line) instead of
concatenation one uses sometimes backslash trick:

var longString = "aaaaaaaaaaaaaaaaaaaa\
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb\
cccccccccccccccccccccccccccccccccc"
------------------
------------------
Cornford's bias:
Haven't you understood yet that VK has no understanding of javascript
and so does not know what is supposed to happen and what is not?
Whatever answer he gives you (if any) it is either going to be one of
his misunderstandings, or it will be total irrelevant to the subject of
determining which whitespace can safely be removed from javascript
source code.

It must also be worth pointing out that the more you encourage VK to
post (which is what you have just done) the more he will act to squander
the resources of the group in correcting his nonsense. Which will have
the consequence of denying you the time of the people who could
otherwise maybe usefully respond to your questions.

Now whould you mind to point a single reason of your regular furious
bias in this particular case. OT? Wrong data? Bad breakfast? The last
option seems as the only one suitable.
 
P

Peter Michaux

You are free to disregard, because it is not officially required by
specs, just an exploit of the internal tokenizer mechanics. Yet if it's
for public use - they you may account it as well. Any way, for long
string literals (too long to place on one line) instead of
concatenation one uses sometimes backslash trick:

var longString = "aaaaaaaaaaaaaaaaaaaa\
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb\
cccccccccccccccccccccccccccccccccc"

I have seen this slash at the end of the line but have never used it
myself because I haven't verified it's validity. Good to know that it
is something to be careful about before using or completely avoiding.
Thanks.

Peter
 
R

Richard Cornford

Peter said:
VK wrote:

I have seen this slash at the end of the line but have never used it
myself because I haven't verified it's validity.

It is not valid at all. The tokenising/syntax rules for ECMAScript
explicitly forbid line terminator characters from appearing inside
string and regular expression literals. Using this half--ass hack
results in code that should not be expected to work anywhere and the
observation that it can work in two or three environments should not
bring the expectation that it would work in any others (and certainly
not all).
Good to know that it is something to be careful about
before using or completely avoiding.

It is not something to be careful about, as it should never be done in
the first place.

I did tell you that VK would either be wrong or be irrelevant. This
time he went for irrelevant.

Noted.

Richard.
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]
glegroups.com>, Wed, 10 Jan 2007 23:20:55, Peter Michaux
I have seen this slash at the end of the line but have never used it
myself because I haven't verified it's validity. Good to know that it
is something to be careful about before using or completely avoiding.


It is something which should be known about.

RC, for instance, tends to express things purely from the point of view
of an *author*. A pure author does not need to know of such
continuation (though it could be useful in an internal test hack, where
known to work).

However, since it works in strings in IE, a *reader* of code may come
across it - and should understand its browser-dependent implications.
 
V

VK

var longString = "aaaaaaaaaaaaaaaaaaaa\
It is not valid at all. The tokenising/syntax rules for ECMAScript
explicitly forbid line terminator characters from appearing inside
string and regular expression literals. Using this half--ass hack
results in code that should not be expected to work anywhere and the
observation that it can work in two or three environments should not
bring the expectation that it would work in any others (and certainly
not all).

As I pointed earlier, this is not an exploit of a particular bug on a
particular platform. This is an exploit of the core mechanics of any
ECMAScript-compliant code parser. The curious ones may read through the
section 7 of ECMAScript 3rd ed.
This way any ECMAScript-compliant parser will demonstrate this behavior
- or it is not ECMAScript-compliant. This is why "backslashed strings"
are equally supported by say IE, Firefox, Opera and by going back in
the history by Netscape 4.x, 3.x and 2.x

Respectively "supported" is not really a correct term as there is not
an extra feature to support here. "Vulnerable" would be more correct by
too scary sounding :)

It also mean that the "hack" term is not fully applicable here. It is
no more hack then say return some other object from the constructor
instead of [this] or placing anonymous function expression as with()
argument. "Exploitation of mal-documented engine features" is more
suitable.

I am not propagandizing backslashed strings usage. I just want to make
clear the nature of this phenomenon, because in a few follow up posts
it was implicitly suggested that it is some IE/JScript-only bug, like
"since it works in strings in IE".
It "works" for all existing/ever existed UAs with javascript
support.
 
V

VK

As I pointed earlier, this is not an exploit of a particular bug on a
Please quote the exact part of section 7 which backs up your claim.

The claim of "backslashed string" supported by all browsers cannot be
proved by quoting Books Of ECMA. It has to be proved by testing.
Either provide a browser where it fails, or let's agree on this
starting point. Then we can advance further by looking what specs
contradiction made it possible.

P.S. If you want to make historical researches as well, Netscape 4.x
can be downloaded at
<http://browser.netscape.com/ns8/download/archive.jsp>
 
R

Randy Webb

VK said the following on 1/12/2007 9:00 AM:
The claim of "backslashed string" supported by all browsers cannot be
proved by quoting Books Of ECMA.

That wasn't what was asked. You were being asked to please quote the
part of Section 7 that backs up your claim that it is an "exploit of the
core mechanics of any ECMAScript-compliant code parser". You then
pointed people to read Section 7. Now that you are being challenged to
back up that assertion you are finally admitting that nothing in Section
7 backs up your claim?

Besides, how you can claim it is some kind of "ECMAScript code parser"
issue and then tell people they can download NN4 to test it is, well,
ignorantly bliss at best as NN4 was almost at the end of its line before
the current edition of ECMAScript was released.
 
V

VK

That wasn't my point.

In this case you should quote more accurately the statements you are
referring to. In the post I answered to that was:

Please quote the exact part of section 7 which backs up your claim.

Respectively I took "backs up your claim" as referring to the statement
right above your text, not the one at the beginning or middle of the
quoted block. And for the last statement about the universal
backslashed strings support - quoting ECMAScript specs is indeed
pointless.

As I have some experience in clj discussion specifics :), you will not
go so easy by putting the opponent into defense position.
So far other people made statements which are not supported by any
facts: namely that backslashed strings are supported by IE only or "by
two or three browsers".
I say that it is supported by all existing/ ever existed browsers and
this statement is rather easy to check. So first we dismiss the false
statement of a narrow support of such strings - or someone will point
an actual browser w/o such support. One step at one time, OK? And on
the current step _I_ have nothing to prove: whoever believes it may
fail somewhere let them search for such browser. Or simply admit:
- Yes, you are correct, backslashed strings are supported on all
current and historical UAs.

Then it will be the time to read the specs.
 
R

Richard Cornford

VK said:
As I pointed earlier, this is not an exploit of a particular bug
on a particular platform. This is an exploit of the core mechanics
of any ECMAScript-compliant code parser.

Nonsense. It relies entirely on a script parser electing not to impose
the syntax rules as specified in ECMA 262.
The curious ones may read through
the section 7 of ECMAScript 3rd ed.

And they easily may understand it better then you do. for example, where
ECMA 262 first edition says:-

Note that a LineTerminator character cannot appear in a string literal,
even if preceded by a backslash \. The correct way to cause a line
terminator character to be part of the string value of a string literal
is to use an escape sequence such as \n or \u000A.

- the second edition says:-

Note that a LineTerminator character cannot appear in a string literal,
even if preceded by a backslash \. The correct way to cause a line
terminator character to be part of the string value of a string literal
is to use an escape sequence such as \n or \u000A.

- and the third, and current, edition says:-

NOTE: A LineTerminator character cannot appear in a string literal, even
if preceded by a backslash \. The correct way to cause a line terminator
character to be part of the string value of a string literal is to use
an escape sequence such as \n or \u000A.

- they may realise that like terminators are _explicitly_ forbidden form
appearing in a string literal, even when preceded by a backslash
character.
This way any ECMAScript-compliant parser will demonstrate
this behavior - or it is not ECMAScript-compliant.

Nonsense. This cannot be expected to work in any ECMA 262 compliant
script engine.
This is why "backslashed strings" are equally supported by say
IE, Firefox, Opera and by going back in the history by Netscape
4.x, 3.x and 2.x

That is just 3 script environments.
Respectively "supported" is not really a correct term as there
is not an extra feature to support here. "Vulnerable" would be
more correct by too scary sounding :)

You have never been qualified to judge.
It also mean that the "hack" term is not fully applicable here.

Why not? When a construct cannot be expected to work at all using it
because it has been observed to work in some environments is a "hack".
It is no more hack then say return some other object from
the constructor instead of [this] or placing anonymous function
expression as with() argument. "Exploitation of mal-documented
engine features" is more suitable.

You just don't understand the specification, so you cannot judge what is
"mal-documented" and what is not. However, the note about line
terminators in string literals seems fairly unambiguous to me.
I am not propagandizing backslashed strings usage.

You are suggesting that it is a credible subject for consideration, when
it is something that should never have been expected to work, and so
should never have been attempted in a general context.
I just want to make clear the nature of this phenomenon,

You have already miss-attributed it.
because in a few follow up posts it was implicitly suggested
that it is some IE/JScript-only bug,

Not in my experience.
like "since it works in strings in IE".
It "works" for all existing/ever existed UAs with javascript
support.

It does not work in the NetFront browser, to name just one (and one is
sufficient to prove the assertion "It "works" for all existing/ever
existed UAs with javascript support" as being false). NetFront claims to
have an ECMA 262, 3rd Ed. compliant script engine, and its regarding an
ECMAScript syntax error as a syntax error does not contradict that.

Richard.
 
R

Ray

VK said:
As I pointed earlier, this is not an exploit of a particular bug on a
particular platform. This is an exploit of the core mechanics of any
ECMAScript-compliant code parser. The curious ones may read through the
section 7 of ECMAScript 3rd ed.

Did you actually read the spec itself? I did. It says on page 20: "A
'LineTerminator' character cannot appear in a string literal, even if
preceded by a backslash \. The correct way to cause a line terminator
character to be part of the string value of a string literal is to use
an escape sequence such as \n or \u000A."

You can download the 3rd edition here:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
This way any ECMAScript-compliant parser will demonstrate this behavior
- or it is not ECMAScript-compliant.

Nah, that's just plain wrong. See the document again.

http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

Cheers
Ray
 
J

John G Harris

- and the third, and current, edition says:-

NOTE: A LineTerminator character cannot appear in a string literal, even
if preceded by a backslash \. The correct way to cause a line terminator
character to be part of the string value of a string literal is to use
an escape sequence such as \n or \u000A.
<snip>

The <backslash><line end> convention is needed by C macros. Once some
fool put it into a browser it became extremely difficult to take it out
again.

Sorry for the interruption :)

John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top