Identifiers - UnicodeEscapeSequence

A

Asen Bozhilov

Documentation permit to be used `\UnicodeEscapeSequence` in
IdentifierName. But there:

| Unicode escape sequences are also permitted in identifiers,
| where they contribute a single character to the
| identifier, as computed by the CV of the
| UnicodeEscapeSequence. The \ preceding the
| UnicodeEscapeSequence does not contribute a character to the
identifier.
| A UnicodeEscapeSequence cannot be
| used to put a character into an identifier that
| would otherwise be illegal. In other words, if a
\UnicodeEscapeSequence
| sequence were replaced by its UnicodeEscapeSequence's CV,
| the result must still be a
| valid Identifier that has the exact same sequence of characters as
the original Identifier.

As i understand it. If i type:

var \\u0069\\u0066; //var if;

`if` is ReservedWord and example above, should throw SyntaxError.

try {
eval('var \\u0069\\u0066;'); //var if;
}catch(e) {
window.alert(e instanceof SyntaxError);
}

Firefox 3.5.7 - No error
IE6 - true
Chrome 4.0 - No error
Opera 9.64 - No error
Safari 4.0 - No error
Rhino 1.7R2 - No error
DMDScript 1.02 - true

try {
eval('var \\u0030;'); //var 0;
}catch (e) {
window.alert(e instanceof SyntaxError);
}

Firefox 3.5.7 - true
IE6 - true
Chrome 4.0 - true
Opera 9.64 - No error
Safari 4.0 - true
Rhino 1.7R2 - No error
DMDScript 1.02 - No error

My question is, what is the proper behavior related with
specification? I think if i have `var \\u0069\\u0066;` should throw
SyntaxError.

Thanks.
 
S

Scott Sauyet

My question is, what is the proper behavior related with
specification? I think if i have `var \\u0069\\u0066;` should throw
SyntaxError.

I don't know the spec well enough to answer. But I'm wondering if you
would expect an error from this as well:

window["if"] = 10;

I can't see why either should throw an error. The only reason to
disallow the reserved word as a identifier name is to make unambiguous
to the ES engine what is meant by the term. There is no such
provision for keywords to be listed via unicode escapes, is there? If
not, then there is no ambiguity about what "\\u0069\\u0066" should
represent.

-- Scott
 
L

Lasse Reichstein Nielsen

Asen Bozhilov said:
Documentation permit to be used `\UnicodeEscapeSequence` in
IdentifierName. But there:

| Unicode escape sequences are also permitted in identifiers,
| where they contribute a single character to the
| identifier, as computed by the CV of the
| UnicodeEscapeSequence.

This is the important part. It allows unicode escapes in identifiers.
There is no similar statement for any of the reserved words, so
unicode escapes cannot be used in a keyword.
| The \ preceding the
| UnicodeEscapeSequence does not contribute a character to the
identifier.
| A UnicodeEscapeSequence cannot be
| used to put a character into an identifier that
| would otherwise be illegal. In other words, if a
\UnicodeEscapeSequence
| sequence were replaced by its UnicodeEscapeSequence's CV,
| the result must still be a
| valid Identifier that has the exact same sequence of characters as
the original Identifier.

As i understand it. If i type:

var \\u0069\\u0066; //var if;

(I assume it should be single backslashes when not in a string :)
`if` is ReservedWord and example above, should throw SyntaxError.

No. While 'if' is a keyword, it is only the sequence U+0069 U+0066
that is recognized as the 'if' keyword. Unicode escapes are not allowed
as parts of keywords. The above, correctly, declares a variable called
'if' - because "\u0069\u0066" matches the production of an identifier
and it doesn't match the production of any reserved word.

The inputs, "if" and "i\u0066" are different sequences of characters.
They are parsed differently. The latter is parsed as an identifier.
An identifier is represented as a sequence of code points. It just
happens that "i\u0066", "\u0069f" and "\u0069\u0066" all parses to
identifers represented by U+0069U+0066, and "if" does not.

....
My question is, what is the proper behavior related with
specification?
Yes.

I think if i have `var \\u0069\\u0066;` should throw
SyntaxError.

The operative part of the ECMA262 standard is in section 7.6, which
you quote. It allows escape sequences in identifiers. No such
allowance are given for keywords or other reserved words - so anything
containing a unicode escape is not a keyword.

/L
 
T

Thomas 'PointedEars' Lahn

Lasse said:
This is the important part. It allows unicode escapes in identifiers.

But none that would not be allowed if the character was included verbatim.
There is no similar statement for any of the reserved words, so
unicode escapes cannot be used in a keyword.

You have got it backwards.

I do not think it can be worded more clearly.
(I assume it should be single backslashes when not in a string :)

Why, the double backslashes are legal, too. However the resulting value
would still not be an /Identifier/, barring language extensions.

True, but the program ought to be syntactical in error nonetheless.
While 'if' is a keyword, it is only the sequence U+0069 U+0066
that is recognized as the 'if' keyword. Unicode escapes are not allowed
as parts of keywords. The above, correctly, declares a variable called
'if' - because "\u0069\u0066" matches the production of an identifier
and it doesn't match the production of any reserved word.
The inputs, "if" and "i\u0066" are different sequences of characters.
They are parsed differently. The latter is parsed as an identifier.

Your logic is flawed, because escape sequences are converted into the
corresponding Unicode characters (the character is the Computed Value)
*before* the tokenization process takes place that follows from applying
the syntactical grammar:

| 5.1.4
|
| [...]
| When a stream of characters is to be parsed as an ECMAScript program, it
| is first converted to a stream of input elements by repeated application
| of the lexical grammar; this stream of input elements is then parsed by
| a single application of the syntactic grammar. The program is
| syntactically in error if the tokens in the stream of input elements
| cannot be parsed as a single instance of the goal nonterminal /Program/,
| with no tokens left over.

/UnicodeEscapeSequence/ is a goal symbol of the lexical grammar as is
/Keyword/; /IfStatement/ is a goal symbol of the syntactic grammar.

As a result, first application of the lexical grammar ought to cause

var \u0069\u0066

to become

var if

and second application of the lexical grammar ought to cause `if' to be
parsed as as a /Keyword/:

| Keyword :: one of
| [...] if [...]

Then, application of the syntactic grammar ought to cause

var if

to be recognized as theoretically producible by

VariableStatement :
VariableDeclarationList

VariableDeclarationList :
VariableDeclaration

VariableDeclaration :
Identifier Initialiser_opt

which ought to fail because the token `if' has been determined a /Keyword/
before, not an /Identifier/, and no other productions of the syntactic
grammar would be applicable.

Therefore, the program ought to be considered syntactically in error. That
it might not, could only be attributed to a proprietary extension. Hence
the clarification as quoted above:

| A UnicodeEscapeSequence cannot be used to put a character into an
| identifier that would otherwise be illegal. [...]


PointedEars
 
T

Thomas 'PointedEars' Lahn

Thomas said:
Why, the double backslashes are legal, too.

Ignore that, I went too far here.

| IdentifierStart ::
| UnicodeLetter
| $
| _
| \ UnicodeEscapeSequence
|
| [...]
| UnicodeEscapeSequence ::
| u HexDigit HexDigit HexDigit HexDigit


PointedEars
 
A

Asen Bozhilov

Lasse said:
Asen Bozhilov writes:

(I assume it should be single backslashes when not in a string :)

Yes. Should be:

var \u0069\u0066;

Double backslashes because i was copy from passed string to `eval'.
However this is my mystake.
No. While 'if' is a keyword, it is only the sequence U+0069 U+0066
that is recognized as the 'if' keyword. Unicode escapes are not allowed
as parts of keywords. The above, correctly, declares a variable called
'if' - because "\u0069\u0066" matches the production of an identifier
and it doesn't match the production of any reserved word.

I agree with this point of specification.

| 6 Source Text
| [...]
| In string literals, regular expression literals and identifiers,
| any character (code point) may also be expressed as a
| Unicode escape sequence consisting of six characters,
| namely \u plus four hexadecimal digits.

You are correct and next example prove your words.

try {
\u0069\u0066 (true);
}catch(e) {
window.alert(e instanceof ReferenceError); //true
}

\u0069\u0066 (true); Will be evaluate as `ExpressionStatement` which
finish with explicit semicolon instead of `if Statement` with
`EmptyStatement` `;`.
The operative part of the ECMA262 standard is in section 7.6, which
you quote. It allows escape sequences in identifiers.  No such
allowance are given for keywords or other reserved words - so anything
containing a unicode escape is not a keyword.

I am confused from:

| A UnicodeEscapeSequence cannot be
| used to put a character into an identifier that
| would otherwise be illegal. In other words, if a
| \UnicodeEscapeSequence sequence were replaced by
| its UnicodeEscapeSequence's CV,
| the result must still be a
| valid Identifier

As i understand it.

If i replace:

var \u0069\u0066;

With characters value (CV) i will get:

var if;

And syntax grammar for `Identifiers` doesn't allow identifier with
name `if` in:

Identifier ::
IdentifierName but not ReservedWord

Because `if` is keyword and it's a part from `7.5.1 Reserved Words`.

Thanks for this comment, but why specification doesn't say anything
about this case in explicit way?
 
L

Lasse Reichstein Nielsen

....
I do not think it can be worded more clearly.

I must admit that, on second thought, I tend to agree with that
interpretation.
However, it seems that IE is the only browser that agrees. All of
Opera, Firefox, Chrome and Safari accept \u0069\u0066 as an identifier.

/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top