Are â€extended characters†safe in identifiers?

Discussion in 'Javascript' started by Jukka K. Korpela, May 16, 2011.

  1. The syntax of ECMAScript has allowed “extended characters†in
    identifiers since 3rd edition (1999). This means, among other things,
    allowing any Unicode letters, like Greek, Arabic, and Cyrillic letters
    as well as e.g. Chinese ideographs. As far as I can see, this has been
    supported in web browsers for a long time (e.g., ever since IE 5.5).

    So is it really safe to use them, writing, say

    var π = Math.PI;
    var ผลบวภ= 0;
    function Götterdämmerung()

    or are there some pitfalls? Various coding conventions as well as
    practical editing issues (you can’t be sure of always being able to edit
    your code on a Unicode-enabled editor) aside, is there still some real
    technical reason to stick to the A–Z, a–z, 0–9, â€$â€, â€_" repertoire?

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, May 16, 2011
    #1
    1. Advertising

  2. Jukka K. Korpela wrote:

    > The syntax of ECMAScript has allowed “extended characters†in identifiers since 3rd edition (1999). This means, among other things, allowing any Unicode letters, like Greek, Arabic, and Cyrillic letters as well as e.g. Chinese ideographs. As far as I can see, this has been supported in web browsers for a long time (e.g., ever since IE 5.5).
    >
    > So is it really safe to use them, writing, say
    >
    > var π = Math.PI;
    > var ผลบวภ= 0;
    > function Götterdämmerung()
    >
    > or are there some pitfalls? Various coding conventions as well as practical editing issues (you can’t be sure of always being able to edit your code on a Unicode-enabled editor) aside, is there still some real technical reason to stick to the A–Z, a–z, 0–9, â€$â€, â€_" repertoire?


    I think it is technically safe but I don't see people doing that,
    neither in Javascript nor in other languages like C# which also allow
    more than ASCII letters in identifiers. But the reason is probably
    coding conventions, editing issues and keeping code readable and
    understandable internationally. And partly maybe ignorance that more
    than ASCII can be used.


    --

    Martin Honnen
    http://msmvps.com/blogs/martin_honnen/
     
    Martin Honnen, May 16, 2011
    #2
    1. Advertising

  3. Jukka K. Korpela wrote:

    > The syntax of ECMAScript has allowed “extended characters†in
    > identifiers since 3rd edition (1999). This means, among other things,
    > allowing any Unicode letters, like Greek, Arabic, and Cyrillic letters
    > as well as e.g. Chinese ideographs. As far as I can see, this has been
    > supported in web browsers for a long time (e.g., ever since IE 5.5).
    >
    > So is it really safe to use them, writing, say
    >
    > var π = Math.PI;
    > var ผลบวภ= 0;
    > function Götterdämmerung()
    >
    > or are there some pitfalls? Various coding conventions as well as
    > practical editing issues (you can’t be sure of always being able to edit
    > your code on a Unicode-enabled editor) aside, is there still some real
    > technical reason to stick to the A–Z, a–z, 0–9, â€$â€, â€_" repertoire?


    Perhaps misconfigured Web servers still declaring ISO-8859-1 by default is a
    reason why few people use characters beyond U+007F or U+00FF.

    As for practical editing issues, it is not only the editor, but also the
    input method that needs to be available and to allow for easy typing. At
    least on my current X.org keyboard setup it is considerably harder to type
    `Ï€' than `pi' (except in GNOME applications where I could type C-S-u
    $HEXCODEPOINT; but that would still be four keypresses more). (I really
    don't seem to need the THORN letters, so I could define GREEK LETTER … PI
    for M-P instead. But not all people can do this, and even if they could
    they may not want to.)


    PointedEars
    --
    Danny Goodman's books are out of date and teach practices that are
    positively harmful for cross-browser scripting.
    -- Richard Cornford, cljs, <cife6q$253$1$> (2004)
     
    Thomas 'PointedEars' Lahn, May 17, 2011
    #3
  4. Jukka K. Korpela

    Tim Streater Guest

    Re: Are ำextended charactersำ safe in identifiers?

    In article <>,
    Thomas 'PointedEars' Lahn <> wrote:

    > As for practical editing issues, it is not only the editor, but also the
    > input method that needs to be available and to allow for easy typing. At
    > least on my current X.org keyboard setup it is considerably harder to type
    > `น' than `pi' (except in GNOME applications where I could type C-S-u
    > $HEXCODEPOINT; but that would still be four keypresses more). (I really
    > don't seem to need the THORN letters, so I could define GREEK LETTER ษ PI
    > for M-P instead. But not all people can do this, and even if they could
    > they may not want to.)


    On my Mac น is option-p (alt-p if you prefer) - in all applications.

    --
    Tim

    "That excessive bail ought not to be required, nor excessive fines imposed,
    nor cruel and unusual punishments inflicted" -- Bill of Rights 1689
     
    Tim Streater, May 17, 2011
    #4
  5. Re: Are ำextended charactersำ safe in identifiers?

    Tim Streater wrote:

    > Thomas 'PointedEars' Lahn <> wrote:
    >> As for practical editing issues, it is not only the editor, but also the
    >> input method that needs to be available and to allow for easy typing. At
    >> least on my current X.org keyboard setup it is considerably harder to
    >> type `น' than `pi' (except in GNOME applications where I could type C-S-u
    >> $HEXCODEPOINT; but that would still be four keypresses more). (I really
    >> don't seem to need the THORN letters, so I could define GREEK LETTER ษ PI
    >> for M-P instead. But not all people can do this, and even if they could
    >> they may not want to.)

    >
    > On my Mac น is option-p (alt-p if you prefer) - in all applications.


    But what good is a handy input method if you have the wrong application?
    For example, you did not post GREEK SMALL LETTER PI, but something else
    (UniView says, U+0E19 THAI CHARACTER NO NU; you even managed to mangle my
    proper Unicode pi and ellipsis when quoting them.)

    So I think we can add lack of proper Unicode support in some newsreaders to
    the list of technical reasons for not using non-ASCII characters in source
    code ;-)


    PointedEars
    --
    var bugRiddenCrashPronePieceOfJunk = (
    navigator.userAgent.indexOf('MSIE 5') != -1
    && navigator.userAgent.indexOf('Mac') != -1
    ) // Plone, register_function.js:16
     
    Thomas 'PointedEars' Lahn, May 17, 2011
    #5
  6. Jukka K. Korpela

    Tim Streater Guest

    Re: Are ำextended charactersำ safe in identifiers?

    In article <>,
    Thomas 'PointedEars' Lahn <> wrote:

    > Tim Streater wrote:
    >
    > > Thomas 'PointedEars' Lahn <> wrote:
    > >> As for practical editing issues, it is not only the editor, but also the
    > >> input method that needs to be available and to allow for easy typing. At
    > >> least on my current X.org keyboard setup it is considerably harder to
    > >> type `น' than `pi' (except in GNOME applications where I could type C-S-u
    > >> $HEXCODEPOINT; but that would still be four keypresses more). (I really
    > >> don't seem to need the THORN letters, so I could define GREEK LETTER ษ PI
    > >> for M-P instead. But not all people can do this, and even if they could
    > >> they may not want to.)

    > >
    > > On my Mac น is option-p (alt-p if you prefer) - in all applications.

    >
    > But what good is a handy input method if you have the wrong application?
    > For example, you did not post GREEK SMALL LETTER PI, but something else
    > (UniView says, U+0E19 THAI CHARACTER NO NU; you even managed to mangle my
    > proper Unicode pi and ellipsis when quoting them.)
    >
    > So I think we can add lack of proper Unicode support in some newsreaders to
    > the list of technical reasons for not using non-ASCII characters in source
    > code ;-)


    Yes, MT-NewsWatcher does seem to have some issues in this regard (it
    claims to send UTF-8 but is obviously lying). Shame really as it's quite
    good in most other respects for my purposes.

    --
    Tim

    "That excessive bail ought not to be required, nor excessive fines imposed,
    nor cruel and unusual punishments inflicted" -- Bill of Rights 1689
     
    Tim Streater, May 17, 2011
    #6
  7. Jukka K. Korpela

    Erwin Moller Guest

    Re: Are ำextended charactersำ safe in identifiers?

    On 5/17/2011 10:24 PM, Tim Streater wrote:
    > In article <>,
    > Thomas 'PointedEars' Lahn <> wrote:
    >
    >> As for practical editing issues, it is not only the editor, but also
    >> the input method that needs to be available and to allow for easy
    >> typing. At least on my current X.org keyboard setup it is considerably
    >> harder to type `น' than `pi' (except in GNOME applications where I
    >> could type C-S-u $HEXCODEPOINT; but that would still be four
    >> keypresses more). (I really don't seem to need the THORN letters, so I
    >> could define GREEK LETTER ษ PI for M-P instead. But not all people can
    >> do this, and even if they could they may not want to.)

    >
    > On my Mac น is option-p (alt-p if you prefer) - in all applications.
    >


    Did anybody else notice the change in the topic when Tim replied?

    The original quotes around "extended characters" have been replaced by
    something else.
    Funny, considering the discussion at hand. ;-)

    Regards,
    Erwin Moller

    --
    "That which can be asserted without evidence, can be dismissed without
    evidence."
    -- Christopher Hitchens
     
    Erwin Moller, May 18, 2011
    #7
  8. Jukka K. Korpela

    Tim Streater Guest

    Re: Are ำextended charactersำ safe in identifiers?

    In article <4dd36c43$0$49038$4all.nl>,
    Erwin Moller
    <> wrote:

    > On 5/17/2011 10:24 PM, Tim Streater wrote:
    > > In article <>,
    > > Thomas 'PointedEars' Lahn <> wrote:
    > >
    > >> As for practical editing issues, it is not only the editor, but also
    > >> the input method that needs to be available and to allow for easy
    > >> typing. At least on my current X.org keyboard setup it is considerably
    > >> harder to type `น' than `pi' (except in GNOME applications where I
    > >> could type C-S-u $HEXCODEPOINT; but that would still be four
    > >> keypresses more). (I really don't seem to need the THORN letters, so I
    > >> could define GREEK LETTER ษ PI for M-P instead. But not all people can
    > >> do this, and even if they could they may not want to.)

    > >
    > > On my Mac น is option-p (alt-p if you prefer) - in all applications.

    >
    > Did anybody else notice the change in the topic when Tim replied?
    >
    > The original quotes around "extended characters" have been replaced by
    > something else.
    > Funny, considering the discussion at hand. ;-)


    Quite so :)

    --
    Tim

    "That excessive bail ought not to be required, nor excessive fines imposed,
    nor cruel and unusual punishments inflicted" -- Bill of Rights 1689
     
    Tim Streater, May 18, 2011
    #8
  9. Re: Are ”extended characters” safe in identifiers?

    In comp.lang.javascript message <iqqs00$2u5$>, Mon, 16
    May 2011 12:49:51, Jukka K. Korpela <> posted:

    >The syntax of ECMAScript has allowed “extended characters†in
    >identifiers since 3rd edition (1999). This means, among other things,
    >allowing any Unicode letters, like Greek, Arabic, and Cyrillic letters
    >as well as e.g. Chinese ideographs. As far as I can see, this has been
    >supported in web browsers for a long time (e.g., ever since IE 5.5).
    >
    >So is it really safe to use them, writing, say
    >
    >var ? = Math.PI;
    >var ????? = 0;
    >function Götterdämmerung()
    >
    >or are there some pitfalls? Various coding conventions as well as
    >practical editing issues (you can’t be sure of always being able to
    >edit your code on a Unicode-enabled editor) aside, is there still some
    >real technical reason to stick to the A–Z, a–z, 0–9, â€$â€, â€_"
    >repertoire?



    It creates interesting possibilities of writing code which looks
    incorrect but will execute, or /vice versa/ - for example, "while" can
    be used as an ordinary identifier, but not as a reserved word, if the
    third character is \u2170. In at least common fonts, that numeric
    character is likely to look very much like \x69.

    One can likewise attack 'var' and 'extends'. And \u03bf or \u0531 can
    be used in 'for'.

    One can presumably defeat Google Translate be exchanging visually
    equivalent Greek, Cyrillic, and Latin characters.

    Code might be visually obfuscated by renaming one's variables to
    incorporate, or comprise, various non-inking characters = especially
    \u008d.

    ENTIRELY UNTESTED.

    But the French are rather proud of their language, and IIRC a well-
    placed accent can completely change the meaning of a word - I van
    understand a French programmer wanting to use accented identifiers.

    --
    (c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
    Web <http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
    Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
    Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
     
    Dr J R Stockton, May 18, 2011
    #9
  10. Re: Are �extended characters� safe in identifiers?

    18.5.2011 20:53, Dr J R Stockton wrote:

    > It creates interesting possibilities of writing code which looks
    > incorrect but will execute, or /vice versa/ - for example, "while" can
    > be used as an ordinary identifier, but not as a reserved word, if the
    > third character is \u2170.


    Non-Ascii characters in identifiers could be used for a variety of
    purposes, yes. There has been a lot of discussion about similar issues
    with non-Ascii characters in domain names, where the risk (both
    probability and possible damage) of intentionally caused confusion is
    much greater.

    With identifiers in JavaScript, the risks are already with us, without
    any precautions like the complex rules for domain names (e.g. rules
    against mixing letters from different writing systems in a word). So I
    don't think the risks could be used as an argument against appropriate use.

    I was somewhat surprised at seeing that both http://www.jslint.com/ and
    http://jshint.com/ apparently report any non-Ascii characters in
    identifiers as errors, without even offering any option to allow them.

    > But the French are rather proud of their language, and IIRC a well-
    > placed accent can completely change the meaning of a word - I van
    > understand a French programmer wanting to use accented identifiers.


    It's not that common to find French word pairs that differ only in the
    use of accents. In Swedish or Finnish, it's much easier, and letters
    like å and ä aren't treated as letters with accents but as separate
    letters of the alphabet. But Greek, Bulgarian, Thai, and Japanese are
    better examples of languages that need non-Ascii letters.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, May 19, 2011
    #10
  11. Re: Are �extended characters� safe in i dentifiers?

    In comp.lang.javascript message <ir25is$3ee$>, Thu, 19
    May 2011 07:16:29, Jukka K. Korpela <> posted:
    >18.5.2011 20:53, Dr J R Stockton wrote:


    >I was somewhat surprised at seeing that both http://www.jslint.com/ and
    >http://jshint.com/ apparently report any non-Ascii characters in
    >identifiers as errors, without even offering any option to allow them.


    Systems of US origin; such is to be expected. Granted, my own site is
    entirely 7-bit, but that is because I still have some old but valued
    editing tools.

    >> But the French are rather proud of their language, and IIRC a well-
    >> placed accent can completely change the meaning of a word - I van
    >> understand a French programmer wanting to use accented identifiers.

    >
    >It's not that common to find French word pairs that differ only in the
    >use of accents. In Swedish or Finnish, it's much easier, and letters
    >like å and ä aren't treated as letters with accents but as separate
    >letters of the alphabet. But Greek, Bulgarian, Thai, and Japanese are
    >better examples of languages that need non-Ascii letters.


    Much harder for me. The only Finnish I know is "Eskimo, kiitos", and I
    strongly suspect the first of being only a product name. I know two
    words fewer of Swedish, Bulgarian, Thai, and Japanese, and little more
    of Greek.

    --
    (c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
    Web <http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
    Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
    Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
     
    Dr J R Stockton, May 20, 2011
    #11
  12. Re: Are �extended characters� safe in i dentifiers?

    20.5.2011 23:28, Dr J R Stockton wrote:

    >> I was somewhat surprised at seeing that both http://www.jslint.com/ and
    >> http://jshint.com/ apparently report any non-Ascii characters in
    >> identifiers as errors, without even offering any option to allow them.

    >
    > Systems of US origin; such is to be expected.


    Well, I would have expected that people who write software for checking
    program source would apply the standard of the programming language.
    Those linters report non-Ascii letters in identifiers as _errors_. I
    would accept a warning, though an informative diagnostic might be optimal-

    >The only Finnish I know is "Eskimo, kiitos", and I
    > strongly suspect the first of being only a product name.


    Yes, it is a trademark.

    If people decide to use words from a language other than English in
    identifiers, then there are two very different issues:

    1) For languages written in Latin letters, you _could_ stick to Ascii
    and use replacements like "a" or "ae" for "ä". This is what people
    commonly do in programming, but it distorts the words and may force the
    programmer to think about potential confusion and to select the words in
    an unnatural way- The level of distortion depends on language.

    2) For languages written using other alphabets, there really isn't much
    of an option - except perhaps transliteration or transcription.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, May 22, 2011
    #12
  13. Jukka K. Korpela

    Ry Nohryb Guest

    Re: Are extended characters safe in i dentifiers?

    On May 22, 2:49 pm, "Jukka K. Korpela" <> wrote:
    > 20.5.2011 23:28, Dr J R Stockton wrote:
    >
    > >> I was somewhat surprised at seeing that bothhttp://www.jslint.com/and
    > >>http://jshint.com/apparently report any non-Ascii characters in
    > >> identifiers as errors, without even offering any option to allow them.

    >
    > > Systems of US origin; such is to be expected.

    >
    > Well, I would have expected that people who write software for checking
    > program source would apply the standard of the programming language.
    > Those linters report non-Ascii letters in identifiers as _errors_. I
    > would accept a warning, though an informative diagnostic might be optimal-
    >
    > >The only Finnish I know is "Eskimo, kiitos", and I
    > > strongly suspect the first of being only a product name.

    >
    > Yes, it is a trademark.
    >
    > If people decide to use words from a language other than English in
    > identifiers, then there are two very different issues:
    >
    > 1) For languages written in Latin letters, you _could_ stick to Ascii
    > and use replacements like "a" or "ae" for "ä". This is what people
    > commonly do in programming, but it distorts the words and may force the
    > programmer to think about potential confusion and to select the words in
    > an unnatural way- The level of distortion depends on language.
    >
    > 2) For languages written using other alphabets, there really isn't much
    > of an option - except perhaps transliteration or transcription.


    I think you can use for that any utf8 char that's not longer than 2
    bytes/16 bits, is that right ?

    For example you can use π or μ but you can't use ∆ or ∑

    I've been using function Æ’ () { ... } for a while in some <script>s,
    but I discovered it was not being parsed properly in some (older)
    browsers (I can't recall which, exactly), so I've had to return to
    function f () {}.
    --
    Jorge.
     
    Ry Nohryb, May 23, 2011
    #13
  14. Re: Are extended characters safe in i dentifiers?

    Mon, 23 May 2011 02:31:33 -0700 (PDT), /Ry Nohryb/:

    > I think you can use for that any utf8 char that's not longer than 2
    > bytes/16 bits, is that right ?


    I believe you're thinking not of "utf8 char" but of an Unicode
    character which could be represented as a single UTF-16 unit (2 bytes).

    > For example you can use π or μ but you can't use ∆ or ∑


    I think all of these are fine as they are represented using single
    UTF-16 unit. I don't know whether using surrogate code points is
    permitted or restricted in identifiers, however.

    > I've been using function Æ’ () { ... } for a while in some<script>s,
    > but I discovered it was not being parsed properly in some (older)
    > browsers (I can't recall which, exactly), so I've had to return to
    > function f () {}.


    --
    Stanimir
     
    Stanimir Stamenkov, May 23, 2011
    #14
  15. Re: Are extended characters safe in i dentifiers?

    23.5.2011 12:42, Stanimir Stamenkov wrote
    :
    > Mon, 23 May 2011 02:31:33 -0700 (PDT), /Ry Nohryb/:
    >
    >> I think you can use for that any utf8 char that's not longer than 2
    >> bytes/16 bits, is that right ?

    >
    > I believe you're thinking not of "utf8 char" but of an Unicode character
    > which could be represented as a single UTF-16 unit (2 bytes).


    I can’t read minds, but in any case, the UTF-8 encoding has nothing to
    do with the issue.

    The following characters are allowed in identifiers according to ECMA
    262: letters, â€$â€, â€_", digits, combining marks, connector punctuation,
    ZWNJ, and ZWJ. The concepts here are to be understood in Unicode sense,
    e.g. â€letter†means any Unicode character defined to be a letter by its
    General Category property.

    >> For example you can use π or μ but you can't use ∆ or ∑

    >
    > I think all of these are fine


    No, ∆ (U+2206 INCREMENT) and ∑ (U+2211 N-ARY SUMMATION) are not allowed
    in identifiers. Their General Category is Symbol, Math.


    as they are represented using single
    > UTF-16 unit. I don't know whether using surrogate code points is
    > permitted or restricted in identifiers, however.
    >
    >> I've been using function Æ’ () { ... } for a while in some<script>s,
    >> but I discovered it was not being parsed properly in some (older)
    >> browsers (I can't recall which, exactly), so I've had to return to
    >> function f () {}.


    I guess it must have been long ago. But I don’t think the use of â€Æ’†as
    a function name is a particularly good example of the needs for, or even
    benefits of, using â€extended†characters in identifiers. In mathematics,
    â€Æ’†is used as a generic symbol of a function.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, May 23, 2011
    #15
  16. Jukka K. Korpela

    Ry Nohryb Guest

    Re: Are extended characters safe in i dentifiers?

    On May 23, 11:42 am, Stanimir Stamenkov <> wrote:
    > Mon, 23 May 2011 02:31:33 -0700 (PDT), /Ry Nohryb/:
    >
    > > I think you can use for that any utf8 char that's not longer than 2
    > > bytes/16 bits, is that right ?

    >
    > I believe you're thinking not of "utf8 char" but of an Unicode
    > character which could be represented as a single UTF-16 unit (2 bytes).


    Right, yes, I would think so too, but then, why does this <http://
    jorgechamorro.com/test.html> throw a parse error @ line 12 (and not @
    line 11), even when '∆'.charCodeAt(0) is 8710 ?

    > > For example you can use π or μ but you can't use ∆ or∑

    >
    > I think all of these are fine as they are represented using single
    > UTF-16 unit.  I don't know whether using surrogate code points is
    > permitted or restricted in identifiers, however.
    >
    > > I've been using function Æ’ () { ... } for a while in some<script>s,
    > > but I discovered it was not being parsed properly in some (older)
    > > browsers (I can't recall which, exactly), so I've had to return to
    > > function f () {}.

    >
    > --
    > Stanimir


    --
    Jorge.
     
    Ry Nohryb, May 23, 2011
    #16
  17. Jukka K. Korpela

    Ry Nohryb Guest

    Re: Are extended characters safe in i dentifiers?

    On May 23, 1:38 pm, "Jukka K. Korpela" <> wrote:
    > 23.5.2011 12:42, Stanimir Stamenkov wrote
    > > Mon, 23 May 2011 02:31:33 -0700 (PDT), /Ry Nohryb/:

    >
    > >> I think you can use for that any utf8 char that's not longer than 2
    > >> bytes/16 bits, is that right ?

    >
    > > I believe you're thinking not of "utf8 char" but of an Unicode character
    > > which could be represented as a single UTF-16 unit (2 bytes).

    >
    > I can’t read minds, but in any case, the UTF-8 encoding has nothing to
    > do with the issue.
    >
    > The following characters are allowed in identifiers according to ECMA
    > 262: letters, â€$â€, â€_", digits, combining marks, connector punctuation,
    > ZWNJ, and ZWJ. The concepts here are to be understood in Unicode sense,
    > e.g. â€letter†means any Unicode character defined to be aletter by its
    > General Category property.
    >
    > >> For example you can use π or μ but you can't use ∆ or ∑

    >
    > > I think all of these are fine

    >
    > No, ∆ (U+2206 INCREMENT) and ∑ (U+2211 N-ARY SUMMATION) are not allowed
    > in identifiers. Their General Category is Symbol, Math.


    Oh, yeah, I see, so *that* was it (*not a letter*)! Thanks!
    --
    Jorge.
     
    Ry Nohryb, May 23, 2011
    #17
  18. Re: Are extended characters safe in i dentifiers?

    Mon, 23 May 2011 04:41:59 -0700 (PDT), /Ry Nohryb/:
    > On May 23, 11:42 am, Stanimir Stamenkov wrote:
    >
    >> I believe you're thinking not of "utf8 char" but of an Unicode
    >> character which could be represented as a single UTF-16 unit (2 bytes).

    >
    > Right, yes, I would think so too, but then, why does this
    > <http://jorgechamorro.com/test.html> throw a parse error @ line 12 (and not @
    > line 11), even when '∆'.charCodeAt(0) is 8710 ?


    Seems like Jukka Korpela has already pointed out ∆ is not legal
    identifier character (I'm missed to check that).

    --
    Stanimir
     
    Stanimir Stamenkov, May 23, 2011
    #18
  19. Jukka K. Korpela

    Ry Nohryb Guest

    Re: Are extended characters safe in i dentifiers?

    On May 23, 1:38 pm, "Jukka K. Korpela" <> wrote:
    > > Mon, 23 May 2011 02:31:33 -0700 (PDT), /Ry Nohryb/:

    >
    > >> I've been using function Æ’ () { ... } for a while in some<script>s,
    > >> but I discovered it was not being parsed properly in some (older)
    > >> browsers (I can't recall which, exactly), so I've had to return to
    > >> function f () {}.

    >
    > I guess it must have been long ago. But I don’t think the use of â€Æ’†as
    > a function name is a particularly good example of the needs for, or even
    > benefits of, using â€extended†characters in identifiers. In mathematics,
    > (...)


    Ha!, so if

    > â€Æ’†is used as a generic symbol of a function.


    why function Æ’ (){} is not a good use for it ?

    And ∑ and ∆ for example, would have been good names too, for summation
    and increment, don't you think so ?
    --
    Jorge.
     
    Ry Nohryb, May 23, 2011
    #19
  20. Re: Are extended characters safe in i dentifiers?

    23.5.2011 12:42, Stanimir Stamenkov wrote:

    > I don't know whether using surrogate code points is
    > permitted or restricted in identifiers, however.


    This is defined somewhat implicitly, since clause 7.6 of ECMA 262 says:
    â€The characters in the specified categories in version 3.0 of the
    Unicode standard must be treated as in those categories by all
    conforming ECMAScript implementations.†An implementation does not need
    to support characters added after Unicode 3.0 in identifiers. And in
    Unicode 3.0, all characters were in BMP, i.e. directly representable as
    16-bit code units.

    In practice, Firefox, IE, Opera, and Chrome all seem to limit identifier
    characters to those for which support is required in the standard. So
    the Phoenican letter \uD802\uDD0E won’t do in an identifier (even though
    it’s OK in a string literal), making Phoeninican programmers very sad.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, May 23, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    360
    Roedy Green
    Apr 22, 2008
  2. Gabriel Rossetti
    Replies:
    0
    Views:
    1,361
    Gabriel Rossetti
    Aug 29, 2008
  3. Replies:
    1
    Views:
    354
    Brian Candler
    Aug 12, 2003
  4. Aredridel

    Not just $SAFE, but damn $SAFE

    Aredridel, Sep 2, 2004, in forum: Ruby
    Replies:
    19
    Views:
    253
  5. Rui Maciel

    Extended identifiers?

    Rui Maciel, Jun 15, 2012, in forum: C Programming
    Replies:
    1
    Views:
    518
    Ben Bacarisse
    Jun 15, 2012
Loading...

Share This Page