C++ grammar: universal-character-name in identifiers

F

Francesco

Hi there,
sorry for posting this as a separate thread but the other one started
with the wrong foot.

After having posted (there) that C++ program with Chinese characters
used as identifiers, I begun to think: what if those identifiers
aren't really valid?

Then I started my search for checking out whether that program was
really valid C++ as I prematurely claimed.

Searching the web I wasn't able to find any source for clarifying this
issue - I was looking for some Unicode table classifying characters as
"digit", "alphabetic" and so on, and I wasn't able to find it - maybe
such a table doesn't even exist. I found an on-line interface to a
Chinese characters DB reporting codes, strokes classifications and so
on, but that's all about it.

Then, browsing my copy of TC++PL I've dropped my eye on the grammar.

An identifier is declared in this way:
-------
identifier:
nondigit
identifier nondigit
identifier digit
-------
and also:
-------
nondigit: one of
universal-character-name
_ a b c [...] x y z
A B C [...] X Y Z
-------

Of course, there is a universal-character-name for each digit,
punctuation sign and so on, but since those are defined as specific
grammar items (i.e. "digit", "preprocessing-op-or-punc" and so on) I
assume that "one of universal-character-name" excludes those
characters by definition.

So then, does it mean that "universal-character-name" stands for [a
representation of] _any_ character other than those defined by other
parts of the grammar - even if they represent a digit in some other
language?

For instance, take the character 二 (two) - if missing, the glyph looks
like an equal sign "=", just for information.

That's a digit in Chinese, does C++ consider it digit or nondigit?

Thank you for your attention,
best regards,
Francesco
 
A

Alf P. Steinbach

* Francesco:
Hi there,
sorry for posting this as a separate thread but the other one started
with the wrong foot.

After having posted (there) that C++ program with Chinese characters
used as identifiers, I begun to think: what if those identifiers
aren't really valid?

Then I started my search for checking out whether that program was
really valid C++ as I prematurely claimed.

Searching the web I wasn't able to find any source for clarifying this
issue - I was looking for some Unicode table classifying characters as
"digit", "alphabetic" and so on, and I wasn't able to find it - maybe
such a table doesn't even exist. I found an on-line interface to a
Chinese characters DB reporting codes, strokes classifications and so
on, but that's all about it.

Then, browsing my copy of TC++PL I've dropped my eye on the grammar.

An identifier is declared in this way:
-------
identifier:
nondigit
identifier nondigit
identifier digit
-------
and also:
-------
nondigit: one of
universal-character-name
_ a b c [...] x y z
A B C [...] X Y Z
-------

Of course, there is a universal-character-name for each digit,
punctuation sign and so on, but since those are defined as specific
grammar items (i.e. "digit", "preprocessing-op-or-punc" and so on) I
assume that "one of universal-character-name" excludes those
characters by definition.

So then, does it mean that "universal-character-name" stands for [a
representation of] _any_ character other than those defined by other
parts of the grammar - even if they represent a digit in some other
language?

For instance, take the character 二 (two) - if missing, the glyph looks
like an equal sign "=", just for information.

That's a digit in Chinese, does C++ consider it digit or nondigit?

The short of it is, as James Kanze remarked other-thread today or was it
yesterday, that while formally C++ supports general Unicode in names, and did
that before Java, most compilers don't support that.

The characters accepted formally by C++ are the set defined by some ISO
standard, IIRC the used for e.g. JavaScript, and I believe also Java.

There's an appendix at the back of the standard that has some more info, but
essentially: don't use it, not even Western language characters such as ÆØÅ.


Cheers & hth.,

- Alf
 
F

Francesco

* Francesco:


Hi there,
sorry for posting this as a separate thread but the other one started
with the wrong foot.
After having posted (there) that C++ program with Chinese characters
used as identifiers, I begun to think: what if those identifiers
aren't really valid?
Then I started my search for checking out whether that program was
really valid C++ as I  prematurely claimed.
Searching the web I wasn't able to find any source for clarifying this
issue - I was looking for some Unicode table classifying characters as
"digit", "alphabetic" and so on, and I wasn't able to find it - maybe
such a table doesn't even exist. I found an on-line interface to a
Chinese characters DB reporting codes, strokes classifications and so
on, but that's all about it.
Then, browsing my copy of TC++PL I've dropped my eye on the grammar.
An identifier is declared in this way:
-------
identifier:
    nondigit
    identifier nondigit
    identifier digit
-------
and also:
-------
nondigit: one of
    universal-character-name
    _ a b c [...] x y z
      A B C [...] X Y Z
-------
Of course, there is a universal-character-name for each digit,
punctuation sign and so on, but since those are defined as specific
grammar items (i.e. "digit", "preprocessing-op-or-punc" and so on) I
assume that "one of universal-character-name" excludes those
characters by definition.
So then, does it mean that "universal-character-name" stands for [a
representation of] _any_ character other than those defined by other
parts of the grammar - even if they represent a digit in some other
language?
For instance, take the character 二 (two) - if missing, the glyph looks
like an equal sign "=", just for information.
That's a digit in Chinese, does C++ consider it digit or nondigit?

The short of it is, as James Kanze remarked other-thread today or was it
yesterday, that while formally C++ supports general Unicode in names, and did
that before Java, most compilers don't support that.

The characters accepted formally by C++ are the set defined by some ISO
standard, IIRC the used for e.g. JavaScript, and I believe also Java.

There's an appendix at the back of the standard that has some more info, but
essentially: don't use it, not even Western language characters such as ÆØÅ.

Fine, I won't use them in real code.

The purpose of my post was to check if the code I posted with Chinese
identifiers was really valid - and once I was there, to completely
understand the point.

Now I see from your post that I should look out for that appendix in
order to clarify this point - I thought my reasoning above was enough
to assume "all characters except those otherwise specified by this
grammar" as valid identifier's characters. I'll dig the appendix.

Thanks a lot,
Francesco
 
J

James Kanze

[...]
It depends. It can't be used as part of a number, but it is
legal in an identifier (even as the first character of an
identifier).
Fine, I won't use them in real code.

In portable code. I think they work in VC++.
 
F

Francesco

    [...]

It depends.  It can't be used as part of a number, but it is
legal in an identifier (even as the first character of an
identifier).

Thank you for the confirmation, James, just what I was looking for to
tranquilize myself about the Chinese program I posted. About your "It
depends", I suppose you meant something about the fact that some
isdigit() function could return true on that character - which would
be good, I suppose.
In portable code.  I think they work in VC++.

Oh yes, of course I should have written "in portable code", up there.

Thanks again,
Francesco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top