Is there still a problem with class names using non ASCII characters?

B

Bamako sur Seine

Well, the title says it all...

Is there still a problem with class names using non ASCII characters?

With the files created using these non ASCII characters be portable ?

Any input or experience welcome.
 
L

Lew

Bamako said:
Is there still a problem with class names using non ASCII [sic] characters?

There's no problem, they're just forbidden by the language spec.

According to the JLS[1]:
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. ,,,
The Java letters include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024). The $ character should be used only in mechanically generated source code or, rarely, to access preexisting names on legacy systems.
With the files created using these non ASCII [sic] characters be portable ?

One hopes that you store your source files as Unicode, preferably UTF-8, since
"non-ASCII" characters can appear throughout your source. If not, that's your
bad.

ASCII and Unicode are portable to all systems that support ASCII and Unicode,
respectively, and only to such systems.

From the JLS:
Programs are written using the Unicode character set. Information about this character set and its associated character encodings may be found at:

http://www.unicode.org

[1] <http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html>
 
B

Bamako sur Seine

Bamako said:
Is there still a problem with class names using non ASCII [sic] characters?

There's no problem, they're just forbidden by the language spec.

According to the JLS[1]:
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. ,,,
The Java letters include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024). The $ character should be used only in mechanically generated source code or, rarely, to access preexisting names on legacy systems.

include does not mean limited to...

http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.8

Letters and digits may be drawn from the entire Unicode character set,
which supports most writing scripts in use in the world today,
including the large sets for Chinese, Japanese, and Korean. This
allows programmers to use identifiers in their programs that are
written in their native languages.

My question is how well implemented is the following section (and
whether its interpretations are mutually compatible) to be ensure
portability:

A package name component or class name might contain a character that
cannot correctly appear in a host file system's ordinary directory
name, such as a Unicode character on a system that allows only ASCII
characters in file names. As a convention, the character can be
escaped by using, say, the @ character followed by four hexadecimal
digits giving the numeric value of the character, as in the \uxxxx
escape (§3.3), so that the package name:

children.activities.crafts.papierM\u00e2ch\u00e9

which can also be written using full Unicode as:

children.activities.crafts.papierMâché

might be mapped to the directory name:

children/activities/crafts/papierM@00e2ch@00e9

If the @ character is not a valid character in a file name for some
given host file system, then some other character that is not valid in
a identifier could be used instead.
 
L

Lew

Bamako said:
Bamako said:
Is there still a problem with class names using non ASCII [sic] characters?
There's no problem, they're just forbidden by the language spec.

According to the JLS[1]:
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. ,,,
The Java letters include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024). The $ character should be used only in mechanically generated source code or, rarely, to access preexisting names on legacy systems.

include does not mean limited to...
http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.8>
Letters and digits may be drawn from the entire Unicode character set,
which supports most writing scripts in use in the world today,
including the large sets for Chinese, Japanese, and Korean. This
allows programmers to use identifiers in their programs that are
written in their native languages.

In that case, there is no problem using non-ASCII characters for class names.
Was there ever?
My question is how well implemented is the following section (and
whether its interpretations are mutually compatible) to be ensure
portability:

If it's a conformant Java implementation, it's perfectly implemented.
 
R

Roedy Green

Is there still a problem with class names using non ASCII characters?

the class file format allows UTF-8 class names, but *.java source
files may be hosted by an OS that does not support arbitrary UTF-8
strings as filenames. If you use them, you also make it difficult for
American programmers who are not familiar with editing non-ASCII text.
 
R

RedGrittyBrick

Roedy said:
the class file format allows UTF-8 class names, but *.java source
files may be hosted by an OS that does not support arbitrary UTF-8
strings as filenames. If you use them, you also make it difficult for
American programmers who are not familiar with editing non-ASCII text.

American?

I can't think why you'd include Brazilians but exclude inhabitants of
Portugal (for example).

North American?

Are not Ozians, Kiwis, Brits etc worthy of consideration too?

http://en.wikipedia.org/wiki/Anglosphere
 
L

Lew

RedGrittyBrick said:
American?

I can't think why you'd include Brazilians but exclude inhabitants of
Portugal (for example).

North American?

Are not Ozians, Kiwis, Brits etc worthy of consideration too?

http://en.wikipedia.org/wiki/Anglosphere

"American" conventionally refers to U.S. residents, not Western Hemispherians.
Perhaps Roedy is alluding to the stereotype of American (as in U.S.)
provincialism, a flaw to which the rest of the "Anglosphere" is perhaps
arguably more immune.
 
R

RedGrittyBrick

Lew said:
"American" conventionally refers to U.S. residents, not Western
Hemispherians. Perhaps Roedy is alluding to the stereotype of American
(as in U.S.) provincialism, a flaw to which the rest of the
"Anglosphere" is perhaps arguably more immune.

Ah yes, since I think Roedy is Canadian, or at least has a Canadian Mind
under his control somewhere [1], I assumed he meant American in a more
general sense.

Perhaps some Canadians still have a lingering resentment of US attempts
to "liberate" them [2] and are therefore happy to point out skills that
may be less common south of the border ;-)
 
R

Roedy Green

North American?

Are not Ozians, Kiwis, Brits etc worthy of consideration too?

not to the same extent. For example when I was in Britain, many
stores posted prices in Euros. Brits are closer to Europe, and hence
more aware of what the rest of the world is doing. Australians and
Kiwis have lots of contact with South East Asia and trade with the
whole world. Many Americans have never left the USA in their whole
lives. I don't think that is true of many other industrialised
countries.

Because of their military and political power, Americans tend to
discount citizens of other countries as mattering. You sometimes see
an attitude that if those damn fool foreigners decorate their letters
with faggy accents, that's their problem. Damned if I will go out of
my way to accommodate them. The only people that matter are Americans.
 
T

Twisted

Perhaps some Canadians still have a lingering resentment of US attempts
to "liberate" them [2] and are therefore happy to point out skills that
may be less common south of the border ;-)

A simpler and probably the correct explanation is that Canadians are
compensating for resented American cultural imperialism and
Americanism assumptions by being showoffs from time to time.
 
O

Oliver Wong

Bamako sur Seine said:
Well, the title says it all...

Is there still a problem with class names using non ASCII characters?

With the files created using these non ASCII characters be portable ?

Any input or experience welcome.

I'm not sure there was ever a problem, at least from Java (there may
be problems coming from your OS). For example, I created this class file
with CJK characters in them, and it compiled fine in Eclipse, but it would
not run under English WinXP (maybe if I'm really bored, I'll one day try
running it Japanese WinXP).

<code>
public class ?? {

public static void main(String[] args) {

System.out.println("Hello World!");

System.out.println(??.class.getName());

}

}

</code>

- Oliver
 
M

Mike Schilling

Lew said:
Bamako said:
Bamako sur Seine wrote:
Is there still a problem with class names using non ASCII [sic]
characters?
There's no problem, they're just forbidden by the language spec.

According to the JLS[1]:

An identifier is an unlimited-length sequence of Java letters and Java
digits, the first of which must be a Java letter.
,,,
The Java letters include uppercase and lowercase ASCII Latin letters
A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical
reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or
\u0024). The $ character should be used only in mechanically generated
source code or, rarely, to access preexisting names on legacy systems.

include does not mean limited to...
http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.8>
Letters and digits may be drawn from the entire Unicode character set,
which supports most writing scripts in use in the world today,
including the large sets for Chinese, Japanese, and Korean. This
allows programmers to use identifiers in their programs that are
written in their native languages.

In that case, there is no problem using non-ASCII characters for class
names. Was there ever?

It used to be impossible to refer to non-ASCII classes in a jar's manifest
file; see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4260472. That
made it tricky for the main class of an executable jar to be non-ASCII.

This has apparently been fixed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top