javac -encoding problem and/or glaring bug ?

java · Aug 14, 2007

Hi:

Consider this file, saved to disk as utf-8, no BOM.
---------------------------------------------------
public class x
{
public static void main (String args[])
{
System.out.println("\u0222");
}
}
--------------------------------------------------------

By the way, unicode 0x0222 looks like a funky eight --> È¢
You may not see it in this news post because of your
newsreader, doesn't matter.

While compiling I've tried all of:

javac x.java
javac -encoding utf-8 x.java
javac -encoding utf8 x.java
javax -encoding UTF-8 x.java
javac -encoding UTF8 x.java

Using:
JDK 1.5, on both linux and osx (same problem)

If you run this (regardless of how you compile it), you
will see '?' instead of the proper unicode character
(regardless of output device, even if you output to a
unicode capable terminal that can properly render
0x0222, you still see '?'

Am I missing something or is this like the biggest most
retarded bug ever ?

--j

Real Gagnon · Aug 14, 2007

If you run this (regardless of how you compile it), you
will see '?' instead of the proper unicode character
(regardless of output device, even if you output to a
unicode capable terminal that can properly render
0x0222, you still see '?'

Try to run it with :

java -Dfile.encoding=UTF8 x

Bye.

java · Aug 14, 2007

Try to run it with :

java -Dfile.encoding=UTF8 x

Ok, I tried that and that solved the problem.

But why ?

javac -encoding UTF8 x.java --> x.class

Now, shouldn't x.class be entirely self contained ? It's not
java source anymore.

So why do I have to set this property ? Is it because
the PrintWriter (System.out) uses this "file.encoding"
property internally ?

Background:
This becomes tricky when I have differently encoded web pages
(say jsp's) on the server at the same time (all of which print
debugging messages using System.out)

-j

Thomas Fritsch · Aug 14, 2007

java said:
Consider this file, saved to disk as utf-8, no BOM.
---------------------------------------------------
public class x
{
public static void main (String args[])
{
System.out.println("\u0222");
}
} [...]
While compiling I've tried all of:

javac x.java
javac -encoding utf-8 x.java
javac -encoding utf8 x.java
javax -encoding UTF-8 x.java
javac -encoding UTF8 x.java

Using:
JDK 1.5, on both linux and osx (same problem)

If you run this (regardless of how you compile it), you
will see '?' instead of the proper unicode character
(regardless of output device, even if you output to a
unicode capable terminal that can properly render
0x0222, you still see '?'

Am I missing something

Aaahm, yes.
Your *source* contains only harmless ASCII characters.
Remember, \ u 0 2 2 are in range 0x0020...0x007F, where ASCII is
identical to UTF-8. Therefore all your effort to make the compiler
understand UTF-8 is pointless. (sorry)
Your problem is not a compile-problem (javac), but a runtime-problem
(java). Real Gagnon already told how to parametrize java to use UTF-8.
But even that might not solve your problem, if the font used by your
terminal doesn't contain a rendering for the 0x0222 character.

By the way: Even my "Arial Unicode MS" font, which contains all of the
greek, cyrillic, armenian, chinese etc characters, has no renderings in
the range 0x0220..0x024F.

Real Gagnon · Aug 14, 2007

This becomes tricky when I have differently encoded web pages
(say jsp's) on the server at the same time (all of which print
debugging messages using System.out)

The "file.encoding" trick is maybe ok for small console program but maybe
not with a server. You may want to use a special PrintStream instead.

See http://www.rgagnon.com/javadetails/java-0046.html

Bye.

Roedy Green · Aug 17, 2007

You are confused between the
encoding of the Java source, and the
encoding you want for the console output.

To configure the encoding of your Java source. See
http://mindprod.com/jgloss/encoding.html#SOURCE

To configure the default encoding
of your console and files, See
http://mindprod.com/jgloss/encoding.html#CONSOLE

P.S. "x" is not a suitable class name.
Classes should begin with a capital letter.

Juha Laiho · Aug 19, 2007

java said:
Ok, I tried that and that solved the problem.

But why ?

javac -encoding UTF8 x.java --> x.class

Now, shouldn't x.class be entirely self contained ? It's not
java source anymore.

So why do I have to set this property ? Is it because
the PrintWriter (System.out) uses this "file.encoding"
property internally ?

That is because the JVM runtime does attempt to find out what
character encoding the environment outside the JVM uses, and
apparently in your environment it gets a native character set
of something else that UTF8.

So, even if you have funky UTF-8 characters in your source,
Java may be able to print them out in environments with some
other native character encoding, if that other encoding
happens to have a code point for the same character glyph.

For example, source code with UTF-8 may contain the byte
sequence [0xc3, 0xa4], signifying lower-case a-diaeresis
character glyph. Now, if that source code is compiled
properly, letting the compiler know that the source is in UTF-8
character set, and subsequently the code is run in an environment
with ISO-8859-1 character set, the program will output just
one byte, 0xE4. Also, if the same code is run in an environment
configured for plain US-ASCII character set, it will output
only a question mark (as US-ASCII character set does not have
a glyph for the a-diaeresis character.

Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
RDoc and encoding	3	Jan 10, 2011
A few questiosn about encoding	103	Jun 9, 2013
Mod_python and encoding problem	0	Feb 3, 2007
EJB - magic quotes and encoding problem	4	Aug 5, 2010
encoding problem	11	Dec 19, 2008
JDK Transformer bug?	0	Nov 16, 2005
hex dump w/ or w/out utf-8 chars	40	Jul 8, 2013

javac -encoding problem and/or glaring bug ?

java

Real Gagnon

java

Thomas Fritsch

Real Gagnon

Roedy Green

Juha Laiho

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads