JNI / localization / filenames

mwaller · Jan 16, 2004

I'm having a bit of an issue with filenames on Korean systems.
We use the native file dialogs on Win32 (the customer insisted), and
consequently use JNI to access them. This makes life a little
complicated as we have to dip into JNI to get the filename, use it in
various classes in Java to create temporary files for the customised
file format, then save the actual file in JNI.
Using guidelines from Sheng Liang's 'The Java Native Interface' book
I'm using the String getBytes() and new String(byte[]) to convert to
the default locale, instead of GetNewStringUTF.
I can now save files on Korean W2K, but the resultant file cannot be
opened by double clicking unless it is renamed (in DOS) first. (my
'tester' is on the other side of the world as I haven't got a Korean
Windows to hand, and this is what he reported.) Opening the
application then opening the file works.

Any ideas?

mlw

Jon A. Cruz · Jan 16, 2004

mwaller said:
Using guidelines from Sheng Liang's 'The Java Native Interface' book
I'm using the String getBytes() and new String(byte[]) to convert to
the default locale, instead of GetNewStringUTF.

That just sounds like bad advice.

I'd recommend always staying Unicode as long as possible. Even Microsoft
does this.

I can now save files on Korean W2K, but the resultant file cannot be
opened by double clicking unless it is renamed (in DOS) first. (my
'tester' is on the other side of the world as I haven't got a Korean
Windows to hand, and this is what he reported.) Opening the
application then opening the file works.

Ahhh. MS Windows.

Did you know that Windows has an API call to convert between 16-bit
Unicode and the local code page?

WideCharToMultiByte and MultiByteToWideChar. Just pass it CP_ACP.

So use the UTF-16 versions of JNI string calls, not the UTF-8 versions.

Oh, and don't listen to them. Avoid TCHAR, _T and #define UNICODE.
(Microsoft doesn't use them due to problems)

If you can keep all file creation and writing in JNI code, things might
be simpler. There are some long standing bugs in Sun's VM in regards to
non-ASCII filenames on WindowsNT systems.

Also... be aware that most win32 calls have just macros that resolve foo
to be fooA or fooW (for ANSI or Wide, respectively). Just keep all your
characters in JNI as explicit 16-bit UTF-16 characters, then you can
call the fooW calls directly on Windows NT, Windows 2000, Windows XP. On
Windows 95, Windows 98 and Windows ME you'll have to fall back to
convert wide-to-multibyte, call the fooA versions, then converting the
results back from multibyte to widechar.

Chris Uppal · Jan 17, 2004

Jon said:
So use the UTF-16 versions of JNI string calls, not the UTF-8 versions.

But, whatever you do, don't forget that what Sun calls "UTF8" in JNI is nothing
of the kind.

It's a different encoding (admitedly similar in many ways) and you'll have to
convert it to whatever encoding of Unicode Windows uses.

-- chris

Jon A. Cruz · Jan 17, 2004

Chris said:
Jon A. Cruz wrote:

But, whatever you do, don't forget that what Sun calls "UTF8" in JNI is nothing
of the kind.

It's a different encoding (admitedly similar in many ways) and you'll have to
convert it to whatever encoding of Unicode Windows uses.

But...

That's just another reason to avoid it, as I had just said.

Keep to NewString, GetStringLength, GetStringChars, and use jchar.

Chris Uppal · Jan 18, 2004

Jon said:
Chris said:

Jon A. Cruz wrote:

But, whatever you do, don't forget that what Sun calls "UTF8" in JNI is
nothing of the kind.

Click to expand...

[...]

But...

That's just another reason to avoid it, as I had just said.

Sorry, I meant to be read as adding to your point, not denying it.

OTOH. There is no such thing as UTF-16 in JNI. Just the option of using 16-bit
quantities to represent Java 16-bit 'char's directly -- but that's not UTF-16.

(I really wish Sun hadn't started this wretched idea of referring to its
non-standard, short-sighted, crippled, character encodings by the same name as
"proper" industry standards.)

-- chris

Jon A. Cruz · Jan 19, 2004

Chris said:
OTOH. There is no such thing as UTF-16 in JNI. Just the option of using 16-bit
quantities to represent Java 16-bit 'char's directly -- but that's not UTF-16.

Sun experts in the area would disagree with you on that. They definitely
consider it UTF-16 and not UCS-2, and are continually adding more
support for UTF-16 details in implementations, including more support of
surrogate pairs, etc. JSR 204 has more info.

Chris Uppal · Jan 21, 2004

Jon said:
Sun experts in the area would disagree with you on that. They definitely
consider it UTF-16 and not UCS-2, and are continually adding more
support for UTF-16 details in implementations, including more support of
surrogate pairs, etc. JSR 204 has more info.

I don't think so, looking at JSR 204 (thanks for the pointer, I hadn't seen it
before) I get the impression that they are attempting to come to terms with the
fact that Java and Unicode don't match.

--- the intro to JSR 204 ---
Version 3.1 of the Unicode standard is the first one to define characters that
cannot be described by single 16-bit code points and thus the standard breaks a
fundamental assumption of the Java programming language and APIs. This JSR
defines the necessary adjustments to the Java APIs to enable support for such
characters and enables the Java platform to continue to track the Unicode
standard.
----------

Unfortunately the rest of the JSR paper doesn't seem to provide much
information.

However it seems clear that they are following the Unicode standard's very
unfortunate wording and thinking of characters with code points > 2**16 as
somehow "additional", maybe not "real Unicode characters".

Unicode cannot be represented by 16-bit characters.

Sequences of Unicode characters (up to 24-bit) can be represented as sequences
of 16-bit quantities using UTF-16. However neither Java Strings, nor the
arrays of jchar manipulated by JNI are in this encoding. Java/JNI use a direct
encoding of Java's 16-bit characters as (probably) "unsigned short" in JNI.
That isn't UTF-16. Java's encoding is neither upward or downward compatible
with UTF-16 (though there are many sequences of characters that are encoded the
same way in both.)

Granted, a Java String could be used to hold a UTF-16 sequence, (but then so
could a char[], a short[], or a byte[], or -- hell -- even a double[] since
it's only a string of bytes). But the Java "char"s in such a sequence are not
the same as the Unicode "character"s in the same collection of bits.

That's why I say that Java doesn't support Unicode, and that Java Strings, and
char[]s, are *not* UTF-16.

The good people working on JSR204 will have to find a way to work this out.
I'd guess that they'll introduce new APIs for using Strings and char[]s to hold
UTF16-encoded data, and have int-returning methods that (e.g.) know how to do
the decoding to find the (say) 8th Unicode character in a String. The use of
the "char" primitive datatype will start to look very dodgy indeed. With luck
they'll also define a few UnicodeString classes which separate the interface
from the representation, and (internally) encode the Unicode data in
programmer-selectable ways.

However none of that has happened yet. When it does it will introduce another
boat-load of complexity into the Java programmers life, and invalidate
(partially) a load of text handling code that already exists. They are going
to have a very hard time trying to sell this stuff to the community, and their
job won't be made easier by the fact that Sun has traditionally blurred the
differences between the Java APIs and real Unicode -- such as the many APIs
that falsely claim to talk UTF-8.

-- chris

Using JNI Produced Jar Files: java.lang.UnsatisfiedLinkError	5	Jun 15, 2012
JNI FindClass problem	3	Jul 9, 2009
JNI UnsatisfiedLinkError	11	Nov 29, 2007
Using JNI to Call C++ Methods from Java Using NetBeans IDE 7.1.2	0	Jun 14, 2012
reading filenames from stdin - with umlauts?	18	Jul 27, 2008
Using JNI to Invoke C++ Method in Java via IDE NetBeans 7.1.2	0	Jun 14, 2012
Using JNI (to get C++ routines) with Java : NetBeans IDE 7.1.2	0	Jun 14, 2012
What in Hell are Sun playing at with learning JNI ?	18	Jun 26, 2006

JNI / localization / filenames

mwaller

Jon A. Cruz

Chris Uppal

Jon A. Cruz

Chris Uppal

Jon A. Cruz

Chris Uppal

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads