Garbage from resourceBundle.getObject() for Japanese

D

Delia

Hi Group,

I set up properties files to match different locales, and use the
resourceBundle.getObject() function to retrieve the correct strings.

This works great, but not with Japanese.
With Japanese, I get a string of unicode characters back from getObject()
that don't seem to correspond to the Japanese characters I have saved in my
..properties file.

I tried saving the .properties file in UTF-8, 16 bit unicode (big and
little endian), but I just get varrying strings of characters that I can't
figure out where they come from.


Example:

I went to babelfish and typed "File".

It came back with a japanese word.

I pasted that japanese word into a .properties file (using windows
notepad.exe) and saved it as UTF-8.

I loaded that string with getObject() and it returned:
ã\u0083\u0095ã\u0082iã\u0082¤ã\u0083«

Whereas the actual unicode values should be:
30D5 30A1 30A4 30EB

I can't figure enough of a relationship between these sets of values to
determine what it's trying to do?

Help?

-Delia
 
T

Thomas Weidenfeller

Delia said:
I tried saving the .properties file in UTF-8, 16 bit unicode (big and
little endian), but I just get varrying strings of characters that I can't
figure out where they come from.

In general, I would suggest you read the documentation for Properties.
There you will find out the following:

A properties file is supposed to be in ISO Latin-1, and nothing else. If
you need characters in a properties file outside the Latin-1 range you
need to use the \u.... notation to enter the codes for the characters.

If you don't want to type all that by hand, use the native2ascii tool
(comes with the JDK) to convert some non Latin-1 file (e.g. UTF-8) to
Latin-1 with \u escapes.

I suggest you add the conversion via native2ascii to your build system,
so the conversion is automated whenever you change the input file.

/Thomas
 
D

Delia

In general, I would suggest you read the documentation for Properties.
There you will find out the following:

A properties file is supposed to be in ISO Latin-1, and nothing else.
If you need characters in a properties file outside the Latin-1 range
you need to use the \u.... notation to enter the codes for the
characters.

If you don't want to type all that by hand, use the native2ascii tool
(comes with the JDK) to convert some non Latin-1 file (e.g. UTF-8) to
Latin-1 with \u escapes.

I suggest you add the conversion via native2ascii to your build
system, so the conversion is automated whenever you change the input
file.

/Thomas

You are indeed correct about the documentation.

I found it and it says what you say.

** gets on soapbox **

I just had a hard time bringing myself to believe that Java I18N would
be so limited when it has all the resources of Taligent folded into it.

Everyone always pushes how web-enabled Java is and how great it is with
I18N/L10N. I was thinking there had to be some update or better way of
doing something that is integral to the _world_ wide web and I18N.

** off soapbox **
 
T

Todd Carnes

I work with Japanese text occassionally & I would just like add that you
might want to be careful about cutting & pasting what you get from Babel
and then "assuming" it's unicode. Most likely it's not. There are
several different encodings for Japanese & if I were forced to choose
which I thought was most prevelant, I'd have to say that Unicode is the
least used encoding scheme. I think you're much more likely to run into
Shift-JIS or EUC-JP encoding, at least that's what I usually see on the web.
 
M

Michael Borgwardt

Todd said:
I work with Japanese text occassionally & I would just like add that you
might want to be careful about cutting & pasting what you get from Babel
and then "assuming" it's unicode. Most likely it's not.

Yes, it definitely is. Clipboard functionality doesn't just transfer bytes
(and why the hell would the browser return the binary HTML source when
asked for the marked characters?), it is quite complex and knows different
data types. The clipboard contains abstract characters, just like Java.
It's up to the source and target applications to provide and process them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top