natiev2ascii

W

wkijava

I have 2 questions refering to native2ascii:

1.I receive an xls-file from a japanese windows-user. It has two
columns: key for a properties-file and japanese text value for this
key. I intend to create a properties-file from this, which I may run
through native2ascii. But if I export the xls-file as unicode text, the
(latin ) key values also get translated into 2-byte charactes; this
results in a space between the characters , thus corrupting the key:
e.g. bws.welcome gets b w s . w e l c o m e . How can I solve this
issue?

2. If I create a text file with the japanese text only and save it as
Unicode, I can run it through native2ascii. But whatever enconding I
use, the unicode values generated are not valid and browsers don't
display anything resembling Japanese. The encondings I tried are: SJIS,
MS932, EUC_JP and ISO2022JP. JISAutoDetect does not work, the command
complains, that the encoding is not found. I use SKD 1.4.2_06.
My japanese colleague says, that he has a standard japanese Windows
with no special settings. SJIS works best, but still the translation
contains values, which are obivously not Unicode. Any suggestions?

Kind regards,
Wolfgang
 
T

Thomas Weidenfeller

wkijava said:
1.I receive an xls-file from a japanese windows-user. It has two
columns: key for a properties-file and japanese text value for this
key. I intend to create a properties-file from this, which I may run
through native2ascii. But if I export the xls-file as unicode text,

There is not one "Unicode text" format. There are a couple of Unicode
encodings like UTF-8, UTF-16LE, UTF-16BE. Figure out what you really got.
(latin ) key values also get translated into 2-byte charactes; this
results in a space between the characters , thus corrupting the key:
e.g. bws.welcome gets b w s . w e l c o m e . How can I solve this
issue?

Up to this point there is no issue. You have likely written a file in
UTF-16 (LE or BE).
2. If I create a text file with the japanese text only and save it as
Unicode, I can run it through native2ascii. But whatever enconding I
use, the unicode values generated are not valid and browsers don't
display anything resembling Japanese. The encondings I tried are: SJIS,
MS932, EUC_JP and ISO2022JP. JISAutoDetect does not work, the command
complains, that the encoding is not found.

a) Why do you think that should work? You have just told us that you
have saved the file in some Unicode encoding. Non of the above encodings
is a Unicode encoding.

b) Why do you think a browser should display any Japanese from the
output of native2ascii? native2ascii does not write HTML. The Japanese
characters in the native2ascii output are not HTML character entities.
They are written as Java's \u Unicode escape sequences in plain ASCII.
contains values, which are obivously not Unicode. Any suggestions?

Specify the Unicode encoding in which you have written your data as
argument to native2ascii.

/Thomas
 
W

wkijava

Well actually your tip helped. By the way, Excel allows only to export
to 'Unicode text'; it's not more specific. But I have to admit, that I
misunderstood the native2ascii documentation. I run it with encoding
UTF-16 (just a guess) and it worked. The properties-file is now
processed properly and the web application shows japanese characters.
Thanks very much.
 
R

Roedy Green

1.I receive an xls-file from a japanese windows-user. It has two
columns: key for a properties-file and japanese text value for this
key. I intend to create a properties-file from this, which I may run
through native2ascii. But if I export the xls-file as unicode text, the
(latin ) key values also get translated into 2-byte charactes; this
results in a space between the characters , thus corrupting the key:
e.g. bws.welcome gets b w s . w e l c o m e . How can I solve this
issue?

Your file is partly encoding in one way and partly in another.
native2ascii is not prepared to deal with that. Yo will have to write
some custom code to open the file to read the two different sections
with different encodings, or split the file in two and process the two
halves conventionally.

The fileio amanuensis will generate a skeleton program to read the
pieces.

See http://mindprod.com/applets/fileio.html

--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 
R

Roedy Green

2. If I create a text file with the japanese text only and save it as
Unicode, I can run it through native2ascii. But whatever enconding I
use, the unicode values generated are not valid and browsers don't
display anything resembling Japanese. The encondings I tried are: SJIS,
MS932, EUC_JP and ISO2022JP. JISAutoDetect does not work, the command
complains, that the encoding is not found. I use SKD 1.4.2_06.
My japanese colleague says, that he has a standard japanese Windows
with no special settings. SJIS works best, but still the translation
contains values, which are obivously not Unicode. Any suggestions?

I suggest you have a peek at some files no the net that do display
Japanese correctly. I suspect they may start with a unicode header
then flip to Japanese for the body or something similar.

There needs to be a way of noting the encoding of a file in some
standard way.
Kludges include:

1. a companion file with the name originalfile.encoding

2. embedded code in first few bytes.

3. hiding it in the filename somewhere.

I think the world is resisting this hoping that non-Unicode encodings
will disappear. See http://mindprod.com/jgloss/encoding.html

--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top