Problems displaying Japanese characters in alert boxes

T

Thomas Kraft

Hi List,

i have a problem displaying asian characters (japanese or chinese) in an
javascript alert box.

I have: A Excel File from my customer with japanese and chinese strings.
I need: A javascript alert box which shows exactly the same strings.

What i tried:

- I tried to put this strings directly as a string into the alert box.
This is showing strange characters. For example:
alert('ブ'); shows three characters, and 'ブ'.length is also 3.

- i tried some encodings i found in the history of this group like
alert('\u23455'); or alert('実'); without any success. The \u
variant showed two characters (empty box and a 5) and the & variant
showed exactly 実 so the escaping did not work with both variants.

What i found:

Something like alert(String.fromCharCode(55038)); shows me some
(chinese?) character in the alert box.

What i try to reach now:

I need to know how i will be able to convert this strings from excel to
values i can use in this fromCharCode() function. (If this is somehow
possible)

When i save this single japanese character from above in a textfile and
save this to harddisk and look at this file with hex editor it shows
like this:

E3 A3 96

I tried hours of converting this 3 Byte hex String to decimal numbers
and put into alert box to look if this japanese character will appear,
but without success. I saw a lot of funny asian symbols as well as some
arabic symbols but was not successful to see this specific char.

Anyone has any Idea how to find out which code i need to see my japanese
and chinese characters? Any help will be highly appreciated.

Thanks in advance,

Thomas Kraft
 
T

Thomas 'PointedEars' Lahn

Thomas said:

Hello. This is a Usenet newsgroup, not a mailing list.
I have: A Excel File from my customer with japanese and chinese strings.

That would imply data in either two different encodings where one set of
characters is displayed wrong, or Unicode characters encoded in a UTF.
I need: A javascript alert box which shows exactly the same strings.

What i tried:

- I tried to put this strings directly as a string into the alert box.
This is showing strange characters. For example:
alert('ブ'); shows three characters, and 'ブ'.length is also 3.

If you paste the character verbatim, the target resource encoding must match
the original encoding (provided the clipboard preserves the encoding), and
the proper encoding needs to be declared in the Content-Type HTTP header of
the relevant resource; maybe that is required also for the including
resource. The Content-Type `meta' element of an HTML document, if included,
should not differ from the header value (with HTTP, the header should take
precedence; without it there is no header, so the `meta' element is
supposedly interpreted).

Also, the script engine that you are using needs to be Unicode-aware, i.e.
it has to be JavaScript 1.3 or JScript 3.0 to 5.0, or it has to implement
the ECMAScript Language Specification, Edition 3 (e.g. JavaScript 1.5+,
JScript 5.5+).

You have not said with which HTML user agent(s) you have tested with.
- i tried some encodings i found in the history of this group like
alert('\u23455'); or alert('実'); without any success.

Neither one is supposed to work. `\u' must be followed by the *hexadecimal*
notation of the Unicode code point for a proper Unicode literal. `&#'
starts a character reference in SGML-based *markup* languages; it would only
work if the script code was pre-parsed by an XML parser as it was part of
the content of an XHTML `script' element served with an XML MIME media type.
And it would have to be followed by the decimal notation of the codepoint
or `x' and the hexadecimal notation then, whereas the former is more compatible.

[en] http://jibbering.com/faq/
[de] http://www.dodabo.de/charset/index.html
The \u variant showed two characters (empty box and a 5)

No surprise here. The specified and implemented syntax is \uhhhh, with `h'
designating a hexadecimal digit, and you have used five digits. So you have
tried to display the equivalent of '\u2345' + '5'. U+2345 is named "APL
FUNCTIONAL SYMBOL LEFTWARDS VANE" (Miscellaneous Technical). You need a
special Unicode font for that character range in order to display the
character anywhere. As a placeholder, user agents are known to show either
a rectangle glyph (MSHTML-based) or a question mark (Gecko-based).
and the & variant showed exactly 実 so the escaping did not work
with both variants.

BAD. Broken as designed.
Something like alert(String.fromCharCode(55038)); shows me some
(chinese?) character in the alert box.

U+D6FE (훾; decimal codepoint 55038) has no name but belongs to the "Hangul
Syllables" character range. "Hangul [...] is the native alphabet of the
Korean language. [...] It is the official script of North Korea, South
Korea and the Yanbian Korean Autonomous Prefecture of China." (Wikipedia)
What i try to reach now:
^^^^^
Being a fellow German, I think you don't mind if I tell you that IMHO the
proper word in this context is "(to) realise/realize" or simply "(to) do" :)

BTW, did you know that there is de.comp.lang.javascript?
I need to know how i will be able to convert this strings from excel to
values i can use in this fromCharCode() function. (If this is somehow
possible)

I don't think you need to.
When i save this single japanese character from above in a textfile

IIRC, Excel (2003) saves to a text file encoded with UTF-16LE. Maybe you
need to convert this to UTF-8 if your HTML document or script resource is
UTF-8 encoded; you can use Notepad for that.
and save this to harddisk and look at this file with hex editor it shows
like this:

E3 A3 96

According to <http://people.w3.org/rishida/scripts/uniview/conversion>,
those are the UTF-8 code units for U+38D6, the "Han ideograph" from the
character range "CJK Unified Ideographs Extension A"; so probably Chinese.


HTH

PointedEars
 
T

Thomas Kraft

Hello. This is a Usenet newsgroup, not a mailing list.

Oops, sorry, but i am used to write to mailing lists ;)

[...]
If you paste the character verbatim, the target resource encoding must match
the original encoding (provided the clipboard preserves the encoding), and
the proper encoding needs to be declared in the Content-Type HTTP header of
the relevant resource; maybe that is required also for the including
resource. The Content-Type `meta' element of an HTML document, if included,
should not differ from the header value (with HTTP, the header should take
precedence; without it there is no header, so the `meta' element is
supposedly interpreted).

Ok, this may be a problem, because i can only edit the content of the
html file, i will not be able to change the header of the files.
Also, the script engine that you are using needs to be Unicode-aware, i.e.
it has to be JavaScript 1.3 or JScript 3.0 to 5.0, or it has to implement
the ECMAScript Language Specification, Edition 3 (e.g. JavaScript 1.5+,
JScript 5.5+).

You have not said with which HTML user agent(s) you have tested with.

I tested with firefox version 2.0.0.11 on windows an Linux, Internet
Explorer 6.0.2900.[...] on Windows, Opera 9.25 on Windows, and Konqueror
3.5.8 on Linux.
- i tried some encodings i found in the history of this group like
alert('\u23455'); or alert('実'); without any success.

Neither one is supposed to work. `\u' must be followed by the *hexadecimal*
notation of the Unicode code point for a proper Unicode literal. `&#'
starts a character reference in SGML-based *markup* languages; it would only
work if the script code was pre-parsed by an XML parser as it was part of
the content of an XHTML `script' element served with an XML MIME media type.
And it would have to be followed by the decimal notation of the codepoint
or `x' and the hexadecimal notation then, whereas the former is more compatible.

[en] http://jibbering.com/faq/
[de] http://www.dodabo.de/charset/index.html

Ok, i was just despairing and tried out every snipet of code which i
found in internet if any of this will work.
No surprise here. The specified and implemented syntax is \uhhhh, with `h'
designating a hexadecimal digit, and you have used five digits. So you have
tried to display the equivalent of '\u2345' + '5'. U+2345 is named "APL
FUNCTIONAL SYMBOL LEFTWARDS VANE" (Miscellaneous Technical). You need a
special Unicode font for that character range in order to display the
character anywhere. As a placeholder, user agents are known to show either
a rectangle glyph (MSHTML-based) or a question mark (Gecko-based).

Ok, thanks a lot for explaining, i should have known that \u is followed
by hex, and not decimal. Seems that i was not able to think clearly
yesterday evening ;)

BAD. Broken as designed.

Same here, i was despaired and just tried every snippet i found in net.
Something like alert(String.fromCharCode(55038)); shows me some
(chinese?) character in the alert box.

U+D6FE (훾; decimal codepoint 55038) has no name but belongs to the "Hangul
Syllables" character range. "Hangul [...] is the native alphabet of the
Korean language. [...] It is the official script of North Korea, South
Korea and the Yanbian Korean Autonomous Prefecture of China." (Wikipedia)

Again thanks for the info :)
^^^^^
Being a fellow German, I think you don't mind if I tell you that IMHO the
proper word in this context is "(to) realise/realize" or simply "(to) do" :)

Of course i don't mind :)
BTW, did you know that there is de.comp.lang.javascript?

Not yet, but i will subscribe there
I don't think you need to.

After solving this i think you are right ;)
IIRC, Excel (2003) saves to a text file encoded with UTF-16LE. Maybe you
need to convert this to UTF-8 if your HTML document or script resource is
UTF-8 encoded; you can use Notepad for that.

I just copied this character from OpenOffice Calc (I don't have
Microsoft Office license) and with pasted this in new textfile in kate
and saved, just because i wanted to know how it will be saved on binary
level-
According to <http://people.w3.org/rishida/scripts/uniview/conversion>,
those are the UTF-8 code units for U+38D6, the "Han ideograph" from the
character range "CJK Unified Ideographs Extension A"; so probably Chinese.

This Link is exactly what i was searching for. With this link i can
convert the contents of this Excel file to JavaScript and this works
like a charm on every user agent i tried so far. For example there is a
text "expand all" which in japanese is "ã™ã¹ã¦é–‹ã" and in chinese would
be "全部展开".

This tool from w3 is converting this string to Javascript:
Japanese: \u3059\u3079\u3066\u958B\u304F
Chinese: \u5168\u90E8\u5C55\u5F00

When i put these to my alert() box i can see exactly the same characters
which are appearing in the Excel File.

You made my day! Thanks alot

Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top