Prototype, Safari and Japanese problems?

D

Doug Lerner

I'm working on a client/server app that seems to work fine in OS Firefox and
Windows IE and Firefox.

However, in OS X Safari, although the UI/communications themselves work
fine, if the characters getting sent back and forth are in Japanese they
come back from the server "moji bake" (corrupted).

Anybody have any ideas why this might work differently in Safari than in
Firefox or IE?

Thanks!

doug
 
M

Martin Honnen

Doug Lerner wrote:

However, in OS X Safari, although the UI/communications themselves work
fine, if the characters getting sent back and forth are in Japanese they
come back from the server "moji bake" (corrupted).

Consider to use UTF-8 encoded Unicode characters and make sure the
server sends a HTTP header declaring e.g.
Content-Type: application/xml; charset=UTF-8
or if you send plain text then e.g.
Content-Type: text/plain; charset=UTF-8
I assume you use XMLHttpRequest to send and receive data, with the above
both responseXML or responseText should hopefully work correctly
(although I am not familiar with details of the Safari implementation).
 
D

Doug Lerner

Doug Lerner wrote:



Consider to use UTF-8 encoded Unicode characters and make sure the
server sends a HTTP header declaring e.g.
Content-Type: application/xml; charset=UTF-8
or if you send plain text then e.g.
Content-Type: text/plain; charset=UTF-8
I assume you use XMLHttpRequest to send and receive data, with the above
both responseXML or responseText should hopefully work correctly
(although I am not familiar with details of the Safari implementation).

Thanks for your response.

In my case, the return data is not being used to replace an entire HTML
page, it is being used to set the innerHTML for a <div> tag. So sending the
HTTP header wouldn't be appropriate in this case, would it?

The charset for the entire page itself is correct already.

Any thoughts about this?

Thanks!

doug
 
V

VK

Doug said:
Thanks for your response.

In my case, the return data is not being used to replace an entire HTML
page, it is being used to set the innerHTML for a <div> tag. So sending the
HTTP header wouldn't be appropriate in this case, would it?

The charset for the entire page itself is correct already.

Any thoughts about this?

AFAIK browser page is a "single encoding" unit therefore you cannot
display one paragraph in iso-8859-1 and another in say Shift_JIS. This
is why Unicode became originally needed. Did you try to have the entire
page served in UTF-8 including any further server interchange?
 
M

Martin Honnen

Doug Lerner wrote:

In my case, the return data is not being used to replace an entire HTML
page, it is being used to set the innerHTML for a <div> tag. So sending the
HTTP header wouldn't be appropriate in this case, would it?

The charset for the entire page itself is correct already.

If you use XMLHttpRequest and the responseText property then the client
receiving the response needs to know the charset/encoding of the
response to build responseText correctly. At least in theory, MSXML used
by IE does not care about HTTP headers when building responseText and
assumes UTF-8 so that is why I suggested to use UTF-8 and to make that
known with the HTTP header as that way MSXML and Mozilla and Opera and
hopefully Safari should then give you responseText with any characters
Unicode can encode, be that Japanese or whatever else.
 
D

Doug Lerner

AFAIK browser page is a "single encoding" unit therefore you cannot
display one paragraph in iso-8859-1 and another in say Shift_JIS. This
is why Unicode became originally needed. Did you try to have the entire
page served in UTF-8 including any further server interchange?

Yes, I tried that. Something just seems "different" about the way it is
working in Safari. With Firefox or IE, it seems to work fine, even with the
Shift_JIS charset.

But with Safari the Japanese seems to get corrupted. I think it *is* sending
it to the server in UTF-8. That is, if I log the received data on the server
side and examine it it appears to be UTF-8 data. But when I send it back to
the browser, even if the browser's charset is UTF-8 it shows up looking
corrupted.

doug
 
D

Doug Lerner

Doug Lerner wrote:



If you use XMLHttpRequest and the responseText property then the client
receiving the response needs to know the charset/encoding of the
response to build responseText correctly. At least in theory, MSXML used
by IE does not care about HTTP headers when building responseText and
assumes UTF-8 so that is why I suggested to use UTF-8 and to make that
known with the HTTP header as that way MSXML and Mozilla and Opera and
hopefully Safari should then give you responseText with any characters
Unicode can encode, be that Japanese or whatever else.

Thanks for your note. As mentioned in my note, the charset for the entire
page being served was correct. And I tried UTF-8 instead, but that didn't
seem to help.

It displays fine with all browsers except for Safari. I'm sort of stumped.

doug
 
M

Martin Honnen

Doug Lerner wrote:

Thanks for your note. As mentioned in my note, the charset for the entire
page being served was correct.

I am not sure I understand what an entire page is or why that matters or
what you are doing exactly, no wonder as you have not posted any code so
far.
If you serve one HTML document with script and that script makes further
requests to the server using XMLHttpRequest then the server needs to
make sure it sends proper HTTP response headers (with Content-Type and
charset parameter) in its response to any request the script makes if
the browser's XMLHttpRequest implementation should have a chance to
build responseText properly. It does not matter if you do not consider
or indeed such a response is not a complete HTML document but if a HTTP
response is processed then to build a responseText string you need to
know the charset to properly decode the bytes in the response body into
the responseText string.
 
T

Thomas 'PointedEars' Lahn

VK said:
AFAIK browser page is a "single encoding" unit
True.

therefore you cannot display one paragraph in iso-8859-1 and another in
say Shift_JIS.

False. Character references and character entity references are there ever
since to workaround this issue. For example, it is perfectly reasonable,
possible and Valid to use the hexadecimal byte sequence 26 23 38 32 31 31
3B 0A, this represents the decimal character reference `–', for
displaying the Unicode character named "EN DASH" (U+2013) in a HTML
resource encoded with ISO-8859-1 and served as such. In fact, the same
byte sequence could _not_ result in that character reference if the
resource was encoded with a UTF and served as such.
This is why Unicode became originally needed.

Not quite. The reason for creating the Unicode standard and subsequently
the Unicode character set was that _one_ standard and _one_ character set
for all characters was needed, so that one encoding would suffice for all
characters and all textual resources, not only SGML-conforming ones such
as HTML documents, and that the latter then could contain the characters
as they are, without any character reference or character entity reference
which would save bandwidth and disk space.
Did you try to have the entire page served in UTF-8 including any further
server interchange?

The encoding of an SGML-conforming markup resource, that is, how the
character data of a resource is encoded, does not have any impact on
the characters that can be displayed with it, that is, what character
sets can be used to display that data.

Despite its name, the "charset" label of the HTTP Content-Type header
specifies the _encoding_ of the served resource, not necessarily the
character set(s) that is/are to be used to display it; the confusing
name of the label is because of the history of MIME to which HTTP had
to adhere.


HTH

PointedEars
 
T

Thomas 'PointedEars' Lahn

Doug said:
I'm working on a client/server app that seems to work fine in OS Firefox
and Windows IE and Firefox.

However, in OS X Safari, although the UI/communications themselves work
fine, if the characters getting sent back and forth are in Japanese they
come back from the server "moji bake" (corrupted).

Anybody have any ideas why this might work differently in Safari than in
Firefox or IE?

Do not follow the suggestions to serve a resource as UTF-8-encoded
if it is not UTF-8-encoded; that would be harmful.

Check the Content-Type header of any related response for the "charset"
label; make sure it is present and specifies the encoding the served
resource is actually encoded with. Check the response body of any
related (X)HTML resource for

<meta http-equiv="Content-Type" content="text/html; charset=..." ...>

and if found, correct it so that it conforms with the label in the
Content-Type header it is served with.

Do not serve XHTML with Content-Type: text/html.

If you send data, make sure that it is send x-www-url-encoded; if
you do not encode it yourself, make sure that Safari is able to
percent-encode those Japanese characters properly according to
x-www-url-encoded when submitted.

If that does not help and, as the Subject and your first paragraph
suggests (however, you should have mentioned "Prototype" once in
the message body -- not all people [can] read the Subject), you
use prototype.js to submit the data via XMLHTTP, post the URI of a
test case so that one can determine what you might be doing wrong.

Ceterum censeo prototype.js esse deletam.


PointedEars
 
D

Doug Lerner

Doug Lerner wrote:



I am not sure I understand what an entire page is or why that matters or
what you are doing exactly, no wonder as you have not posted any code so
far.
If you serve one HTML document with script and that script makes further
requests to the server using XMLHttpRequest then the server needs to
make sure it sends proper HTTP response headers (with Content-Type and
charset parameter) in its response to any request the script makes if
the browser's XMLHttpRequest implementation should have a chance to
build responseText properly. It does not matter if you do not consider
or indeed such a response is not a complete HTML document but if a HTTP
response is processed then to build a responseText string you need to
know the charset to properly decode the bytes in the response body into
the responseText string.

Let me ask you more about this, because I am obviously confused about this
point.

I am serving one page, that starts with <html> and has the correct charset
header in it.

The XMLHttpRequest is returning text which I am using to set the .innerHTML
of a div section with.

Where in this text would it make sense to store the charset of just the
snippet I am returning from the server.

Thanks!

doug
 
D

Doug Lerner

Corrupted in what way?
1) Latin ASCII chars ?
2) Unicode "missing gliph" chars ? (empty squares)
3) Japanese gliphs but not the needed one ?

Hard to describe in words, so here are some screenshots:

Correct (with Firefox): http://lerner.net/doug/jptext1.jpg

Corrupted (with Safari): http://lerner.net/doug/jptext2.jpg

Another note, if I look at the data logged at the server side, I can see it
correctly there even in Safari if I force the browser to UTF-8 encoding. So
it does seem that the data that gets sent through it somehow being converted
or something to UTF-8.

But forcing the client app I am working on to see the data coming back at
UTF-8 doesn't help.

Thanks,

doug
 
T

Thomas 'PointedEars' Lahn

Doug said:
I am serving one page, that starts with <html> and has the correct
charset header in it.
^^^^^
What you call a "correct charset header" here is (due to the description
you provide of it) _not_ was is called a header in Internet messages at
all, but merely a HTML meta[http-equiv] element. This element, especially
with the `http-equiv' attribute value `Content-Type' (case-insensitive),
MUST be _ignored_ by a compliant HTTP/1.1 client if the respective HTTP
header was already sent by the HTTP server. (RFC2616, 3.4.1.)

Furthermore, a "page", which should be in fact a Valid HTML document, MUST
NOT start with `<html>' (RFC2854 "The 'text/html' Media Type", section 5,
explains that 'Almost all HTML files have the string "<html" or "<HTML"
_near the beginning_ of the file'.) HTML is an SGML application, therefore
a DOCTYPE declaration is required for a Valid HTML document prior to the
root element (here: html). This is explained in RFC2854, and specified in
the HTML 3.2, HTML 4.01 and ISO HTML specifications (where the latter is a
standardized version of HTML 4.01 Strict; HTML versions prior to 3.2, such
as HTML 2.0, are obsoleted by RFC2854.)

The XMLHttpRequest is returning text which I am using to set the
.innerHTML of a div section with.

It matters how the Japanese glyphs in the retrieved hypertext snippet are
referenced or encoded. If only character references or character entity
references are used therein, then there should not be a problem even if the
target document, that is, the document containing the `div' element, has a
different encoding. However, if the encoding of the target document and
the retrieved data differ and characters are not referred to as described,
the hypertext data, when included into the target document, is very likely
to be displayed garbled.
Where in this text would it make sense to store the charset of just the
snippet I am returning from the server.

Nowhere _in_ the text, that is, the message body. You should serve either
resource with the appropriate Content-Type _HTTP (message) header_ and
"charset" label as I suggested in my other followup. Due to the confusion
you display here about headers, it is probably a good idea if you learned
more about how HTTP works before; reading RFC2616 would be a good start.


Regards,
PointedEars
 
T

Thomas 'PointedEars' Lahn

Doug said:
I am serving one page, that starts with <html> and has the correct
charset header in it.
^^^^^
What you call a "correct charset header" here, is (due to the description
you provide of it) _not_ what is called a header in Internet messages at
all, but merely a HTML meta[http-equiv] element. This element, especially
with the `http-equiv' attribute value `Content-Type' (case-insensitive),
MUST be _ignored_ by a compliant HTTP/1.1 client if the respective HTTP
header was already sent by the HTTP server. (RFC2616, 3.4.1.)

Furthermore, a "page", which should be in fact a Valid HTML document, MUST
NOT start with `<html>' (RFC2854 "The 'text/html' Media Type", section 5,
explains that 'Almost all HTML files have the string "<html" or "<HTML"
_near the beginning_ of the file'.) HTML is an SGML application, therefore
a DOCTYPE declaration is required for a Valid HTML document prior to the
root element (here: html). This is explained in RFC2854, and specified in
the HTML 3.2, HTML 4.01 and ISO HTML specifications (where the latter is a
standardized version of HTML 4.01 Strict; HTML versions prior to 3.2, such
as HTML 2.0, are obsoleted by RFC2854.)

The XMLHttpRequest is returning text which I am using to set the
.innerHTML of a div section with.

It matters how the Japanese glyphs in the retrieved hypertext snippet are
referenced or encoded. If only character references or character entity
references are used therein, then there should not be a problem even if the
target document, that is, the document containing the `div' element, has a
different encoding. However, if the encoding of the target document and
the retrieved data differ and characters are not referred to as described,
the hypertext data, when included into the target document, is very likely
to be displayed garbled.
Where in this text would it make sense to store the charset of just the
snippet I am returning from the server.

Nowhere _in_ the text, that is, the message body. You should serve either
resource with the appropriate Content-Type _HTTP (message) header_ and
"charset" label as I suggested in my other followup. Due to the confusion
you display here about headers, it is probably a good idea if you learned
more about how HTTP works before; reading RFC2616 would be a good start.


Regards,
PointedEars
 
D

Doug Lerner

Nowhere _in_ the text, that is, the message body. You should serve either
resource with the appropriate Content-Type _HTTP (message) header_ and
"charset" label as I suggested in my other followup. Due to the confusion
you display here about headers, it is probably a good idea if you learned
more about how HTTP works before; reading RFC2616 would be a good start.

I do know the difference between HTTP headers and the metatag charset
header. What I am not getting is what you are saying about how setting the
HTTP header for the server response will help in this case.

doug
 
T

Thomas 'PointedEars' Lahn

Doug said:
I do know the difference between HTTP headers and the metatag charset
header.

Again, the latter is _not_ a header at all, and certainly not a "metatag
charset header".
What I am not getting is what you are saying about how setting the
HTTP header for the server response will help in this case.

Why, data should always be served as it is actually encoded. Until you
provide a test case (and not just screenshots), there is nothing more that
can be said about this.


PointedEars
 
D

Doug Lerner

Why, data should always be served as it is actually encoded. Until you
provide a test case (and not just screenshots), there is nothing more that
can be said about this.

I can't provide a test case that you can actually use for testing unless you
had the same client/server setup I have, which is unlikely...

I understand what you are saying about data always being needing to be
served as it is encoded. And I also assume that there is an encoding/serving
mismatch here that is causing the problem. I am just not gleaning from what
you are writing how I might resolve the issue.

I do appreciate your taking the time and effort to respond, I am just not
following the suggestion.

In summary, once again, after I send the Japanese data to the server I can
log it there and if I examine the log via Safari I can see the data
correctly if I set the browser to force the character encoding to UTF-8.

So I am assuming that for some reason when sent via Safari the data is
turning into UTF-8. Or maybe that is happening when sent via all the
browsers, but the other browsers are just clever enough to compensate
regardless of the charset heading in the browser.

However, when the data comes back from the server I can't do the same trick
in Safari and force the page to reload with a UTF-8 character encoding to
see the characters correctly.

So... maybe something is happening to the characters on the way back to
Safari. Again, Firefox and IE are able to compensate for whatever is
happening.

But I don't know exactly what is happening, nor how to compensate for it in
Safari.

I do recognize that this is an encoding issue. I was just wondering if
somebody had some advice about how I might attack the problem.

Thanks!

doug
 
T

Thomas 'PointedEars' Lahn

Doug said:
I can't provide a test case that you can actually use for testing unless
you had the same client/server setup I have, which is unlikely...

You could at least provide for a test case, so that a KHTML user (like me)
could check if the encoding you use is the correct one and what might cause
the data to be transmitted garbled from your resource.

(I am sorry if this reads impatient, but I am quite tired and you provided
not much helpful information that could enable me to help you.)
In summary, once again, after I send the Japanese data

How were they input? How were they submitted?
to the server I can log it there

How is the log created server-side?
and if I examine the log via Safari

Is it a plain text file or does it contain code of a markup language? Do
you access the resource directly or do you access another resource that
retrieves the content from the log file indirectly?
I can see the data correctly if I set the browser to force the character
encoding to UTF-8.

The question is how you serve this resource.
So I am assuming that for some reason when sent via Safari the data is
turning into UTF-8.

Maybe. Maybe not. Unfortunately, my crystal ball is on vacation now.
Or maybe that is happening when sent via all the browsers, but the other
browsers are just clever enough to compensate regardless of the charset
heading in the browser.

Yes, it is possible that Safari does not follow HTTP/1.1 in that regard.
Which is why I suggested you checked the encoding of all resources and
made the meta[http-equiv="Content-Type"] `content' attribute value the
same as the Content-Type HTTP header (provided that it is correct) or
just omit the former.
However, when the data comes back from the server

What do you mean with "comes back from the server"?
I can't do the same trick in Safari and force the page to reload with a
UTF-8 character encoding to see the characters correctly.

Use <URL:http://livehttpheaders.mozdev.org/> to find out if you submitted
and served the correct headers, provided that you have no browser switch
server-side.
So... maybe something is happening to the characters on the way back to
Safari. Again, Firefox and IE are able to compensate for whatever is
happening.

Have you tried with Konqueror which is using KHTML and KJS, too?


Goodnight,
PointedEars
 
D

Doug Lerner

This is the standard (if one can use such term in such situation) Latin
characadabra - where there are more Latin chars than original Japanese
ones. It means that browser refuses to interprete combo-chars (two or
more bytes) as one char and read them separately. I *can be deeply
wrong* but seems unrelated to JavaScript. Are you sure that your Safary
supports Japanese?
<http://redcocoon.org/cab/mysoft.html#sysanchor>
Can you view any Japanese sites? Try say <http://www.asahi.co.jp/>

It absolutely supports Japanese. I used it every day at Japanese sites -
including my own! :)

doug
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top