UTF-8 Japanese and IE 8

A

Andrew Poulos

It seems that if you directly add Japanese ideograms to a web page, set
the charset to utf-8 then "all" browsers, except for IE 8, display the
ideograms correctly.

Adding this line to the page's head will allow the ideograms to display:
<meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7">
but I'd rather not have to run pages in compatibility mode.

Is there a way to get IE 8 to display the ideograms correctly without
having to dynamically change each ideogram to its unicode equivalent?

I've also noted that Google search result pages don't display ideograms
in IE 8.

Andrew Poulos
 
D

David Mark

It seems that if you directly add Japanese ideograms to a web page, set
the charset to utf-8 then "all" browsers, except for IE 8, display the
ideograms correctly.

Adding this line to the page's head will allow the ideograms to display:
   <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7">
but I'd rather not have to run pages in compatibility mode.

Is there a way to get IE 8 to display the ideograms correctly without
having to dynamically change each ideogram to its unicode equivalent?

I've also noted that Google search result pages don't display ideograms
in IE 8.

Could be a bug in IE8. Hard to say without seeing your markup and
headers.
 
A

Andrew Poulos

David said:
Could be a bug in IE8. Hard to say without seeing your markup and
headers.

Ok (I'm not sure how this will display in a newsgroup posting)
- the Japanese part of the page's title displays as squares
- the manager's name appears as a grouping of 4 little horizontal lines.

In non IE browsers all displays as expected

In IE under 8 all displays as expected.

In IE 8 set to 'compatibility view' all displays as expected.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>セーラ ブランム- Contact</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<meta http-equiv="Content-Style" content="text/css">
<meta name="description" content="セーラ ブランãƒã€€ã‚ªãƒ•ã‚£ã‚·">
<meta name="keywords" content="オーストラリ">
</head>
<body>
<p>Manager: Kimi Funasaka<br>船å‚ 公紀</p
</body>
</html>


Andrew Poulos
 
T

Thomas 'PointedEars' Lahn

Andrew said:
David said:
It seems that if you directly add Japanese ideograms to a web page, set
the charset to utf-8 then "all" browsers, except for IE 8, display the
ideograms correctly.
[...]
Could be a bug in IE8. Hard to say without seeing your markup and
headers.

Ok (I'm not sure how this will display in a newsgroup posting)

As you no doubt have discovered shortly after, (y)our Thunderbird/Icedove
suggests UTF-8 if the typed characters cannot be displayed in your preferred
encoding :)
- the Japanese part of the page's title displays as squares
- the manager's name appears as a grouping of 4 little horizontal lines.

In non IE browsers all displays as expected

In IE under 8 all displays as expected.

In IE 8 set to 'compatibility view' all displays as expected.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>セーラ ブランム- Contact</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

May I presume the HTTP Content-Type header looks the same?
<meta http-equiv="Content-Style" content="text/css">

Should be

<meta name="description" content="セーラ ブランãƒã€€ã‚ªãƒ•ã‚£ã‚·">
<meta name="keywords" content="オーストラリ">
</head>
<body>

Looks reasonably Valid otherwise.
<p>Manager: Kimi Funasaka<br>船å‚ 公紀</p

Is this a copy-paste error or have you actually forgotten to end the P
element? If the latter, your markup is not Valid.
</body>
</html>


PointedEars
 
A

Andrew Poulos

Thomas said:
Andrew said:
David said:
It seems that if you directly add Japanese ideograms to a web page, set
the charset to utf-8 then "all" browsers, except for IE 8, display the
ideograms correctly.
[...]
Could be a bug in IE8. Hard to say without seeing your markup and
headers.
Ok (I'm not sure how this will display in a newsgroup posting)

As you no doubt have discovered shortly after, (y)our Thunderbird/Icedove
suggests UTF-8 if the typed characters cannot be displayed in your preferred
encoding :)
- the Japanese part of the page's title displays as squares
- the manager's name appears as a grouping of 4 little horizontal lines.

In non IE browsers all displays as expected

In IE under 8 all displays as expected.

In IE 8 set to 'compatibility view' all displays as expected.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>セーラ ブランム- Contact</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

May I presume the HTTP Content-Type header looks the same?

I don't know how to tell.
Should be

<meta http-equiv="Content-Style-Type" content="text/css">

Dang, I've been writing it wrong for a year or so.
Looks reasonably Valid otherwise.


Is this a copy-paste error or have you actually forgotten to end the P
element? If the latter, your markup is not Valid.

Yes its a copy-paste error.

I checked www.sony.jp in IE 8 and many of the ideograms there also don't
display correctly. I'm guessing MS may have botched it.

Andrew Poulos
 
D

David Mark

Thomas said:
Andrew said:
David Mark wrote:
It seems that if you directly add Japanese ideograms to a web page, set
the charset to utf-8 then "all" browsers, except for IE 8, display the
ideograms correctly.
[...]
Could be a bug in IE8.  Hard to say without seeing your markup and
headers.
Ok (I'm not sure how this will display in a newsgroup posting)
As you no doubt have discovered shortly after, (y)our Thunderbird/Icedove
suggests UTF-8 if the typed characters cannot be displayed in your preferred
encoding :)
May I presume the HTTP Content-Type header looks the same?

I don't know how to tell.
Should be
  <meta http-equiv="Content-Style-Type" content="text/css">

Dang, I've been writing it wrong for a year or so.

That is unfortunate as you didn't need to write it at all (doesn't do
anything.)
 
T

Thomas 'PointedEars' Lahn

David said:
That is unfortunate as you didn't need to write it at all (doesn't do
anything.)

It declares the default stylesheet language as used in `style' attribute
values. RTFM.


PointedEars
 
D

David Mark

It declares the default stylesheet language as used in `style' attribute
values.  RTFM.

I know what it is supposed to do. It is just that there are no other
style sheet languages, so browsers obviously ignore it. Just a waste
of characters for now and the foreseeable future.
 
T

Thomas 'PointedEars' Lahn

Andrew said:
I don't know how to tell.

Search for "HTTP sniffer", or get an OS (even Cygwin will suffice) and run
`HEAD http://my.server.example/path' (should be in the `libwww-perl'
package). Or use `telnet my.server.example 80', wait for the welcome
message and type `HEAD /path HTTP/1.0<CR<LF><CR><LF>'. Or, if you need to
do it "the JavaScript way":

// for cross-browser tests, use a wrapper method here instead
var x = new XMLHttpRequest();

x.open("GET", document.URL, true);

x.onreadystatechange = function() {
if (x.readyState == 4 && x.status == 200)
{
window.alert(x.getResponseHeader("Content-Type"));
}
};

x.send(null);


HTH

PointedEars
 
T

Thomas 'PointedEars' Lahn

David said:
I know what it is supposed to do. It is just that there are no other
style sheet languages,

What about DSSSL and XSL?
so browsers obviously ignore it.

That's an unprovable statement and a non sequitur, though.
Just a waste of characters for now and the foreseeable future.

No.


PointedEars
 
T

Thomas 'PointedEars' Lahn

kangax said:
Thomas said:
Andrew said:
Thomas 'PointedEars' Lahn wrote:
Andrew Poulos wrote:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
May I presume the HTTP Content-Type header looks the same?
I don't know how to tell.
Search for "HTTP sniffer", or get an OS (even Cygwin will suffice) and run
`HEAD http://my.server.example/path' (should be in the `libwww-perl'
package). Or use `telnet my.server.example 80', wait for the welcome
message and type `HEAD /path HTTP/1.0<CR<LF><CR><LF>'. Or, if you need to
do it "the JavaScript way":
[XHR]
IIRC, Firebug (at least, recent versions) also displays response headers
in a "Net" tab under "Headers" tab of a single request (after collapsing
it). It shows both request and response ones.

But that would be so boring ... ;-)


PointedEars
 
E

Eric Bednarz

David Mark said:
[Content-Style-Type]

Yes.

So what you are saying is that despite you vague knowledge of the most
applicable spec that requires it, you preferred to test the behaviour in
a handful of user agents in their default configuration and decided that
it would be better to do more and write less.

I have this terrible feeling of déjà vu. :)
 
D

David Mark

David Mark said:
Thomas 'PointedEars' Lahn wrote:
[Content-Style-Type]
It declares the default stylesheet language as used in `style' attribute
values.  RTFM.
Just a waste of characters for now and the foreseeable future.
No.

So what you are saying is that despite you vague knowledge of the most
applicable spec that requires it, you preferred to test the behaviour in
a handful of user agents in their default configuration and decided that
it would be better to do more and write less.

LOL. I don't "write" HTML, so the number of characters was only a
reference to wasting bandwidth. As for comparing it to using jQuery
or the like. I think you are reaching. I'm not saying that that META
tag is harmful or not standard or even a bad idea. Just a waste at
the moment IMO.
I have this terrible feeling of déjà vu. :)

It's not that terrible.
 
E

Eric Bednarz

David Mark said:
[…] I'm not saying that that META
tag is harmful or not standard or even a bad idea.

Au contraire, you were saying that *omitting* it[0] would *not* be a bad
idea. If that’s not based on random empirical evidence, please share
(I’m not arguing that statement itself, really; you just cannot seem to
find a balance between being reasonably pragmatic about some subjects
and unreasonably unpragmatic about others).

[0] just two remarks:
a) there are only two *tags* in HTML (three in XHTML, that’s why it is
clearly superior :)
b) this could as well if not better be done with HTTP headers instead of
*an* META element (HTML 4.01 literally babbles about “the META
element†where it really should say ‘the META element type’, so
anybody is excused)
 
D

David Mark

David Mark said:
[…] I'm not saying that that META
tag is harmful or not standard or even a bad idea.

Au contraire, you were saying that *omitting* it[0] would *not* be a bad
idea. If that’s not based on random empirical evidence, please share
(I’m not arguing that statement itself, really; you just cannot seem to
find a balance between being reasonably pragmatic about some subjects
and unreasonably unpragmatic about others).

[0] just two remarks:
a) there are only two *tags* in HTML (three in XHTML, that’s why it is

"Meta tag" is a euphemism for META element and looks like I combined
the two in haste.
   clearly superior :)
b) this could as well if not better be done with HTTP headers instead of
   *an* META element (HTML 4.01 literally babbles about “the META
   element” where it really should say ‘the META element type’,so
   anybody is excused)

Yes, as with any http-equiv type. Another reason why the element
might be ignored. If you are saying that it is unrealistic to exclude
this element, I disagree. If you think it is unrealistic to exclude
the header, I still disagree, but less strongly. I don't see it.

<meta http-equiv="Content-Style-Type" content="text/css">
 
A

Andrew Poulos

Thomas said:
Search for "HTTP sniffer", or get an OS (even Cygwin will suffice) and run
`HEAD http://my.server.example/path' (should be in the `libwww-perl'
package). Or use `telnet my.server.example 80', wait for the welcome
message and type `HEAD /path HTTP/1.0<CR<LF><CR><LF>'. Or, if you need to
do it "the JavaScript way":

// for cross-browser tests, use a wrapper method here instead
var x = new XMLHttpRequest();

x.open("GET", document.URL, true);

x.onreadystatechange = function() {
if (x.readyState == 4 && x.status == 200)
{
window.alert(x.getResponseHeader("Content-Type"));
}
};

x.send(null);


HTH

Yes, thanks.

The alert reads "text/html".


IE 8 properties dialog (File > Properties) shows the ideograms correctly
in its title.

IE 8 says the encoding is Unicode (utf-8).


Response headers via the Web Developer add-on in FF 3 reads:

Content-Length: 816
Content-Type: text/html
Last-Modified: Thu, 09 Apr 2009 03:51:23 GMT
Accept-Ranges: bytes
Etag: "c69d3773c6b8c91:33c001"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Thu, 09 Apr 2009 04:00:37 GMT
Set-Cookie: server=1

200 OK

The meta tag info is:

Name Content
Content-Type text/html;charset=utf-8
Content-Style-Type text/css
description セーラ ブランãƒã€€ã‚ªãƒ•ã‚£ã‚·
keywords オーストラリ

The page info says the render mode is: standards compliance mode.


Surely MS haven't broken every web page that uses unicode???

Andrew Poulos
 
T

Thomas 'PointedEars' Lahn

Andrew said:
IE 8 properties dialog (File > Properties) shows the ideograms correctly
in its title.

IE 8 says the encoding is Unicode (utf-8).

Response headers via the Web Developer add-on in FF 3 reads:

[...]
Content-Type: text/html
[...]
Server: Microsoft-IIS/6.0
[...]

The meta tag info is:

Name Content
Content-Type text/html;charset=utf-8
[...]

The page info says the render mode is: standards compliance mode.

Surely MS haven't broken every web page that uses unicode???

ISTM that MS(HTML 8.0) is correct (to a certain extent). The HTML 4.01
Specification mandates that if the HTTP Content-Type header is present, it
takes precedence over the `meta' element information.[1] HTTP 1.0 (RFC
1945) and HTTP 1.1 (RFC 2616) define that the default "charset" value for
resources of the "text" type is ISO-8859-1 when received via HTTP (as
pointed out recently).[2][3] However, newer versions of MSHTML are known to
use UTF-7 as the default encoding.[4]

[1] <http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2>
[2] <http://tools.ietf.org/html/rfc1945#section-3.6.1>
[3] <http://tools.ietf.org/html/rfc2616#section-3.7.1>
[4] [de] <http://schneegans.de/web/ie-utf-7/>

So if you declared the character encoding in the HTTP Content-Type header,
it should work. I don't know how to do it on IIS (since I'm preferring
Apache), but RTFM should help.


PointedEars
 
A

Andrew Poulos

Thomas said:
Andrew said:
IE 8 properties dialog (File > Properties) shows the ideograms correctly
in its title.

IE 8 says the encoding is Unicode (utf-8).

Response headers via the Web Developer add-on in FF 3 reads:

[...]
Content-Type: text/html
[...]
Server: Microsoft-IIS/6.0
[...]

The meta tag info is:

Name Content
Content-Type text/html;charset=utf-8
[...]

The page info says the render mode is: standards compliance mode.

I found a Vista box with IE 8 and the ideograms show correctly. Neither
my XP SP3 box nor the Vista box have East Asian Languages installed.
I've installed the latest Windows updates on the XP box.

I'll need to test more.

Andrew Poulos
 
T

Thomas 'PointedEars' Lahn

Andrew said:
I found a Vista box with IE 8 and the ideograms show correctly. Neither
my XP SP3 box nor the Vista box have East Asian Languages installed.

AFAIK, you only need Support for East Asian Languages if you want to *type*
Han characters with an IME (Input Method Editor); you don't need it for
displaying those characters, especially not on Windows XP, where all vector
fonts should be OpenType fonts.
I've installed the latest Windows updates on the XP box.
Hm.

I'll need to test more.

Good idea.


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top