Japanese characters in TITLE element

E

Erwin Moller

Hello,

I am currently creating a multi language website, including the Japanese
language (which I do not understand at all).
A little background:

- Server sends headers for content-type UTF-8:
Content-Type: text/html; charset=UTF-8

- Doctype html4 strict.
The beginning of the document looks like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
....etc

- Database (Postgres) stores all texts in UTF8.
- Scriptinglanguage: PHP5.2
- Apache 2 webserver

The problem I have is with using Japanese characters in the title of the
document.
Everywhere else on the page all Japanese characters appear just fine,
except in the title. They ALL show up as squares in the title of the
browser, meaning the browser cannot display them.

I checked with w3.org on the title element, here:
http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.2

where is written:
===================
Titles may contain character entities (for accented characters, special
characters, etc.), but may not contain other markup (including comments).
===================

Does this mean I cannot use Japanese characters in the title element?
I am not sure about character entities in Japanese. I thought they were
just there for charactersets that lack certain characters, and unicode
SHOULD include them all. But maybe I am wrong. Unicode can be very
confusing for simple PHP programmers. ;-)

Can anybody help me with this?

Regards,
Erwin Moller
 
P

pacal

Op 4-4-2011 13:27, Erwin Moller schreef:
Hello,

I am currently creating a multi language website, including the Japanese
language (which I do not understand at all).
A little background:

- Server sends headers for content-type UTF-8:
Content-Type: text/html; charset=UTF-8

- Doctype html4 strict.
The beginning of the document looks like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
...etc

- Database (Postgres) stores all texts in UTF8.
- Scriptinglanguage: PHP5.2
- Apache 2 webserver

The problem I have is with using Japanese characters in the title of the
document.
Everywhere else on the page all Japanese characters appear just fine,
except in the title. They ALL show up as squares in the title of the
browser, meaning the browser cannot display them.

I checked with w3.org on the title element, here:
http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.2

where is written:
===================
Titles may contain character entities (for accented characters, special
characters, etc.), but may not contain other markup (including comments).
===================

Does this mean I cannot use Japanese characters in the title element?
I am not sure about character entities in Japanese. I thought they were
just there for charactersets that lack certain characters, and unicode
SHOULD include them all. But maybe I am wrong. Unicode can be very
confusing for simple PHP programmers. ;-)

Can anybody help me with this?

Regards,
Erwin Moller
erwin

i just went to sony.jp and china.com and got the same result
just a thought,,,, do you have to use a japanese or chinees version of a
brouwser to get the expected result ????

pacal
 
J

Jukka K. Korpela

pacal said:
i just went to sony.jp and china.com and got the same result

What result? You seem to have fullquoted a rather clueless post which lacked
all the essential information, such as a URL and browser names and versions
and what actually happened.
 
N

Neil Gould

Jukka said:
What result? You seem to have fullquoted a rather clueless post which
lacked all the essential information, such as a URL and browser names
and versions and what actually happened.
Having a bad day, Jukka? ;-)

The OP wrote about this result:
"Everywhere else on the page all Japanese characters appear just fine,
except in the title. They ALL show up as squares in the title of the
browser, meaning the browser cannot display them."

This implies to me that the issue is how his browser is presenting the
information in the title area, so a URL would not be very useful to anyone
else. Pacal's approach to go to a site that is likely to have Japanese
characters seems reasonable to me, and confirmation that there are squares
in the title reinforces the notion that this is the behavior of the
individual's system/browser.

I suspect that the display in the title area of a browser is determined by
the language character set in use on the person's computer, and therefore
extended characters would not display as expected.
 
S

Stanimir Stamenkov

Mon, 04 Apr 2011 13:27:51 +0200, /Erwin Moller/:
The problem I have is with using Japanese characters in the title of
the document.
Everywhere else on the page all Japanese characters appear just
fine, except in the title. They ALL show up as squares in the title
of the browser, meaning the browser cannot display them.

No, it doesn't mean the browser cannot display them (as you've
already pointed out you see them just fine in the page content). It
means the OS GUI can't display them in the window title given your
current setup. Various factors as font availability (fonts
containing the necessary characters) and font substitution may
affect the observed results. If you're on Windows XP, I guess you
need to install the East Asian language support files:

http://www.microsoft.com/resources/.../proddocs/en-us/int_pr_install_languages.mspx
 
J

Jukka K. Korpela

Neil said:
Having a bad day, Jukka? ;-)

Huh? That was written while in _good_ mood.
The OP wrote about this result:
"Everywhere else on the page all Japanese characters appear just fine,
except in the title. They ALL show up as squares in the title of the
browser, meaning the browser cannot display them."

Thereby not giving any URL, browser name or version, and giving a wrong
accoung of what actually happened. I bet the squares weren't squares. And
the correct word "rectangles" would still be insufficient description.
This implies to me that the issue is how his browser is presenting the
information in the title area, so a URL would not be very useful to
anyone else.

We have no information on whether the problem occurred for the OP's own
page(s) only.
Pacal's approach to go to a site that is likely to have
Japanese characters seems reasonable to me, and confirmation that
there are squares in the title reinforces the notion that this is the
behavior of the individual's system/browser.

Are you saying that Pacal and the OP are using the same system/browser?
I suspect that the display in the title area of a browser is
determined by the language character set in use on the person's
computer, and therefore extended characters would not display as
expected.

What "extended characters"? What "language character set"?

The rendering of the title element's content is different from the rendering
of body content. It is affected by many things, including the browser, the
system, and the system settings. This, however, is rather irrelevant to
_authoring_ issues. The URL would be most relevant to knowing what the
_author_ has done. Only a snippet of code was posted, and it didn't even
contain a title element.

If someone wants help with authoring HTML pages, he should post a URL. If
someone needs help with configuring one's system or browser (like changing
the font Windows uses for IE topbar text), well, then the system and browser
need to be identified, and the question probably belongs to some other
group.
 
D

dorayme

Neil Gould said:
I suspect that the display in the title area of a browser is determined by
the language character set in use on the person's computer, and therefore
extended characters would not display as expected.

My computer's language character set presumably stays the same
from moment to moment but my modern browsers show the correct
title at

http://www.sony.jp/

while some older ones do not.
 
N

Neil Gould

dorayme said:
My computer's language character set presumably stays the same
from moment to moment but my modern browsers show the correct
title at

http://www.sony.jp/

while some older ones do not.
On my screen, FF shows square boxes in the title area at the _top_ of the
browser (and on the system control bar), and shows Kanji characters on the
page tabs. This makes sense to me.
 
N

Neil Gould

Jukka said:
Huh? That was written while in _good_ mood.


Thereby not giving any URL, browser name or version, and giving a
wrong accoung of what actually happened. I bet the squares weren't
squares. And the correct word "rectangles" would still be
insufficient description.
Well, as I replied to dorayme, they're squares on the top bar of FF on my
system, too. The Kanji characters show on the tabs.
We have no information on whether the problem occurred for the OP's
own page(s) only.
That was settled by Pacal's check on the pages whose URL were in the reply.
Are you saying that Pacal and the OP are using the same
system/browser?
No, not that it would make a difference if they were using the same browser,
but if they were using the same character set on their system (i.e.
something that doesn't include Kanji), the results should be similar.
What "extended characters"? What "language character set"?
Really???

The rendering of the title element's content is different from the
rendering of body content. It is affected by many things, including
the browser, the system, and the system settings. This, however, is
rather irrelevant to _authoring_ issues. The URL would be most
relevant to knowing what the _author_ has done. Only a snippet of
code was posted, and it didn't even contain a title element.

If someone wants help with authoring HTML pages, he should post a
URL. If someone needs help with configuring one's system or browser
(like changing the font Windows uses for IE topbar text), well, then
the system and browser need to be identified, and the question
probably belongs to some other group.
As I read it, the question was a general one, and the answers provided may
suffice. I notice that the OP hasn't replied yet.
 
D

dorayme

Neil Gould said:
On my screen, FF shows square boxes in the title area at the _top_ of the
browser (and on the system control bar), and shows Kanji characters on the
page tabs. This makes sense to me.

It makes even more sense when it displays in both right. I just
fired up my Windows FF and yes, I get the same as you. But I was
talking my Mac browsers. And in these, except for old like Mac
IE5, the title is showing correctly as in the tabs. Mac, you see,
all quality. <g>
 
C

cwdjrxyz

dorayme said:
It makes even more sense when it displays in both right. I just
fired up my Windows FF and yes, I get the same as you. But I was
talking my Mac browsers. And in these, except for old like Mac
IE5, the title is showing correctly as in the tabs. Mac, you see,
all quality. <g>
At the moment, I have 8 browsers installed. At www.sony.com, all show
proper Japanese characters and a few English words. The OS is Windows
Vista 64-bit, and nearly all updates are installed. The recent
browsers are Firefox 3.6.16, IE9, Safari for Windows 5.0.4, Opera
11.01, Google Chrome 10.0.648.204, SeaMonkey 2.0.9, Flock 3.5.3.4641,
and K-meleon 1.5.4. I do not recall installing any extras on the OS or
the browsers that would modify or add support for Japanese, but it is
always possible that I missed reading something for an update that I
made. The mentioned Sony page looks nearly the same on all browsers
used - as close as you can expect for a variety of browsers.
 
J

Jonathan N. Little

dorayme said:
My computer's language character set presumably stays the same
from moment to moment but my modern browsers show the correct
title at

http://www.sony.jp/

while some older ones do not.

It is not the browser, but the OS. It fails as described in all browsers
on my Windows XP machine including FF4. But on Ubuntu, where FF is still
at 3.6.16 the Window title show the correct Kanji. I think the Window
title is mart of the chrome and therefore under the "auspices" of the OS
and not the browser.
 
D

dorayme

Jonathan N. Little said:
It is not the browser, but the OS.

Ignoring the seemingly knock-down argument I give (last part of
which was "while some older ones do not.")?
It fails as described in all browsers
on my Windows XP machine including FF4. But on Ubuntu, where FF is still
at 3.6.16 the Window title show the correct Kanji. I think the Window
title is mart of the chrome and therefore under the "auspices" of the OS
and not the browser.

Perhaps the truth of it is that the browser has to take advantage
of the OS, and in my evidence, the old MacIE5 did not?

On the other hand, I am running XP in VirtualBox on a Mac and in
IE8 ­ where it *is* at a loss in the title at the top and in the
bottom minimisation ­ it is just the squa... I mean rectangl ...
I mean Bengali Rupee Marks in both the title bar and tab bar. But
in Win FF in the tab bar, it gets it right!

So, taking the last bit of evidence about my Win FF, it is
getting it right in the tab, how did it do that? Never mind, it
got it right, right? So it should have done same for main title
bar and minimisation strip and this starts to look more like a
browser failing. WinIE fails completely, WinGoogleChromne too, FF
partly.

Looks as if FF has a slight consciousness, an inkling, the
beginning of a clue.

Anyway, that is a quick assessment and perhaps the issue is more
complicated.
 
J

Jonathan N. Little

dorayme said:
So, taking the last bit of evidence about my Win FF, it is
getting it right in the tab, how did it do that? Never mind, it
got it right, right? So it should have done same for main title
bar and minimisation strip and this starts to look more like a
browser failing. WinIE fails completely, WinGoogleChromne too, FF
partly.

I would say the tab is part of the browser UI, since FF "gets is right"
then the tab has the kanji displayed, but the window title is part of
the OS's UI, and Windows "gets is wrong". That is why the title does
work in Linux. The clue that the window title is part of the OS's UI is
in Linux where the UI is much easier to customize than Windows a change
in windows manager the change is evident in the window title (and menu),
but not in the tabs...
 
N

Neil Gould

dorayme said:
Ignoring the seemingly knock-down argument I give (last part of
which was "while some older ones do not.")?


Perhaps the truth of it is that the browser has to take advantage
of the OS, and in my evidence, the old MacIE5 did not?

On the other hand, I am running XP in VirtualBox on a Mac and in
IE8 ­ where it *is* at a loss in the title at the top and in the
bottom minimisation ­ it is just the squa... I mean rectangl ...
I mean Bengali Rupee Marks in both the title bar and tab bar. But
in Win FF in the tab bar, it gets it right!

So, taking the last bit of evidence about my Win FF, it is
getting it right in the tab, how did it do that?
Often, when programming with a Windows API, the window uses installed fonts
in the title area and in the minimization area. If a font is not available
on the system, those characters are not displayed. However, all content
_within_ the window is determined by the application, so, tabs and the other
content will display whatever is served to it, as long as the fonts are
available from the source.
Looks as if FF has a slight consciousness, an inkling, the
beginning of a clue.
So... here's food for thought: instead of rectangles (squares are rectangles
with equal length sides), IE6 displays extended character in the default
font in both locations if the Japanese font is not installed on the system,
and displays the proper characters if it is. So much for the "older"
browsers bit.
 
D

dorayme

Jonathan N. Little said:
I would say the tab is part of the browser UI, since FF "gets is right"
then the tab has the kanji displayed, but the window title is part of
the OS's UI, and Windows "gets is wrong". That is why the title does
work in Linux. The clue that the window title is part of the OS's UI is
in Linux where the UI is much easier to customize than Windows a change
in windows manager the change is evident in the window title (and menu),
but not in the tabs...

Come to think of it, you probably are right. But I liked butting
in on a thread about Japanese characters (never thought I could!
<g>).
 
J

Jukka K. Korpela

Neil said:
Often, when programming with a Windows API, the window uses installed
fonts in the title area and in the minimization area.

Well, it more or less has to use installed fonts. You probably mean it uses
fonts specified in some system-wide settings, typically the factory
settings, as few people know how to change such things (via the Control
panel) and feel a need to do that.

This may or may not apply to a web browser. Whether a browser shows the
title element content in the browser window's top bar ("the title area")
depends on the browser - some modern browsers are minimalistic in this
respect, too.
If a font is
not available on the system, those characters are not displayed.

That may well happen, as the rendering mechanism is simplistic, with no
fallback fonts.
However, all content _within_ the window is determined by the
application, so, tabs and the other content will display whatever is
served to it, as long as the fonts are available from the source.

"From the source"?? No. You should say "as long as a glyph for a character
is available in some of the fonts being used, according to rendering
mechanism of the browser". Older versions of IE are notorious for their
frequent inability of picking up a glyph from fallback fonts.
So... here's food for thought: instead of rectangles (squares are
rectangles with equal length sides), IE6 displays extended character
in the default font in both locations if the Japanese font is not
installed on the system, and displays the proper characters if it is.

Umm... whenever there's the word "font" in some discussion on web page
rendering, there's a fundamental confusion of concepts and phenomena. IE6,
the old monster, probably uses the Windows settings for the rendering of
various widgets, unless a theme has been selected. I do not think installing
a Japanese font, whatever that might mean, changes those settings.
 
N

Neil Gould

Jukka said:
Well, it more or less has to use installed fonts. You probably mean
it uses fonts specified in some system-wide settings, typically the
factory settings, as few people know how to change such things (via
the Control panel) and feel a need to do that.
I started out with that, then tested it with IE6, so I changed it to the
above because when the Kanji font is *installed* it displays in the title
area without any other changes to the OS.
This may or may not apply to a web browser. Whether a browser shows
the title element content in the browser window's top bar ("the title
area") depends on the browser - some modern browsers are minimalistic
in this respect, too.
Yes, and why I stated "_Often_", above. There is no requirement that the
windows be drawn with the default API, and apparently some browsers (e.g.
IE6) take "the longer route" and write their entire UI as a custom program.
That may well happen, as the rendering mechanism is simplistic, with
no fallback fonts.


"From the source"?? No. You should say "as long as a glyph for a
character is available in some of the fonts being used, according to
rendering mechanism of the browser". Older versions of IE are
notorious for their frequent inability of picking up a glyph from
fallback fonts.


Umm... whenever there's the word "font" in some discussion on web page
rendering, there's a fundamental confusion of concepts and phenomena.
IE6, the old monster, probably uses the Windows settings for the
rendering of various widgets, unless a theme has been selected. I do
not think installing a Japanese font, whatever that might mean,
changes those settings.
Well, again, I chose my wording based on the results of tests. Before Ithe
Kanji font was installed, the content _within_ the IE6 window displayed the
expected characters as far as I can tell (it displayed the same characters
as FF), but displayed extended characters of the Tahoma font (my choice) in
the title and mimization areas. To render the page, the Kanji character set
had to be available from some other source than my system. How would you
have stated these results?
 
N

Neil Gould

Neil said:
Well, again, I chose my wording based on the results of tests. Before
Ithe Kanji font was installed, the content _within_ the IE6 window
displayed the expected characters as far as I can tell (it displayed
the same characters as FF), but displayed extended characters of the
Tahoma font (my choice) in the title and mimization areas. To render
the page, the Kanji character set had to be available from some other
source than my system. How would you have stated these results?
After giving your challenge some thought, another explanation for what I saw
occurred to me, so I took a closer look at the results in both FF and IE6.
The content of the page was a mix of Kanji and squares/extended characters
in both browsers. This implies that the Kanji characters shown are graphics
and the squares/extended characters are fonts. So, the fonts are not being
supplied externally as I originally thought.

Still, "that old monster" IE6 got it right, and "that new monster" FF did
not.
 
J

Jonathan N. Little

Neil said:
Still, "that old monster" IE6 got it right, and "that new monster" FF did
not.

Really? Are you sure it is the browser not the OS? Kanji in both window
title AND page content with not problem in Firefox 3.6.16 in Ubuntu.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top