changing or at least detecting character encoding via javascript ?

D

David Komanek

Hi all,

I have a question if it is possible to manipulate the settings of
character encoding in Ms Internet Explorer 5.0, 5.5 and 6.0. The
problem is that the default instalation of Ms IE seems to have hard
selected default encoding to "Western European (ISO)", which means
iso-8859-1. When browsing pages with some Central/Eastern European
characters these are converted to iso-8859-1 so displayed wrong.

I would suppose the "auto-select" option should be default, so the
browser can select the right encoding according to the meta-tags in
the head of webpage. But this is apparently not true.

Please, is it possible to use JavaScript or Java applet to get the
information about the current client character encoding settings
and/or change it to the "auto-select" value ? How to do this ?

Thanks in advance,

David Komanek
 
M

Martin Honnen

David said:
Hi all,

I have a question if it is possible to manipulate the settings of
character encoding in Ms Internet Explorer 5.0, 5.5 and 6.0. The
problem is that the default instalation of Ms IE seems to have hard
selected default encoding to "Western European (ISO)", which means
iso-8859-1. When browsing pages with some Central/Eastern European
characters these are converted to iso-8859-1 so displayed wrong.

I would suppose the "auto-select" option should be default, so the
browser can select the right encoding according to the meta-tags in
the head of webpage. But this is apparently not true.

Please, is it possible to use JavaScript or Java applet to get the
information about the current client character encoding settings
and/or change it to the "auto-select" value ? How to do this ?

What about using the HTML <meta> tag:
<meta http-equiv="Content-Type" content="text/html;
charset=yourCharsetHere">
 
V

VK

All browsers respect and treat properly the encoding set via meta tag.
The "auto-select" option in IE is used only if the page has no encoding set
neither by meta nor by server. Then browser tries to guess the encoding
using characters byte values (and usually its guess is wrong, so you have to
change it manually).

If your browser doesn't display a page or a part of page in the proper
encoding, it means either of following:
1) The page has no encoding set neither by meta nor by server. Browser has
"auto-select" enabled, it tried to guess the encoding and it missed.
2) The encoding is set properly, but your system doesn't have a
corresponding font to display.
3) Several character sets are used on the same page (for example, a Latin
and a Cyrillic ones), and the page encoding is not "utf-8". Only UTF-8
(Unicode) allows it.
4) Something's broken in your browser. Reinstall it.

P.S. If you really want to go by some very special way, see document.charset
property (read/write, but buggy).
 
P

Paul Gorodyansky

Hello!

Hi all,

I have a question if it is possible to manipulate the settings of
character encoding in Ms Internet Explorer 5.0, 5.5 and 6.0. The
problem is that the default instalation of Ms IE seems to have hard
selected default encoding to "Western European (ISO)", which means
iso-8859-1.

No, there is no such thing in Internet Explorer as
'default encoding' (Netscape/Mozilla do have such thing).
When browsing pages with some Central/Eastern European
characters these are converted to iso-8859-1 so displayed wrong.

Martin and VK has already answered that - it's a _site_'s problem,
it's probably does not specify its encoding so you need to choose
it manually in IE's menu - only if theb page you visited right
before was not Central European - then IE will show your new page
Ok - if a new page does not specify its encoding, IE uses
*last used encoding* to show such page.

--
Regards,
Paul Gorodyansky
"Russian On-screen Keyboard"
(based on the JavaScript code by Matin Honnen et al):
http://ourworld.compuserve.com/homepages/PaulGor/onscreen.htm
 
D

David Komanek

Hi all,

thank you for the responses. Unfortunately my colleague is abroad, in
Netherlands and I have no possibility to play with his compoter (and all
computers in his department, too :) But What I can tell for sure is
that I have the appropriate meta-tag in the page: iso-8859-2. He says he
has iso-8859-1 is his setting what he sees in the "view|encoding" menu
as selected. And all the Czech characters he sees converted to the
english equivalents. For example &Aacute; he sees as a simple "A" if I
use the normal character. Only two ways to get the right character to
his display which I can go is to use the &Aacute; entity itself or to
recode the page to utf-8, right. But ïf I use the "normal character"
(not the corresponding entity) in the html source and my colleague
manually switches the encoding to the "Central European (ISO)", which
means iso-8859-2, voila, he sees the character well .... but tell this
to do to all people abroad .... :)

I am pretty sure I have the meta-tag o.k. because I see the characters
exaxtly as I should on my windows machine (and on many others close to
me), even if the default codepage in Czech editions of windows is
cp-1250 which is different one. Yes, it differs only in few characters,
but I tried them, too - with no problems.

I would agree, that if my colleague would have not fonts properly
installed, he should see strange characters. But why are the characters
implicitly converted on his side ? And why on many computers ? Is it
possible it does his proxy ?

Thanks,

David
 
P

Paul Gorodyansky

David,

David Komanek said:
Hi all,

thank you for the responses. Unfortunately my colleague is abroad, in
Netherlands and I have no possibility to play with his compoter (and all
computers in his department, too :) But What I can tell for sure is
that I have the appropriate meta-tag in the page: iso-8859-2. He says he
has iso-8859-1 is his setting what he sees in the "view|encoding" menu
as selected.

If you would let us know the URL it would be easier for us to
help you.
Any way, the above happens often with Russian too for the following reason:
- author created good page with correct META...charset=
- he placed .html to the Web Server of his Internet Provider
- The Web Server of the Provider is configured in such a way that
it places Charset=iso-8859-1 ("Western European") into
HTTP Header that is sent along with the page itself to a reader
( http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html )

- HTTP Header, but the standards, has higher priority than META...charset=
so browser gets it as a iso-8859-1 page!

So your friend needs to ask Web Server people if they do the above.
For example, my Internet Provider, CompuServe, does NOT fill our
Charset field of HTTP Header, so in my files META...charset=
works OK.

There is a test page that shows HTTP Header, so:
- create a Web page *without* META...charset= in it
- place it to the Web Server
- go to this page, get the screen with HTTP Header and see
what is the value of "Charset" field:
http://www.delorie.com/web/headers.html

If you do the above for _my_ page where there is no META...charset=
http://ourworld.compuserve.com/homepages/PaulGor/test1251.htm
you will see that CompuServe leaves Charset field empty...
 
P

Paul Gorodyansky

David said:
Hi all,

thank you for the responses. Unfortunately my colleague is abroad, in
Netherlands and I have no possibility to play with his compoter (and all
computers in his department, too :) But What I can tell for sure is
that I have the appropriate meta-tag in the page: iso-8859-2. He says he
has iso-8859-1 is his setting what he sees in the "view|encoding" menu
as selected. And all the Czech characters he sees converted to the
english equivalents.

I have ISO-8859-2 Test Page (because I work as Software I18n engineer),
so you can ask your friend to check how it is shown using *my*
Provider who does not fill out Charset of HTTP Header:
http://ourworld.compuserve.com/homepages/paulgor/8859-2.htm
 
S

Stephen

Paul said:
David,




If you would let us know the URL it would be easier for us to
help you.
Any way, the above happens often with Russian too for the following reason:
- author created good page with correct META...charset=
- he placed .html to the Web Server of his Internet Provider
- The Web Server of the Provider is configured in such a way that
it places Charset=iso-8859-1 ("Western European") into
HTTP Header that is sent along with the page itself to a reader
( http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html )

- HTTP Header, but the standards, has higher priority than META...charset=
so browser gets it as a iso-8859-1 page!

So your friend needs to ask Web Server people if they do the above.
For example, my Internet Provider, CompuServe, does NOT fill our
Charset field of HTTP Header, so in my files META...charset=
works OK.

Is it possible that you might also get 8859-1 because the client sends
this in the Accept-charset request header? Without providing for
alternatives, and regardless of server configuration?
There is a test page that shows HTTP Header, so:
- create a Web page *without* META...charset= in it
- place it to the Web Server
- go to this page, get the screen with HTTP Header and see
what is the value of "Charset" field:
http://www.delorie.com/web/headers.html

OT: The above URL is an example of an application that is broken by
Verisign's implementing "sitefinder".

Regards
Stephen
 
P

Paul Gorodyansky

Hi,
Is it possible that you might also get 8859-1 because the client sends
this in the Accept-charset request header? Without providing for
alternatives, and regardless of server configuration?

No, not really. First - and it's easy to verify - many browsers - and
MS Internet Explorer is one of them - do *not* fill out Accept-Charset
field - you can check it for example using "CGI Test Script" link
here: http://koi8.pp.ru/frame.html?htmlreq.html

Second, Accept-Charset is for different reason - when server has
*several* variants of the same page, say one contains same Russian
text in KOI8-R encoding, another - in Windows-1251 encoding, then
a browser via Accept-Charset=koi8-r tells the server what it
can take. Server can not *make* a document to be KOI8-R if it does not
havev such. Same in our case - if server contains ISO-8859-2
document and browser (f.e. Mozilla) requests ISO-8859-1, then
it does not mean at all that server will send existing -2 document
as -1.
 
S

Stephen

Paul said:
Hi,
Paul said:
David,

[...snip...]

Is it possible that you might also get 8859-1 because the client sends
this in the Accept-charset request header? Without providing for
alternatives, and regardless of server configuration?

No, not really. First - and it's easy to verify - many browsers - and
MS Internet Explorer is one of them - do *not* fill out Accept-Charset
field - you can check it for example using "CGI Test Script" link
here: http://koi8.pp.ru/frame.html?htmlreq.html

Second, Accept-Charset is for different reason - when server has
*several* variants of the same page, say one contains same Russian
text in KOI8-R encoding, another - in Windows-1251 encoding, then
a browser via Accept-Charset=koi8-r tells the server what it
can take. Server can not *make* a document to be KOI8-R if it does not
havev such. Same in our case - if server contains ISO-8859-2
document and browser (f.e. Mozilla) requests ISO-8859-1, then
it does not mean at all that server will send existing -2 document
as -1.
Of course. Thanks for the commentary. I did notice that Gecko-based
browsers (Netscape 7.0, Moz 1.4, Firebird) do send Accept-charset. And
contrary to what I was remembering, you are right about IE: it does not.
Thanks again,
Stephen
 
D

David Komanek

Thank you all for your help.

In the meantime I got the workaround for my problem by recoding the
pages to utf8, as was suggested here. Because the encoding is made by
a module in Apache on the server, where the implicit codepage served
to clients is iso-8859-2, I just prefixed the pages with /utf8, wich
tells the server to use the explicit encoding "utf-8". So, for
example, one of the recoded pages, where is the problem is

http://www.natur.cuni.cz/utf8/fem_modflow/index.php?id=4

the original one is now as

http://www.natur.cuni.cz/fem_modflow/index_test.php?id=4

Please, colud somebody form non-central/eastern-european region tell
me what (s)he sees on the page between lines

"Organizing Committee"

and

"Institute of Hydrogeology, Engineering Geology and Applied
Geophysics" ?

The should be the name "Zbynek Hrkal", where the "e" has a special
decoration (sthg. like tilde, but not exactly, I have no idea how to
call this letter in english language, sorry (does anybody know ?)). I
see it right in both encodings, with MS IE 6, Netscape 7.1, Mozilla
..... but my colleague in Netherlands sees it well only in utf8, not in
original iso-8859-2. In the latter case he sees "regular e" instead.

I do not know how to ge the http header from the server. When I
connect to port 80 of the webserver via unix telnet and type

GET /fem_modflow/index_test.php?id=4

I get just the source of the webpage, no http header lines:

# telnet www.natur.cuni.cz 80
Trying 195.113.56.1...
Connected to tao.natur.cuni.cz.
Escape character is '^]'.
GET /fem_modflow/index_test.php?id=4
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
.....
.....
.....

Thanks again for your comments.

With best regards,

David Komanek
 
D

David Komanek

- HTTP Header, but the standards, has higher priority than META...charset=
so browser gets it as a iso-8859-1 page!

So your friend needs to ask Web Server people if they do the above.
For example, my Internet Provider, CompuServe, does NOT fill our
Charset field of HTTP Header, so in my files META...charset=
works OK.

Well, this seems to be the problem. Thank you. The header displayed by
http://www.delorie.com/web/headers.html tells the charset should be
"us-asci". Regardless of setting AddDefaultCharset in Apache
httpd.conf, php.ini setting and "header()" function as the forst line
of PHP source itself. Very strange. And even more strange is that on
some computers the meta-tag based information about the encoding takes
precedence and on some not ....

David Komanek
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top