pulvens said:
I have a javascript running at the server side, receiving both POST
and GET requests.
You should have said that you are using JScript in ASP.
The POST request works fine, none latin-1 letters are displayed
correct.
An HTTP POST request must include a Content-Type header which defines the
format and encoding (implicitly if omitted, the default is ISO-8859-1), so
the server, regardless of its default character set, should be able to
decode it properly.
GET requests are different because there is no message body for which a
Content-Type header would apply (which is why RFC 3986 defines it instead.)
Also keep in mind that the supported URI length is limited in IE/MSHTML to
2083 characters, so you probably would want to use POST requests anyway:
But if I type the request directly into the address bar, I get the
replacement (65533) char instead.
That is probably because the query part cannot be decoded by the server. A
Unicode-supporting application is required to use a replacement sequence if
decoding of a byte sequence is not possible; U+FFFD is the primary
possibility for doing that. (There are at least four others, see below.)
If I type ?text=ø the browser sends ?text=%F8, and the receiving
script fails to read it correct.
Understandable, see below.
If I type ?text=%C3%B8 the browser sends it correct and the receiving
script succeeds.
Understandable, too.
What is the difference, is %F8 utf16? or utf8?
`%F8' can be neither, and no part of either. BTW, UTF-16 and UTF-8 are only
different character encodings for the same Unicode character set (see
below).
and are there any way to influence it.
The problem is the same for both IE8 and Firefox 3.5.
The page is in utf-8, both the html header and the page meta tag
specifies that.
There is no HTML header. There is an HTTP (response) header, and if that
header begins with `Content-Type:' then its value takes precedence over the
META _element_ (<meta http-equiv="Content-Type" content="...">). (You only
need the META element when the resource should be displayed without a HTTP
server; unfortunately few browsers manage to duplicate the Content-Type
header as a META element when saving the document on the local filesystem.)
The script runs at a IIS server.
Here is the test script:
function main() {
var t = String(Request('text'));
for(var i =0; i < t.length;i++){
for (var i = 0, len = t.length; i < len; i++) {
is more efficient and better readable. (Allman style¹, which I use and
recommend, even requires the brace to be placed below the `f', but YMMV.)
Content.add(i+'='+t.charCodeAt(i)+'<br />');
}
Content.add(t+'<br />');
}
I guess my question are:
Why dose the javascript String(obj) function not understand the %F8
encoding?
First of all, AFAIK String() does nothing here but to return the passed
value as that is a string value already. AFAIK, it is never going to decode
anything. Second, how non-ASCII characters are encoded depends primarily on
the client.
If the client uses a percent-encoding not defined in RFC 3986, you have to
deal with that, for example by guessing the used encoding and apply
unescape(). That is relatively easy to do for some codes for characters of
8-bit character sets because a UTF-8 code unit is never going to be one of
C0, C1, F5, F6, F7, F8, F9, FA, FB, FC, FD, FE, and FF.²
It would be better, though, if the client used UTF-8 percent-encoding as
defined by RFC 3986 to begin with. You can encourage the client to do so if
you declare and use UTF-8 (instead of an encoding for an 8-bit character
set, like ISO-8859-1) for serving your content, the former with the
following HTTP header:
Content-Type: ...; charset=utf-8
or any case variation thereof (see also
<
http://www.iana.org/assignments/character-sets>). (Observe that I am using
UTF-8 for encoding this posting [to pass the footnote 1 character], Google
Groups that you are using should be able to decode it, and your browser
should be able to display it.)
However, your posting suggests that the client might not behave anyway; in
that case the problem must be the client or the form with which the data is
submitted, because it works on a great number of other Web sites, including
those which I have worked on.
But better check the *received* response headers first, for example in
Firefox with Firebug or LiveHTTPHeaders.
Do you have any suggestions how to solve this problem.
HTH
PointedEars
___________
¹ <
http://en.wikipedia.org/wiki/Indent_style>
² <
http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences>