Mac Safari and urlencoding non-latin characters

R

Riku Kangas

Hi,

I have a web service with a javascript line like this to send
user input to another frame:

<code>
parent.isk_artlist_top.location.href='artlist_top.asp?ss='+escape(document.h
aku.sana.value)+'&base='+document.haku.base.value+'#activeword';
</code>

the problematic part being this ----> escape(document.haku.sana.value)

It works well on all other browsers, but Mac's Safari doesn't encode
the non-latin characters. Changing the charset to UTF-8 solves the
problem, but is there any way to get it working with iso-8859-1?



R.Kangas
 
T

Thomas 'PointedEars' Lahn

Riku said:
<code>
parent.isk_artlist_top.location.href='artlist_top.asp?ss='+escape(document.h
aku.sana.value)+'&base='+document.haku.base.value+'#activeword';
</code>

the problematic part being this ---->
escape(document.haku.sana.value)

It works well on all other browsers,

Most certainly you have neither tested with *all* other browsers nor
would that mean anything.
but Mac's Safari doesn't encode the non-latin characters.

As it should not. If escape() is used for URIs, it should be compliant
to RFC 1738, section 2.2, and/or RFC 2396, section 2.1, which specify
that only US-ASCII characters should be escaped (if they are reserved
and not used in such a meaning) unless the URI encoding is specified.
However, RFC 1738 and RFC 2396 do not define means to specify the
encoding for URLs/URIs in general nor do RFC 1945 (HTTP/1.0) or RFC 2616
(HTTP/1.1) for http URLs.
Changing the charset to UTF-8

How, exactly?
solves the problem, but is there any way to get it working with
iso-8859-1?

I don't think so. There is a lot of proprietarity involved here.

escape() is a proprietary method, its implementation is only *suggested*
in ECMAScript 3, section B.2.1 (and explicitely not adhering to RFC
1738, updated by RFC 2396, there). Using Unicode escape sequences in
URIs is not backed up by RFCs either, so it is proprietary behavior if a
HTTP server decides to interpret all escape sequences in URIs as UTF-8
escape sequences (in order to map two-byte, three-byte and four-byte
sequences to Unicode [4.0] characters accordingly). That a UA is
converting Unicode characters in URI to Unicode escape sequences prior
to sending the HTTP request is proprietary behavior as well (see above).

There are no means to specify the URI encoding that are backed up by
RFCs, so there is no standards-compliant way for the UA to tell the
HTTP server the encoding used for escaping the URI characters. The
server could be possibly configured to take escape sequences specifying
a code point in the range 0x80..0xFF as escape sequences for ISO-8859-1
characters. That would be proprietary behavior, though.

The standards-compliant encodeURIComponent() method of ECMAScript 3 is
not available everywhere and AIUI it is not fully compliant to the RFCs
mentioned either.

It could help to make POST requests instead of GET requests where the
encoding of the POST *data* can be properly specified. That would
require either a HTML form to be submitted or a HTTP API with ECMAScript
language binding (such as XMLHttpRequest) as well as the server-side
application to process POST data (using the Request.Form collection
in ASP).

The best solution, though, is plain and simple: Do not use non-ASCII
characters in URIs, so do not use them in names of files to become
Internet resources unless you also do server-side redirection/URL rewriting.


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top