Determining which encoding the browser used for a url

J

Jon Maz

Hi,

I am working on a dotnet url rewriting mechanism that has to be able to deal
with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.

The problem is that some browsers will encode this url using utf8 & some
using ISO 8859 (I *think* those are the only two possibilities). For ISO
8859 I can use the built-in UrlDecode function, for utf8 I am using a
function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes = (byte)inputString;
}
return Encoding.UTF8.GetString(utf8Bytes);
}

The problem is deciding *which* encoding the browser has used, and therefore
which decoding function I need to use. It seems that Mozilla-based browsers
use ISO 8859, whereas IE can use either, depending on a user-setting, and I
haven't looked at any other browsers yet.

As far as I know, the browser does NOT send anything in the headers that
tell you what url-encoding it is using, so I guess I need some way of
looking at the raw url and working out which encoding it's using.

Can anyone help me with writing a function to do this? The ideal would be a
GetEncoding(string testString) function, but I'd settle for a function
IsUtf8Encoded(string testString), on the grounds that if it *isn't* utf8, it
must be ISO 8859.

TIA,

JON
 
J

Joerg Jooss

Jon said:
Hi,

I am working on a dotnet url rewriting mechanism that has to be able
to deal with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.

Doing this without direct control over your clients' configuration is a
daunting task, as you've just found out ;-)

The problem is that some browsers will encode this url using utf8 &
some using ISO 8859 (I think those are the only two possibilities).

Depends on you audience. Don't expect Chinese users to send ISO-8859-x.
For ISO 8859 I can use the built-in UrlDecode function, for utf8 I am
using a function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes = (byte)inputString;
}
return Encoding.UTF8.GetString(utf8Bytes);
}


Um... why? System.Web.HttpUtility has tons of methods for this,
including
public static string UrlDecode(string, Encoding);
The problem is deciding which encoding the browser has used, and
therefore which decoding function I need to use. It seems that
Mozilla-based browsers use ISO 8859, whereas IE can use either,
depending on a user-setting, and I haven't looked at any other
browsers yet.

You can't solve this. It's like trying to open an arbitrary file and
guess a correct character encoding.
As far as I know, the browser does NOT send anything in the headers
that tell you what url-encoding it is using, so I guess I need some
way of looking at the raw url and working out which encoding it's
using.

You're right it's not defined what encoding to use. Sender and receiver
need to agree on this.

Can anyone help me with writing a function to do this? The ideal
would be a GetEncoding(string testString) function, but I'd settle
for a function IsUtf8Encoded(string testString), on the grounds that
if it *isn't* utf8, it must be ISO 8859.

I'd rather drop the requirement of transparently supporting non ASCII
URL paths.

Cheers,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top