mime charset problem

M

Martin Nadoll

Hello,

i have a german website and need to use german characters like äöü.

So first i have:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>
<head>

<title>MyTitle</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="pragma" content="no-cache">
<meta http-equiv="expires" content="0">
</head>

<body>
äöü
</body>
</html>

But these characters are not shown correctly.
Why?

Thanks for any help,
Martin Nadoll
 
J

Jukka K. Korpela

Scripsit Martin Nadoll:
i have a german website
URL?

and need to use german characters like äöü.
Yes...

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

You are thereby asking IE to work in a very broken mode, called "Quirks
Mode". However, the brokenness does not affect the encoding issue.
<title>MyTitle</title>

That's a poor title. Or are you saying that you didn't copy the _actual_
page? (There are thousands of pages that actually have such a nonsense
title, by the way, probably since the author originally wrote it with
the _intention_ of fixing it "later".)
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">

This will be trumped by HTTP headers, if they specify the encoding.

Do you now realize why the URL is all-important? No? Then you _surely_
need to post the URL.
<meta http-equiv="pragma" content="no-cache">

That's clueless. You most probably have no idea of how caches work and
how they help, so you have no way of knowing whether you have the one in
an thousand case where they hurt.
<meta http-equiv="expires" content="0">

That's both incorrect and clueless.

Just don't mess around with caches before you understand them, mm'kay?
<body>
äöü
</body>
</html>

But these characters are not shown correctly.

My crystal ball says that in the actual HTML file, they appear as UTF-8
encoded. Or maybe they are ISO-8859-1 encoded but the server announces
UTF-8.

P.S. You also need to fix your newsreader. It now sends non-ASCII data
without specifying the encoding.
 
M

Martin Nadoll

Thanks for your help.
http://www.convince-ag.de/andex.htm
but content is äöüß, not ???? as you see in source.
That's a poor title. Or are you saying that you didn't copy the _actual_
page? (There are thousands of pages that actually have such a nonsense
title, by the way, probably since the author originally wrote it with the
_intention_ of fixing it "later".)


I actually have another title, but i thought, that doesn't matter, sorry.
This will be trumped by HTTP headers, if they specify the encoding.
Sorry, i don't understand that...
Do you now realize why the URL is all-important? No? Then you _surely_
need to post the URL.
Yes, i fixed that mistake here.
My crystal ball says that in the actual HTML file, they appear as UTF-8
encoded. Or maybe they are ISO-8859-1 encoded but the server announces
UTF-8.
Why that?

Looks like you know, what you are talking about, maybe you can help me
another time?

Thanks,
Martin Nadoll
 
J

Jukka K. Korpela

Scripsit Martin Nadoll:

OK, now we can check that the _server_ announces the page as UTF-8
encoded, even though it is in fact ISO-8859-1 encoded. You need to
change either of these: the server's announcement (in HTTP headers), or
the actual encoding, so that they match.

The server runs Apache, so the problem _might_ be fixable simply by
adding a file with the exact name ".htaccess" (without the quotes but
with the leading dot) into the main directory of your pages, with the
following line as its only content:
AddType text/html;charset=iso-8859-1 htm
(If .htaccess already exists, just add that line there.)

This simply instructs the server to announce the ISO-8859-1 encoding for
any page in a file with a name ending with ".htm".

However, depending on local policy by the server admin, your ".htaccess"
might lack the effect. In that case, contact the server admin and read
their instructions. If the policy is to use UTF-8 for everything, just
live with it. (They're wrong, but probably stubbornly wrong.) Try
finding an authoring tool that can save data in UTF-8 encoding. This
should be easy, though not _all_ authoring tools can do that.

It might be better to switch to UTF-8 anyway, though ISO-8859-1 still
works a little more reliably. In UTF-8, you can enter _any_ character as
such, provided of course that your authoring tool offers some way of
typing it. This includes things like German punctuation marks as well as
en dash and em dash, and about 100,000 other characters that don't
belong to ISO-8859-1.

There are plenty of pages on these issues these days, but I'm afraid
they present the issue as more complex than it actually is. You might
however check e.g.
http://www.w3.org/International/tutorials/tutorial-char-enc/#Slide0240
Sorry, i don't understand that...

The <meta> tag is just an Ersatz trick and effectively tells the browser
to behave _as if_ it had got a Content-Type HTTP header. By the specs,
and by actual practice, browsers will ignore the Ersatz when they get a
real HTTP header from a server, with conflicting content.
Why that?

The crystal ball is pure magic, and it was correct once again - it was
the latter alternative.
 
M

Martin Nadoll

I will check all these advice.
Thanks a lot for spending so much time with my problem.
Looks like you are very capable of handling all that issues.
A lot of my content comes from a database, so there i have many of that
german characters.
That's why i thought, it's not a good idea, to switch to UTF-8.

Thank you,
Martin Nadoll
 
B

Brandy Red

I will check all these advice.
Thanks a lot for spending so much time with my problem.
Looks like you are very capable of handling all that issues.
A lot of my content comes from a database, so there i have many of that
german characters.
That's why i thought, it's not a good idea, to switch to UTF-8.

I'm from Norway. In norway we also have some characters that don't
match with utf-8. In norway it's primarly æ, ø and å. Should I stick
to
ISO-8859-1 or move to utf-8? That might seems like a hard question,
but it is really simple. Every us-letters in utf-8 are encoded with
one "letter",
most europen letters like those form German and Norway are encode
with two "letters". Your letters (äöü), doubble s, and so on is
encoded whith
two "letters". Just one little image will make the diffrerens
unimportent.
 
J

Jukka K. Korpela

Scripsit Brandy Red:
I'm from Norway.

A small world, isn't it? This little peninsula of Asia that we call
"Europe" is inhabited by interesting people, even though it is
linguistically relatively uniform. A collection of only about 1,000
characters (the so-called Minimum European Subset 2, MES-2) covers
virtually all letters and punctuation used in European languages. But I
digress.

First I'd like to mention that data coming from a database tends to be
in an encoding of the database, but quite often you can easily convert
it to a different encoding. Even PHP, which is primitive in many ways
and doesn't really support Unicode, has tools for converting from
ISO-8859-1 to UTF-8 (which is a fairly trivial conversion anyway).
In norway we also have some characters that don't
match with utf-8.

Pardon? I was very confused... but I think you mean characters that have
a different representation in UTF-8 than in ISO-8859-1.
In norway it's primarly æ, ø and å.

Right. And perhaps some punctuation marks, though partly they don't
exist in ISO-8859-1 at all.
Should I stick to
ISO-8859-1 or move to utf-8? That might seems like a hard question,
but it is really simple.

It depends.
Every us-letters in utf-8 are encoded with
one "letter",
most europen letters like those form German and Norway are encode
with two "letters". Your letters (äöü), doubble s, and so on is
encoded whith
two "letters". Just one little image will make the diffrerens
unimportent.

That sounds very confusing, but there is a point behind it. What you
really mean is that letters like æ, ø, å, ä, ä, ü, ß etc. each occupy
one octet (one 8-bit byte) in ISO-8859-1, two octets in UTF-8, and that
this difference is not very important in terms of efficiency.

The real issues are elsewhere. Can you work with UTF-8 in your authoring
software? Can other people who edit the pages later do the same? Can you
change the Content-Type header sent by the server? And so on. The
efficiency impact is mostly ignorable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,059
Latest member
cryptoseoagencies

Latest Threads

Top