page encoding question

T

Tony Vella

I am preparing a series of philatelic html pages (lots of text and a few
scans of stamps) which will include alpha-characters (accents) in Italian,
French, Spanish, Portuguese and Danish. The pages I have finished in draft
form so far I have encoded UTF-8 but I have just been told that 99% of the
world will not be able to read them and that I should go through all the
pages and re-encode them "western european - windows (1252)". I guess what
I would like to know is what encoding would be most effective for these
particular languages. Any advice and pointers will be appreciated.
 
D

David Dorward

Tony said:
I am preparing a series of philatelic html pages (lots of text and a few
scans of stamps) which will include alpha-characters (accents) in Italian,
French, Spanish, Portuguese and Danish. The pages I have finished in draft
form so far I have encoded UTF-8 but I have just been told that 99% of the
world will not be able to read them

That is rubbish. UTF-8 is very well supported (so much so, that I can't
remember the last time I came across a system that couldn't handle it).
 
D

Dan

Tony said:
I am preparing a series of philatelic html pages (lots of text and a few
scans of stamps) which will include alpha-characters (accents) in Italian,
French, Spanish, Portuguese and Danish. The pages I have finished in draft
form so far I have encoded UTF-8 but I have just been told that 99% of the
world will not be able to read them and that I should go through all the
pages and re-encode them "western european - windows (1252)". I guess what
I would like to know is what encoding would be most effective for these
particular languages. Any advice and pointers will be appreciated.

While UTF-8 is actually very widely supported, and thus there's no
reason to change your encoding from this (if your server sends a proper
content-type header indicating the encoding), the Western European
languages you are using should work all right in the standard Western
encoding iso-8859-1 as well. Avoid windows-1252; it's a proprietary
Microsoft set.
 
L

Luigi Donatello Asero

Dan said:
While UTF-8 is actually very widely supported, and thus there's no
reason to change your encoding from this (if your server sends a proper
content-type header indicating the encoding), the Western European
languages you are using should work all right in the standard Western
encoding iso-8859-1 as well. Avoid windows-1252; it's a proprietary
Microsoft set.

I am using
iso-8859-1 at the moment but I am going to change it into UTF-8 to add the
pages in Russian and Chinese
(just now I have little in these languages)
 
J

Jukka K. Korpela

Tony Vella said:
I am preparing a series of philatelic html pages (lots of text and a few
scans of stamps) which will include alpha-characters (accents) in Italian,
French, Spanish, Portuguese and Danish.

They are all covered by the ISO-8859-1 encoding, except for some punctuation
marks and letters like the oe ligature. If you use windows-1252, you get the
punctuation marks and the ligature, too.
The pages I have finished in draft
form so far I have encoded UTF-8 but I have just been told that 99% of the
world will not be able to read them

Nonsense. More probably, 99 % of the WWW users _are_ able to read them. Well,
let's say 97.6 %. After all, 96,3 % of all percentages have just been made
up, and the remaining 4,7 % have been miscalculated.
and that I should go through all the
pages and re-encode them "western european - windows (1252)".

I wouldn't do that at this point, unless you have good tools that do such
things for you with minimal effort.
I guess what
I would like to know is what encoding would be most effective for these
particular languages.

If you were just about to start the project, I would recommend ISO-8859-1 (or
windows-1252 if you need those extras) - not because of wider browser
coverage (though there is a _small_ improvement to be gained there) but
because those encodings are somewhat more efficient (one byte per character,
whereas UTF-8 uses two bytes for some of the characters you'd use).

UTF-8 is certainly simpler in the future if you'll ever need to add
characters in other languages.
 
A

Alan J. Flavell

That is rubbish.
Agreed

UTF-8 is very well supported (so much so, that I can't remember the
last time I came across a system that couldn't handle it).

Broad agreement with that, but there are exceptions...

Well, Netscape 4.* versions do a pretty good job of *rendering* utf-8,
but do keep in mind that, if any form submission is required, then NN4
badly mangles utf-8. Whether it's worth understanding how to
implement a workaround for that old zombie is debatable, of course:
I'm just mentioning that it's not without a problem.

cheers

(The original WebTV is also hopeless at rendering anything other than
a subset of Windows-1252, but ho hum.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top