1. lang="utf-8" - yes, alright, I set it wrong ...
Honestly, I think a period would be the right punctuation here, not
ellipsis (three dots).
2. Of course I have to define the language set even if just for one
sentence. If I don't define the character set, even that one
sentence won't be visible. Then... what's the point?
Presumable "language" means "character" here. Otherwise the statement
does not make sense. And you should _always_ make sure your server
sends character encoding information (charset parameter), though the
need becomes really apparent if you use an encoding other than
iso-8859-1 or relatives.
3. My problem is because I have upgraded to Apache2 from Apache
1.3.x - and I notice that Apache now explicitly sends off language
information to the browser.
Which language information? I think you are confusing language with
character encoding, again. This is actually _very_ common, but that
doesn't make the confusion any less problematic.
I don't see any _language_ headers (Content-Language) if I access e.g.
http://parker.com.hk (which resides on an Apache 2 server). Just quite
normal and common HTTP headers.
I actually find this annoying. Why?
A good question. You shouldn't be annoyed, if it's really the charset
you mean. It should always be included. If your problem is that the
server does not send the _correct_ parameter value, then this needs to
be fixed, in a server-dependent manner, which is probably rather easy
as soon as you have the correct documentation and have a picture
(figuratively speaking) of your web site structure. You cannot override
the HTTP charset parameter in any HTML tag, since the former by
definition has preference.
Because for some sites I host there are multiple languages between
the pages
Again, languages are not the issue; character encodings are, though
naturally the language has an impact on the repertoire of feasible
encodings. If you have pages with different encodings, then the
simplest way, on Apache, is to put files in one encoding into one
directory and create a .htaccess file into that directory, with a
suitable directive to Apache in it, e.g.
AddType text/html;charset=utf-8 HTML
Ideally I want this shut off completely and for my
HTML pages to resume the job of defining the charset.
Whether you can do that depends on Apache 2. Have you checked its
documentation? I would guess that using an AddType without a charset
parameter would do it. But that's really _not_ the WWW way. The WWW way
is to specify the encoding in actual HTTP headers, and <meta> tags are
just surrogates that some people need to resort to (and that _might_ be
including for certain reasons even when you have made the server send
adequate headers).
And why don't I use UTF-8 for everything? Because, while that is
the ideal for compatibility between languages, fact of the matter
is UTF-8 has entered the world too late.
Or too early. But it is true that UTF-8 is _inefficient_ for most East
Asian languages.
Languages such as BIG5 /
GB have become so dominant in Asia that these are native to most
software, NOT UTF.
Again, encodings, not languages. And the software needs to grow up.
UTF-8 is the way the WWW and the Internet are going, in the sense that
support to UTF-8 is the primary goal (according to official IEFT
policy) - any new protocols and software _should_ support it and
_may_ support other encodings.
Support to BIG5 and GB is probably so widespread in situations where
Chinese can be read in the first place that it's probably practical to
encode your documents in Chinese using either of them, so I'm not
arguing against the point that there are good reasons to use different
encodings for pages on a server.