Setting language to UTF-8

Discussion in 'HTML' started by Terence Parker, Nov 2, 2003.

  1. I currently have at the beginning of my sites:

    <html lang="utf-8">
    <head>
    <title>Some imaginative title....</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    </head>

    Scouring the web for several other websites that set the character set, this
    is the exact tag used. However, it doesn't work for me. And not just me -
    when my code is viewed on any browser by any person the system doesn't use
    UTF-8 to render the page... it uses whatever default is on that system.

    How do I force a browser to use the correct character set? This seems to
    work with other languages... just not Unicode.

    Any ideas anyone?

    Thanks,
    Terence
     
    Terence Parker, Nov 2, 2003
    #1
    1. Advertising

  2. Terence Parker wrote:

    > I currently have at the beginning of my sites:
    >
    > <html lang="utf-8">
    > <head>
    > <title>Some imaginative title....</title>
    > <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    > </head>


    The lang attribute is for human languages (english, french, etc.), not
    character sets. Set the character set in the HTTP Content-Type header or
    your XML decleration.

    > How do I force a browser to use the correct character set? This seems to
    > work with other languages... just not Unicode.


    You can't force a browser to do anything, period.
     
    Leif K-Brooks, Nov 2, 2003
    #2
    1. Advertising

  3. The <html lang="utf-8"> tag was something I added in more recently in
    desperation, seeing as the Content-Type tag didn't work. I do, separately,
    have a tag that reads:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    - but that doesn't do anything. Yes, I can't 'force' a browser to do
    anything, but assuming that one has their browser configured to
    automatically detect the character set it should change to UTF-8 upon seeing
    the above tag. But it doesn't.

    The text (Chinese) on the page *does* work... i.e. if you manually set the
    encoding to UTF-8 on the browser - but it's just not selected automatically,
    which the browser does seem to do for other sets (like big5 for example).

    What I don't understand is why no browser would set the character set to
    UTF-8 when viewing my pages.

    Terence



    "Leif K-Brooks" <> wrote in message
    news:6j%ob.194$...
    > > I currently have at the beginning of my sites:
    > >
    > > <html lang="utf-8">
    > > <head>
    > > <title>Some imaginative title....</title>
    > > <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    > > </head>

    >
    > The lang attribute is for human languages (english, french, etc.), not
    > character sets. Set the character set in the HTTP Content-Type header or
    > your XML decleration.
    >
    > > How do I force a browser to use the correct character set? This seems to
    > > work with other languages... just not Unicode.

    >
    > You can't force a browser to do anything, period.
     
    Terence Parker, Nov 2, 2003
    #3
  4. Terence Parker wrote:

    > The <html lang="utf-8"> tag was something I added in more recently in
    > desperation,


    Well, get rid of it quick! As Leif said, lang is for human languages, eg
    "en-GB" (English), "fr" (French) or "de" (German).

    > seeing as the Content-Type tag didn't work.


    Then set the Content-Type HTTP header.

    --
    Toby A Inkster BSc (Hons) ARCS
    Contact Me - http://www.goddamn.co.uk/tobyink/?id=132
     
    Toby A Inkster, Nov 2, 2003
    #4
  5. "Terence Parker" <> wrote:

    > The <html lang="utf-8"> tag was something I added in more recently in
    > desperation, seeing as the Content-Type tag didn't work.


    It's generally not productive to throw tag sallad around just because you
    don't know what's going on. You will just make things worse. And why did you
    write the Subject line the way you did? It does not describe the problem at
    all, just a misguided attempt to solve an unspecified problem. More hints on
    how to post constructively:
    http://www.cs.tut.fi/~jkorpela/usenet/dont.html

    > I do,
    > separately, have a tag that reads:
    >
    > <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    >
    > - but that doesn't do anything.


    Why don't you post the URL instead of code snippets?

    Believe me, the URL _is_ relevant. You may not realize this yet, but that
    would indicate that you don't understand the basics of this "charset" thing.

    > Yes, I can't 'force' a browser to do
    > anything, but assuming that one has their browser configured to
    > automatically detect the character set it should change to UTF-8 upon
    > seeing the above tag.


    No, a browser _must not_ do that _except_ when the HTTP headers do not
    indicate the encoding. And the headers _should_ indicate the encoding.

    > The text (Chinese) on the page *does* work... i.e. if you manually set
    > the encoding to UTF-8 on the browser - but it's just not selected
    > automatically, which the browser does seem to do for other sets (like
    > big5 for example).


    Well, _if_ the HTTP headers fail to indicate the encoding, then browsers
    _should_ use the plastic imitation, the <meta> tag. Now there's the
    possibility that your actual document contains a typo in that tag (sorry,
    the crystal ball is dim). Or there might something odd in the situation, but
    we really need the URL for a starter. In future, please save a few rounds of
    iteration and post the URL in the original question.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
     
    Jukka K. Korpela, Nov 2, 2003
    #5
  6. Not to be ungrateful, but I think the productivity of this original post
    hasn't been what I originally planned.

    > It's generally not productive to throw tag sallad around just because you
    > don't know what's going on. You will just make things worse. And why did

    you
    > write the Subject line the way you did? It does not describe the problem

    at
    > all, just a misguided attempt to solve an unspecified problem. More hints

    on
    > how to post constructively:
    > http://www.cs.tut.fi/~jkorpela/usenet/dont.html


    When people ask questions on forums before they are willing to try out
    various things and try the obvious, they are flamed for not taking the time
    to attempt to solve it themselves. And when I experiment by trying out
    another tag which I feel might solve the problem? The result is that people
    still complain. It seems that nobody can be pleased these days.

    My use of the lang clause in the HTML tag did not casue the problem nor did
    in interfere with the problem - and since it did not fix the problem either,
    it has since been removed from my code. I really don't see what a big deal
    it is... it's not going to cause the self-descruction of my website or
    anything.

    And regarding the 'unspecified problem' - I feel it was quite well specified
    to begin with. The offending tag was included in the code, with the exact
    problem described in my original posting : browsers aren't using UTF-8 to
    display the page despite the Content-Type setting telling it to do so. What
    more does one need to know? While i'm at it there are several PHP related
    questions I can ask, but they would probably be of no interest to anyone in
    this forum.

    And no doubt I would only get bashed for asking something off-topic as a
    result then anyway.

    > Why don't you post the URL instead of code snippets?


    > Believe me, the URL _is_ relevant. You may not realize this yet, but that
    > would indicate that you don't understand the basics of this "charset"

    thing.

    The reason I don't post a URL is because i'm working on something that
    interfaces a database and as the code was in its early stages at the time
    the site was not something I wanted to be public accessible. Even so, I do
    not see how any of the other tags on the page would be of any use /
    relavence to anyone. I have, actually, gone through the trouble of looking
    at other sites first before posting my original message - just in case
    anyone thinks I posted it straight away and was so lazy I couldn't even be
    bothered to do research. All websites looked at (including google groups -
    which uses utf-8) use the exact same tag I used, and nothing else anywhere
    in the page. Do you mean to tell me that by some remarkably strange
    coincidence all my pages happen to require an extra tag which nobody else's
    does? I hardly doubt it.

    So, at the moment, I really don't see how the URL is relavent, beyond being
    able to view the source and see the same tag which I have included in my
    original posting anyway. However, for the benefit of everyone's curiosity,
    the pages can be found at http://intranet.shatincollege.edu.hk/prm/index.php

    >> Yes, I can't 'force' a browser to do
    >> anything, but assuming that one has their browser configured to
    >> automatically detect the character set it should change to UTF-8 upon
    >> seeing the above tag.


    > No, a browser _must not_ do that _except_ when the HTTP headers do not
    > indicate the encoding. And the headers _should_ indicate the encoding.


    Thank-you... well... at least I learned something in that paragraph.

    Though okay, i'm still puzzled as to why the headers i'm using failed to
    indicate the encoding.

    > Well, _if_ the HTTP headers fail to indicate the encoding, then browsers
    > _should_ use the plastic imitation, the <meta> tag. Now there's the
    > possibility that your actual document contains a typo in that tag (sorry,
    > the crystal ball is dim). Or there might something odd in the situation,

    but
    > we really need the URL for a starter. In future, please save a few rounds

    of
    > iteration and post the URL in the original question.


    I believe I went through the reasons why I didn't post the URL already but,
    again, I have included it above if it really is going to be of help. Unless
    I am totally blind (which I wouldn't entirely rule out) I cannot see a typo
    in the META tag used as compared to META tags used in most other websites.

    Terence
     
    Terence Parker, Nov 5, 2003
    #6
  7. .... and please excuse the typos. It's past midnight and way past my bedtime!

    Terence
     
    Terence Parker, Nov 5, 2003
    #7
  8. Terence Parker

    Steve Pugh Guest

    "Terence Parker" <> wrote:

    >So, at the moment, I really don't see how the URL is relavent, beyond being
    >able to view the source and see the same tag which I have included in my
    >original posting anyway. However, for the benefit of everyone's curiosity,
    >the pages can be found at http://intranet.shatincollege.edu.hk/prm/index.php


    Good now we can see what the HTTP headers are, and so can you.

    http://www.delorie.com/web/headers.cgi?url=http://intranet.shatincollege.edu.hk/prm/index.php

    >>> Yes, I can't 'force' a browser to do
    >>> anything, but assuming that one has their browser configured to
    >>> automatically detect the character set it should change to UTF-8 upon
    >>> seeing the above tag.

    >
    >> No, a browser _must not_ do that _except_ when the HTTP headers do not
    >> indicate the encoding. And the headers _should_ indicate the encoding.

    >
    >Thank-you... well... at least I learned something in that paragraph.


    Now all you need to do is apply it.

    >Though okay, i'm still puzzled as to why the headers i'm using failed to
    >indicate the encoding.


    They headers you use say that the page is ISO-8859-1. So that's the
    encoding that browsers use.

    >Unless I am totally blind (which I wouldn't entirely rule out) I cannot see a typo
    >in the META tag used as compared to META tags used in most other websites.


    Remember that paragraph that you learned something from?
    Your meta tag is 100% irrelevant.
    Browsers MUST ignore it as there is a character set specified in the
    HTTP header.

    The solution to your problem is to change your HTTP header.

    Steve

    --
    "My theories appal you, my heresies outrage you,
    I never answer letters and you don't like my tie." - The Doctor

    Steve Pugh <> <http://steve.pugh.net/>
     
    Steve Pugh, Nov 5, 2003
    #8
  9. Quoth the raven named Terence Parker:
    > the pages can be found at http://intranet.shatincollege.edu.hk/prm/index.php


    Perhaps if you fixed the <html lang="utf-8"> your page would work?

    "utf-8" is not the LANGuage of a country or a people.

    <html lang="en"> is for English.

    --
    -bts
    -This space intentionally left blank.
     
    Beauregard T. Shagnasty, Nov 5, 2003
    #9
  10. Toby A Inkster, Nov 6, 2003
    #10
  11. Terence Parker

    DU Guest

    Terence Parker wrote:

    http://intranet.shatincollege.edu.hk/prm/index.php

    Terence, I don't understand why you are not setting the charset via http
    headers to "en" instead of utf-8 and then setting lang attribute of the
    single line of chinese text to BIG5. Many so far told you that lang
    attribute only takes human languages as defined by iso-639 norm; yet,
    you keep using utf-8.

    http://lcweb.loc.gov/standards/iso639-2/langcodes.html

    Only 1 sentence is in Chinese in your file. So there is really no need
    to set the whole document charset to utf-8 in the first place.
    Finally, for the sake of web interoperability across multiple charset
    and language, I really think you should write an entirely validated html
    file. As written, your file is not valid and resort to a bad design
    technique (tables) and several deprecated html elements (center, font).

    My 2 cents

    DU
     
    DU, Nov 13, 2003
    #11
  12. Terence Parker

    DU Guest

    DU wrote:

    > Terence Parker wrote:
    >
    > http://intranet.shatincollege.edu.hk/prm/index.php
    >
    > Terence, I don't understand why you are not setting the charset via http
    > headers to "en" instead of utf-8 and then setting lang attribute of the
    > single line of chinese text to BIG5.


    DOH! Argh... meant to say lang="zh" instead :)

    Many so far told you that lang
    > attribute only takes human languages as defined by iso-639 norm; yet,
    > you keep using utf-8.
    >
    > http://lcweb.loc.gov/standards/iso639-2/langcodes.html
    >
    > Only 1 sentence is in Chinese in your file. So there is really no need
    > to set the whole document charset to utf-8 in the first place.
    > Finally, for the sake of web interoperability across multiple charset
    > and language, I really think you should write an entirely validated html
    > file. As written, your file is not valid and resort to a bad design
    > technique (tables) and several deprecated html elements (center, font).
    >
    > My 2 cents
    >
    > DU
    >
     
    DU, Nov 13, 2003
    #12
  13. DU <> wrote:

    > Terence Parker wrote:
    >
    > http://intranet.shatincollege.edu.hk/prm/index.php
    >
    > Terence, I don't understand why you are not setting the charset via http
    > headers to "en" instead of utf-8 and then setting lang attribute of the
    > single line of chinese text to BIG5.

    [ corrected to lang="zh" in a later posting - I wonder why you didn't
    supersede ]

    Sorry, but this is astonishingly strange. We've discussed the issue at
    length, and the page _still_ contains the nonsensical lang="utf-8", and now
    you are adding to the confusion that charset be set to "en", which would be
    just as nonsensical but with more serious consequences.

    > Many so far told you that lang
    > attribute only takes human languages as defined by iso-639 norm;


    Well, basically so, although in principle you can use x-whatever-you-like
    too, it just won't (normally) be useful at all.

    > yet, you keep using utf-8.


    That's strange indeed, but hardly causes much damage. Setting charset to
    "en" in HTTP would mean setting it to undefined value and letting browser
    play its guessing game.

    What the page (the server) does _right_ is the HTTP header that specifies
    UTF-8. This is absolutely the right thing, when the data is UTF-8 encoded,
    no matter what language (if any) the content is.

    And setting lang="zh" would have nothing to do with the character encoding
    issue. It would be adequate in principle, even a priority 1 requirement in
    WAI guidelines, to declare the language of a fragment that way. But that
    does _not_ affect the encoding issues.

    > Only 1 sentence is in Chinese in your file. So there is really no need
    > to set the whole document charset to utf-8 in the first place.


    Yes there is. The encoding is a property of a document, and it's always the
    same for the entire document. You cannot switch the encoding. (It is true
    that there is an encoding that permits certain switching _inside_ it, but
    that's encoding-level issue, not very useful, not much used, and has nothing
    to do with HTML markup.)

    What _could_ be done is writing the document in, say, US-Ascii encoding,
    using character references () for anything outside US-Ascii.
    But there's no special reason to do that, especially after the page has been
    written in UTF-8.

    And since the page contains a form, there can be a special reason to use
    UTF-8. Browsers normally send form data in the encoding of the page
    character containing the form. In fact, this is the only way in practice to
    set the character encoding of the form data. In this case, the data is
    probably all Ascii, so this doesn't matter, but it's important on pages
    containing e.g. search forms.

    > Finally, for the sake of web interoperability across multiple charset
    > and language, I really think you should write an entirely validated html
    > file. As written, your file is not valid and resort to a bad design
    > technique (tables) and several deprecated html elements (center, font).


    It's tag soup, alright. But this doesn't really affect the encoding issues.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
     
    Jukka K. Korpela, Nov 13, 2003
    #13
  14. Terence Parker

    DU Guest

    Jukka K. Korpela wrote:

    > DU <> wrote:
    >
    >
    >>Terence Parker wrote:
    >>
    >>http://intranet.shatincollege.edu.hk/prm/index.php
    >>
    >>Terence, I don't understand why you are not setting the charset via http
    >>headers to "en" instead of utf-8 and then setting lang attribute of the
    >>single line of chinese text to BIG5.

    >
    > [ corrected to lang="zh" in a later posting - I wonder why you didn't
    > supersede ]
    >
    > Sorry, but this is astonishingly strange. We've discussed the issue at
    > length, and the page _still_ contains the nonsensical lang="utf-8", and now
    > you are adding to the confusion that charset be set to "en", which would be
    > just as nonsensical but with more serious consequences.
    >


    DOH!! I got confused, mixed up myself!

    >
    >>Many so far told you that lang
    >>attribute only takes human languages as defined by iso-639 norm;

    >
    >
    > Well, basically so, although in principle you can use x-whatever-you-like
    > too, it just won't (normally) be useful at all.
    >
    >
    >>yet, you keep using utf-8.

    >
    >
    > That's strange indeed, but hardly causes much damage. Setting charset to
    > "en" in HTTP would mean setting it to undefined value and letting browser
    > play its guessing game.
    >


    Sorry. Meant to say charset=iso-8859-1
    and then only use character entities for the single line of Chinese.
    That is what I would have tried.

    > What the page (the server) does _right_ is the HTTP header that specifies
    > UTF-8. This is absolutely the right thing, when the data is UTF-8 encoded,
    > no matter what language (if any) the content is.
    >
    > And setting lang="zh" would have nothing to do with the character encoding
    > issue. It would be adequate in principle, even a priority 1 requirement in
    > WAI guidelines, to declare the language of a fragment that way. But that
    > does _not_ affect the encoding issues.
    >
    >
    >>Only 1 sentence is in Chinese in your file. So there is really no need
    >>to set the whole document charset to utf-8 in the first place.

    >
    >
    > Yes there is. The encoding is a property of a document, and it's always the
    > same for the entire document. You cannot switch the encoding. (It is true
    > that there is an encoding that permits certain switching _inside_ it, but
    > that's encoding-level issue, not very useful, not much used, and has nothing
    > to do with HTML markup.)
    >
    > What _could_ be done is writing the document in, say, US-Ascii encoding,
    > using character references () for anything outside US-Ascii.
    > But there's no special reason to do that, especially after the page has been
    > written in UTF-8.
    >
    > And since the page contains a form, there can be a special reason to use
    > UTF-8. Browsers normally send form data in the encoding of the page
    > character containing the form. In fact, this is the only way in practice to
    > set the character encoding of the form data. In this case, the data is
    > probably all Ascii, so this doesn't matter, but it's important on pages
    > containing e.g. search forms.
    >
    >
    >>Finally, for the sake of web interoperability across multiple charset
    >>and language, I really think you should write an entirely validated html
    >>file. As written, your file is not valid and resort to a bad design
    >>technique (tables) and several deprecated html elements (center, font).

    >
    >
    > It's tag soup, alright. But this doesn't really affect the encoding issues.
    >



    Sorry I got mixed up! What I meant to say is this:


    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">

    <html lang="en">

    <head>

    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <meta http-equiv="Content-Language" content="en">
    <meta http-equiv="Content-Style-Type" content="text/css">
    <meta http-equiv="Content-Script-Type" content="text/javascript">

    <title>Sha Tin College PRM:: Main Menu</title>
    </head>

    <body>

    (...)

    <b><span lang="zh">沙田學院 :: 家長通訊系統</span></b><br>

    and sorry I couldn't reply to your post earlier. My apologies!

    DU
     
    DU, Nov 24, 2003
    #14
  15. Terence Parker

    DU Guest

    >
    > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    > "http://www.w3.org/TR/html4/strict.dtd">
    >
    > <html lang="en">
    >
    > <head>
    >
    > <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    > <meta http-equiv="Content-Language" content="en">
    > <meta http-equiv="Content-Style-Type" content="text/css">
    > <meta http-equiv="Content-Script-Type" content="text/javascript">
    >
    > <title>Sha Tin College PRM:: Main Menu</title>
    > </head>
    >
    > <body>
    >
    > (...)
    >
    > <b><span lang="zh">沙田學院 :: 家長通訊系統</span></b><br>


    Argh... I fumbled again !! Gulp!

    <b><span lang="zh">沙田學院 ::
    家長通訊系統</span></b><br>

    :)

    DU
     
    DU, Nov 24, 2003
    #15
  16. In article <bpsojg$8ud$>, DU says...

    > > <title>Sha Tin College ...


    Heh, heh.

    - Dan
     
    Daniel Ruscoe, Nov 24, 2003
    #16
  17. DU wrote:

    > <b><span lang="zh">沙田學院 :: 家長通訊系統</span></b><br>


    I am very pleased with my newsreader being able to display this correctly!
    And my browser too.

    --
    Toby A Inkster BSc (Hons) ARCS
    Contact Me - http://www.goddamn.co.uk/tobyink/?page=132
     
    Toby A Inkster, Nov 24, 2003
    #17
  18. This thread is quite old now but I forgot to check back on it -
    actually I went on a trip abroad.

    I admit defeat in that the header is defined by the webserver - when
    it was first mentioned, the word 'webserver' wasn't used and 'header'
    to me simply meant the header tags of the HTML - not the HTML header
    sent out by Apache. I have now fixed the problem by, as someone
    suggested, setting the header explicitly in my Apache configuration.
    However - a few responses to the replies I received.

    1. lang="utf-8" - yes, alright, I set it wrong ... but as someone did
    mention that shouldn't cause any problems. And indeed it didn't. It
    may confuse the browser, but it won't screw anything up. Anyways that
    has now been corrected.

    2. Of course I have to define the language set even if just for one
    sentence. If I don't define the character set, even that one sentence
    won't be visible. Then... what's the point?

    3. My problem is because I have upgraded to Apache2 from Apache 1.3.x
    - and I notice that Apache now explicitly sends off language
    information to the browser. I actually find this annoying. Why?
    Because for some sites I host there are multiple languages between the
    pages - and if Apache were to force the browser to be one language,
    then it effectively can't serve different pages of differing languages
    under the same virtual host (no, i'm not talking about multiple
    languages within one page here - I know you can't do that). Ideally I
    want this shut off completely and for my HTML pages to resume the job
    of defining the charset. I don't want Apache doing it for me.

    And why don't I use UTF-8 for everything? Because, while that is the
    ideal for compatibility between languages, fact of the matter is UTF-8
    has entered the world too late. Languages such as BIG5 / GB have
    become so dominant in Asia that these are native to most software, NOT
    UTF. And that goes for websites in this part of the world too.

    Anyways - thanks to all that replied. At least my problem is partially
    solved now.

    Terence
     
    Terence Parker, Dec 6, 2003
    #18
  19. (Terence Parker) wrote:

    > 1. lang="utf-8" - yes, alright, I set it wrong ...


    Honestly, I think a period would be the right punctuation here, not
    ellipsis (three dots).

    > 2. Of course I have to define the language set even if just for one
    > sentence. If I don't define the character set, even that one
    > sentence won't be visible. Then... what's the point?


    Presumable "language" means "character" here. Otherwise the statement
    does not make sense. And you should _always_ make sure your server
    sends character encoding information (charset parameter), though the
    need becomes really apparent if you use an encoding other than
    iso-8859-1 or relatives.

    > 3. My problem is because I have upgraded to Apache2 from Apache
    > 1.3.x - and I notice that Apache now explicitly sends off language
    > information to the browser.


    Which language information? I think you are confusing language with
    character encoding, again. This is actually _very_ common, but that
    doesn't make the confusion any less problematic.

    I don't see any _language_ headers (Content-Language) if I access e.g.
    http://parker.com.hk (which resides on an Apache 2 server). Just quite
    normal and common HTTP headers.

    > I actually find this annoying. Why?


    A good question. You shouldn't be annoyed, if it's really the charset
    you mean. It should always be included. If your problem is that the
    server does not send the _correct_ parameter value, then this needs to
    be fixed, in a server-dependent manner, which is probably rather easy
    as soon as you have the correct documentation and have a picture
    (figuratively speaking) of your web site structure. You cannot override
    the HTTP charset parameter in any HTML tag, since the former by
    definition has preference.

    > Because for some sites I host there are multiple languages between
    > the pages


    Again, languages are not the issue; character encodings are, though
    naturally the language has an impact on the repertoire of feasible
    encodings. If you have pages with different encodings, then the
    simplest way, on Apache, is to put files in one encoding into one
    directory and create a .htaccess file into that directory, with a
    suitable directive to Apache in it, e.g.
    AddType text/html;charset=utf-8 HTML

    > Ideally I want this shut off completely and for my
    > HTML pages to resume the job of defining the charset.


    Whether you can do that depends on Apache 2. Have you checked its
    documentation? I would guess that using an AddType without a charset
    parameter would do it. But that's really _not_ the WWW way. The WWW way
    is to specify the encoding in actual HTTP headers, and <meta> tags are
    just surrogates that some people need to resort to (and that _might_ be
    including for certain reasons even when you have made the server send
    adequate headers).

    > And why don't I use UTF-8 for everything? Because, while that is
    > the ideal for compatibility between languages, fact of the matter
    > is UTF-8 has entered the world too late.


    Or too early. But it is true that UTF-8 is _inefficient_ for most East
    Asian languages.

    > Languages such as BIG5 /
    > GB have become so dominant in Asia that these are native to most
    > software, NOT UTF.


    Again, encodings, not languages. And the software needs to grow up.
    UTF-8 is the way the WWW and the Internet are going, in the sense that
    support to UTF-8 is the primary goal (according to official IEFT
    policy) - any new protocols and software _should_ support it and
    _may_ support other encodings.

    Support to BIG5 and GB is probably so widespread in situations where
    Chinese can be read in the first place that it's probably practical to
    encode your documents in Chinese using either of them, so I'm not
    arguing against the point that there are good reasons to use different
    encodings for pages on a server.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
     
    Jukka K. Korpela, Dec 6, 2003
    #19
  20. "Jukka K. Korpela" <> wrote:

    > But it is true that UTF-8 is _inefficient_ for most East
    > Asian languages.


    No, why? Because you need three bytes instead of two bytes for one
    character? Nested tables, e.g., are a heavier crime than big files.
    And a simple image will usually be bigger than your HTML text.
    Let's not forget that some editors blow up your source by inserting
    90 % space characters.
    All this is, IMHO, more severe than using three bytes for one
    Chinese character.

    --
    But thats what FP puts in to the page, so i asume thats correct
    Harry H. Arends in microsoft.public.frontpage.client
     
    Andreas Prilop, Dec 6, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JJBW
    Replies:
    1
    Views:
    10,202
    Joerg Jooss
    Apr 24, 2004
  2. =?Utf-8?B?QXNoYQ==?=
    Replies:
    3
    Views:
    426
  3. Arifi Koseoglu
    Replies:
    2
    Views:
    977
    Arifi Koseoglu
    Apr 13, 2004
  4. Jimmy Shaw

    Converting from UTF-16 to UTF-32

    Jimmy Shaw, Jul 31, 2006, in forum: C++
    Replies:
    7
    Views:
    1,334
    P.J. Plauger
    Aug 1, 2006
  5. darrel
    Replies:
    5
    Views:
    472
    =?ISO-8859-1?Q?G=F6ran_Andersson?=
    Apr 14, 2007
Loading...

Share This Page