!doctype & foreign languages

Discussion in 'HTML' started by yes=no, Nov 28, 2003.

  1. yes=no

    yes=no Guest

    Hi,


    I'm translating several of my site's pages into french.

    i have so far added this line to the metatags:

    <META HTTP-EQUIV="Content-Language" Content="fr">

    but I'm not sure if any other additions are necessary. my main aim here is
    to make the page accessible to search engines that index for the french
    language.

    I notice that the !DOCTYPE declaration generated by Dreamweaver is this:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

    now, does the "EN" at the end signify the "english" language, and should I
    change this to "FR" for my french pages?

    Also (if there are Canucks listening...) does the Quebecois french demand
    any different kind of tagging or does "fr" indicate a universal french,
    irrespective of different "dialects" of french?

    thanks for any comments..

    Y?N
    yes=no, Nov 28, 2003
    #1
    1. Advertising

  2. yes=no

    Eric Bohlman Guest

    "yes=no" <> wrote in
    news:pfyxb.57019$oN2.322@edtnps84:

    > I notice that the !DOCTYPE declaration generated by Dreamweaver is
    > this:
    >
    > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    >
    > now, does the "EN" at the end signify the "english" language, and
    > should I change this to "FR" for my french pages?


    No. The "EN" indicates that the human-readable comments in the actual DTD
    document are written in English.
    Eric Bohlman, Nov 28, 2003
    #2
    1. Advertising

  3. yes=no

    Marc Nadeau Guest

    yes=no a écrit:

    > Hi,
    >
    >
    > I'm translating several of my site's pages into french.
    >
    > i have so far added this line to the metatags:
    >
    > <META HTTP-EQUIV="Content-Language" Content="fr">
    >
    > but I'm not sure if any other additions are necessary. my main aim here
    > is to make the page accessible to search engines that index for the french
    > language.
    >
    > I notice that the !DOCTYPE declaration generated by Dreamweaver is this:
    >
    > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    >
    > now, does the "EN" at the end signify the "english" language, and should I
    > change this to "FR" for my french pages?
    >
    > Also (if there are Canucks listening...) does the Quebecois french demand
    > any different kind of tagging or does "fr" indicate a universal french,
    > irrespective of different "dialects" of french?
    >
    > thanks for any comments..
    >
    > Y?N


    I am a 'québécois' and use fr.

    We do not use a dialect; the rest of the world does. ;-)

    --
    Ta mere elle est tellement grosse que quand elle sort, il y a une eclipse
    Marc Nadeau, Nov 28, 2003
    #3
  4. yes=no

    yes=no Guest


    > I am a 'québécois' and use fr.
    >
    > We do not use a dialect; the rest of the world does. ;-)


    Damn! It's that "attitude" thing again! What happens when the Chinese take
    over and put up signs in Chinese bigger than Francoise? Whose "distinct
    society" will you belong to in prison reciting "ho chi minh" mantras? :)
    yes=no, Nov 28, 2003
    #4
  5. yes=no wrote:

    > i have so far added this line to the metatags:
    > <META HTTP-EQUIV="Content-Language" Content="fr">
    > but I'm not sure if any other additions are necessary.


    Also:

    <html lang="fr">

    And if you have any control over your HTTP Headers:

    Content-Language: fr

    As Eric indicated, this should remain unchanged:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

    --
    Toby A Inkster BSc (Hons) ARCS
    Contact Me - http://www.goddamn.co.uk/tobyink/?page=132
    Toby A Inkster, Nov 28, 2003
    #5
  6. "Toby A Inkster" wrote:
    > > <META HTTP-EQUIV="Content-Language" Content="fr">

    > <html lang="fr">
    > Content-Language: fr
    > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">


    Google seems to recognise what language a page is in, even without language
    (meta)tags, and this regardless of the URL's last letters (after the dot).
    I'm aware of a .org organistation with pages in Dutch and yep, Google puts
    them under the Dutch search (as language) and under a Belgian search (as
    country).

    How do they do that? Check where the server it is on is located? Just
    wondering... (I should also say that Google also makes a lot of mistakes by
    putting 'English' pages on a Dutch search).
    Felix Atagong, Nov 28, 2003
    #6
  7. yes=no

    Safalra Guest

    "yes=no" <> wrote in message news:<pfyxb.57019$oN2.322@edtnps84>...
    > Also (if there are Canucks listening...) does the Quebecois french demand
    > any different kind of tagging or does "fr" indicate a universal french,
    > irrespective of different "dialects" of french?


    You can use either 'fr' or 'fr-ca'. Also, change your <html> tag to
    <html lang="fr">. And technically if you link to pages in a different
    language from the document's, you should indicate this with the
    hreflang property - for example:

    <a href="http://www.safalra.com/" hreflang="en-gb">Safalra's
    Website</a>

    --- Stephen Morley ---
    http://www.safalra.com
    Safalra, Nov 28, 2003
    #7
  8. Felix Atagong wrote:

    > Google seems to recognise what language a page is in, even without
    > language (meta)tags...
    >
    > How do they do that?


    A well-configured web server sends a Content-Language: header with each
    page to indicate what human-readable language the page is in. This is
    the appropriate place for this information.
    Owen Jacobson, Nov 28, 2003
    #8
  9. yes=no

    Safalra Guest

    "Felix Atagong" <> wrote in message news:<3fc70ea5$0$3236$>...
    > Google seems to recognise what language a page is in, even without language
    > (meta)tags, and this regardless of the URL's last letters (after the dot).
    > I'm aware of a .org organistation with pages in Dutch and yep, Google puts
    > them under the Dutch search (as language) and under a Belgian search (as
    > country).
    >
    > How do they do that? Check where the server it is on is located? Just
    > wondering... (I should also say that Google also makes a lot of mistakes by
    > putting 'English' pages on a Dutch search).


    It just looks the words on the pages up in dictionaries for various
    languages, and judges the page to be in the language where the most
    words existed. (Sorry about the awful grammar in that last
    sentence...)

    --- Stephen Morley ---
    http://www.safalra.com
    Safalra, Nov 28, 2003
    #9
  10. "yes=no" <> wrote:

    > Subject: !doctype & foreign languages


    English is a foreign language.

    > <META HTTP-EQUIV="Content-Language" Content="fr">


    Rather write <html lang="fr"> .
    <http://uk.htmlhelp.com/reference/html40/attrs.html#i18n>

    > does the Quebecois french demand any different kind of tagging


    You could write <html lang="fr-CA"> .

    > or does "fr" indicate a universal french,
    > irrespective of different "dialects" of french?


    Yes.

    --
    Top posting.
    What's the most irritating thing on Usenet?
    Andreas Prilop, Nov 28, 2003
    #10
  11. "Owen Jacobson" <> wrote:

    >> Google seems to recognise what language a page is in, even without
    >> language (meta)tags...
    >>
    >> How do they do that?

    >
    > A well-configured web server sends a Content-Language: header with
    > each page to indicate what human-readable language the page is in.
    > This is the appropriate place for this information.


    What makes you think so? I don't know any recommendation about using
    Content-Language. And how would a server know the language? What the
    HTTP protocol says about the use of this header is this:
    "The primary purpose of Content-Language is to allow a user to identify
    and differentiate entities according to the user's own preferred
    language."
    http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.12

    It's basically useful for negotiated content, and in fact negotiation
    usually takes place on the server, so that the Content-Type header, if
    present, has no impact on the client.

    The HTML specifications define the lang attribute (and xml:lang in
    XHTML) for indicating natural language of content. A browser might use
    a Content-Language header as the default, when the root element
    (<html>) lacks such an attribute. But it is _not_ common for browsers
    to do so (I don't think any browser does such things), it is _not_
    common to configure server to send such headers and there is hardly any
    reason to do so. And I would be very surprised if Google would do
    anything with them.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
    Jukka K. Korpela, Nov 28, 2003
    #11
  12. yes=no

    Marc Nadeau Guest

    yes=no a écrit:

    >
    >> I am a 'québécois' and use fr.
    >>
    >> We do not use a dialect; the rest of the world does. ;-)

    >
    > Damn! It's that "attitude" thing again! What happens when the Chinese
    > take
    > over and put up signs in Chinese bigger than Francoise? Whose "distinct
    > society" will you belong to in prison reciting "ho chi minh" mantras? :)


    This chinese 24 bits glyph: ;-)
    is pronounced 'wink' in many dialects and means that the author is doing a
    joke.


    --
    L'humour est une tentative pour décaper les grands sentiments de leur
    connerie.
    Raymond Queneau
    Marc Nadeau, Nov 28, 2003
    #12
  13. Jukka K. Korpela wrote:

    > The HTML specifications define the lang attribute (and xml:lang in
    > XHTML) for indicating natural language of content. A browser might
    > use a Content-Language header as the default, when the root element
    > (<html>) lacks such an attribute. But it is not common for browsers
    > to do so (I don't think any browser does such things), it is not
    > common to configure server to send such headers and there is hardly
    > any reason to do so. And I would be very surprised if Google would do
    > anything with them.


    Interesting point, I hadn't considered that. Assuming you're right and
    I'm wrong (a fairly safe assumption here) then I ask, why *not* use the
    Content-Language header? Most useful metadata about the document is
    sent with the headers, not the response body; further, the
    Content-Language header may allow caches to do language negotiation
    locally rather than passing the request on to the actual server.
    Owen Jacobson, Nov 29, 2003
    #13
  14. "Owen Jacobson" <> wrote:

    > - - why *not* use the Content-Language header?


    If the information in the header is correct and no client gets the
    meaning of the header wrong, then there is no harm in including it. But
    I'm not so sure about those ifs, especially the former, and even the
    latter is uncertain.

    How would you make a server send those headers? Suppose all your
    HTML documents are in English and you make the server send
    Content-Language: en
    for them. Fine. But will you remember to do something if you add a
    document in another language? And how would you do that? You would
    probably need some special mechanism, perhaps changing the filename
    extension, potentially causing problems. (I still remember the .htm8
    incident: Google didn't index URLs ending with .htm8 at all, and
    although this was fixed, I'm a bit suspicious about the effect of
    creative suffixes.)

    Besides, if your documents are in British English, you could specify
    Content-Language: en-UK
    which is more informative. But I'm afraid that _if_ some software uses
    Content-Language headers for something, it could play with simplistic
    and incorrect rules and accept a primary language code only, perhaps
    treating en-UK as unrecognized language. Besides, the semantics of the
    header is not clear. In fact I think you should not use a subcode in
    Content-Language unless you really think that the document is
    unintelligible to people who do not understand that particular form of
    the language. As you see, it's easy to get confused with the language
    codes.

    > Most useful metadata about the document is
    > sent with the headers, not the response body;


    It is true that headers could be useful e.g. in avoiding useless
    fetching of data. In principle, a browser could inform the user that
    the user is about to follow a link to resource that is 42 megabytes of
    text in a dialect of Finnish - after the browser has sent a HEAD
    request and analyzed the response, and before sending a GET.
    But I don't think any browser even tries to do such things.

    > further, the
    > Content-Language header may allow caches to do language negotiation
    > locally rather than passing the request on to the actual server.


    No, I don't think so. I'm having hard time in trying to imagine how
    that could work even in principle.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
    Jukka K. Korpela, Nov 29, 2003
    #14
  15. Jukka K. Korpela wrote:

    > "Owen Jacobson" <> wrote:
    >
    >> - - why *not* use the Content-Language header?

    >
    > How would you make a server send those headers? Suppose all your
    > HTML documents are in English and you make the server send
    > Content-Language: en
    > for them. Fine. But will you remember to do something if you add a
    > document in another language? And how would you do that?


    The conventional way would be to use:

    about_us.html.en (in English)
    about_us.html.de (auf Deutsch)

    -or-

    about_us.en.html (in English)
    about_us.de.html (auf Deutsch)

    Apache is configured to recognise these forms out of the box.

    > You would
    > probably need some special mechanism, perhaps changing the filename
    > extension, potentially causing problems. (I still remember the .htm8
    > incident: Google didn't index URLs ending with .htm8 at all, and
    > although this was fixed, I'm a bit suspicious about the effect of
    > creative suffixes.)


    With Apache MultiViews, suffixes aren't needed at all.

    > Besides, if your documents are in British English, you could specify
    > Content-Language: en-UK
    > which is more informative.


    Which would be wrong. Try en-GB.

    > But I'm afraid that _if_ some software uses
    > Content-Language headers for something, it could play with simplistic
    > and incorrect rules and accept a primary language code only, perhaps
    > treating en-UK as unrecognized language.


    Well, it would be an unrecognised language. The variety of English common
    in the Ukraine?

    Besides which, Apache's content negotiation is smart enough to deal with
    this. It will happily serve en-GB documents to clients that request
    documents with an HTTP Accept-Language of just "en".

    Other servers may be less smart.

    >> further, the
    >> Content-Language header may allow caches to do language negotiation
    >> locally rather than passing the request on to the actual server.

    >
    > No, I don't think so. I'm having hard time in trying to imagine how
    > that could work even in principle.


    I think language-based content negotitation is a disaster right now, until
    more people learn to set their Accept-Language header.

    But there is no harm in specifying the Content-Language of a non-negotiated
    document.

    --
    Toby A Inkster BSc (Hons) ARCS
    Contact Me - http://www.goddamn.co.uk/tobyink/?page=132
    Toby A Inkster, Nov 30, 2003
    #15
  16. Toby A Inkster <> wrote:

    > The conventional way would be to use:
    >
    > about_us.html.en (in English)
    > about_us.html.de (auf Deutsch)
    >
    > -or-
    >
    > about_us.en.html (in English)
    > about_us.de.html (auf Deutsch)


    That's a conventional way, and it involves a complication of URLs (and
    filenames). Mostly this won't cause harm, but it might make people
    wonder, especially since such URLs aren't _that_ common.

    >> Besides, if your documents are in British English, you could
    >> specify Content-Language: en-UK which is more informative.

    >
    > Which would be wrong. Try en-GB.


    Indeed. As you see, it's very easy to get confused with language codes.
    The code en-GB appears in quite a many documents, including Dublin Core
    specifications (where it appears as an example). I knew it's wrong, but
    to be honest, this time I forgot (so I won't pretend this was an
    intentional demonstration of the confusion).

    > Well, it would be an unrecognised language. The variety of English
    > common in the Ukraine?


    No, Ukraine is UA.

    > Besides which, Apache's content negotiation is smart enough to deal
    > with this. It will happily serve en-GB documents to clients that
    > request documents with an HTTP Accept-Language of just "en".


    That's good behaviour, but problems arise in the common situation where
    the client specifies en-UK only (and, as you know, this might be caused
    just by the browser's factory defaults, which do _not_ reflect the
    user's language abilities).

    > I think language-based content negotitation is a disaster right
    > now, until more people learn to set their Accept-Language header.


    Not a disaster, just a bit frustrating. You need to make sure that
    whatever the Accept-Language says, the user gets a page where he can
    find a version he prefers.

    > But there is no harm in specifying the Content-Language of a
    > non-negotiated document.


    That is correct, if the information there is correct and if no software
    makes a mistake. :) But what are the possible _benefits_? Rather
    theoretical, I would say.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
    Jukka K. Korpela, Nov 30, 2003
    #16
  17. Jukka K. Korpela wrote:

    > Toby A Inkster <> wrote:
    >
    >> The conventional way would be to use:
    >>
    >> about_us.html.en (in English)
    >> about_us.html.de (auf Deutsch)
    >>
    >> -or-
    >>
    >> about_us.en.html (in English)
    >> about_us.de.html (auf Deutsch)

    >
    > That's a conventional way, and it involves a complication of URLs (and
    > filenames). Mostly this won't cause harm, but it might make people
    > wonder, especially since such URLs aren't _that_ common.


    I didn't say that you should neccessarily use those as URLs. Just file
    names.

    A practical solution to do might be to have the files:

    /var/www/html/today.html.en
    /var/www/html/heute.html.de

    and then link to them via:

    <a href="/today" lang="en">
    <a href="/heute" lang="de">

    So that the ".html" and ".en"/".de" suffixes aren't used as part of the
    URLs, and aren't used in content negotiation -- just to tell the server
    which HTTP headers to send.

    >> Well, it would be an unrecognised language. The variety of English
    >> common in the Ukraine?

    >
    > No, Ukraine is UA.


    ".ua" is the Ukranian top-level domain name, but "uk" is the ISO639-2
    code.

    --
    Toby A Inkster BSc (Hons) ARCS
    Contact Me - http://www.goddamn.co.uk/tobyink/?page=132
    Toby A Inkster, Nov 30, 2003
    #17
  18. Toby A Inkster <> wrote:

    > ".ua" is the Ukranian top-level domain name, but "uk" is the ISO639-2
    > code.


    This might get too off-topic even for alt.html, but:

    ISO 639-2 defines language codes only. The Ukrainian _language_ has
    code "uk" there (and the three-letter code "ukr"). But you cannot use a
    language code as a subcode; "en-UK" is undefined (and therefore
    incorrect).

    The authority on _country codes_ says that the country code for Ukraine
    is "UA":
    <http://www.iso.org/iso/en/prods-services/iso3166ma/
    02iso-3166-code-lists/list-en1.html>

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
    Jukka K. Korpela, Nov 30, 2003
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roedy Green

    Data entry in foreign languages

    Roedy Green, Jan 26, 2006, in forum: Java
    Replies:
    4
    Views:
    569
    Roedy Green
    Jan 27, 2006
  2. Simon
    Replies:
    7
    Views:
    1,287
    Alan J. Flavell
    Apr 9, 2006
  3. H5N1
    Replies:
    0
    Views:
    428
  4. UJ

    Foreign Languages?

    UJ, Jun 16, 2006, in forum: ASP .Net
    Replies:
    3
    Views:
    359
    Juan T. Llibre
    Jun 16, 2006
  5. Larry Lindstrom
    Replies:
    19
    Views:
    1,268
    Jonathan N. Little
    Jun 12, 2012
Loading...

Share This Page