lang parameter in anchor tag

T

Tristan Miller

Greetings.

What is the correct semantics of a lang parameter in an anchor tag? For
example,

<a href="foo" lang="ru">bar</a>

Does this mean

(a) The word "bar" is in Russian (and, for example, should be pronounced as
such by a voice browser), or

(b) The document "foo" is in Russian, but "bar" is still in whatever
language its container element is?

If the correct interpretation is (b), then I take it that to get the
semantics of (a), I need to write something like this:

<span lang="ru"><a href="foo">bar</a></span>

Correct?
 
?

=?iso-8859-1?Q?brucie?=

What is the correct semantics of a lang parameter in an anchor tag? For
example,

<a href="foo" lang="ru">bar</a>

lang = language-code [CI]
This attribute specifies the base language of an element's attribute
values and text content. The default value of this attribute is
unknown. http://www.w3.org/TR/html401/struct/dirlang.html#adef-lang
Does this mean

(a) The word "bar" is in Russian (and, for example, should be pronounced as
such by a voice browser), or

Language information specified via the lang attribute may be used by a
user agent to control rendering in a variety of ways. Some situations
where author-supplied language information may be helpful include:
Assisting search engines
Assisting speech synthesizers
Helping a user agent select glyph variants for high quality typography
Helping a user agent choose a set of quotation marks
Helping a user agent make decisions about hyphenation, ligatures, and spacing
Assisting spell checkers and grammar checkers
http://www.w3.org/TR/html401/struct/dirlang.html#adef-lang
(b) The document "foo" is in Russian, but "bar" is still in whatever
language its container element is?

"bar" and if read the "foo" URL text would be in russian.
If the correct interpretation is (b), then I take it that to get the
semantics of (a), I need to write something like this:

<span lang="ru"><a href="foo">bar</a></span>

what you had first is right as lang applies to an "element's attribute
values and text content." one day browsers may even support it.

also have a look at sending content-language headers

The "Content-Language" header is intended for use in the case where
one desires to indicate the language(s) of something that has RFC
822-like headers, such as MIME body parts or Web documents.
http://www.ietf.org/rfc/rfc3282.txt
 
T

Toby A Inkster

Tristan said:
<a href="foo" lang="ru">bar</a>

Does this mean

(a) The word "bar" is in Russian (and, for example, should be pronounced as
such by a voice browser), or

(b) The document "foo" is in Russian, but "bar" is still in whatever
language its container element is?

Why not just *read* the specs? They say:

(c) The word "bar" is in Russian. The word "foo" is also in Russian. The
document that "foo" points to could be in any language.

If you want to say that the document that foo points to is in Russian, use
hreflang="ru".
If the correct interpretation is (b), then I take it that to get the
semantics of (a), I need to write something like this:

<span lang="ru"><a href="foo">bar</a></span>

Well, (b) isn't correct. Technically neither is (a), but for all intents
as purposes, as it doesn't really matter which language a URL is in, (c)
and (a) are close enough.
 
T

Tristan Miller

Greetings.

Toby A Inkster said:
Why not just *read* the specs? They say:

Because my web proxy server was down and I didn't have the foresight to
download a copy for viewing offline? Or is that not a good enough reason?

I'm sorry if answering this question took up too much of your time, but
then, you were never obligated to respond.
 
D

DU

Tristan said:
Greetings.

What is the correct semantics of a lang parameter in an anchor tag? For
example,

<a href="foo" lang="ru">bar</a>

Does this mean

(a) The word "bar" is in Russian (and, for example, should be pronounced as
such by a voice browser), or

(b) The document "foo" is in Russian, but "bar" is still in whatever
language its container element is?

The document referenced by "foo" is not associated at all to a russian
file. It could be in any language. If you had added hreflang="ru", then
it would have meant that the referenced "foo" resource is written in
russian. Also, charset="koi8-r" would have also identified the character
set of foo in that way.
If the correct interpretation is (b), then I take it that to get the
semantics of (a), I need to write something like this:

<span lang="ru"><a href="foo">bar</a></span>

Just a word of caution. If no available font can render the resource in
the requested language, browsers usually trigger a font download dialog
modal window. Since russian is widely supported in unicode fonts, then
no problem but that is not the case for other Asian languages.

The handling of language related attributes is not obvious IMO. As
others mentioned (I agree with them on this), you should check the specs.

DU
--
Javascript and Browser bugs:
http://www10.brinkster.com/doctorunclear/
- Resources, help and tips for Netscape 7.x users and Composer
- Interactive demos on Popup windows, music (audio/midi) in Netscape 7.x
http://www10.brinkster.com/doctorunclear/Netscape7/Netscape7Section.html
 
J

Jukka K. Korpela

Tristan Miller said:
What is the correct semantics of a lang parameter in an anchor tag?

The semantics of the lang attribute is very complex. I don't mean what the
specifications say (which has been cited and summarized here). They don't
say very much, and that exactly is the problem. As soon as you really
start using language markup, you start encountering all kinds of problems.
And there's really no good summary of even the problems. (Or, rather,
there is, but it's available in Finnish only, and I don't think I have
time and energy to translate it, especially due to the miniscule practical
effect that language markup has at present, or in the near future.)
<a href="foo" lang="ru">bar</a>

To summarize the situation: the word "bar" is declared as being Russian,
whereas nothing is said about the linked document's language. In
principle, "foo" is declared Russian too, and this might be relevant to a
speech browser that is asked to tell information about a link, including
its URL. URLs _can_ be spoken, and sometimes need to.

But if you write Russian words in a transliteration, using Latin letters,
such as "bar" literally, I would advice against using the lang attribute
at all. Beware that I am now advicing you to break a WCAG 1.0 priority 1
requirement (which is, in fact, broken by the WCAG 1.0 document itself,
too, and by virtually all W3C documents) - the requirement that language
changes be indicated in markup.

I have two reasons to my advice:
1. Browsers, such as IE 6, are known to let the lang attribute affect
fonts too. They may even get wild and make the use frustrated when
they look for a font containing Cyrillic letters, despite the fact
that the text contains Latin letters only. And if they find such
a font, they may use it for the transliterated Russian text, making
it look different from the rest of the text. So
 
J

Jukka K. Korpela

Tristan Miller said:
What is the correct semantics of a lang parameter in an anchor tag?

The semantics of the lang attribute is very complex. I don't mean what
the specifications say (which has been cited and summarized here). They
don't say very much, and that exactly is the problem. As soon as you
really start using language markup, you start encountering all kinds of
problems. And there's really no good summary of even the problems. (Or,
rather, there is, but it's available in Finnish only, and I don't think
I have time and energy to translate it, especially due to the miniscule
practical effect that language markup has at present, or in the near
future.)
<a href="foo" lang="ru">bar</a>

To summarize the situation: the word "bar" is declared as being Russian,
whereas nothing is said about the linked document's language. In
principle, "foo" is declared Russian too, and this might be relevant to
a speech browser that is asked to tell information about a link,
including its URL. URLs _can_ be spoken, and sometimes need to.

But if you write Russian words in a transliteration, using Latin
letters, such as "bar" literally, I would advice against using the lang
attribute at all. Beware that I am now advicing you to break a WCAG 1.0
priority 1 requirement (which is, in fact, broken by the WCAG 1.0
document itself, too, and by virtually all W3C documents) - the
requirement that language changes be indicated in markup.

I have two reasons to my advice:

1. Browsers, such as IE 6, are known to let the lang attribute affect
fonts too. They may even get wild and make the use frustrated when
they look for a font containing Cyrillic letters, despite the fact
that the text contains Latin letters only. And if they find such
a font, they may use it for the transliterated Russian text, making
it look different from the rest of the text. So
<p>My favorite author is <span lang="ru">Pushkin</span></p>
may result in "Pushkin" displayed in a considerably different font.
So although browsers don't do much _useful_ with lang attributes,
they surely know how to mess things up.

2. There is no way to indicate the transliteration method. What would
<span lang="ru">chas</span> mean? Should it be spoken (and spelling
checked, and indexed, etc.) according to the transliteration that
is common in English language context, or by French rules, or by
German rules, or by standard (ISO 9) rules? It would be just a wild
guess that the language of the enclosing text dictates this.

If you actually write Russian in Cyrillic letters, then different cans of
worms are opened.
 
T

Tristan Miller

Greetings.

Jukka K. Korpela said:
If you actually write Russian in Cyrillic letters, then different cans of
worms are opened.

Well, yes, but since it's a short amount of text in an overwhelmingly
iso-8559-1 document, I was planning on using HTML entities rather than
using Unicode or mixing character encodings. That is, I had originally
intended to write something like this:

<a lang="ru" href="http://www.cs.toronto.edu/~kol/" title="Antonina
Kolokolova's web
page">Антонина
Колоколова</a>

And have it show up as follows, with a mouseover tooltip (or whatever other
mechanism the browser provides, if any) displaying the English title: (Note
that this message is in UTF-8.)

Ðнтонина Колоколова

In light of what I have learned from this thread, though, I suppose that the
browser will consider the anchor's title, "Antonina Kolokolova's web page",
to be in Russian rather than English. I don't suppose there's any way
around this; I seem to recall reading that one isn't allowed to put markup
inside anchor tags. That is, I would not be allowed to write the
following, correct?

<a href="http://www.cs.toronto.edu/~kol/" title="Antonina
Kolokolova's web page"><span
lang="ru">Антонина
Колоколова</span></a>

So assuming I drop the anchor title and end up with just

<a lang="ru"
href="http://www.cs.toronto.edu/~kol/">Антонина
Колоколова</a>

am I still exposing myself to potential cans of worms?
 
J

Jukka K. Korpela

Tristan Miller said:
Well, yes, but since it's a short amount of text in an overwhelmingly
iso-8559-1 document, I was planning on using HTML entities rather than
using Unicode or mixing character encodings.

I see. Then the page will depend on browser support to Cyrillic letters.
The "HTML entities", or actually character references to be exact, are
relatively well supported - but people often use systems with insufficient
fonts. This is one reason why transliterations are often used. Another
reason is that people who don't know Russian still get some idea of a text
when it's written as transliterated. (I remember how I started three times
studying elementary Russian at the university, and I always quitted after
a few lessons, since those odd characters were all Greek to me.)
<a lang="ru" href="http://www.cs.toronto.edu/~kol/" title="Antonina
Kolokolova's web
page">Антонина
Колоколова<
/a>

Technically, this is incorrect information since the lang attribute
specifies the language of the element content and all attributes.
But why not write the title attribute value in Russian, since the page is
in Russian?
In light of what I have learned from this thread, though, I suppose
that the browser will consider the anchor's title, "Antonina
Kolokolova's web page", to be in Russian rather than English.

Well, it should. But useful support to lang attributes is very limited.
That is, I would not
be allowed to write the following, correct?

<a href="http://www.cs.toronto.edu/~kol/" title="Antonina
Kolokolova's web page"><span
lang="ru">Антонина
Колоколова<
/span></a>

That would be allowed, since <span> elements (and other text-level markup)
are allowed inside <a> elements.

But it would say that "Antonina Kokokolova's web page" is in English,
although we know that the name is Russian, and in principle it should be
pronounced with this in mind. There's no way around this, since attribute
values are plain text by definition. Oh well, in the deepest theory,
Unicode has some fancy tools for indicating language with special
characters, but that gets far too theoretical even for me (and I will eat
a worm if there is a browser that supports such things).
So assuming I drop the anchor title and end up with just

<a lang="ru"
href="http://www.cs.toronto.edu/~kol/">Анто&#10
85;ина
Колоколова<
/a>

am I still exposing myself to potential cans of worms?

The only problem I can see with that is the potential lack of Cyrillic
fonts on people's browsers, and naturally the fact that most of the
world's population doesn't understand Russian. But if you know a _useful_
value for it, go ahead and use it without worrying too much about lang
attributes.

For a general audience, which may or may not understand Russian, I would
use a link like the above, with Russian link text, followed by a simple
textual explanation like "(Antonina Kolokolova's web page, in Russian)".
 
D

DU

Tristan said:
Greetings.




Well, yes, but since it's a short amount of text in an overwhelmingly
iso-8559-1 document, I was planning on using HTML entities rather than
using Unicode or mixing character encodings. That is, I had originally
intended to write something like this:

<a lang="ru" href="http://www.cs.toronto.edu/~kol/" title="Antonina
Kolokolova's web
page">Антонина
Колоколова</a>

[snipped]

The above does not make sense. lang defines the language for the content
as well as the advisory title attribute value: here, you have 2
languages used.
The referenced resource used koi8-r but is entirely written in English:
not consistent but this might be doable (I doubt this).

I think it would be a lot more sensible to offer 2 links here. How about:

<a lang="ru" href="http://www.cs.toronto.edu/~kol/" charset="koi8-r"
hreflang="ru" title="Страничка Ðнтонина Колоколова">
Антонина
Колоколова</a>
which would lead to a document written in Russian in koi8-r

and

<a lang="en" href="http://www.cs.toronto.edu/otherFile..."
charset="iso-8859-1" hreflang="en" title="Antonina Kolokolova's web
page"> Antonina Kolokolova</a>
which would lead to a document written in English in iso-latin

Links, title and attributes would be clear and consistent.



Note that the document at
http://www.cs.toronto.edu/~kol/
uses an incorrect doctype declaration.

{ Note that the public identifier section of the DOCTYPE declaration is
case sensitive. Some versions of Netscape Composer are known to insert
the lower-case "-//w3c//dtd html 4.0 transitional//en", rather than the
correct mixed-case "-//W3C//DTD HTML 4.0 Transitional//EN".
}
http://www.htmlhelp.org/faq/html/basics.html#doctype

http://www.w3.org/QA/2002/04/valid-dtd-list.html

DU
--
Javascript and Browser bugs:
http://www10.brinkster.com/doctorunclear/
- Resources, help and tips for Netscape 7.x users and Composer
- Interactive demos on Popup windows, music (audio/midi) in Netscape 7.x
http://www10.brinkster.com/doctorunclear/Netscape7/Netscape7Section.html
 
T

Tristan Miller

Greetings.

The referenced resource used koi8-r but is entirely written in English:
not consistent but this might be doable (I doubt this).

I think it would be a lot more sensible to offer 2 links here. How about:

<a lang="ru" href="http://www.cs.toronto.edu/~kol/" charset="koi8-r"
hreflang="ru" title="Страничка Ðнтонина Колоколова">
Антонина
Колоколова</a>
which would lead to a document written in Russian in koi8-r

Well, this would not be appropriate given, as you said, that Antonina's
index page is in English, not Russian. (Other parts of her site are in
Russian, but not the specific file I'm linking to.)
Note that the document at
http://www.cs.toronto.edu/~kol/
uses an incorrect doctype declaration.

Unfortunately, I don't have any control over other people's HTML coding
skills (or lack thereof). If this error really irks you you can e-mail her
yourself. :)

Regards,
Tristan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top