XHTML and how to do it right

A

Andreas Prilop

Thomas Mlynarczyk said:
So the browser does not "rewrite" the received header "on the fly" upon

The browser does not rewrite anything. The original idea was that the
parsing the meta tags but rather *might* choose to use the meta info if it
can't find anything useful in the header?

It usually does.
 
T

Thomas Mlynarczyk

Also sprach Andreas Prilop:
The browser does not rewrite anything. The original idea was that the


It usually does.

So the best approach is to have the charset parameter both in the meta tag
and the http header?
 
A

Andreas Prilop

Thomas Mlynarczyk said:
So the best approach is to have the charset parameter both in the meta tag
and the http header?

Grmpf, no.
As explained here and in <many times: Give to the HTTP header what belongs to the HTTP header.
 
M

Mark Parnell

Also sprach Andreas Prilop:


So the best approach is to have the charset parameter both in the meta tag
and the http header?

No, the browser will only use what is in the meta element if the server
doesn't send anything. If the server sends a charset then the meta
should be ignored. But it could cause problems if the meta charset
happened to be different to the http charset.

Having said that, there *are* arguments for specifying it in both - e.g.
for local testing (if you don't have a server running locally) or for
validation via upload.
 
M

Mark Parnell

The doctype declaration used
to serve only a validation purpose a few years ago; now, many browsers
(IE6) use it to define a parsing mode and a rendering mode for the document.

I hope you're not saying this is a good thing. Doctype sniffing is an
Evil Thing[TM], and should be condemned to the depths it came from.
 
J

Jukka K. Korpela

Thomas Mlynarczyk said:
The more I think about it, this may be indeed the best solution.

Much time has been wasted by thinking otherwise. If you need to ask how
to use XHTML 1.0, it's surely not time for you to move to it. It's
especially pointless to convert old documents into XHTML just for the
sake of conversion.
Still, isn't XHTML supposed to be "the future" of the WWW?

HTML is the past, present and future of the WWW, as the simple solid
basis of hypertext markup. Putting the "X" in front of it, and
transmogrifying the irrelevant details of syntax, is just XML
marketing. (Have they already confessed it was a joke? Right, I think
so too. They can't afford to admit it now.)
So wouldn't it be a good idea to make use of XML, while somehow
providing a "graceful degradation" for older browsers (similar to
using CSS which can be ignored by old browsers)?

No. If you use XHTML 1.0, you mostly gain nothing but lose some time
and effort that could be used better. If you "use XHTML 1.0" in the
sense of writing an XML document containing tags from the XHTML
"namespace", then a) you are _not_ writing XHTML 1.0 documents and
b) you are writing documents that fail to work on leading present-day
browsers when served correctly.

So it would be a quite high price for the comfort of writing stuff like
<person>...</person> instead of <span class="person">...</span>.
 
J

Jukka K. Korpela

DU said:
Apparently (and I could be quite wrong here),
the charset in the meta element can be ignored or can be honored
depending on how the web server was configured.

It depends on the _browser_, not the server. Initially it was designed
to be read and processed by servers, but this never happened (except in
rare cases); instead, browsers started reading and applying it.

But by XHTML rules, the meta element must be overridden, if it
conflicts with the XML declaration ("prolog"). It's confusing:
http://www.w3.org/TR/html/#C_9
The logical interpretation is that if there is no XML declaration,
then the XML default encoding="utf-8" applies, no matter what any meta
tags might try to say. On the other hand, this is an area where we can
see confused browsers too. If I serve an XHTML 1.0 document to Opera,
for example, as text/html, it ignores the XML declaration. Apparently
it thinks that serving XHTML as text/html is not kosher and mishandles
it as "tag soup".

The practical conclusion is that especially when you play with multiple
encodings, you shouldn't play the XHTML game at the same time.

Specifying the encoding in a <meta> tag works tolerably well _as long
as_ it does not conflict with information sent otherwise, in HTTP
headers or in XML declaration.
At least, the webserver could be (or should be?) configured in a
way that it could serve the proper character encoding when meeting
a link coded in this example/manner in, say, an English document:

<a href="path/RussianFilename.html" charset="koi8-r"
hreflang="ru">Russian exposition in Moscow</a>

No, the charset attribute is just informative, intended to help the
browser decide what to do (e.g., to inform the user that the link he
wants to follow is _probably_ in an encoding that the browser has some
difficulties with). If the document is available, under the same URL,
in different encodings via content negotiation mechanism, then this
takes place between the browser and the server - I don't see how a
charset attribute in HTML could help here.
browser should notify in advance the webserver to serve that
document in iso-8859-2 character encoding.

No, that's not the idea. What the browser prefers depends on the
browser, not the link markup.
 
T

Thomas Mlynarczyk

Also sprach Jukka K. Korpela:
But by XHTML rules, the meta element must be overridden, if it
conflicts with the XML declaration ("prolog"). It's confusing:
http://www.w3.org/TR/html/#C_9
The logical interpretation is that if there is no XML declaration,

....then it could neither override anything nor take precedence over
anything, because something which is not there cannot do anything. *That*
would seem logical to me. But...
then the XML default encoding="utf-8" applies, no matter what any meta
tags might try to say.

In other words: If you want to change the charset to something else than
UTF-8, you can do so only via HTTP header or xml declaration and the meta
http-equiv can thus be considered deprecated and useless?
Specifying the encoding in a <meta> tag works tolerably well _as long
as_ it does not conflict with information sent otherwise, in HTTP
headers or in XML declaration.

And "no XML declaration" === "XML declaration specifying UTF-8"?
 
T

Thomas Mlynarczyk

Also sprach Jukka K. Korpela:
Much time has been wasted by thinking otherwise. If you need to ask
how to use XHTML 1.0, it's surely not time for you to move to it. It's
especially pointless to convert old documents into XHTML just for the
sake of conversion.

Alright, then be it HTML4.01 Strict and bye-bye to all my XML ambitions...
:-(
 
T

Thomas Mlynarczyk

Also sprach Mark Parnell:
No, the browser will only use what is in the meta element if the
server doesn't send anything.

In which case the meta element would be necessary.
If the server sends a charset then the
meta should be ignored.

So it would do no harm if it was present?
But it could cause problems if the meta
charset happened to be different to the http charset.

I promise that will not happen.
Having said that, there *are* arguments for specifying it in both -
e.g. for local testing (if you don't have a server running locally)
or for validation via upload.

Or if someone saves my page locally?
 
T

Thomas Mlynarczyk

Also sprach Andreas Prilop:
As explained here and in <many times: Give to the HTTP header what belongs to the HTTP header.

But that's what I would be doing: Charset specified in HTTP header. And
simply repeated in the meta element.
 
T

Thomas Mlynarczyk

Also sprach Mark Parnell:
I hope you're not saying this is a good thing. Doctype sniffing is an
Evil Thing[TM], and should be condemned to the depths it came from.

But as long as it is there, the best approach would be to force "all"
browsers into standards compliant mode. B.T.W., why do browsers like Mozilla
have a "quirks mode"? In which way does it differ from their standards
compliant mode?
 
J

Jukka K. Korpela

DU said:
I do not include lang="de" here; if one day I want to convert the
file to XHTML 1.1, then I'm closer/readier that way.

Why would anyone convert anything to XHTML 1.1, which is just an
exercise in futility, except that it confuses some people, so that it's
worse than futile? And I'm afraid you are farther from actual use of
language information - those few programs that recognize language
markup may well know lang but not xml:lang.
I include a few other <meta>'s myself:

There's such a thing as too much markup.
<meta http-equiv="Content-Language" content="de" />

Pointless. There's a well-defined mechanism for declaring the document
language in HTML itself. Any software that does not use it will not be
a success in language support.
<meta http-equiv="Content-Style-Type" content="text/css" />

That hackery is what the specification tells you to use if you have
style attributes. But it's completely useless, since browsers default
to text/css anyway. Does any browser actually use such a tag for
anything?
<meta http-equiv="Content-Script-Type" content="text/javascript" />

Does any browser actually use such a tag for anything? Besides, if you
think you are doing the Right Thing, think again. There is no
registered Internet media type text/javascript. And you use some
x-type, browsers may think you're using some fancy scripting language
and ignore your scripts.
<meta http-equiv="date" content="2004-01-28T09:54:03+08:00" />

Would that really make sense? Assuming you could really make your
documents contain the _correct_ timestamp, namely the one corresponding
to the moment of time of sending the document, what good would it do?
If the server sends an actual Date header, yours will be ignored. If
not, the browser will probably not use the Date for anything. (And you
may have some difficulties, since your server misbehaves. Except for a
few specific exceptions, the server MUST include a Date header. If a
server violates this, can you trust on its sending your HTML document,
including your meta tag, correctly?)

But if it does, things get really weird, since you have specified an
"HTTP equivalent" for a header name that has actually been defined in
the HTTP protect, and you've done it _wrong_. Your time stamp violates
a MUST requirement in the protocol.
<meta http-equiv="imagetoolbar" content="no" />
This one is to avoid the annoyance of the MSIE image toolbar.

I have difficulties in deciding which is worse, the image toolbar or
authors' attempts to fight against it. If it disturbs me, when looking
at an image, that a toolbar appear, I will move the mouse. But if I
really want to use it, I have no simple cure if an author has removed
the toolbar.
 
T

Toby A Inkster

Jukka said:
Would that really make sense?

DU would probably do better with:

<meta name="DC.date.created"
content="2004-01-28T09:54:03+08:00"
scheme="W3CDTF"
/>

There are other DC.date refinements too, such as DC.date.modified.
 
A

Andy Dingley

So it would be a quite high price for the comfort of writing stuff like
<person>...</person> instead of <span class="person">...</span>.

Why would one do that anyway ?

<person> markup just isn't much use. We've had the "tag soup web"
(plain text with colours), we've had the "HTML web" (compliant
rendering standards, but no sematics).

Now we're looking at semantic webs and how to mark-up the concept of
"person" in a way that can also co-exist with the rendering of pages
by browsers uninterested in semantics. There are two schools of this;
XML & non-XML . The "XML web" would use <person>

However it's abundantly clear (to those who've tried it, at least)
that XML has limitations. It's an admittedly powerful channel, but
it's a narrow channel between well-defined endpoints. You can't expect
to build a semantically complex "web" in XML and then have it
understood by any passing spider. XML conflates structure and meaning,
which imposes such restriction on the available expression of meaning
and the means to convey its interpretation that it limits XML's
overall capabilities for building a true "semantic web".

To build anything useful and global in scope, we're going to need more
than XML alone. It would be excessive to claim that this _must_ be
RDF, but it's certainly something with a comparable data model. We
must work with it at this level, not at the serialisation. Seen as
triples, <person> and <span class="person"> are equivalent. With
half-decent selectors, they're even equivalent at the CSS level.

The differences though are that a pure-XML implementation via <person>
has already been demonstrated to not have the wide-scale workability
that solutions (such as RDF) taking the form of <span class="person">
have done. We _need_ more than smple element names can deliver.

At the level of the single element of course, this is hard to
demonstrate, but I think the "XML vs post-XML" techniques are now ell
enough known to not require a long post at this time.
 
D

DU

Jukka said:
It depends on the _browser_, not the server. Initially it was designed
to be read and processed by servers, but this never happened (except in
rare cases); instead, browsers started reading and applying it.

But by XHTML rules, the meta element must be overridden, if it
conflicts with the XML declaration ("prolog"). It's confusing:
http://www.w3.org/TR/html/#C_9
The logical interpretation is that if there is no XML declaration,
then the XML default encoding="utf-8" applies, no matter what any meta
tags might try to say. On the other hand, this is an area where we can
see confused browsers too. If I serve an XHTML 1.0 document to Opera,
for example, as text/html, it ignores the XML declaration. Apparently
it thinks that serving XHTML as text/html is not kosher and mishandles
it as "tag soup".

The practical conclusion is that especially when you play with multiple
encodings, you shouldn't play the XHTML game at the same time.

Specifying the encoding in a <meta> tag works tolerably well _as long
as_ it does not conflict with information sent otherwise, in HTTP
headers or in XML declaration.




No, the charset attribute is just informative, intended to help the
browser decide what to do (e.g., to inform the user that the link he
wants to follow is _probably_ in an encoding that the browser has some
difficulties with). If the document is available, under the same URL,
in different encodings via content negotiation mechanism, then this
takes place between the browser and the server - I don't see how a
charset attribute in HTML could help here.




No, that's not the idea. What the browser prefers depends on the
browser, not the link markup.


Jukka, could you do me a favor? Just go over a few pages at/from

http://www.geocities.com/Area51/Realm/8655/ENGLISH/Main_ENG.htm

and tell me if there is anything I should change, modify regarding
language/charset/meta issues (only the language/internationalization
stuff; I'll handle the rest as I have already a long to-do list on the
site). I'm pretty sure (I want to believe this) I have done all I could
do to internationalize that site to the best of my abilities. I would
appreciate and defintively respect your opinions on this.
I'm in the process of updating that whole site and moving it elsewhere
(without ad banners).

DU
 
S

Steve Pugh

DU said:
Jukka, could you do me a favor?

Hope you don't mind other people butting in? ;-)
Just go over a few pages at/from

http://www.geocities.com/Area51/Realm/8655/ENGLISH/Main_ENG.htm

and tell me if there is anything I should change, modify regarding
language/charset/meta issues

You've made a classic error of associating flags with languages. A
flag represents a country. Many countries speak more than one
language, and most languages are spoken in more than one country.

As you're on Geocities at the moment I guess that you have no option
to use Content Negotiation to automatically take the user to their
prefered langauge option. Something to look into when you move to a
new server.

I'll now let the experts have their say.

Steve
 
D

DU

Steve said:
Hope you don't mind other people butting in? ;-)




You've made a classic error of associating flags with languages. A
flag represents a country. Many countries speak more than one
language, and most languages are spoken in more than one country.

As you're on Geocities at the moment I guess that you have no option
to use Content Negotiation to automatically take the user to their
prefered langauge option. Something to look into when you move to a
new server.

I'll now let the experts have their say.

Steve

True. I need and *want* to change that. But, as a modest defense for
what is right now, I do edit the title attribute of those flags. The
sitemap also "translate" the parts' title on a mouseover of the flag.
Not ideal but not entirely stupid. Anyway, I want to remove all flags
and replace them with 3-letter language code; also, I want to remove the
DHTML sitemap entirely too.
The top headline of this page

http://www.texturizer.net/firebird/index.html

is something excellent toward which I'm trying to base the switching to
translated sub-sites for my site.

DU
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top