very basic question about xhtml

J

John Salerno

Ok, I understand that XHTML and HTML are basically the same, but that
XHTML requires a stricter structure, which it inherits from XML. But
what is meant when someone says that XHTML *is* XML? I know that HTML is
an SGML language, and XHTML is an XML language, but what does that
really mean? It seems to suggest that I can create my own tags in XHTML
(since you can in XML), but I doubt that's correct. Is it safe to say
that XHTML is a completely new language based on XML that just happens
to have all the same-named tags as HTML?

Thanks.
 
J

Jukka K. Korpela

John Salerno said:
Ok, I understand that XHTML and HTML are basically the same,

They aren't. The first thing to know about XHTML is that it is futile or
worse at present as a document delivery format on the WWW.
but that XHTML requires a stricter structure,

It doesn't.
which it inherits from XML.

It doesn't. XML is a more restricted metalanguage than SGML, so many many
constructs that are valid XHTML are invalid HTML. For example,
But what is meant when someone says that XHTML *is* XML?

Something pointless.
I know that HTML is an SGML language,

Formally only.
It seems to suggest that I can create my own tags in XHTML
(since you can in XML),

You can create your own tags in SGML just as well as in XML. Actually better.
Of course, HTML specifications are closed, so inventing your own tags takes
you outside HTML, no matter what the metalanguage is.
Is it safe to say
that XHTML is a completely new language based on XML that just happens
to have all the same-named tags as HTML?

No. There's nothing safe in XHTML. People just hurt themselves when they
start playing with XHTML.
 
D

David Dorward

John said:
Ok, I understand that XHTML and HTML are basically the same, but that
XHTML requires a stricter structure, which it inherits from XML.

Stricter is a bit debatable. XML doesn't have as many places that you are
allowed to ignore a rule (e.g. In HTML the end tag for a p element is
optional), but nor is the DTD as expressive (so a validator can't spot the
But what is meant when someone says that XHTML *is* XML? I know that HTML
is an SGML language, and XHTML is an XML language, but what does that
really mean?

XML defines some basic rules. Add some more rules and you have XHTML. Add
some different rules and you have RSS. Different rules again and you get
SVG.
It seems to suggest that I can create my own tags in XHTML

You can't.
(since you can in XML)

You can create your own XML based markup language with tags of your choice.
(Although it isn't of much practical value on the WWW).
Is it safe to say that XHTML is a completely new language based on XML
that just happens to have all the same-named tags as HTML?

Pretty much.
 
C

cwdjrxyz

John said:
Ok, I understand that XHTML and HTML are basically the same, but that
XHTML requires a stricter structure, which it inherits from XML. But
what is meant when someone says that XHTML *is* XML? I know that HTML is
an SGML language, and XHTML is an XML language, but what does that
really mean? It seems to suggest that I can create my own tags in XHTML
(since you can in XML), but I doubt that's correct. Is it safe to say
that XHTML is a completely new language based on XML that just happens
to have all the same-named tags as HTML?

XHTML is a language in transition. The W3C is working on newer versions
to help make XHTML be XML pure, but this will require a new generation
of browsers. The main reason for XHTML is an attempt to establish XML
purity for html, since there are now many other computing devices than
PCs, and XML is the desired common language for exchange of
information. However this transition is extremely difficult because
HTML conventions used on PCs have been around for so long. It is much
like forcing everyone to start driving on the right side of the road
when they have been driving on the left side of the road all of their
life.

There are many XML languages, and not all of these support the same
things. For example SMIL2 is a special XML language for media
presentations. I doubt if it would allow you to create your own tags.
Likewise, the XML you may now use as an island on a web page likely
will not support many SMIL tags, and forcing it to do so would be very
difficult, if not impossible.

You must keep in mind that despite perfect XHTML 1.1 code that you
might write, it will only be served as HTML unless you associate the
mime type application/xhtml+xml with it on your server. In that case
IE6 can not view the page since it does not support the mentioned mime
type. You then must write separate pages for IE6 and most other recent
browsers, or you must use a PHP include, or something else of the sort,
at the very top of the page at the header level. If the header
browser/server exchange says application/xhtml+xml support is possible,
then the code above the head tag of the page is written as XHTML 1.1,
for example. Else the information is written as HTML 4.01 strict for
example. Also you must use a regular expression in the PHP code to
convert <br /> to <br> etc for html 4.01 strict. You must then copy the
code on viewing the page on IE6, for example, and take it to the W3C
validator, paste it in the text box there, and validate to make certain
the automatic PHP conversion from XHTML 1.1 to HTML 4.01 strict has
worked properly. This is not as difficult as it might appear after you
have done it once as all of the PHP can just be in an include file
which you can call as an external file. If you write pages with a lot
of script, there are many special considerations for true XHTML, and in
some cases you are forced to replace the browser side script with
server side PHP.

The moral of the story is that to write in XHTML and then just serve it
as HTML, as many do is pointless, and you are better off just using
HTML 4.01 strict.
 
T

Toby Inkster

John said:
Is it safe to say that XHTML is a completely new language based on XML
that just happens to have all the same-named tags as HTML?

More or less -- though the way you say it sounds like this was an
accident. It was deliberately designed to be as close to HTML 4.01
as possible without violating any of the rules of XML.
I know that HTML is an SGML language, and XHTML is an XML language

If you want to really blow your mind, XML is a subset of SGML.
 
A

Andy Dingley

Ok, I understand that XHTML and HTML are basically the same, but that
XHTML requires a stricter structure, which it inherits from XML. But
what is meant when someone says that XHTML *is* XML?

XML is a lower-level protocol for defining things than XHTML is.

XML defines syntactic aspects of the language, such as there being tags,
tags being delimited by "<" and ">" characters, tag names being
case-sensitive, elements being represented by paired or empty tags etc.

What XML doesn't define is the set of tags that can be used, and how
they may be combined. This is defined by XHTML. The overall language
needs an understanding of both XML and XHTML to specify it, and an
understanding of SGML and HTML _as_well_ to really use it wisely.

HTML pre-dates XML and does not use it. HTML is based on SGML in much
the same way that XHTML is built from XML, but HTML actually uses a
subtly simplified SGML that is no longer strictly SGML compliant. For
this reason the HTML specification blurs the distinction a little
between them and must describe things that are part of HTML, even though
a purer form might just have been able to leave those as covered already
by the SGML spec. In general though, ignore this - for most useful
purposes, HTML is built on SGML in just the same way that XHTML is built
on XML.

The current web is not ready for XHTML as an XML language. There is a
kludge by which XHTML may be treated as a HTML language and used by
currently existing browsers (Appendix C of the XHTML 1.0 spec). This
works and is useful, although many hereabouts will claim that it
doesn't.

XHTML is also extremely useful as a pure XML language within your onw
tools, such as content-management systems (CMS). It is _much_ simpler to
build these using XML than to use SGML. It is also easy to turn theis
output into HTML for use on the public web.

It is not possible to extend XHTML. XHTML is already defined by its
specification and DTD. It is possible to extend "XML documents on the
web that are based on XHTML", which is in all practical terms the same
thing you are asking for. However (a terminology distinction) you are
then going to make a document _composed_of_ XHTML and some other XML
schema, you are not extending XHTML itself. An easy technique for this
is XML namespacing.

However (the bad news) this is an XML technique and so only works with
XHTML documents that are XML documents, not the Appendix C XHTML non-XML
documents we've already mentioned as being the only ones that are yet
ready for use on the web. You can still use these techniques, but it's
not simple, much of your audience may have problems with them, and
compatibility issues are significant.
 
A

Andy Dingley

The W3C is working on newer versions
to help make XHTML be XML pure,

In what way was the first XHTML 1.0 version (trans, strict or framest)
not already "XML pure" ? Your comment is meaningless.
 
J

John Salerno

cwdjrxyz said:
The moral of the story is that to write in XHTML and then just serve it
as HTML, as many do is pointless, and you are better off just using
HTML 4.01 strict.

Wow, I didn't realize that so many people were actually against using
XHTML in place of HTML. I would think that since it is the newest
technology, that it would be desired over HTML. I know it's been around
for 5 years, which is why it surprises me that browsers still don't
support it well. Now I'm confused all over again. I thought the decision
to use XHTML was a no-brainer, but now I'm reading a lot of posts
(including in this thread!) that seem to suggest XHTML isn't a good
thing right now...
 
C

cwdjrxyz

John said:
Wow, I didn't realize that so many people were actually against using
XHTML in place of HTML. I would think that since it is the newest
technology, that it would be desired over HTML. I know it's been around
for 5 years, which is why it surprises me that browsers still don't
support it well. Now I'm confused all over again. I thought the decision
to use XHTML was a no-brainer, but now I'm reading a lot of posts
(including in this thread!) that seem to suggest XHTML isn't a good
thing right now...

You are quite right that many are against using XHTML at all. I am not
one of these and now write and serve most new pages as true xhtml 1.1
to browsers that will accept it and as html 4.01 strict to IE6 and
other browsers that do not say they will accept the mime type
application/xhtml+xml in the header exchange. There are a few lesser
used browsers that will not tell you what they can use in the header
exchange. For that case I err on the safe side and serve html 4.01
strict to avoid possible lock out of the browser. Recent Safaris appear
to fall into this group, although they really will accept proper XHTML
- they just do not tell you they will and browser name and version
detection is not very safe in these days when browsers often spoof one
another. I am very much against serving XHTML as application/html as
many are doing, often without being aware of it. This often does no
practical harm, but to do so is just a waste of time because you are
really just using HTML. Also true XHTML is not as forgiving as HTML. If
you view a page on a true XHTML aware browser such as Opera or one of
the Mozilla family, the page is parsed as XML. Even the most small XML
error, such as an unclosed tag, often causes the page not to show and
gives you an XML parse error message instead. We have all seen HTML
pages full of such small errors that still display fairly well, at
least on some browsers.
 
J

John Salerno

cwdjrxyz said:
You are quite right that many are against using XHTML at all. I am not
one of these and now write and serve most new pages as true xhtml 1.1
to browsers that will accept it and as html 4.01 strict to IE6 and
other browsers that do not say they will accept the mime type
application/xhtml+xml in the header exchange.

But is this an issue if you use XHTML 1.0 instead?
 
J

John Salerno

cwdjrxyz said:
You are quite right that many are against using XHTML at all.

But is it much of an issue if it's for simple sites? It seems like the
only real difference between HTML and XHTML is the " />" closing tag
(assuming you write tags in lowercase and properly nest everything). Is
the big debate more about doing a lot of advanced things in HTML vs. XHTML?
 
C

cwdjrxyz

John said:
But is this an issue if you use XHTML 1.0 instead?

So far as I know you are not using true XHTML unless you serve as
application/xhtml+xml, even for the most loose form which is XHTML 1.0
transitional. However, as I said in a previous post, many XHTML pages
served incorrectly as application/html often will work. The only thing
I can see you might gain by serving XHTML as HTML would be if you
intended to serve the pages correctly soon when you get a new host or
something of the sort. But this need be no great problem. There are
programs out there that will automatically convert a proper HTML 4.01
strict page to XHTML. Sometimes you have to touch the conversion up a
bit. But for not-so-simple pages loaded with script, etc you may have
much more work to do if you use any version of XHTML.

There are several who sometimes post in this group who are very anti
XHTML and who have long pages describing why. If any happen to read
this page, they likely will be more than happy to give you the URL of
one of their pages.
 
J

John Salerno

cwdjrxyz said:
So far as I know you are not using true XHTML unless you serve as
application/xhtml+xml, even for the most loose form which is XHTML 1.0
transitional.

Hmm, is there more to XHTML than just the small syntactic changes then?
The only differences I've read about so far are things like lowercase
tags, closing empty elements, etc. Is there anything wrong with writing
a website this way and calling it "XHTML"?
 
T

Toby Inkster

John said:
Wow, I didn't realize that so many people were actually against using
XHTML in place of HTML. I would think that since it is the newest
technology, that it would be desired over HTML. I know it's been around
for 5 years, which is why it surprises me that browsers still don't
support it well.

*Browsers* do. A certain component of the Microsoft Windows operating
system, which some people believe is a browser doesn't. :)
 
D

David Dorward

John said:
But is it much of an issue if it's for simple sites? It seems like the
only real difference between HTML and XHTML is the " />" closing tag

There aren't many browsers that get HTML correct, but those that do will
display a ">" character after every self-closing tag in an XHTML document
served as text/html. (since in HTML <foo /> means the same as <foo>&gt;)
 
D

David Dorward

But is this an issue if you use XHTML 1.0 instead?

XHTML 1.0 has Appendix C which includes some rules to make it "compatible"
with HTML, and if you follow those rules the you are allowed to serve it as
text/html.

The main problems are that there is a lack of tools on the market for
testing Appendix C conformance, that its too easy to do silly things like
comment out a style sheet (thanks to differences between XHTML and HTML),
and that Appendix C doesn't make the document HTML compatible, it makes it
compatible with HTML browsers which share certain common (but *not*
universal) bugs.
 
J

Jukka K. Korpela

John Salerno said:
Hmm, is there more to XHTML than just the small syntactic changes then?

Yes, there are several undocumented changes as well. They have been hidden
into the DTD, so you would need to be fluent in SGML and XML to know what
really happened.

On the other hand, it would not matter much. Browsers still mostly eat HTML
tag soup and don't care about syntax specifications.
 
J

John Salerno

David said:
There aren't many browsers that get HTML correct, but those that do will
display a ">" character after every self-closing tag in an XHTML document
served as text/html. (since in HTML <foo /> means the same as <foo>&gt;)

You mean if I use this new syntax to close an img or br tag, it will
show a > symbol?
 
M

Mark Parnell

Deciding to do something for the good of humanity, John Salerno
You mean if I use this new syntax to close an img or br tag, it will
show a > symbol?

If it's HTML rather than XHTML, and the browser is behaving according to
the specs, yes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top