obscure problem using elementtree to make xhtml website

L

Lee

Elementtree (python xml parser) will transform markup like

<tag boo="baa"></tag>

into

<tag boo="baa" />

which is a reasonable thing to do for xml (called minimization, I
think).

But this caused an obscure problem when I used it to create the xhtml
parts of my website,
causing Internet Explorer to display nearly blank pages. I explain the
details at

http://lee-phillips.org/scripttag/

and am writing here as a heads-up to anyone who might be using a
workflow similar to mine: writing documents in xml and using python
and elementtree to transform those into xhtml webpages, and using the
standard kludge of serving them as text/html to IE, to get around the
latter's inability to handle xml. I can't be the only one (and I doubt
this problem is confined to elementtree).


Lee Phillips
 
D

David Smith

Lee said:
Elementtree (python xml parser) will transform markup like

<tag boo="baa"></tag>

into

<tag boo="baa" />

which is a reasonable thing to do for xml (called minimization, I
think).

But this caused an obscure problem when I used it to create the xhtml
parts of my website,
causing Internet Explorer to display nearly blank pages. I explain the
details at

http://lee-phillips.org/scripttag/

and am writing here as a heads-up to anyone who might be using a
workflow similar to mine: writing documents in xml and using python
and elementtree to transform those into xhtml webpages, and using the
standard kludge of serving them as text/html to IE, to get around the
latter's inability to handle xml. I can't be the only one (and I doubt
this problem is confined to elementtree).


Lee Phillips

It's not just Elementtree that does this .. I've seen others libraries
(admittedly in other languages I won't mention here) transform empty
tags to the self-terminating form. A whitespace text node or comment
node in between *should* prevent that from happening. AFAIK, the only
tag in IE xhtml that really doesn't like to be reduced like that is the
<script > tag. Firefox seems to be fine w/ self-terminating <script />
tags. At any rate, I tend to put a comment node in between the begin
and end to prevent the reduction:

<script src=" ... " type="text/javascript"><!-- --></script>

--David
 
L

Lee

I went with a space, but a comment is a better idea.

I only mention the <script> tag in my article, for brevity, but I had
the same problem with the <object> tag: basically any tag that can
have content in html you had better close the html way (<tag></tag>),
or IE will see it as unclosed and will not display the rest of the
page after the tag (or do something else unexpected). Not a bug in IE
(this time), which is correctly parsing the file as html.


Lee
 
L

Lee

I went with a space, but a comment is a better idea.

I only mention the <script> tag in my article, for brevity, but I had
the same problem with the <object> tag: basically any tag that can
have content in html you had better close the html way (<tag></tag>),
or IE will see it as unclosed and will not display the rest of the
page after the tag (or do something else unexpected). Not a bug in IE
(this time), which is correctly parsing the file as html.


Lee
 
S

Stefan Behnel

Lee said:
basically any tag that can
have content in html you had better close the html way (<tag></tag>),
or IE will see it as unclosed and will not display the rest of the
page after the tag (or do something else unexpected). Not a bug in IE
(this time), which is correctly parsing the file as html.

.... which is obviously not the correct thing to do when it's XHTML.

Stefan
 
R

Rami Chowdhury

basically any tag that can
... which is obviously not the correct thing to do when it's XHTML.

Not correct, of course, but AFAIK it's a very common hack indeed.

If the goal is to produce XHTML that will work as text/html, have you
considered using one of the myriad templating libraries? IIRC a lot (if
not most) of them support "HTMLish" output for precisely that reason.
 
S

Stefan Behnel

Richard said:
It isn't though; it's HTML with a XHTML DOCTYPE

Not the page I look at (i.e. the link provided by the OP). It clearly has
an XHTML namespace, so it's X(HT)ML, not HTML.

Stefan
 
N

Nobody

Not the page I look at (i.e. the link provided by the OP). It clearly has
an XHTML namespace, so it's X(HT)ML, not HTML.

It depends upon your User-Agent header.

By default, it returns a Content-Type of application/xhtml+xml, so it
should be parsed as XML, i.e. <script /> should be treated as
<script></script>.

But if the User-Agent header indicates MSIE, it returns a Content-Type of
text/html, which should be parsed as HTML, where <script /> won't work.

XHTML can be either HTML or XML, and it makes a difference as to whether
you parse it as HTML or XML. If you want to create a document which parses
the same way in either case, you must adhere to the compatibility
rules in Appendix C of the XHTML standard, which means (amongst other
things) not minimising tags which can have content (i.e. not EMPTY),
regardless of whether or not they do have content.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,481
Members
44,900
Latest member
Nell636132

Latest Threads

Top