Controlling whitespace in XSL output - tutorial anywhere?

S

Simon Brooke

I've been aware for a long time that I have been missing tricks on
control of whitespace in XSL output. I strongly dislike situations
where whitespace is significant, but I'm increasingly hitting them. In
a problem I'm working on at present I have an XSL document (amed-
dbdef.xsl) which transforms one XML document (amendments.xml) into an
XSL document (amend-dbdef.xsl) which then transforms an automatically
generated database schema into a hibernate mapping:

<target name="amend-dbd" depends="parsedbd"
description="update the automatically generated database
definition with amendments">
<style style="${transforms}/amend-dbdef.xslt" destdir="${tmpdir}"
extension=".auto.xslt" in="BusinessModel/amendments.xml"/>
<style style="${tmpdir}/amendments.auto.xslt" destdir="${tmpdir}"
extension=".amended.xml" in="${mapping-generator}/database-
definition.xml"/>
</target>

<target name="nhibernate" depends="amend-dbd"
description="compiles hibernate mapping from the compiled
database description">
<style style="${mapping-generator}/dbdef2hibernate-mapping.xsl"
destdir="${tmpdir}"
extension=".hbm.auto.xml">
<infiles basedir="${tmpdir}">
<include name="database-definition.amended.xml"/>
</infiles>
</style>
<copy file="${tmpdir}/database-definition.amended.hbm.auto.xml"
tofile="${entities}/hibernate-mapping.auto.hbm.xml"/>
</target>

The problem with this is that NHibernate chokes on whitespace inside
certain elements. In consequence I'm currently using 'indent="no"' in
my xsl-output directive on the dbdef2hibernate-mapping.xsl transform,
even though it makes the output code much harder to read. I'd much
rather be able to direct the XSL processor to tag-minimise empty tags
on output, but I don't know how to do this.

Another problem that I've been aware of for some time but which has
become a more critical issue in this process is white space inside
attribute values. For example,

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xslo="http://www.w3.org/1999/XSL/TransformAlias">
[...]
<xsl:template match="entity[@ignore='true']">
<xslo:template>
<xsl:attribute name="match">entity[@name='<xsl:value-of
select="@name"/>']</xsl:attribute>
<xslo:comment>
Ignore table '<xsl:value-of select="@name"/>'
</xslo:comment>
</xslo:template>
</xsl:template>

generates code which is semantically different from

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xslo="http://www.w3.org/1999/XSL/TransformAlias">
[...]
<xsl:template match="entity[@ignore='true']">
<xslo:template>
<xsl:attribute name="match">
entity[@name='<xsl:value-of select="@name"/>']
</xsl:attribute>
<xslo:comment>
Ignore table '<xsl:value-of select="@name"/>'
</xslo:comment>
</xslo:template>
</xsl:template>

I'd like to be able to tell the XSL processor to normalise space in
all attribute values - that is, trim out leading and trailing white
space - but again I don't know how to.

Can anyone give me pointers? Is there a good tutorial on control of
whitespace on the Web? I have searched, but I haven't found one.

Cheers

Simon
 
S

ssamuel

Simon,

You may want to consider writing a second template that matches all
attributes. Create copies, but normalize space. The only issue you
face is that you may hog nodes that would normally have been handled
by other templates.

Ask if you need more help.


s}


I've been aware for a long time that I have been missing tricks on
control of whitespace in XSL output. I strongly dislike situations
where whitespace is significant, but I'm increasingly hitting them. In
a problem I'm working on at present I have an XSL document (amed-
dbdef.xsl) which transforms one XML document (amendments.xml) into an
XSL document (amend-dbdef.xsl) which then transforms an automatically
generated database schema into a hibernate mapping:

<target name="amend-dbd" depends="parsedbd"
description="update the automatically generated database
definition with amendments">
<style style="${transforms}/amend-dbdef.xslt" destdir="${tmpdir}"
extension=".auto.xslt" in="BusinessModel/amendments.xml"/>
<style style="${tmpdir}/amendments.auto.xslt" destdir="${tmpdir}"
extension=".amended.xml" in="${mapping-generator}/database-
definition.xml"/>
</target>

<target name="nhibernate" depends="amend-dbd"
description="compiles hibernate mapping from the compiled
database description">
<style style="${mapping-generator}/dbdef2hibernate-mapping.xsl"
destdir="${tmpdir}"
extension=".hbm.auto.xml">
<infiles basedir="${tmpdir}">
<include name="database-definition.amended.xml"/>
</infiles>
</style>
<copy file="${tmpdir}/database-definition.amended.hbm.auto.xml"
tofile="${entities}/hibernate-mapping.auto.hbm.xml"/>
</target>

The problem with this is that NHibernate chokes on whitespace inside
certain elements. In consequence I'm currently using 'indent="no"' in
my xsl-output directive on the dbdef2hibernate-mapping.xsl transform,
even though it makes the output code much harder to read. I'd much
rather be able to direct the XSL processor to tag-minimise empty tags
on output, but I don't know how to do this.

Another problem that I've been aware of for some time but which has
become a more critical issue in this process is white space inside
attribute values. For example,

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xslo="http://www.w3.org/1999/XSL/TransformAlias">
[...]
<xsl:template match="entity[@ignore='true']">
<xslo:template>
<xsl:attribute name="match">entity[@name='<xsl:value-of
select="@name"/>']</xsl:attribute>
<xslo:comment>
Ignore table '<xsl:value-of select="@name"/>'
</xslo:comment>
</xslo:template>
</xsl:template>

generates code which is semantically different from

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xslo="http://www.w3.org/1999/XSL/TransformAlias">
[...]
<xsl:template match="entity[@ignore='true']">
<xslo:template>
<xsl:attribute name="match">
entity[@name='<xsl:value-of select="@name"/>']
</xsl:attribute>
<xslo:comment>
Ignore table '<xsl:value-of select="@name"/>'
</xslo:comment>
</xslo:template>
</xsl:template>

I'd like to be able to tell the XSL processor to normalise space in
all attribute values - that is, trim out leading and trailing white
space - but again I don't know how to.

Can anyone give me pointers? Is there a good tutorial on control of
whitespace on the Web? I have searched, but I haven't found one.

Cheers

Simon
 
J

Joe Kesselman

Mike Kay's book has a good section on how XSLT treats whitespace, and
how to exercise some control over it. (I still think that book is the
best hardcopy XSLT reference I've seen -- it was good enough to keep me
from writing one.)
>NHibernate chokes on whitespace inside certain elements. In consequence
>I'm currently using 'indent="no"' in my xsl-output directive

That's always the right choice unless you are formatting the document
more for human-readability than machine-readability. Indentation, by
definition, changes the content of the document. If you're generating
HTML or something else where the whitespace doesn't matter, great;
otherwise this is a Bad Thing and you Shouldn't Do It.

There are things you can do to hint whether spaces are intended to be
meaningful or not -- the xml:space attribute, and/or the
xsl:preserve-space and xsl:strip-space directives in the stylesheet. You
may be able to use these to explicitly retain whitespace in those places
where it actually matters and discard it where it doesn't.
>white space inside attribute values.

If the attribute type is not CDATA, XML will automatically normalize
spaces. Otherwise, it's your stylesheet's responsibility to extract only
the portion of the data you want to preserve.
 
P

Pavel Lepin

Joe Kesselman said:
If the attribute type is not CDATA, XML will automatically
normalize spaces. Otherwise, it's your stylesheet's
responsibility to extract only the portion of the data you
want to preserve.

However, non-validating parsers are recommended, but not
required, to treat all attributes as CDATA. Surprised me to
no end. Those damned lawyers!

To the OP: you can always set up a two stage transformation
(especially if you're using an XSLT2 processor) to deal
with your whitespace problems, or perhaps solve some of
them by validation. But the right thing to do would be
generating the document with the correct semantics straight
away.
 
R

Richard Tobin

However, non-validating parsers are recommended, but not
required, to treat all attributes as CDATA.

Sort of.

All XML parsers are required to normalise attributes according to the
declaration (if any) that they have read for that attribute.
Non-validating parsers are not required to read the external subset
(though many do), and so they may not have read declarations for all
attributes. And validating parsers will be in the same position when
given invalid documents that lack declarations.

In this situation the parser SHOULD treat the attribute as being of
type CDATA. Why SHOULD rather than MUST? I think it's so that
applications that "know" the type (because they only handle one kind
of document) can do the right thing even without a declaration.

Note that attribute declarations in the internal must be read and
acted upon even by non-validating parsers.

-- Richard
 
S

Simon Brooke

Mike Kay's book has a good section on how XSLT treats whitespace, and
how to exercise some control over it. (I still think that book is the
best hardcopy XSLT reference I've seen -- it was good enough to keep me
from writing one.)

Thanks. I'll have a look for that.
That's always the right choice unless you are formatting the document
more for human-readability than machine-readability. Indentation, by
definition, changes the content of the document. If you're generating
HTML or something else where the whitespace doesn't matter, great;
otherwise this is a Bad Thing and you Shouldn't Do It.

That's all very well, but in my experience it's a great benefit to
the
developer (and an even greater benefit to subsequent maintainers)
if intermediate results are human readable and self-documenting.

I'd really greatly prefer it if things (i.e. not XML parsers
themselves, but
the things which consume their output) which use XML formats for
configuration and so on would treat leading and trailing whitespace
as
insignificant, and treat a node which contains nothing but whitespace
as an empty node. However, Hibernate (and many other things, to be
fair)
does not do this. I agree that's a Hibernate issue and not an XML
one.
There are things you can do to hint whether spaces are intended to be
meaningful or not -- the xml:space attribute, and/or the
xsl:preserve-space and xsl:strip-space directives in the stylesheet. You
may be able to use these to explicitly retain whitespace in those places
where it actually matters and discard it where it doesn't.


If the attribute type is not CDATA, XML will automatically normalize
spaces. Otherwise, it's your stylesheet's responsibility to extract only
the portion of the data you want to preserve.

Aye. However, in this circumstance I don't own the schema, so I can't
fix it.
 
J

Joseph Kesselman

Simon said:
I'd really greatly prefer it if things (i.e. not XML parsers
themselves, but
the things which consume their output) which use XML formats for
configuration and so on would treat leading and trailing whitespace
as
insignificant, and treat a node which contains nothing but whitespace
as an empty node. However, Hibernate (and many other things, to be
fair)
does not do this. I agree that's a Hibernate issue and not an XML
one.

Yep. You really can't expect most data-oriented (as opposed to
document-markup-oriented) XML tools to be optimized for
human-readability. The usual solution is to put the document through a
stylesheet before presenting it to humans, and/or to tell them to use an
editing tool which presents the data in a more human-friendly form,
rather than to try to drive that requirement back into the XML markup.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top