XSL: I'm doing something wrong, and I can't see it!

S

Simon Brooke

This is supposed to be a very simple XSL stylesheet to strip styling
information out of HTML documents - it could not be more basic. And yet,
it doesn't work. I'm obviously getting something very basic wrong and for
the life of me I can't see it. Please, somebody, cast your eyes over this
and tell me what's wrong!

First, the XSL stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- strip out styling -->

<xsl:eek:utput indent="yes" method="xml"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>

<xsl:template match="style">
<xsl:comment>
Don't carry style through
</xsl:comment>
</xsl:template>

<xsl:template match="font">
<span>
<xsl:apply-templates/>
</span>
</xsl:template>

<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:for-each select="@*">
<xsl:choose>
<xsl:when test="name()='style'">
<!-- nothing -->
</xsl:when>
<xsl:when test="name()='STYLE'">
<!-- nothing -->
</xsl:when>
<xsl:eek:therwise>
<xsl:attribute name="{name()}"><xsl:value-of
select="."/></xsl:attribute>
</xsl:eek:therwise>
</xsl:choose>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

</xsl:stylesheet>

Now, an example HTML document to test it:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head lang="en" dir="ltr">
<title>Test document for destyler</title>
<style type="text/css">
BODY
{
font-family: cursive;
}
</style>
</head>

<body>
<h1>Test document for destyler</h1>

<p style="font-family: serif">
Test with 'style='
</p>

<p STYLE="font-family: serif">
Test with 'STYLE='
</p>
</body>
</html>

And the output after processing:

-[simon]-> xsltproc destyle.xsl test.html
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head lang="en" dir="ltr" xml:lang="en"><meta http-equiv="Content-Type"
content="text/html; charset=UTF-8" />
<title>Test document for destyler</title>
<style type="text/css" xml:space="preserve">
BODY
{
font-family: cursive;
}
</style>
</head>

<body>
<h1>Test document for destyler</h1>

<p>
Test with 'style='
</p>

<p>
Test with 'STYLE='
</p>
</body>
</html>

As you can see, the 'style' attributes are getting successfully stripped.
But the 'style' element is not stripped. The template

<xsl:template match="style">
<xsl:comment>
Don't carry style through
</xsl:comment>
</xsl:template>

simply never matches. I cannot see why not; my understanding of the
conflict resolution rule was that the most specific matching template
should be applied. I've tried this with two completely different XSL-T
implementations, the Gnome libxslt and Apache Xalan, and they behave
consistently with one another (as one would hope).

So what have I missed? What am I doing wrong?
 
M

Martin Honnen

Simon said:

XHTML elements are in the namespace http://www.w3.org/1999/xhtml, which
the document declares correctly. However your stylesheet does not take
that into account, change it e.g.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">
then use that prefix in your XPath expressions and in your XSLT match
patterns e.g.

<xsl:template match="xhtml:style">
<xsl:comment>
Don't carry style through
</xsl:comment>
</xsl:template>

<xsl:template match="xhtml:font">
<span>
<xsl:apply-templates/>
</span>
</xsl:template>

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="@style"/>
 
S

Simon Brooke

Martin said:
XHTML elements are in the namespace http://www.w3.org/1999/xhtml, which
the document declares correctly. However your stylesheet does not take
that into account, change it e.g.

D'oh! I knew it had to be something as simple as that! Many thanks...

And particular thanks for this:
<xsl:template match="@style"/>

which is an elegant trick I had not seen before.

You wouldn't have a neat solution for trimming out <p> and <div> tags which
contain only dodgy formatting stuff (<br>s and &nbsp;s), would you?
 
P

p.lepin

in message


And particular thanks for this:


which is an elegant trick I had not seen before.

You wouldn't have a neat solution for trimming out <p>
and <div> tags which contain only dodgy formatting stuff
(<br>s and &nbsp;s), would you?

I wouldn't call this neat, and it probably doesn't even
DWYM, but consider:

<xsl:template
match=
"
xhtml:*
[
self::xhtml:p or self::xhtml:div
]
[
not(text()[normalize-space()!='']) and
not(xhtml:*[not(self::xhtml:br)])
]
"/>

Worse yet, while Saxon-8B and Xalan grok it just fine,
xsltproc chokes on it with:

pavel@debian:~/dev/xslt$ xsltproc dodgy.xsl dodgy.xml
XPath error : Undefined namespace prefix
xmlXPathEval: evaluation failed

....for some reason I fail to grasp.
 
B

Bjoern Hoehrmann

* (e-mail address removed) wrote in comp.text.xml:
<xsl:template
match=
"
xhtml:*
[
self::xhtml:p or self::xhtml:div
]
[
not(text()[normalize-space()!='']) and
not(xhtml:*[not(self::xhtml:br)])
]
"/>
pavel@debian:~/dev/xslt$ xsltproc dodgy.xsl dodgy.xml
XPath error : Undefined namespace prefix
xmlXPathEval: evaluation failed

I am unable to reproduce this with the latest release version. If you
could check this with the latest version and it still fails, report it
at <http://bugzilla.gnome.org/enter_bug.cgi?product=libxslt>.
 
P

p.lepin

This post contains console session transcript, some lines
may be longer than 78 characters.

* (e-mail address removed) wrote in comp.text.xml:
<xsl:template
match=
"
xhtml:*
[
self::xhtml:p or self::xhtml:div
]
[
not(text()[normalize-space()!='']) and
not(xhtml:*[not(self::xhtml:br)])
]
"/>
pavel@debian:~/dev/xslt$ xsltproc dodgy.xsl dodgy.xml
XPath error : Undefined namespace prefix
xmlXPathEval: evaluation failed

I am unable to reproduce this with the latest release
version. If you could check this with the latest version
and it still fails, report it at
<http://bugzilla.gnome.org/enter_bug.cgi?product=libxslt>.

I'm sorry, but I'm quite reluctant to manually upgrade to
1.1.20, I'm relying on my pm too much. The error seems
reproducible with 1.1.17 and 1.1.19:

pavel@debian:~/dev/xslt$ cat dodgy.xml
<xhtml:root xmlns:xhtml="http://example.org/xhtml">
<xhtml:p>Dodgy formatting stuff</xhtml:p>
<xhtml:p> <xhtml:br/> </xhtml:p>
<xhtml:p> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </xhtml:p>
<xhtml:div>Dodgy formatting stuff</xhtml:div>
<xhtml:div> <xhtml:br/> </xhtml:div>
<xhtml:div> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </
xhtml:div>
</xhtml:root>
pavel@debian:~/dev/xslt$ cat dodgy.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://example.org/xhtml">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template
match=
"
xhtml:*
[
self::xhtml:p or self::xhtml:div
]
[
not(text()[normalize-space()!='']) and
not(xhtml:*[not(self::xhtml:br)])
]
"/>
</xsl:stylesheet>
pavel@debian:~/dev/xslt$ saxon -t dodgy.xml dodgy.xsl
Saxon 8.8J from Saxonica
Java version 1.5.0_08
Warning: at xsl:stylesheet on line 3 of file:/var/www/dev/xslt/
dodgy.xsl:
Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
Stylesheet compilation time: 449 milliseconds
Processing file:/var/www/dev/xslt/dodgy.xml
Building tree for file:/var/www/dev/xslt/dodgy.xml using class
net.sf.saxon.tinytree.TinyBuilder
Tree built in 4 milliseconds
Tree size: 35 nodes, 50 characters, 0 attributes
<?xml version="1.0" encoding="UTF-8"?><xhtml:root xmlns:xhtml="http://
example.org/xhtml">
<xhtml:p>Dodgy formatting stuff</xhtml:p>

<xhtml:p> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </xhtml:p>
<xhtml:div>Dodgy formatting stuff</xhtml:div>

<xhtml:div> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </
xhtml:div>
</xhtml:root>Execution time: 92 milliseconds
Memory used: 944320
NamePool contents: 17 entries in 17 chains. 8 prefixes, 9 URIs
pavel@debian:~/dev/xslt$ xalan -v

Xalan version 1.10.0
Xerces version 2.7.0
pavel@debian:~/dev/xslt$ xalan -in dodgy.xml -xsl dodgy.xsl
<?xml version="1.0" encoding="UTF-8"?><xhtml:root xmlns:xhtml="http://
example.org/xhtml">
<xhtml:p>Dodgy formatting stuff</xhtml:p>

<xhtml:p> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </xhtml:p>
<xhtml:div>Dodgy formatting stuff</xhtml:div>

<xhtml:div> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </
xhtml:div>
pavel@debian:~/dev/xslt$ xsltproc -V dodgy.xsl dodgy.xml
Using libxml 20626, libxslt 10117 and libexslt 813
xsltproc was compiled against libxml 20626, libxslt 10117 and libexslt
813
libxslt 10117 was compiled against libxml 20626
libexslt 813 was compiled against libxml 20626
XPath error : Undefined namespace prefix
xmlXPathEval: evaluation failed
pavel@debian:~/dev/xslt$ xsltproc -V dodgy.xsl dodgy.xml
Using libxml 20627, libxslt 10119 and libexslt 813
xsltproc was compiled against libxml 20627, libxslt 10119 and libexslt
813
libxslt 10119 was compiled against libxml 20627
libexslt 813 was compiled against libxml 20627
XPath error : Undefined namespace prefix
xmlXPathEval: evaluation failed
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top