XSL: I'm doing something wrong, and I can't see it!

Discussion in 'XML' started by Simon Brooke, Feb 26, 2007.

  1. Simon Brooke

    Simon Brooke Guest

    This is supposed to be a very simple XSL stylesheet to strip styling
    information out of HTML documents - it could not be more basic. And yet,
    it doesn't work. I'm obviously getting something very basic wrong and for
    the life of me I can't see it. Please, somebody, cast your eyes over this
    and tell me what's wrong!

    First, the XSL stylesheet:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <!-- strip out styling -->

    <xsl:eek:utput indent="yes" method="xml"
    doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
    doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>

    <xsl:template match="style">
    <xsl:comment>
    Don't carry style through
    </xsl:comment>
    </xsl:template>

    <xsl:template match="font">
    <span>
    <xsl:apply-templates/>
    </span>
    </xsl:template>

    <xsl:template match="*">
    <xsl:element name="{name()}">
    <xsl:for-each select="@*">
    <xsl:choose>
    <xsl:when test="name()='style'">
    <!-- nothing -->
    </xsl:when>
    <xsl:when test="name()='STYLE'">
    <!-- nothing -->
    </xsl:when>
    <xsl:eek:therwise>
    <xsl:attribute name="{name()}"><xsl:value-of
    select="."/></xsl:attribute>
    </xsl:eek:therwise>
    </xsl:choose>
    </xsl:for-each>
    <xsl:apply-templates/>
    </xsl:element>
    </xsl:template>

    </xsl:stylesheet>

    Now, an example HTML document to test it:

    <?xml version="1.0"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    <html xmlns="http://www.w3.org/1999/xhtml">
    <head lang="en" dir="ltr">
    <title>Test document for destyler</title>
    <style type="text/css">
    BODY
    {
    font-family: cursive;
    }
    </style>
    </head>

    <body>
    <h1>Test document for destyler</h1>

    <p style="font-family: serif">
    Test with 'style='
    </p>

    <p STYLE="font-family: serif">
    Test with 'STYLE='
    </p>
    </body>
    </html>

    And the output after processing:

    -[simon]-> xsltproc destyle.xsl test.html
    <?xml version="1.0"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head lang="en" dir="ltr" xml:lang="en"><meta http-equiv="Content-Type"
    content="text/html; charset=UTF-8" />
    <title>Test document for destyler</title>
    <style type="text/css" xml:space="preserve">
    BODY
    {
    font-family: cursive;
    }
    </style>
    </head>

    <body>
    <h1>Test document for destyler</h1>

    <p>
    Test with 'style='
    </p>

    <p>
    Test with 'STYLE='
    </p>
    </body>
    </html>

    As you can see, the 'style' attributes are getting successfully stripped.
    But the 'style' element is not stripped. The template

    <xsl:template match="style">
    <xsl:comment>
    Don't carry style through
    </xsl:comment>
    </xsl:template>

    simply never matches. I cannot see why not; my understanding of the
    conflict resolution rule was that the most specific matching template
    should be applied. I've tried this with two completely different XSL-T
    implementations, the Gnome libxslt and Apache Xalan, and they behave
    consistently with one another (as one would hope).

    So what have I missed? What am I doing wrong?

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    Morning had broken. I found a rather battered tube of Araldite
    resin in the bottom of the toolbag.
    Simon Brooke, Feb 26, 2007
    #1
    1. Advertising

  2. Simon Brooke wrote:

    > <html xmlns="http://www.w3.org/1999/xhtml">


    XHTML elements are in the namespace http://www.w3.org/1999/xhtml, which
    the document declares correctly. However your stylesheet does not take
    that into account, change it e.g.
    <xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    exclude-result-prefixes="xhtml">
    then use that prefix in your XPath expressions and in your XSLT match
    patterns e.g.

    <xsl:template match="xhtml:style">
    <xsl:comment>
    Don't carry style through
    </xsl:comment>
    </xsl:template>

    <xsl:template match="xhtml:font">
    <span>
    <xsl:apply-templates/>
    </span>
    </xsl:template>

    <xsl:template match="@* | node()">
    <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="@style"/>


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Feb 26, 2007
    #2
    1. Advertising

  3. Simon Brooke

    Simon Brooke Guest

    in message <45e2d23c$0$23148$-online.net>, Martin
    Honnen ('') wrote:

    > Simon Brooke wrote:
    >
    >> <html xmlns="http://www.w3.org/1999/xhtml">

    >
    > XHTML elements are in the namespace http://www.w3.org/1999/xhtml, which
    > the document declares correctly. However your stylesheet does not take
    > that into account, change it e.g.


    D'oh! I knew it had to be something as simple as that! Many thanks...

    And particular thanks for this:

    > <xsl:template match="@style"/>


    which is an elegant trick I had not seen before.

    You wouldn't have a neat solution for trimming out <p> and <div> tags which
    contain only dodgy formatting stuff (<br>s and &nbsp;s), would you?

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/
    ;; We don't just borrow words; on occasion, English has pursued other
    ;; languages down alleyways to beat them unconscious and riffle their
    ;; pockets for new vocabulary -- James D. Nicoll
    Simon Brooke, Feb 26, 2007
    #3
  4. Simon Brooke

    Guest

    On Feb 26, 4:11 pm, Simon Brooke <>
    wrote:
    > in message
    > <45e2d23c$0$23148$-online.net>,
    > Martin Honnen ('') wrote:
    > > Simon Brooke wrote:

    >
    > >> <html xmlns="http://www.w3.org/1999/xhtml">

    >
    > And particular thanks for this:
    >
    > > <xsl:template match="@style"/>

    >
    > which is an elegant trick I had not seen before.
    >
    > You wouldn't have a neat solution for trimming out <p>
    > and <div> tags which contain only dodgy formatting stuff
    > (<br>s and &nbsp;s), would you?


    I wouldn't call this neat, and it probably doesn't even
    DWYM, but consider:

    <xsl:template
    match=
    "
    xhtml:*
    [
    self::xhtml:p or self::xhtml:div
    ]
    [
    not(text()[normalize-space()!='']) and
    not(xhtml:*[not(self::xhtml:br)])
    ]
    "/>

    Worse yet, while Saxon-8B and Xalan grok it just fine,
    xsltproc chokes on it with:

    pavel@debian:~/dev/xslt$ xsltproc dodgy.xsl dodgy.xml
    XPath error : Undefined namespace prefix
    xmlXPathEval: evaluation failed

    ....for some reason I fail to grasp.

    --
    Pavel Lepin
    , Feb 26, 2007
    #4
  5. * wrote in comp.text.xml:
    > <xsl:template
    > match=
    > "
    > xhtml:*
    > [
    > self::xhtml:p or self::xhtml:div
    > ]
    > [
    > not(text()[normalize-space()!='']) and
    > not(xhtml:*[not(self::xhtml:br)])
    > ]
    > "/>


    >pavel@debian:~/dev/xslt$ xsltproc dodgy.xsl dodgy.xml
    >XPath error : Undefined namespace prefix
    >xmlXPathEval: evaluation failed


    I am unable to reproduce this with the latest release version. If you
    could check this with the latest version and it still fails, report it
    at <http://bugzilla.gnome.org/enter_bug.cgi?product=libxslt>.
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
    68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
    Bjoern Hoehrmann, Feb 26, 2007
    #5
  6. Simon Brooke

    Guest

    Possible bug in libxslt1.1.17-19? WAS: Re: XSL: I'm doing something wrong, and I can't see it!

    This post contains console session transcript, some lines
    may be longer than 78 characters.

    On Feb 26, 6:12 pm, Bjoern Hoehrmann <>
    wrote:
    > * wrote in comp.text.xml:
    >
    > > <xsl:template
    > > match=
    > > "
    > > xhtml:*
    > > [
    > > self::xhtml:p or self::xhtml:div
    > > ]
    > > [
    > > not(text()[normalize-space()!='']) and
    > > not(xhtml:*[not(self::xhtml:br)])
    > > ]
    > > "/>
    > >pavel@debian:~/dev/xslt$ xsltproc dodgy.xsl dodgy.xml
    > >XPath error : Undefined namespace prefix
    > >xmlXPathEval: evaluation failed

    >
    > I am unable to reproduce this with the latest release
    > version. If you could check this with the latest version
    > and it still fails, report it at
    > <http://bugzilla.gnome.org/enter_bug.cgi?product=libxslt>.


    I'm sorry, but I'm quite reluctant to manually upgrade to
    1.1.20, I'm relying on my pm too much. The error seems
    reproducible with 1.1.17 and 1.1.19:

    pavel@debian:~/dev/xslt$ cat dodgy.xml
    <xhtml:root xmlns:xhtml="http://example.org/xhtml">
    <xhtml:p>Dodgy formatting stuff</xhtml:p>
    <xhtml:p> <xhtml:br/> </xhtml:p>
    <xhtml:p> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </xhtml:p>
    <xhtml:div>Dodgy formatting stuff</xhtml:div>
    <xhtml:div> <xhtml:br/> </xhtml:div>
    <xhtml:div> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </
    xhtml:div>
    </xhtml:root>
    pavel@debian:~/dev/xslt$ cat dodgy.xsl
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://example.org/xhtml">
    <xsl:template match="@*|node()">
    <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
    </xsl:template>
    <xsl:template
    match=
    "
    xhtml:*
    [
    self::xhtml:p or self::xhtml:div
    ]
    [
    not(text()[normalize-space()!='']) and
    not(xhtml:*[not(self::xhtml:br)])
    ]
    "/>
    </xsl:stylesheet>
    pavel@debian:~/dev/xslt$ saxon -t dodgy.xml dodgy.xsl
    Saxon 8.8J from Saxonica
    Java version 1.5.0_08
    Warning: at xsl:stylesheet on line 3 of file:/var/www/dev/xslt/
    dodgy.xsl:
    Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
    Stylesheet compilation time: 449 milliseconds
    Processing file:/var/www/dev/xslt/dodgy.xml
    Building tree for file:/var/www/dev/xslt/dodgy.xml using class
    net.sf.saxon.tinytree.TinyBuilder
    Tree built in 4 milliseconds
    Tree size: 35 nodes, 50 characters, 0 attributes
    <?xml version="1.0" encoding="UTF-8"?><xhtml:root xmlns:xhtml="http://
    example.org/xhtml">
    <xhtml:p>Dodgy formatting stuff</xhtml:p>

    <xhtml:p> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </xhtml:p>
    <xhtml:div>Dodgy formatting stuff</xhtml:div>

    <xhtml:div> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </
    xhtml:div>
    </xhtml:root>Execution time: 92 milliseconds
    Memory used: 944320
    NamePool contents: 17 entries in 17 chains. 8 prefixes, 9 URIs
    pavel@debian:~/dev/xslt$ xalan -v

    Xalan version 1.10.0
    Xerces version 2.7.0
    pavel@debian:~/dev/xslt$ xalan -in dodgy.xml -xsl dodgy.xsl
    <?xml version="1.0" encoding="UTF-8"?><xhtml:root xmlns:xhtml="http://
    example.org/xhtml">
    <xhtml:p>Dodgy formatting stuff</xhtml:p>

    <xhtml:p> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </xhtml:p>
    <xhtml:div>Dodgy formatting stuff</xhtml:div>

    <xhtml:div> <xhtml:br/><xhtml:abbrev>aaa</xhtml:abbrev> </
    xhtml:div>
    pavel@debian:~/dev/xslt$ xsltproc -V dodgy.xsl dodgy.xml
    Using libxml 20626, libxslt 10117 and libexslt 813
    xsltproc was compiled against libxml 20626, libxslt 10117 and libexslt
    813
    libxslt 10117 was compiled against libxml 20626
    libexslt 813 was compiled against libxml 20626
    XPath error : Undefined namespace prefix
    xmlXPathEval: evaluation failed
    pavel@debian:~/dev/xslt$ xsltproc -V dodgy.xsl dodgy.xml
    Using libxml 20627, libxslt 10119 and libexslt 813
    xsltproc was compiled against libxml 20627, libxslt 10119 and libexslt
    813
    libxslt 10119 was compiled against libxml 20627
    libexslt 813 was compiled against libxml 20627
    XPath error : Undefined namespace prefix
    xmlXPathEval: evaluation failed

    --
    Pavel Lepin
    , Feb 27, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark
    Replies:
    0
    Views:
    458
  2. Chiller

    Doing something wrong

    Chiller, Apr 12, 2004, in forum: C++
    Replies:
    1
    Views:
    334
    Daniel T.
    Apr 12, 2004
  3. Michael Sparks
    Replies:
    6
    Views:
    463
    Michael Sparks
    Sep 21, 2005
  4. Jp Calderone
    Replies:
    1
    Views:
    378
    Michael Sparks
    Sep 20, 2005
  5. Eric Lilja

    Am I doing something wrong with printf() here?

    Eric Lilja, Jan 8, 2005, in forum: C Programming
    Replies:
    9
    Views:
    281
    Jonathan Burd
    Jan 10, 2005
Loading...

Share This Page