XSLT problem with single tags

Discussion in 'XML' started by dwergkees, May 7, 2006.

  1. dwergkees

    dwergkees Guest

    Hi,

    Got a litte problem here. I'm trying to create a XSLT file that will do
    a transformation from WordML format (MS Word XML format, see
    http://rep.oio.dk/Microsoft.com/officeschemas/welcome.htm) to a
    reasonably clean (X)HTML format.

    (The reason being that, combined with some PHP scripting it should be
    possible to store the embedded images, which is pretty neat).

    I am, however running into a XSLT problem. An piece of an old version
    works like this:

    <xsl:template match="w:r">
    <xsl:choose>
    <xsl:when test=".//w:i">
    <i><xsl:apply-templates /></i>
    </xsl:when>
    <xsl:when test=".//w:b">
    <b><xsl:apply-templates /></b>
    </xsl:when>
    <xsl:eek:therwise>
    <xsl:apply-templates />
    </xsl:eek:therwise>
    </xsl:choose>
    </xsl:template>

    This matches the r element (Run element, kind of a default container
    thingy). It tests whether the r element contains an i or b element
    (meaning of course that the content of that r element is in italic or
    bold.) When this is the case, nice html style tags are placed. This
    doesn't function properly in the case where an r element contains both
    an i and a b element, i.e. when the text is both italic and bold.
    Therefore, i changed the code to:

    <xsl:template match="w:r">
    <xsl:if test=".//w:i">
    <i>
    </xsl:if>

    <xsl:if test=".//w:b">
    <b>
    </xsl:if>

    <xsl:apply-templates />

    <xsl:if test=".//w:i">
    </i>
    </xsl:if>

    <xsl:if test=".//w:b">
    </b>
    </xsl:if>
    </xsl:template>

    It now tests twice for each style, for the opening tag and for the
    closing tag. In principal this works fine, but in practice the xslt
    sheet is not well-formed and will not be applied as it contains non
    closed tags (the <i> and <b> tags). I've tried to:
    - replace the < and > with &lt; and &gt;
    - put the tages inside CDATA sections, for example <![CDATA[<i>]]>


    However, in both cases the tags of appear as literal text instead of
    HTML code.

    Any ideas on how to able to insert single open or closing tags in my
    HTML code, or another solution to properly nest the <i> and <b>
    elements?

    TIA

    Wilco - Dwergkees - Menge
    dwergkees, May 7, 2006
    #1
    1. Advertising

  2. dwergkees wrote:


    > <xsl:template match="w:r">
    > <xsl:choose>
    > <xsl:when test=".//w:i">
    > <i><xsl:apply-templates /></i>
    > </xsl:when>
    > <xsl:when test=".//w:b">
    > <b><xsl:apply-templates /></b>
    > </xsl:when>
    > <xsl:eek:therwise>
    > <xsl:apply-templates />
    > </xsl:eek:therwise>
    > </xsl:choose>
    > </xsl:template>


    Why don't you simply do
    <xsl:template match="w:r"><xsl:apply-templates /></xsl:template>

    <xsl:template match="w:i">
    <i><xsl:apply-templates /></i>
    </xsl:template>

    <xsl:template match="w:b">
    <b><xsl:apply-templates /></b>
    </xsl:template>

    I am not familiar with WordML however, but based on what you have posted
    and on how XSLT works it seems more natural to simply let
    xsl:apply-templates do its work combined with templates for the
    different elements you need to process.


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, May 7, 2006
    #2
    1. Advertising

  3. dwergkees

    dwergkees Guest

    >Why don't you simply do
    > <xsl:template match="w:r"><xsl:apply-templates /></xsl:template>
    >
    > <xsl:template match="w:i">
    > <i><xsl:apply-templates /></i>
    > </xsl:template>
    >
    > <xsl:template match="w:b">
    > <b><xsl:apply-templates /></b>
    > </xsl:template>



    I've tried fussin' about with your solution, but i can't get it to fit
    just right. I'll show a small WordML example to demonstrate the problem
    more clearly:

    WordML:

    <w:p>
    <w:r>
    <w:t>Plain text</w:t>
    </w:r>
    <w:r>
    <w:rPr>
    <w:b/>
    </w:rPr>
    <w:t>Bold text</w:t>
    </w:r>
    <w:r>
    <w:t> </w:t>
    </w:r>
    <w:r>
    <w:rPr>
    <w:i/>
    </w:rPr>
    <w:t>Italic text</w:t>
    </w:r>
    <w:r>
    <w:rPr>
    <w:b/>
    <w:i/>
    </w:rPr>
    <w:t>Bold and italic</w:t>
    </w:r>
    </w:p>

    Should transform to:

    <p>Plain text <b>Bold text</b> <i>Italic text</i> <i><b>Bold and
    italic</b></i></p>

    As you can see, the <w:i/> and <w:b/> tags are grandchildren of the r
    element, the text itself is a child of the r element. So at each r
    element I want to check the existence of w:i and w:b and surround the t
    element with the corresponding HTML. Your solution matches the
    existence, but then the processor is at the wrong current Node. (As far
    as I understand the complexities of xslt).

    Any thoughts?

    TIA

    Wilco.
    dwergkees, May 7, 2006
    #3
  4. dwergkees

    Peter Flynn Guest

    dwergkees wrote:
    >> Why don't you simply do
    >> <xsl:template match="w:r"><xsl:apply-templates /></xsl:template>
    >>
    >> <xsl:template match="w:i">
    >> <i><xsl:apply-templates /></i>
    >> </xsl:template>
    >>
    >> <xsl:template match="w:b">
    >> <b><xsl:apply-templates /></b>
    >> </xsl:template>

    >
    >
    > I've tried fussin' about with your solution, but i can't get it to fit
    > just right. I'll show a small WordML example to demonstrate the problem
    > more clearly:
    >
    > WordML:
    >
    > <w:p>
    > <w:r>
    > <w:t>Plain text</w:t>
    > </w:r>
    > <w:r>
    > <w:rPr>
    > <w:b/>
    > </w:rPr>
    > <w:t>Bold text</w:t>
    > </w:r>
    > <w:r>
    > <w:t> </w:t>
    > </w:r>
    > <w:r>
    > <w:rPr>
    > <w:i/>
    > </w:rPr>
    > <w:t>Italic text</w:t>
    > </w:r>
    > <w:r>
    > <w:rPr>
    > <w:b/>
    > <w:i/>
    > </w:rPr>
    > <w:t>Bold and italic</w:t>
    > </w:r>
    > </w:p>


    This is an interesting relic of (a) the fact that Word uses out-of-line
    markup and (b) the sedulous avoidance of Mixed Content common to those
    who think pointers are more fun to program than trees. It's also about
    the only way you can model the behaviour of unschooled authors in Word.

    Just add another condition to your original:

    <xsl:template match="w:r">
    <xsl:choose>
    <xsl:when test=".//w:i and .//w:b">
    <i><b><xsl:apply-templates/></b></i>
    </xsl:when>
    <xsl:when test=".//w:i and not(.//w:b)">
    <i><xsl:apply-templates/></i>
    </xsl:when>
    <xsl:when test=".//w:b and not(.//w:i)">
    <b><xsl:apply-templates/></b>
    </xsl:when>
    <xsl:eek:therwise>
    <xsl:apply-templates/>
    </xsl:eek:therwise>
    </xsl:choose>
    </xsl:template>

    > Should transform to:
    >
    > <p>Plain text <b>Bold text</b> <i>Italic text</i> <i><b>Bold and
    > italic</b></i></p>


    No, there is no white-space after "Plain text" nor after "Italic text"
    in your quoted XML document. If you need to introduce extra white-space
    you need to specify the rules for doing so.

    > As you can see, the <w:i/> and <w:b/> tags are grandchildren of the r
    > element, the text itself is a child of the r element. So at each r
    > element I want to check the existence of w:i and w:b and surround the t
    > element with the corresponding HTML. Your solution matches the
    > existence, but then the processor is at the wrong current Node. (As far
    > as I understand the complexities of xslt).
    >
    > Any thoughts?


    Here's another way to do it, based on Martin's suggestion of using the
    normal "apply-templates" way of proceeding down a document. You'll have
    to jiggle the declared namespace for w: as I don't know what your
    document declares it as. This method will handle anything occurring in
    w:rPr, not just bold and italics.

    <?xml version="1.0" encoding="UTF-8" ?>
    <xsl:stylesheet version="1.0"
    xmlns:w="http://foo.bar.org"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:eek:utput method="html"/>
    <xsl:strip-space elements="*"/>
    <xsl:preserve-space elements="w:t"/>

    <xsl:template match="w:p">
    <p>
    <xsl:apply-templates/>
    </p>
    </xsl:template>

    <xsl:template match="w:r">
    <xsl:choose>
    <xsl:when test="w:rPr">
    <xsl:call-template name="nest">
    <xsl:with-param name="styles" select="w:rPr/*"/>
    </xsl:call-template>
    </xsl:when>
    <xsl:eek:therwise>
    <xsl:apply-templates/>
    </xsl:eek:therwise>
    </xsl:choose>
    </xsl:template>

    <xsl:template name="nest">
    <xsl:param name="styles"/>
    <xsl:param name="counter">
    <xsl:text>1</xsl:text>
    </xsl:param>
    <xsl:choose>
    <xsl:when test="$counter>count($styles)">
    <xsl:value-of select="w:t"/>
    </xsl:when>
    <xsl:eek:therwise>
    <xsl:element name="{local-name($styles[$counter])}">
    <xsl:call-template name="nest">
    <xsl:with-param name="styles" select="$styles"/>
    <xsl:with-param name="counter" select="$counter+1"/>
    </xsl:call-template>
    </xsl:element>
    </xsl:eek:therwise>
    </xsl:choose>
    </xsl:template>

    </xsl:stylesheet>

    In this, I am stripping all space except in w:t. This will preserve the
    otherwise vulnerable white-space-only node.

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
    Peter Flynn, May 7, 2006
    #4
  5. dwergkees

    dwergkees Guest

    Thanks!

    This is the kind of solution I was looking for!!! I have to tweak it
    here and there, but this is just the kind of nesting principle I was
    interested in, as it allows for extensions (I plan to use the same
    nesting at higher level, so I can alternate between <p>, <h1>, <h2> to
    <hn>. That should be possible right?)

    >> Should transform to:


    >> <p>Plain text <b>Bold text</b> <i>Italic text</i> <i><b>Bold and
    >> italic</b></i></p>

    >
    >No, there is no white-space after "Plain text" nor after "Italic text"
    >in your quoted XML document. If you need to introduce


    as for the extra whitespaces in the desired output, they are just
    random typos from me! I'm just as happy with all the original
    whitespaces minus unnecesary whitespace.
    Again, thanks a lot for helping out both Martin and Peter!!

    Wilco Menge
    dwergkees, May 8, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dean H. Saxe
    Replies:
    0
    Views:
    1,004
    Dean H. Saxe
    Jan 3, 2004
  2. Rob Nicholson
    Replies:
    3
    Views:
    636
    Rob Nicholson
    May 28, 2005
  3. Ranganath

    Custom Tags within Custom Tags.

    Ranganath, Oct 17, 2003, in forum: Java
    Replies:
    2
    Views:
    434
    Ranganath
    Oct 21, 2003
  4. Mike
    Replies:
    3
    Views:
    853
    Michael Borgwardt
    Jan 9, 2004
  5. A. Brinkmann
    Replies:
    2
    Views:
    1,059
    A. Brinkmann
    Apr 16, 2004
Loading...

Share This Page