Has anyone solved the problem of lists in WordML (Word 2003)?

C

Clifford W. Racz

Has anyone solved the issue of translating lists in Word 2003 (WordML)
into xHTML? I have been trying to get the nested table code for my XSLT
to work for a while now, with no way to get the collection that I need.

To begin, I am using xsltproc that conmes with Cygwin as my processor.
I have no particular affinity to this processor except that it is open
source and standards compliant. I don't like M$, but if using a M$
processing program will fix this transformation, then I will use it.
xsltproc can be gotten here (for Windows platform):
http://www.zlatkovic.com/libxml.en.html
ftp://ftp.zlatkovic.com/pub/libxml/
(This is a windows port of libxslt, that comes with GNOME).

The problem is this:

As those of you who have worked with this type of problem, the WordML
structure is a flat structure where the focus is on visual formatting.
So, instead of a nicely nested list structure like HTML has, WordML has
a linear collection of w:p elements that contain a child <w:listPr>
element, containing the list information. A typical Word paragraph that
represents a list item is shown here:

<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/>
</w:listPr>
</w:pPr>
<w:r>
<w:t>Bulleted item 1</w:t>
</w:r>
</w:p>

The item <w:ilvl w:val="0"/> tells me that the level of nesting for this
item is "0", i.e. the first level (zero based counting).

My model for processing this list was this: As I encounter the first
<w:p> that is a list item, represented by the xPath
match="w:p[descendant-or-self::w:pPr/w:listPr][1]", then I grab the
entire collection of following-sibling elements that are paragraphs with
listPr children. This is "grabbing the list". I call a template and
pass this list to the template.

The template itself is a recursive template. Whenever I encounter a
"transitional list item" (one that is at a level greater then the
current level being processed by the template), I want to grab the
sub-collection of list elements above my current level, enclose them in
<ol></ol> and then call the template again with the new collection.

So... what is my problem? Let us pretend that my list looks like this:

* Bulleted item 1
* Bulleted item 2
o First level nesting, bulleted item 2-1
o First level nesting, bulleted item 2-2
* Bulleted item 3
* Bulleted item 4
o First level nesting, bulleted item 4-1
o First level nesting, bulleted item 4-2
o First level nesting, bulleted item 4-3
o First level nesting, bulleted item 4-4
* Bulleted item 5
* Bulleted item 6


When I am processing level 0, I don't have any issues until I grab the
items on level 1. When I do, I not only get the items 2-1 and 2-2, but
also 4-1, 4-2, 4-3, and 4-4. I have tried tweaking the xPath for this
list, but to no avail. My output looks like this with my method:

* Bulleted item 1
* Bulleted item 2
o First level nesting, bulleted item 2-1
o First level nesting, bulleted item 2-2
o First level nesting, bulleted item 4-1
o First level nesting, bulleted item 4-2
o First level nesting, bulleted item 4-3
o First level nesting, bulleted item 4-4
* Bulleted item 3
* Bulleted item 4
o First level nesting, bulleted item 4-1
o First level nesting, bulleted item 4-2
o First level nesting, bulleted item 4-3
o First level nesting, bulleted item 4-4
* Bulleted item 5
* Bulleted item 6

This following 2 items are the stripped down WordML and stripped down
XSLT for this transformation, to make this posting not insanely long.
If anyone can contribute to this problem or has already solved it, I
would be most grateful for feedback.

Cliff

********************************************************************************
XSLT for processing the WordML
********************************************************************************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY tab " ">
<!ENTITY sp " ">
<!ENTITY crlf "
">
<!ENTITY nbsp " ">
<!ENTITY bullet "•">
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-com:eek:ffice:word"
xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core"
xmlns:aml="http://schemas.microsoft.com/aml/2001/core"
xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"
xmlns:eek:="urn:schemas-microsoft-com:eek:ffice:eek:ffice"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns:st1="urn:schemas-microsoft-com:eek:ffice:smarttags" version="1.0"
exclude-result-prefixes="w v w10 sl aml wx o dt st1">
<!-- START stylesheet commands -->
<xsl:eek:utput method="xml" version="1.0" encoding="UTF-8" indent="yes"
doctype-system="fubar.dtd" />
<xsl:strip-space elements="*" />
<xsl:preserve-space elements="w:binData w:tab" />
<!-- End stylesheet commands -->
<!-- START variable declarations -->
<!-- null value for text comparisons -->
<xsl:variable name="null"></xsl:variable>
<!-- null value for text comparisons -->
<xsl:variable name="space">&sp;</xsl:variable>
<!-- null value for text comparisons -->
<xsl:variable name="bullet">À·</xsl:variable>
<!-- END variable declarations -->
<!-- START template declarations -->

<xsl:template match="/w:wordDocument">
<html>
<!-- Process the head information -->
<xsl:apply-templates select="//o:DocumentProperties" mode="head" />
<!-- Process the body information -->
<xsl:apply-templates select="//w:body" mode="body" />
</html>
</xsl:template>

<xsl:template match="w:body" mode="body">
<body>
<xsl:apply-templates select="*" mode="body" />
</body>
</xsl:template>

<xsl:template match="wx:sect" mode="body">
<xsl:apply-templates mode="body" />
</xsl:template>

<xsl:template match="wx:sub-section" mode="body">
<xsl:apply-templates mode="body" />
</xsl:template>

<xsl:template match="w:p[descendant-or-self::w:pPr/w:listPr][1]"
mode="body">
<!-- <xsl:comment> w:p[1] template match found... </xsl:comment> -->
<xsl:call-template name="listProcessor" mode="list">
<xsl:with-param name="myCollectionOfSiblingListItems"
select=".|following-sibling::w:p[descendant-or-self::w:pPr/w:listPr]" />
</xsl:call-template>
</xsl:template>

<xsl:template name="listProcessor" mode="list">
<xsl:param name="myCollectionOfSiblingListItems" />

<xsl:variable name="myCurrentListLevel"
select="$myCollectionOfSiblingListItems[1]/w:pPr/w:listPr/w:ilvl/@w:val" />
<ul>
<xsl:for-each select="$myCollectionOfSiblingListItems">

<xsl:variable name="previousSiblingListLevel"
select="preceding-sibling::w:p[position() =
1]/w:pPr/w:listPr/w:ilvl/@w:val" />
<xsl:variable name="myOwnCurrentListLevel"
select="descendant-or-self::w:p/w:pPr/w:listPr/w:ilvl/@w:val" />
<xsl:variable name="nextSiblingListLevel"
select="following-sibling::w:p[position() =
1]/w:pPr/w:listPr/w:ilvl/@w:val" />

<xsl:variable name="attempToGetTheRightSetIntoAVariable"
select="following-sibling::w:p[child::w:pPr/w:listPr/w:ilvl/@w:val][generate-id(preceding-sibling::w:p[child::w:pPr/w:listPr/w:ilvl/@w:val
= 0]) = generate-id(current())]" />

<!-- <xsl:comment> current contents: <xsl:value-of
select="current()" /><xsl:text> </xsl:text></xsl:comment> -->
<!-- <xsl:comment><xsl:text> *****Found a collection of this many
items: </xsl:text><xsl:value-of
select="count($attempToGetTheRightSetIntoAVariable)" /><xsl:text>
</xsl:text></xsl:comment> -->

<xsl:choose>
<xsl:when
test="number(descendant-or-self::w:pPr/w:listPr/w:ilvl/@w:val) =
number($myCurrentListLevel)">
<li>
<xsl:call-template name="processParagraphAsListItemContents"
mode="list" />
</li>
</xsl:when>
<xsl:when test="(
number(descendant-or-self::w:pPr/w:listPr/w:ilvl/@w:val) &gt;
number($myCurrentListLevel) ) and (
number(descendant-or-self::w:pPr/w:listPr/w:ilvl/@w:val) &gt;
number($previousSiblingListLevel))">
<xsl:variable name="nextListItemIndexOnOrBelowMyLevel"
select="following-sibling::w:p[w:pPr/w:listPr/w:ilvl/@w:val &lt;=
number($myCurrentListLevel)]" />
<xsl:variable name="subCollection"
select=".|following-sibling::w:p[descendant-or-self::w:pPr/w:listPr/w:ilvl/@w:val
&gt; number($myCurrentListLevel)]"></xsl:variable>

<!-- <xsl:comment> My current list level for recursive call:
<xsl:value-of select="number($myCurrentListLevel)" /> , with current
contents: <xsl:value-of select="." /><xsl:text>
</xsl:text></xsl:comment> -->
<li>
<xsl:call-template name="listProcessor" mode="list" >
<xsl:with-param name="myCollectionOfSiblingListItems"
select="$subCollection" />
</xsl:call-template>
</li>

</xsl:when>
<xsl:eek:therwise>
<!-- Do nothing! -->
</xsl:eek:therwise>
</xsl:choose>
</xsl:for-each>
</ul>
</xsl:template>

<xsl:template name="processParagraphAsListItemContents" mode="list">
<xsl:if test="descendant-or-self::text()">
<xsl:apply-templates mode="body" />
</xsl:if>
</xsl:template>

<xsl:template match="w:t" mode="body">
<xsl:value-of select="." />
</xsl:template>

<xsl:template match="w:r|w:b|w:u|w:i" mode="body">
<xsl:apply-templates mode="body" />
</xsl:template>

<xsl:template match="*" mode="body">
<!-- Do nothing... drop content here... -->
</xsl:template>
<!-- END template declarations -->
</xsl:stylesheet>


********************************************************************************
Sample stripped down WordML
********************************************************************************
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-com:eek:ffice:word"
xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core"
xmlns:aml="http://schemas.microsoft.com/aml/2001/core"
xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"
xmlns:eek:="urn:schemas-microsoft-com:eek:ffice:eek:ffice"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
w:macrosPresent="no" w:embeddedObjPresent="no" w:eek:cxPresent="no"
xml:space="preserve">
<w:body>
<wx:sect>
<wx:sub-section>
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1"/></w:pPr>
<w:r>
<w:t>Test #9</w:t></w:r></w:p>
<w:p>
<w:r>
<w:t>Here is a bulleted test list with 2 levels deep
nesting:</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 1</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 2</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 2-1</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 2-2</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 3</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 4</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 4-1</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 4-2</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 4-3</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 4-4</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 5</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 6</w:t></w:r></w:p>
<w:p>
<w:r>
<w:t>Here is some following text...</w:t></w:r></w:p></wx:sub-section>
<wx:sub-section>
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1"/></w:pPr>
<w:r>
<w:t>Test #10</w:t></w:r></w:p>
<w:p>
<w:r>
<w:t>Here is another bulleted test list for testing
purposes:</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Another list entirely, Bulleted item 1</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Another list entirely, Bulleted item 2</w:t></w:r></w:p>
<w:p/>
<w:sectPr>
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800"
w:header="720" w:footer="720" w:gutter="0"/>
<w:cols w:space="720"/>
<w:docGrid
w:line-pitch="360"/></w:sectPr></wx:sub-section></wx:sect></w:body></w:wordDocument>
 
B

Ben Edgington

Clifford W. Racz said:
This following 2 items are the stripped down WordML and stripped down
XSLT for this transformation, to make this posting not insanely
long. If anyone can contribute to this problem or has already solved
it, I would be most grateful for feedback.

I don't think you are going to be able to solve this by tinkering with
the XPath - the XML just doesn't have enough structure. The problem
is that you need to track transitions between list levels, and they
are not accessible with XPath in this flat XML structure

Here's a radically simplified version that simulates your problem that
you should be able to adapt to your code easily enough.

It again uses a recursive template to keep track of the list level,
but now the list items are considered sequentially rather than in
groups of the same level. Using the recursion we can detect when the
list level changes and insert some markup accordingly (this has to be
done "by hand" using disable-output-escaping, which is ugly. But it
works).

[Note by the way: you can't use the mode attribute on
xsl:call-template, but that's not the problem here]

This transformation

- - -
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="list">
<ul>
<xsl:call-template name="process-list-items">
<xsl:with-param name="item-number" select="1"/>
<xsl:with-param name="level" select="0"/>
</xsl:call-template>
</ul>
</xsl:template>

<xsl:template name="process-list-items">
<xsl:param name="item-number"/>
<xsl:param name="level"/>

<xsl:variable name="current-item" select="./item[$item-number]"/>

<!-- If the list level has increased we start a sublist -->
<xsl:if test="$level &lt; $current-item/level/@Val">
<xsl:text disable-output-escaping="yes">&lt;ul></xsl:text>
</xsl:if>

<!-- If the list level has decreased we end the sublist -->
<xsl:if test="$level &gt; $current-item/level/@Val">
<xsl:text disable-output-escaping="yes">&lt;/ul></xsl:text>
</xsl:if>

<!-- Output the list item -->
<li><xsl:value-of select="$current-item/text"/></li>

<!-- Process the next list item -->
<xsl:if test="./item[$item-number+1]">
<xsl:call-template name="process-list-items">
<xsl:with-param name="item-number" select="$item-number+1"/>
<xsl:with-param name="level" select="$current-item/level/@Val"/>
</xsl:call-template>
</xsl:if>

</xsl:template>

</xsl:stylesheet>
- - -

with this XML

- - -
<list>
<item>
<level val="0"/>
<text>Item 1</text>
</item>
<item>
<level val="0"/>
<text>Item 2</text>
</item>
<item>
<level val="1"/>
<text>Item 2-1</text>
</item>
<item>
<level val="1"/>
<text>Item 2-2</text>
</item>
<item>
<level val="0"/>
<text>Item 3</text>
</item>
<item>
<level val="0"/>
<text>Item 4</text>
</item>
<item>
<level val="1"/>
<text>Item 4-1</text>
</item>
<item>
<level val="1"/>
<text>Item 4-2</text>
</item>
<item>
<level val="0"/>
<text>Item 5</text>
</item>
</list>
- - -

gives this output (after reformatting)

- - -
<?xml version="1.0"?>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<ul>
<li>Item 2-1</li>
<li>Item 2-2</li>
</ul>
<li>Item 3</li>
<li>Item 4</li>
<ul>
<li>Item 4-1</li>
<li>Item 4-2</li>
</ul>
<li>Item 5</li>
</ul>
- - -
 
O

Oleg Tkachenko [MVP]

Clifford said:
Has anyone solved the issue of translating lists in Word 2003 (WordML)
into xHTML? I have been trying to get the nested table code for my XSLT
to work for a while now, with no way to get the collection that I need.

You may want to download Microsoft's WordML viewer and take a look at
their XSLT stylesheet.
 
C

Clifford W. Racz

I have looked at the M$ WordML viewer... of course, that was one of the first things I did.

What is spit out of that thing is a paragraph that is styled to sort-of look like a list item, just as word handles it in WordML.

For example, here is the first list item when transformed by the word2html.xsl:

<p class="Normal-P" style="margin-left:36pt;text-indent:-18pt;">
<span class="Normal-H"><span style="font-family:Symbol;font-style:normal;text-decoration:none;font-weight:normal;">·<span style="padding-left:12pt;"></span></span>Bulleted item 1</span>
</p>

And so, that is useless when trying to export this to html. Word internally handles it differently because the "save as..." html option does export it properly. However, I am not wanting html output, only xml that is compatable with the html list model.

Clifford
 
C

Clifford W. Racz

I am trying to write a simple XSLT to "beautify" any arbitrary xml, i.e. to indent it for readability and convert it to UTF-8 for use in some scripts that I authored.

If I use something like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="xml" indent="yes" encoding="UTF-8" />
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

It does the trick nicely. I want to accept any arbitrary xml language, so I don't specify a default namespace. Not a problem.

Problem: I want to output the proper DOCTYPE statement for the input file, so that I can validte it.

So, does anyone know a way to access the public and system dtd names for an input filetype?


Clifford
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,880
Messages
2,569,944
Members
46,249
Latest member
MelodyThye

Latest Threads

Top