Re: Problems creating an automatic index for XHTML with XSLT

Discussion in 'XML' started by Marrow, Sep 11, 2003.

  1. Marrow

    Marrow Guest

    Hi Alex,

    It seems that you want to structure your flat <H?> elements into a
    hierarchical item list. Something like the following stylesheet will give
    you the output you wanted...

    <?xml version="1.0"?>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:eek:utput method="xml" indent="yes"/>
    <!-- key for grouping H? elements by their parent H? element -->
    <xsl:key name="kHGroups" match="H2"
    use="concat('1|',generate-id(preceding-sibling::H1[1]))"/>
    <xsl:key name="kHGroups" match="H3"
    use="concat('2|',generate-id(preceding-sibling::H2[1]))"/>
    <xsl:key name="kHGroups" match="H4"
    use="concat('3|',generate-id(preceding-sibling::H3[1]))"/>
    <xsl:key name="kHGroups" match="H5"
    use="concat('4|',generate-id(preceding-sibling::H4[1]))"/>
    <xsl:key name="kHGroups" match="H6"
    use="concat('5|',generate-id(preceding-sibling::H5[1]))"/>
    <xsl:template match="HTML">
    <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="BODY">
    <xsl:copy>
    <xsl:copy-of select="@*"/>
    <H1>Index</H1>
    <OL>
    <xsl:apply-templates select="H1" mode="struc"/>
    </OL>
    <xsl:apply-templates select="H1 | H2 | H3 | H4 | H5 | H6"/>
    </xsl:copy>
    </xsl:template>

    <!-- template for structuring H? elements into item lists -->
    <xsl:template match="H1 | H2 | H3 | H4 | H5 | H6" mode="struc">
    <LI>
    <a href="#{.}">
    <xsl:value-of select="."/>
    </a>
    <!-- get the children of this H -->
    <xsl:variable name="h-children"
    select="key('kHGroups',concat(substring(name(),2,1),'|',generate-id()))"/>
    <xsl:if test="$h-children">
    <OL>
    <xsl:apply-templates select="$h-children" mode="struc"/>
    </OL>
    </xsl:if>
    </LI>
    </xsl:template>

    <!-- template for listing H? elements as <a> links -->
    <xsl:template match="H1 | H2 | H3 | H4 | H5 | H6">
    <a name="{.}">
    <xsl:copy-of select="."/>
    </a>
    </xsl:template>

    </xsl:stylesheet>

    Hope this helps
    Marrow
    http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
    http://www.topxml.com/Xselerator


    "Alex Geller" <> wrote in message
    news:-ig.de...
    > Hi,
    > I am trying to add an index up front of an XSLT document.
    > The style should spot H1,H2 and H3s and make some sort of index from it
    > (currently I use nested OL/LI).
    > Example:
    > $cat test.html
    > <HTML>
    > <BODY>
    > <H1>H1 1</H1>
    > <H2>H2 1.1</H2>
    > <H3>H3 1.1.1</H3>
    > <H3>H3 1.1.2</H3>
    > <H3>H3 1.1.3</H3>
    > <H2>H2 1.2</H2>
    > <H3>H3 1.2.1</H3>
    > <H3>H3 1.2.2</H3>
    > <H3>H3 1.2.3</H3>
    > <H1>H1 2</H1>
    > <H2>H2 2.1</H2>
    > <H3>H3 2.1.1</H3>
    > <H3>H3 2.1.2</H3>
    > <H3>H3 2.1.3</H3>
    > <H2>H2 2.2</H2>
    > <H3>H3 2.2.1</H3>
    > <H3>H3 2.2.2</H3>
    > <H3>H3 2.2.3</H3>
    > </BODY>
    > </HTML>
    > $xalan -in test.html -xsl mkindex.xslt -out result.html
    > $cat result.html
    > <?xml version="1.0" encoding="UTF-8"?>
    > <HTML>
    > <BODY>
    > <H1>Index</H1>
    > <OL>
    > <LI><a href="#H1 1">H1 1</a>
    > <OL>
    > <LI><a href="#H2 1.1">H2 1.1</a>
    > <OL>
    > <LI><a href="#H3 1.1.1">H3 1.1.1</a></LI>
    > <LI><a href="#H3 1.1.2">H3 1.1.2</a>
    > </LI><LI><a href="#H3 1.1.3">H3 1.1.3</a></LI>
    > </OL>
    > </LI>
    > <LI><a href="#H2 1.2">H2 1.2</a>
    > <OL>
    > <LI><a href="#H3 1.2.1">H3 1.2.1</a></LI>
    > <LI><a href="#H3 1.2.2">H3 1.2.2</a></LI>
    > <LI><a href="#H3 1.2.3">H3 1.2.3</a></LI>
    > </OL>
    > </LI>
    > </OL>
    > </LI>
    > <LI><a href="#H1 2">H1 2</a>
    > <OL>
    > <LI><a href="#H2 2.1">H2 2.1</a>
    > <OL>
    > <LI><a href="#H3 2.1.1">H3 2.1.1</a></LI>
    > <LI><a href="#H3 2.1.2">H3 2.1.2</a>
    > </LI><LI><a href="#H3 2.1.3">H3 2.1.3</a></LI>
    > </OL>
    > </LI>
    > <LI><a href="#H2 2.2">H2 2.2</a>
    > <OL>
    > <LI><a href="#H3 2.2.1">H3 2.2.1</a></LI>
    > <LI><a href="#H3 2.2.2">H3 2.2.2</a></LI>
    > <LI><a href="#H3 2.2.3">H3 2.2.3</a></LI>
    > </OL>
    > </LI>
    > </OL>
    > </LI>
    > </OL>
    > <a name="H1 1"><H1>H1 1</H1></a>
    > <a name="H2 1.1"><H2>H2 1.1</H2></a>
    > <a name="H3 1.1.1"><H3>H3 1.1.1</H3></a>
    > <a name="H3 1.1.2"><H3>H3 1.1.2</H3></a>
    > <a name="H3 1.1.3"><H3>H3 1.1.3</H3></a>
    > <a name="H2 1.2"><H2>H2 1.2</H2></a>
    > <a name="H3 1.2.1"><H3>H3 1.2.1</H3></a>
    > <a name="H3 1.2.2"><H3>H3 1.2.2</H3></a>
    > <a name="H3 1.2.3"><H3>H3 1.2.3</H3></a>
    > <a name="H1 2"><H1>H1 2</H1></a>
    > <a name="H2 2.1"><H2>H2 2.1</H2></a>
    > <a name="H3 2.1.1"><H3>H3 2.1.1</H3></a>
    > <a name="H3 2.1.2"><H3>H3 2.1.2</H3></a>
    > <a name="H3 2.1.3"><H3>H3 2.1.3</H3></a>
    > <a name="H2 2.2"><H2>H2 2.2</H2></a>
    > <a name="H3 2.2.1"><H3>H3 2.2.1</H3></a>
    > <a name="H3 2.2.2"><H3>H3 2.2.2</H3></a>
    > <a name="H3 2.2.3"><H3>H3 2.2.3</H3></a>
    > </BODY>
    > </HTML>
    > I have found a solution that works for me but which is not very good.

    Maybe
    > it's helpful as a starting point or just to prove that I have tried a
    > little before posting.
    > The solution has at least the following problems:
    > - In order to detect all H2 silblings between the two H1 elements H1a and
    > H1b I create a node list of all siblings following H1a and then, by
    > conditional, check whether the previous H1 sibling of the current H2 is

    H1.
    > The check is done by comparing the text of the H1 nodes. This breaks as
    > soon as two adjacent H1s have the same text.
    > - The template fails as soon as the Hn tags are not siblings in the same
    > list. Suppose we introduce a <div> in the document so that one or more Hn
    > elements become descendants of this element, then my scheme breaks.
    > $cat mkindex.xslt
    > <?xml version="1.0" encoding="ISO-8859-1"?>
    > <xsl:stylesheet
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    > version="1.0">
    > <xsl:eek:utput method="xml"/>
    >
    > <xsl:template match="@*|node()">
    > <xsl:copy>
    > <xsl:apply-templates select="@*|node()"/>
    > </xsl:copy>
    > </xsl:template>
    > <xsl:template match="BODY">
    > <BODY>
    > <H1>Index</H1>
    > <OL>
    > <xsl:for-each select="H1">
    > <xsl:variable name="h1text" select="text()"/>
    > <LI>
    > <a href="#{text()}">
    > <xsl:value-of select="text()"/>
    > </a>
    > <OL>
    > <xsl:for-each select="following-sibling::H2">
    > <xsl:if
    > test="preceding-sibling::H1[position()=1]/text()=$h1text">
    > <xsl:variable name="h2text"
    > select="text()"/>
    > <LI>
    > <a href="#{text()}">
    > <xsl:value-of

    select="text()"/>
    > </a>
    > <OL>
    > <xsl:for-each
    > select="following-sibling::H3">
    > <xsl:if
    > test="preceding-sibling::H2[position()=1]/text()=$h2text">
    > <LI>
    > <a

    href="#{text()}">
    > <xsl:value-of
    > select="text()"/>
    > </a>
    > </LI>
    > </xsl:if>
    > </xsl:for-each>
    > </OL>
    > </LI>
    > </xsl:if>
    > </xsl:for-each>
    > </OL>
    > </LI>
    > </xsl:for-each>
    > </OL>
    > <xsl:apply-templates/>
    > </BODY>
    > </xsl:template>
    > <xsl:template match="H1">
    > <a name="{text()}">
    > <H1>
    > <xsl:apply-templates/>
    > </H1>
    > </a>
    > </xsl:template>
    > <xsl:template match="H2">
    > <a name="{text()}">
    > <H2>
    > <xsl:apply-templates/>
    > </H2>
    > </a>
    > </xsl:template>
    > <xsl:template match="H3">
    > <a name="{text()}">
    > <H3>
    > <xsl:apply-templates/>
    > </H3>
    > </a>
    > </xsl:template>
    > </xsl:stylesheet>
    > Thank you for your help
    > Regards,
    > Alex
     
    Marrow, Sep 11, 2003
    #1
    1. Advertising

  2. Marrow

    Alex Geller Guest

    Hi Marrow,
    thank you for your help.
    Marrow wrote:

    > Hi Alex,
    >
    > It seems that you want to structure your flat <H?> elements into a
    > hierarchical item list.

    Exactly
    >Something like the following stylesheet will give
    > you the output you wanted...
    >
    > <?xml version="1.0"?>
    > <xsl:stylesheet version="1.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    >..

    Your style works for the test case and it seems, that you have solved the
    first problem of the name comparing. The second problem as I pointed out,
    is that the template should work for almost arbitrary HTML documents where
    the Hns are not neccessarly siblings but are maybe partially enclosed in a
    <div> for example.
    Consider the following fragment:
    <H1>H1 1</H>
    <div class="examples">
    <H2>H2 1.1</H2>
    <H2>H2 1.2</H2>
    </div>
    <H1>H1 2...
    In this case both mine and your style fail to see the two nested H2s.
    Your style has another problem of failing to copy the document content. My
    first naive attempt of adding a general copy template didn't work.

    Thank you however, I will study your style and try to understand how it
    works.

    Regards,
    Alex
     
    Alex Geller, Sep 11, 2003
    #2
    1. Advertising

  3. "Alex Geller" <> wrote in message news:-ig.de...
    > Hi Dimitre,
    > Dimitre Novatchev wrote:
    >
    > > Excuse me, but it is not clear what exactly you want to produce from your
    > > source xhtml -- how the output is related to the input (they seem
    > > essentially to have the same structure),

    > Well, not quite. The input H?s have a flat structure (siblings) while in the
    > output they are nested (Hn+1 become descendands of Hn). Maybe you were
    > fooled by the indentation of the input HTML.
    > >how the output must be structured

    > Exactly as shown (best is, you view both the input and the output in a
    > browser).
    > > and what requirements it must satisfy.

    >
    > >
    > > In other words, can you define what you mean by "index"?

    >
    > I want an automatic generation of a table of contents up front of an
    > arbitrary HTML document where chapters and subchapters are denoted using H?
    > tags. The style should search for these tags in the document, create the
    > table of contents from those tags and then copy the document itself. The
    > items in the table of contents should be linked vi <a href=.." to their
    > respective chapters in the document. The table of content should have a
    > hirachical structure using numbered lists as shown in the example. The
    > rules for the structure could be defined as follows:
    > Let v be a vector of all H? elements found in a pre order traversal of the
    > document tree.
    > For example:
    > v=H1,H2,H2,H3,H2,H3,H1,H1,H2,H3,H2,H3
    > We call n of a Hn element, it's hierarchy value.
    > Create a resulttree r so that it contains all nodes from the source vector
    > v. In this resulttree r every node vn from the source vector v
    > becomes the child of it's preceding sibling vn-1 if the hierarchy value of
    > vn is lower than the hirarchy value of vn-1
    > r=H1(H2,H2(H3),H2(H3)),H1,H1(H2(H3),H2(H3).
    >
    > Thank you,
    > Alex


    Hi Alex,

    The following transformation implements all your requirements,
    including the one that different Hx may not always be siblings, but
    may be children of other elements, e.g. div.

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:eek:utput omit-xml-declaration="yes" indent="yes"/>

    <xsl:key name="kChildren"
    match="H2"
    use="generate-id(
    (ancestor::H1 | preceding::H1)
    [last()]
    )"/>
    <xsl:key name="kChildren"
    match="H3"
    use="generate-id(
    (ancestor::H2 | preceding::H2)
    [last()]
    )"/>
    <xsl:key name="kChildren"
    match="H4"
    use="generate-id(
    (ancestor::H3 | preceding::H3)
    [last()]
    )"/>
    <xsl:key name="kChildren"
    match="H5"
    use="generate-id(
    (ancestor::H4 | preceding::H4)
    [last()]
    )"/>
    <xsl:key name="kChildren"
    match="H6"
    use="generate-id(
    (ancestor::H5 | preceding::H5)
    [last()]
    )"/>
    <xsl:template match="/">
    <html>
    <xsl:apply-templates select="/*/*/H1" mode="TOC"/>
    <xsl:apply-templates/>
    </html>
    </xsl:template>

    <xsl:template match="H1 | H2 | H3 | H4 | H5 | H6"
    mode="TOC">
    <LI><a href="#{.}"><xsl:value-of select="."/></a>
    <OL>
    <xsl:apply-templates
    select="key('kChildren', generate-id())"
    mode="TOC"/>
    </OL>
    </LI>
    </xsl:template>

    <xsl:template match="@* | node()">
    <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="H1 | H2 | H3 | H4 | H5 | H6">
    <a name="#{.}"/>
    <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
    </xsl:template>
    </xsl:stylesheet>

    When applied on this source.xhtml:

    <HTML>
    <BODY>
    <H1>H1 1</H1>
    <H2>H2 1.1</H2>
    <div>
    <H3>H3 1.1.1</H3>
    <H3>H3 1.1.2</H3>
    </div>
    <H3>H3 1.1.3</H3>
    <H2>H2 1.2</H2>
    <H3>H3 1.2.1</H3>
    <H3>H3 1.2.2</H3>
    <H3>H3 1.2.3</H3>
    <H1>H1 2</H1>
    <H2>H2 2.1</H2>
    <H3>H3 2.1.1</H3>
    <H3>H3 2.1.2</H3>
    <H3>H3 2.1.3</H3>
    <H2>H2 2.2</H2>
    <H3>H3 2.2.1</H3>
    <H3>H3 2.2.2</H3>
    <H3>H3 2.2.3</H3>
    </BODY>
    </HTML>

    the wanted result is produced:

    <html>
    <LI><a href="#H1 1">H1 1</a><OL>
    <LI><a href="#H2 1.1">H2 1.1</a><OL>
    <LI><a href="#H3 1.1.1">H3 1.1.1</a><OL></OL>
    </LI>
    <LI><a href="#H3 1.1.2">H3 1.1.2</a><OL></OL>
    </LI>
    <LI><a href="#H3 1.1.3">H3 1.1.3</a><OL></OL>
    </LI>
    </OL>
    </LI>
    <LI><a href="#H2 1.2">H2 1.2</a><OL>
    <LI><a href="#H3 1.2.1">H3 1.2.1</a><OL></OL>
    </LI>
    <LI><a href="#H3 1.2.2">H3 1.2.2</a><OL></OL>
    </LI>
    <LI><a href="#H3 1.2.3">H3 1.2.3</a><OL></OL>
    </LI>
    </OL>
    </LI>
    </OL>
    </LI>
    <LI><a href="#H1 2">H1 2</a><OL>
    <LI><a href="#H2 2.1">H2 2.1</a><OL>
    <LI><a href="#H3 2.1.1">H3 2.1.1</a><OL></OL>
    </LI>
    <LI><a href="#H3 2.1.2">H3 2.1.2</a><OL></OL>
    </LI>
    <LI><a href="#H3 2.1.3">H3 2.1.3</a><OL></OL>
    </LI>
    </OL>
    </LI>
    <LI><a href="#H2 2.2">H2 2.2</a><OL>
    <LI><a href="#H3 2.2.1">H3 2.2.1</a><OL></OL>
    </LI>
    <LI><a href="#H3 2.2.2">H3 2.2.2</a><OL></OL>
    </LI>
    <LI><a href="#H3 2.2.3">H3 2.2.3</a><OL></OL>
    </LI>
    </OL>
    </LI>
    </OL>
    </LI>
    <HTML>

    <BODY>
    <a name="#H1 1"></a><H1>H1 1</H1>
    <a name="#H2 1.1"></a><H2>H2 1.1</H2>

    <div>
    <a name="#H3 1.1.1"></a><H3>H3 1.1.1</H3>
    <a name="#H3 1.1.2"></a><H3>H3 1.1.2</H3>

    </div>
    <a name="#H3 1.1.3"></a><H3>H3 1.1.3</H3>
    <a name="#H2 1.2"></a><H2>H2 1.2</H2>
    <a name="#H3 1.2.1"></a><H3>H3 1.2.1</H3>
    <a name="#H3 1.2.2"></a><H3>H3 1.2.2</H3>
    <a name="#H3 1.2.3"></a><H3>H3 1.2.3</H3>
    <a name="#H1 2"></a><H1>H1 2</H1>
    <a name="#H2 2.1"></a><H2>H2 2.1</H2>
    <a name="#H3 2.1.1"></a><H3>H3 2.1.1</H3>
    <a name="#H3 2.1.2"></a><H3>H3 2.1.2</H3>
    <a name="#H3 2.1.3"></a><H3>H3 2.1.3</H3>
    <a name="#H2 2.2"></a><H2>H2 2.2</H2>
    <a name="#H3 2.2.1"></a><H3>H3 2.2.1</H3>
    <a name="#H3 2.2.2"></a><H3>H3 2.2.2</H3>
    <a name="#H3 2.2.3"></a><H3>H3 2.2.3</H3>

    </BODY>

    </HTML>
    </html>

    Hope this helped.


    =====
    Cheers,

    Dimitre Novatchev.
    http://fxsl.sourceforge.net/ -- the home of FXSL
     
    Dimitre Novatchev, Sep 11, 2003
    #3
  4. Marrow

    Alex Geller Guest


    > The following transformation implements all your requirements,
    > including the one that different Hx may not always be siblings, but
    > may be children of other elements, e.g. div.

    Looks elegant, works, thank you very much!
    Regards,
    Alex
     
    Alex Geller, Sep 12, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Guest
    Replies:
    1
    Views:
    774
    Guest
    Jun 29, 2004
  2. Dimitre Novatchev
    Replies:
    1
    Views:
    621
    Alex Geller
    Sep 11, 2003
  3. Replies:
    7
    Views:
    905
  4. Usha2009
    Replies:
    0
    Views:
    1,143
    Usha2009
    Dec 20, 2009
  5. Tomasz Chmielewski

    sorting index-15, index-9, index-110 "the human way"?

    Tomasz Chmielewski, Mar 4, 2008, in forum: Perl Misc
    Replies:
    4
    Views:
    315
    Tomasz Chmielewski
    Mar 4, 2008
Loading...

Share This Page