[xsl] speed up tripple-muench

C

Chris Huebsch

Hello,

I have a Real-World-XSLT-Problem.(tm)

That is: A XML-Document of about 75.000 elements contains
<person>-elements at various depths of the tree. Each person consists of
a <name>, <vorname> and <titel>. There are about 8000 persons in this tree.
A lot of them exist more than once. (That means that there are
person-Elements with the same content.)

The tree consists of some dozen subtrees wich are identified by
one special element at the root of the subtree. The name of this element
varies between different subtrees.

I need an alphabetically sorted list of persons. In this list I have to
use the first occurrance of a person because they are used in a
FOP-generated PDF as an anchor for a page-number. The <titel>-element
does not matter in this list.

I appy the Muench-Method three times. The problem which arisis is both
memory-usage and cpu-time. My P4-2.6 takes 10 minutes and 1.4 GB RAM.
Both is a "little bit" too much. All other transformations are done in a
fraction of seconds.

I am using libxsl 1.1.5.

Here is my xslt-template (sorry for the long lines):

#v+
<?xml version="1.0" encoding="iso-8859-1"?>

<!-- allgemeine Namespaces für alle template.fo-Dateien -->
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:foberon="http://rnvs.informatik.tu-chemnitz.de/foberon"

xmlns:date="http://exslt.org/dates-and-times"
xmlns:str="http://exslt.org/strings"
xmlns:dynamic="http://exslt.org/dynamic"
xmlns:exsl="http://exslt.org/common"
xmlns:func="http://exslt.org/functions"

xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fox="http://xml.apache.org/fop/extensions"

exclude-result-prefixes="xsl foberon date str dynamic exsl">

<xsl:eek:utput method="xml" indent="yes"/>

<!-- get the identifier for the subtree recursively towards the root -->
<func:function name="foberon:groupid">
<xsl:param name="node"/>
<xsl:variable name="res">
<xsl:choose>
<xsl:when test="$node/foberon:meta/foberon:strukturnummer">
<xsl:value-of select="$node/foberon:meta/foberon:strukturnummer"/>
</xsl:when>
<xsl:when test="$node/foberon:meta/foberon:nummer">
<xsl:value-of select="$node/foberon:meta/foberon:nummer"/>
</xsl:when>
<xsl:when test="$node/foberon:meta/foberon:jahr">
<xsl:value-of select="$node/foberon:meta/foberon:jahr"/>
</xsl:when>
<xsl:eek:therwise>
<xsl:value-of select="foberon:groupid($node/..)"/>
</xsl:eek:therwise>
</xsl:choose>
</xsl:variable>
<func:result select="$res"/>
</func:function>

<xsl:key name="kDistinctName" match="foberon:person" use="foberon:name"/>
<xsl:key name="kDistinctNameAndVorname" match="foberon:person" use="concat(foberon:name,'||',foberon:vorname)"/>
<xsl:key name="kDistinctNameAndVornameAndSNR" match="foberon:person" use="concat(foberon:name,'||',foberon:vorname,'||',foberon:groupid(.))"/>

<xsl:template name="foberon:personenverzeichnis">
<fo:block xsl:use-attribute-sets="profstart">Personenverzeichnis</fo:block>
<xsl:for-each select="//foberon:person[generate-id() = generate-id(key('kDistinctName', foberon:name))]">
<!-- sort by name -->
<xsl:sort select="foberon:name"/>
<xsl:variable name="name" select="foberon:name"/>
<!-- find all different Name-Vorname-Pairs -->
<xsl:for-each select="//foberon:person[generate-id() = generate-id(key('kDistinctNameAndVorname',concat($name,'||',foberon:vorname)))]">
<!-- sort by vorname -->
<xsl:sort select="foberon:vorname"/>
<xsl:variable name="vorname" select="foberon:vorname"/>
<fo:block>
<xsl:value-of select="foberon:name"/>, <xsl:value-of select="substring(foberon:vorname,0,2)"/><xsl:text>. </xsl:text>
<!-- find first occurance of name-vorname in a subtree -->
<xsl:for-each select="//foberon:person[generate-id() = generate-id(key('kDistinctNameAndVornameAndSNR', concat($name,'||',$vorname,'||',foberon:groupid(.))))]">
<fo:basic-link internal-destination="{generate-id(.)}">
<xsl:choose>
<xsl:when test="not(position()=last())">
<fo:page-number-citation ref-id="{generate-id(.)}"/><xsl:text>, </xsl:text>
</xsl:when>
<xsl:eek:therwise>
<fo:page-number-citation ref-id="{generate-id(.)}"/><xsl:text>. </xsl:text>
</xsl:eek:therwise>
</xsl:choose>
</fo:basic-link>
</xsl:for-each>
</fo:block>
</xsl:for-each>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>
#v-

I tried the following:

1) Instead of using three times //foberon:person I create a variable
$persons=//foberon:person and applyed the for-each-loops on this
variable (for-each select=$persons[....]) -> no success

2) Not to use a variable in function groupid but writing:
<func:result><xsl:choose>...</xsl:choose></func:result> -> no success

There seems to be a memory-leak in libxslt.

Can you give me a hint or idea how to improve the performance of this
xslt?

Thank you in advance.


Chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top