C
Chris Huebsch
Hello,
I have a Real-World-XSLT-Problem.(tm)
That is: A XML-Document of about 75.000 elements contains
<person>-elements at various depths of the tree. Each person consists of
a <name>, <vorname> and <titel>. There are about 8000 persons in this tree.
A lot of them exist more than once. (That means that there are
person-Elements with the same content.)
The tree consists of some dozen subtrees wich are identified by
one special element at the root of the subtree. The name of this element
varies between different subtrees.
I need an alphabetically sorted list of persons. In this list I have to
use the first occurrance of a person because they are used in a
FOP-generated PDF as an anchor for a page-number. The <titel>-element
does not matter in this list.
I appy the Muench-Method three times. The problem which arisis is both
memory-usage and cpu-time. My P4-2.6 takes 10 minutes and 1.4 GB RAM.
Both is a "little bit" too much. All other transformations are done in a
fraction of seconds.
I am using libxsl 1.1.5.
Here is my xslt-template (sorry for the long lines):
#v+
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- allgemeine Namespaces für alle template.fo-Dateien -->
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:foberon="http://rnvs.informatik.tu-chemnitz.de/foberon"
xmlns:date="http://exslt.org/dates-and-times"
xmlns:str="http://exslt.org/strings"
xmlns:dynamic="http://exslt.org/dynamic"
xmlns:exsl="http://exslt.org/common"
xmlns:func="http://exslt.org/functions"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fox="http://xml.apache.org/fop/extensions"
exclude-result-prefixes="xsl foberon date str dynamic exsl">
<xsl
utput method="xml" indent="yes"/>
<!-- get the identifier for the subtree recursively towards the root -->
<func:function name="foberon:groupid">
<xsl
aram name="node"/>
<xsl:variable name="res">
<xsl:choose>
<xsl:when test="$node/foberon:meta/foberon:strukturnummer">
<xsl:value-of select="$node/foberon:meta/foberon:strukturnummer"/>
</xsl:when>
<xsl:when test="$node/foberon:meta/foberon:nummer">
<xsl:value-of select="$node/foberon:meta/foberon:nummer"/>
</xsl:when>
<xsl:when test="$node/foberon:meta/foberon:jahr">
<xsl:value-of select="$node/foberon:meta/foberon:jahr"/>
</xsl:when>
<xsl
therwise>
<xsl:value-of select="foberon:groupid($node/..)"/>
</xsl
therwise>
</xsl:choose>
</xsl:variable>
<func:result select="$res"/>
</func:function>
<xsl:key name="kDistinctName" match="foberon
erson" use="foberon:name"/>
<xsl:key name="kDistinctNameAndVorname" match="foberon
erson" use="concat(foberon:name,'||',foberon:vorname)"/>
<xsl:key name="kDistinctNameAndVornameAndSNR" match="foberon
erson" use="concat(foberon:name,'||',foberon:vorname,'||',foberon:groupid(.))"/>
<xsl:template name="foberon
ersonenverzeichnis">
<fo:block xsl:use-attribute-sets="profstart">Personenverzeichnis</fo:block>
<xsl:for-each select="//foberon
erson[generate-id() = generate-id(key('kDistinctName', foberon:name))]">
<!-- sort by name -->
<xsl:sort select="foberon:name"/>
<xsl:variable name="name" select="foberon:name"/>
<!-- find all different Name-Vorname-Pairs -->
<xsl:for-each select="//foberon
erson[generate-id() = generate-id(key('kDistinctNameAndVorname',concat($name,'||',foberon:vorname)))]">
<!-- sort by vorname -->
<xsl:sort select="foberon:vorname"/>
<xsl:variable name="vorname" select="foberon:vorname"/>
<fo:block>
<xsl:value-of select="foberon:name"/>, <xsl:value-of select="substring(foberon:vorname,0,2)"/><xsl:text>. </xsl:text>
<!-- find first occurance of name-vorname in a subtree -->
<xsl:for-each select="//foberon
erson[generate-id() = generate-id(key('kDistinctNameAndVornameAndSNR', concat($name,'||',$vorname,'||',foberon:groupid(.))))]">
<fo:basic-link internal-destination="{generate-id(.)}">
<xsl:choose>
<xsl:when test="not(position()=last())">
<fo
age-number-citation ref-id="{generate-id(.)}"/><xsl:text>, </xsl:text>
</xsl:when>
<xsl
therwise>
<fo
age-number-citation ref-id="{generate-id(.)}"/><xsl:text>. </xsl:text>
</xsl
therwise>
</xsl:choose>
</fo:basic-link>
</xsl:for-each>
</fo:block>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
#v-
I tried the following:
1) Instead of using three times //foberon
erson I create a variable
$persons=//foberon
erson and applyed the for-each-loops on this
variable (for-each select=$persons[....]) -> no success
2) Not to use a variable in function groupid but writing:
<func:result><xsl:choose>...</xsl:choose></func:result> -> no success
There seems to be a memory-leak in libxslt.
Can you give me a hint or idea how to improve the performance of this
xslt?
Thank you in advance.
Chris
I have a Real-World-XSLT-Problem.(tm)
That is: A XML-Document of about 75.000 elements contains
<person>-elements at various depths of the tree. Each person consists of
a <name>, <vorname> and <titel>. There are about 8000 persons in this tree.
A lot of them exist more than once. (That means that there are
person-Elements with the same content.)
The tree consists of some dozen subtrees wich are identified by
one special element at the root of the subtree. The name of this element
varies between different subtrees.
I need an alphabetically sorted list of persons. In this list I have to
use the first occurrance of a person because they are used in a
FOP-generated PDF as an anchor for a page-number. The <titel>-element
does not matter in this list.
I appy the Muench-Method three times. The problem which arisis is both
memory-usage and cpu-time. My P4-2.6 takes 10 minutes and 1.4 GB RAM.
Both is a "little bit" too much. All other transformations are done in a
fraction of seconds.
I am using libxsl 1.1.5.
Here is my xslt-template (sorry for the long lines):
#v+
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- allgemeine Namespaces für alle template.fo-Dateien -->
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:foberon="http://rnvs.informatik.tu-chemnitz.de/foberon"
xmlns:date="http://exslt.org/dates-and-times"
xmlns:str="http://exslt.org/strings"
xmlns:dynamic="http://exslt.org/dynamic"
xmlns:exsl="http://exslt.org/common"
xmlns:func="http://exslt.org/functions"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fox="http://xml.apache.org/fop/extensions"
exclude-result-prefixes="xsl foberon date str dynamic exsl">
<xsl
<!-- get the identifier for the subtree recursively towards the root -->
<func:function name="foberon:groupid">
<xsl
<xsl:variable name="res">
<xsl:choose>
<xsl:when test="$node/foberon:meta/foberon:strukturnummer">
<xsl:value-of select="$node/foberon:meta/foberon:strukturnummer"/>
</xsl:when>
<xsl:when test="$node/foberon:meta/foberon:nummer">
<xsl:value-of select="$node/foberon:meta/foberon:nummer"/>
</xsl:when>
<xsl:when test="$node/foberon:meta/foberon:jahr">
<xsl:value-of select="$node/foberon:meta/foberon:jahr"/>
</xsl:when>
<xsl
<xsl:value-of select="foberon:groupid($node/..)"/>
</xsl
</xsl:choose>
</xsl:variable>
<func:result select="$res"/>
</func:function>
<xsl:key name="kDistinctName" match="foberon
<xsl:key name="kDistinctNameAndVorname" match="foberon
<xsl:key name="kDistinctNameAndVornameAndSNR" match="foberon
<xsl:template name="foberon
<fo:block xsl:use-attribute-sets="profstart">Personenverzeichnis</fo:block>
<xsl:for-each select="//foberon
<!-- sort by name -->
<xsl:sort select="foberon:name"/>
<xsl:variable name="name" select="foberon:name"/>
<!-- find all different Name-Vorname-Pairs -->
<xsl:for-each select="//foberon
<!-- sort by vorname -->
<xsl:sort select="foberon:vorname"/>
<xsl:variable name="vorname" select="foberon:vorname"/>
<fo:block>
<xsl:value-of select="foberon:name"/>, <xsl:value-of select="substring(foberon:vorname,0,2)"/><xsl:text>. </xsl:text>
<!-- find first occurance of name-vorname in a subtree -->
<xsl:for-each select="//foberon
<fo:basic-link internal-destination="{generate-id(.)}">
<xsl:choose>
<xsl:when test="not(position()=last())">
<fo
</xsl:when>
<xsl
<fo
</xsl
</xsl:choose>
</fo:basic-link>
</xsl:for-each>
</fo:block>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
#v-
I tried the following:
1) Instead of using three times //foberon
$persons=//foberon
variable (for-each select=$persons[....]) -> no success
2) Not to use a variable in function groupid but writing:
<func:result><xsl:choose>...</xsl:choose></func:result> -> no success
There seems to be a memory-leak in libxslt.
Can you give me a hint or idea how to improve the performance of this
xslt?
Thank you in advance.
Chris