XSLT to "normalize" weight attribute

A

arnold

Hi,

I've been knocking my head against the wall trying to create an
XSL transform to perform "normalizations" of a set of XML files
that have a common structure.


% XML file before transform


<base>
<foo>
<bar weight="20">
<elementOne>asfd</elementOne>
<elementTwo>qwer</elementTwo>
</bar>
<bar weight="5">
<elementOne>asfd</elementOne>
<elementTwo>qwer</elementTwo>
</bar>
<bar weight="30">
<elementOne>asfd</elementOne>
<elementTwo>qwer</elementTwo>
</bar>
</foo>
</base>


% XML file after transform


<base>
<foo weightSum="55">
<bar weight="20" lower="1" upper="20">
<elementOne>asfd</elementOne>
<elementTwo>qwer</elementTwo>
</bar>
<bar weight="5" lower="21" upper="25">
<elementOne>asfd</elementOne>
<elementTwo>qwer</elementTwo>
</bar>
<bar weight="30" lower="26" upper="55">
<elementOne>asfd</elementOne>
<elementTwo>qwer</elementTwo>
</bar>
</foo>
</base>

The idea is that a random number between 1 and weightSum would be
selected, and then the child element of the element w/ the weightSum
attribute that has lower<=randomNum<=upper would be selected. This
is a transformation that would only be run when the underlying xml
files
have been updated, so speed of transformation is not an issue.

Constraints: I have many files with this 'weightSum'-'weight' pattern,
and the element names ('foo' and 'bar' in the example above) differ
from file to file. Furthermore, it would be great if the transform
worked on nested 'weightSum'-'weight' patterns, such as the following.


% XML file before transform


<base>
<foo>
<bar weight="20">
<elementOne weight="3">asfd</elementOne>
<elementTwo weight="8">qwer</elementTwo>
</bar>
...


</foo>
</base>


% XML file after transform


<base>
<foo weightSum="55">
<bar weight="20" lower="1" upper="20" weightSum="11">
<elementOne weight="3" lower="1" upper="3">asfd</elementOne>
<elementTwo weight="8" lower="4" upper="11">qwer</elementTwo>
</bar>
...
</foo>
</base>


Any help would be appreciated.


- Arnold
 
A

arnold

Here is a solution my earlier post. I used the Saxon8.7b parser.
I don't know if the solution relies on any XSLT 2.0 capabilities,
I need to test it with a XSLT 1.0 compliant parser.

The setup is as follows: A parent "container" element holds a
number of children elements with the same tag name. You want to
make it easy for a program to randomly select a child element with
a frequency that varies for each child. So in the example 'XML
input file' below, the first parent "container" element is named
'people' and thre are three children with the tag name 'person'.
The weights for the three children are '80, '10' and '40'. So
80/(80+10+40)% of the time I want to select the first 'person'
element. Likewise, within the first person element, I want to
select the first 'given' element 35/(35+25+10)% of the time.

Notes:
- The solution seems to work on nested weightSum-weight
combinations.
- For reasons I don't understand, simply applying the
transformation to the XML input file results in extra blank
lines. I use awk in a shell script to get rid of the blank
lines.
- Referring to the 'XML output file', a program would randomly
select (say) a 'person' by
1- reading the value of the 'weightSum'attribute for the
parent element 'persons'
2- randomly drawing between 0 and weightSum-1
3- locating the 'person' element s.t. the random number is
= the 'lower' attribute value and < the 'upper'
attribute value.




%------------------- XML input file ------------------------------
<?xml version="1.0"?>
<people weightSum="100">
<person weight="80">
<givens weightSum="0">
<given weight="35">Alfred</given>
<given weight="25">Fred</given>
<given weight="10">Wilfred</given>
</givens>
<family>Newman</family>
</person>
<person weight="10">
<givens>
<given>Leslie</given>
</givens>
<family>Newman</family>
</person>
<person weight="40">
<givens>
<given>Maria</given>
</givens>
<family>Newman</family>
</person>
</people>


%------------------- XML output file -----------------------------
<?xml version="1.0" encoding="UTF-8"?>
<people weightSum="130">
<person weight="80" lower="0" upper="80">
<givens weightSum="70">
<given weight="35" lower="0" upper="35">Alfred</given>
<given weight="25" lower="35" upper="60">Fred</given>
<given weight="10" lower="60" upper="70">Wilfred</given>
</givens>
<family>Newman</family>
</person>
<person weight="10" lower="80" upper="90">
<givens>
<given>Leslie</given>
</givens>
<family>Newman</family>
</person>
<person weight="40" lower="90" upper="130">
<givens>
<given>Maria</given>
</givens>
<family>Newman</family>
</person>
</people>




%------------------- XSLT file -----------------------------------
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:eek:utput method="xml" indent="yes"/>


<!-- The xsl:choose statement is used have this default template
match -->
<!-- everything EXCEPT elements with a 'weight' attribute.
-->
<xsl:template match="@*|node()">
<xsl:choose>
<xsl:when test="@weight"></xsl:when>
<xsl:eek:therwise>
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:eek:therwise>
</xsl:choose>
</xsl:template>

<!-- Here we match the element nodes that have a 'weight' attribute
-->
<xsl:template match="attribute::weightSum">
<xsl:attribute name="weightSum">
<xsl:value-of select="sum(../child::*/attribute::weight)" />
</xsl:attribute>

<xsl:for-each select="../child::*">
<xsl:variable name="weight" select="attribute::weight" />
<xsl:variable name="from"
select="sum(./preceding-sibling::*/attribute::weight)" />
<xsl:variable name="to"
select="sum(./preceding-sibling::*/attribute::weight)+$weight" />

<xsl:copy>
<xsl:attribute name="weight" >
<xsl:value-of select="$weight" />
</xsl:attribute>
<xsl:attribute name="lower">
<xsl:value-of select="$from" />
</xsl:attribute>
<xsl:attribute name="upper">
<xsl:value-of select="$to" />
</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:for-each>

</xsl:template>

</xsl:stylesheet>


%------------- Script to remove extra blank lines ----------------
#!/bin/bash

argc="$#"

if [ \( "$argc" -lt 1 \) -o \( "$argc" -gt 2 \) ]; then
printf "\n\n"
printf " Usage: NormalizeWeights.sh data.xml [output_file]"
printf "\n\n"
exit 1
fi

if [ "$argc" -eq 1 ]; then
inputXmlFname=$1;
/usr/bin/java -jar $HOME/sbox/software/lib/saxon8.7/saxon8.jar -t
$inputXmlFname NormalizeWeights.xsl | /usr/bin/awk '!/^( )+$/{print
$0;}'
elif [ "$argc" -eq 2 ]; then
inputXmlFname=$1;
outputXmlFname=$2;
if [ -f "$outputXmlFname" ]; then
backupName=$(printf "%s%s" $outputXmlFname ".bac" )
echo "File $outputXmlFname exists, making backup named
$backupName"
/bin/cp $outputXmlFname $backupName
fi
/usr/bin/java -jar $HOME/sbox/software/lib/saxon8.7/saxon8.jar -t
-o $outputXmlFname $inputXmlFname NormalizeWeights.xsl
/bin/cat $outputXmlFname | /usr/bin/awk '!/^( )+$/{print $0;}' >
tmp$$
/bin/mv tmp$$ $outputXmlFname
/bin/rm -f tmp$$
fi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top