XSLT to "normalize" weight attribute

Discussion in 'XML' started by arnold, Mar 2, 2006.

  1. arnold

    arnold Guest

    Hi,

    I've been knocking my head against the wall trying to create an
    XSL transform to perform "normalizations" of a set of XML files
    that have a common structure.


    % XML file before transform


    <base>
    <foo>
    <bar weight="20">
    <elementOne>asfd</elementOne>
    <elementTwo>qwer</elementTwo>
    </bar>
    <bar weight="5">
    <elementOne>asfd</elementOne>
    <elementTwo>qwer</elementTwo>
    </bar>
    <bar weight="30">
    <elementOne>asfd</elementOne>
    <elementTwo>qwer</elementTwo>
    </bar>
    </foo>
    </base>


    % XML file after transform


    <base>
    <foo weightSum="55">
    <bar weight="20" lower="1" upper="20">
    <elementOne>asfd</elementOne>
    <elementTwo>qwer</elementTwo>
    </bar>
    <bar weight="5" lower="21" upper="25">
    <elementOne>asfd</elementOne>
    <elementTwo>qwer</elementTwo>
    </bar>
    <bar weight="30" lower="26" upper="55">
    <elementOne>asfd</elementOne>
    <elementTwo>qwer</elementTwo>
    </bar>
    </foo>
    </base>

    The idea is that a random number between 1 and weightSum would be
    selected, and then the child element of the element w/ the weightSum
    attribute that has lower<=randomNum<=upper would be selected. This
    is a transformation that would only be run when the underlying xml
    files
    have been updated, so speed of transformation is not an issue.

    Constraints: I have many files with this 'weightSum'-'weight' pattern,
    and the element names ('foo' and 'bar' in the example above) differ
    from file to file. Furthermore, it would be great if the transform
    worked on nested 'weightSum'-'weight' patterns, such as the following.


    % XML file before transform


    <base>
    <foo>
    <bar weight="20">
    <elementOne weight="3">asfd</elementOne>
    <elementTwo weight="8">qwer</elementTwo>
    </bar>
    ...


    </foo>
    </base>


    % XML file after transform


    <base>
    <foo weightSum="55">
    <bar weight="20" lower="1" upper="20" weightSum="11">
    <elementOne weight="3" lower="1" upper="3">asfd</elementOne>
    <elementTwo weight="8" lower="4" upper="11">qwer</elementTwo>
    </bar>
    ...
    </foo>
    </base>


    Any help would be appreciated.


    - Arnold
    arnold, Mar 2, 2006
    #1
    1. Advertising

  2. arnold

    arnold Guest

    Here is a solution my earlier post. I used the Saxon8.7b parser.
    I don't know if the solution relies on any XSLT 2.0 capabilities,
    I need to test it with a XSLT 1.0 compliant parser.

    The setup is as follows: A parent "container" element holds a
    number of children elements with the same tag name. You want to
    make it easy for a program to randomly select a child element with
    a frequency that varies for each child. So in the example 'XML
    input file' below, the first parent "container" element is named
    'people' and thre are three children with the tag name 'person'.
    The weights for the three children are '80, '10' and '40'. So
    80/(80+10+40)% of the time I want to select the first 'person'
    element. Likewise, within the first person element, I want to
    select the first 'given' element 35/(35+25+10)% of the time.

    Notes:
    - The solution seems to work on nested weightSum-weight
    combinations.
    - For reasons I don't understand, simply applying the
    transformation to the XML input file results in extra blank
    lines. I use awk in a shell script to get rid of the blank
    lines.
    - Referring to the 'XML output file', a program would randomly
    select (say) a 'person' by
    1- reading the value of the 'weightSum'attribute for the
    parent element 'persons'
    2- randomly drawing between 0 and weightSum-1
    3- locating the 'person' element s.t. the random number is
    >= the 'lower' attribute value and < the 'upper'

    attribute value.




    %------------------- XML input file ------------------------------
    <?xml version="1.0"?>
    <people weightSum="100">
    <person weight="80">
    <givens weightSum="0">
    <given weight="35">Alfred</given>
    <given weight="25">Fred</given>
    <given weight="10">Wilfred</given>
    </givens>
    <family>Newman</family>
    </person>
    <person weight="10">
    <givens>
    <given>Leslie</given>
    </givens>
    <family>Newman</family>
    </person>
    <person weight="40">
    <givens>
    <given>Maria</given>
    </givens>
    <family>Newman</family>
    </person>
    </people>


    %------------------- XML output file -----------------------------
    <?xml version="1.0" encoding="UTF-8"?>
    <people weightSum="130">
    <person weight="80" lower="0" upper="80">
    <givens weightSum="70">
    <given weight="35" lower="0" upper="35">Alfred</given>
    <given weight="25" lower="35" upper="60">Fred</given>
    <given weight="10" lower="60" upper="70">Wilfred</given>
    </givens>
    <family>Newman</family>
    </person>
    <person weight="10" lower="80" upper="90">
    <givens>
    <given>Leslie</given>
    </givens>
    <family>Newman</family>
    </person>
    <person weight="40" lower="90" upper="130">
    <givens>
    <given>Maria</given>
    </givens>
    <family>Newman</family>
    </person>
    </people>




    %------------------- XSLT file -----------------------------------
    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:eek:utput method="xml" indent="yes"/>


    <!-- The xsl:choose statement is used have this default template
    match -->
    <!-- everything EXCEPT elements with a 'weight' attribute.
    -->
    <xsl:template match="@*|node()">
    <xsl:choose>
    <xsl:when test="@weight"></xsl:when>
    <xsl:eek:therwise>
    <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
    </xsl:eek:therwise>
    </xsl:choose>
    </xsl:template>

    <!-- Here we match the element nodes that have a 'weight' attribute
    -->
    <xsl:template match="attribute::weightSum">
    <xsl:attribute name="weightSum">
    <xsl:value-of select="sum(../child::*/attribute::weight)" />
    </xsl:attribute>

    <xsl:for-each select="../child::*">
    <xsl:variable name="weight" select="attribute::weight" />
    <xsl:variable name="from"
    select="sum(./preceding-sibling::*/attribute::weight)" />
    <xsl:variable name="to"
    select="sum(./preceding-sibling::*/attribute::weight)+$weight" />

    <xsl:copy>
    <xsl:attribute name="weight" >
    <xsl:value-of select="$weight" />
    </xsl:attribute>
    <xsl:attribute name="lower">
    <xsl:value-of select="$from" />
    </xsl:attribute>
    <xsl:attribute name="upper">
    <xsl:value-of select="$to" />
    </xsl:attribute>
    <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
    </xsl:for-each>

    </xsl:template>

    </xsl:stylesheet>


    %------------- Script to remove extra blank lines ----------------
    #!/bin/bash

    argc="$#"

    if [ \( "$argc" -lt 1 \) -o \( "$argc" -gt 2 \) ]; then
    printf "\n\n"
    printf " Usage: NormalizeWeights.sh data.xml [output_file]"
    printf "\n\n"
    exit 1
    fi

    if [ "$argc" -eq 1 ]; then
    inputXmlFname=$1;
    /usr/bin/java -jar $HOME/sbox/software/lib/saxon8.7/saxon8.jar -t
    $inputXmlFname NormalizeWeights.xsl | /usr/bin/awk '!/^( )+$/{print
    $0;}'
    elif [ "$argc" -eq 2 ]; then
    inputXmlFname=$1;
    outputXmlFname=$2;
    if [ -f "$outputXmlFname" ]; then
    backupName=$(printf "%s%s" $outputXmlFname ".bac" )
    echo "File $outputXmlFname exists, making backup named
    $backupName"
    /bin/cp $outputXmlFname $backupName
    fi
    /usr/bin/java -jar $HOME/sbox/software/lib/saxon8.7/saxon8.jar -t
    -o $outputXmlFname $inputXmlFname NormalizeWeights.xsl
    /bin/cat $outputXmlFname | /usr/bin/awk '!/^( )+$/{print $0;}' >
    tmp$$
    /bin/mv tmp$$ $outputXmlFname
    /bin/rm -f tmp$$
    fi
    arnold, Mar 5, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Christos TZOTZIOY Georgiou

    unicodedata . normalize (NFD - NFC) inconsistency

    Christos TZOTZIOY Georgiou, Nov 8, 2004, in forum: Python
    Replies:
    3
    Views:
    879
    Christos TZOTZIOY Georgiou
    Nov 10, 2004
  2. AndyL
    Replies:
    6
    Views:
    425
    John Machin
    May 25, 2006
  3. =?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?=

    Vector, matrix, normalize, rotate. What package?

    =?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?=, Feb 27, 2007, in forum: Python
    Replies:
    5
    Views:
    6,303
  4. Mike
    Replies:
    0
    Views:
    406
  5. Peter Bengtsson

    Normalize a polish L

    Peter Bengtsson, Oct 15, 2007, in forum: Python
    Replies:
    11
    Views:
    590
    Roberto Bonvallet
    Oct 23, 2007
Loading...

Share This Page