Use variables to get unique nodes

R

Rolf Kemper

Dear Experts,

I got stuck with the following problem and need your help.

What I wnat to do is to get a set of distinct nodes.
Before the distinct I have selected the multiple occourences already
sucsessfully. However , the rest does not work as expected.

Hope someone can help on that.
Rolf

############################### DATA ################################

My XML DATA:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<TEST>
<Pin PinName="A" />
<Pin PinName="B" />
<Pin PinName="B" />
<Pin PinName="C" />
<Pin PinName="X" />
</TEST>
<TEST>
<Pin PinName="A" />
<Pin PinName="D" />
<Pin PinName="C" />
<Pin PinName="X" />
<Pin PinName="A" />
</TEST>
</Root>


My Test XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:eek:utput method="text"/>
<xsl:variable name="NewLine" select="'
'"/>

<xsl:variable name="TestNodes" select="Root/TEST"/>
<xsl:variable name="MultiplePins" select="$TestNodes/Pin[@PinName =
preceding::pin/@PinName]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[@PinName != preceding::*/@PinName]"/>

<xsl:template match="/">
<xsl:value-of select="concat('all pins of all test nodes ==>
OK ',$NewLine)"/>
<xsl:for-each select="$TestNodes/Pin">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('multiple pins ==> OK ',$NewLine)"/>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('unique pins ==> NOT GOOD !!
',$NewLine)"/>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>

</xsl:template>

</xsl:stylesheet>

My results (gained by XMLSpy debug mode):

all pins of all test nodes ==> OK
A
B
B
C
X
A
D
C
X
A

multiple pins ==> OK
B
A
C
X
A

unique pins ==> NOT GOOD !!
B
A
C
X
A

( I expected B A C X )
#################### End #################################
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Rolf said:
What I wnat to do is to get a set of distinct nodes.

What is "distinct" ? Unique within one TEST node ?
Or Unique within one Root node ?

I wrote a short xmlgawk script, which tries to
reproduce your results. The script even looks
readable to me. Half of it consists of printing
test results:

# distinct_nodes.awk
# comp.text.xml 2004-10-01
# Read all nodes of type pin and find the ones
# which have a unique name attribute.
# JK 2004-10-01

BEGIN {
XMLMODE=1
print "all pins of all test nodes ==>"
}

XMLSTARTELEM == "Pin" {
count[XMLATTR["PinName"]] ++
print XMLATTR["PinName"]
}

END {
print "multiple pins ==>"
for (PinName in count) {
if (count[PinName] > 1)
print PinName, count[PinName]
}
print "unique pins ==>"
for (PinName in count) {
if (count[PinName] == 1)
print PinName, count[PinName]
}
}

all pins of all test nodes ==> OK
A
B
B
C
X
A
D
C
X
A

multiple pins ==> OK
B
A
C
X
A

unique pins ==> NOT GOOD !!
B
A
C
X
A

The results I get are:
all pins of all test nodes ==>
A
B
B
C
X
A
D
C
X
A
multiple pins ==>
A 3
B 2
C 2
X 2
unique pins ==>
D 1
( I expected B A C X )
#################### End #################################

Why do you expect B A C X ?
No matter how I understand "distinct", I would not
call B A C X "distinct" pins.


BTW: Do the Electrical Engineers at NEC really use XML
for counting their beans .. errhhh pins .. ?
 
M

Marrow

Hi Rolf,

When comparing node-sets it is as well to remember that x != y is not quite
the same as not(x = y).

But anyway, I don't think it's worth trying to do what you want using the
preceding method for obtaining uniques - the Muenchian technique will be
much easier and yield far better performance, e.g.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="text"/>

<xsl:key name="kDistinctPins" match="Pin" use="@PinName"/>

<xsl:variable name="TestNodes" select="Root/TEST/Pin"/>
<xsl:variable name="MultiplePins"
select="$TestNodes[count(key('kDistinctPins',@PinName)) &gt; 1]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[generate-id() =
generate-id(key('kDistinctPins',@PinName))]"/>

<xsl:template match="/">
<xsl:text>all pins of all test nodes ==> OK
</xsl:text>
<xsl:for-each select="$TestNodes">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
multiple pins ==> OK
</xsl:text>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
unique pins ==> OK
</xsl:text>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>


Btw, don't overuse the concat() function - especially not for literal
output... you are just doing concatenation where the transformation engine
will already serialize the output.

HTH
Marrow
http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
http://www.topxml.com/Xselerator


Rolf Kemper said:
Dear Experts,

I got stuck with the following problem and need your help.

What I wnat to do is to get a set of distinct nodes.
Before the distinct I have selected the multiple occourences already
sucsessfully. However , the rest does not work as expected.

Hope someone can help on that.
Rolf

############################### DATA ################################

My XML DATA:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<TEST>
<Pin PinName="A" />
<Pin PinName="B" />
<Pin PinName="B" />
<Pin PinName="C" />
<Pin PinName="X" />
</TEST>
<TEST>
<Pin PinName="A" />
<Pin PinName="D" />
<Pin PinName="C" />
<Pin PinName="X" />
<Pin PinName="A" />
</TEST>
</Root>


My Test XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:eek:utput method="text"/>
<xsl:variable name="NewLine" select="'
'"/>

<xsl:variable name="TestNodes" select="Root/TEST"/>
<xsl:variable name="MultiplePins" select="$TestNodes/Pin[@PinName =
preceding::pin/@PinName]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[@PinName != preceding::*/@PinName]"/>

<xsl:template match="/">
<xsl:value-of select="concat('all pins of all test nodes ==>
OK ',$NewLine)"/>
<xsl:for-each select="$TestNodes/Pin">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('multiple pins ==> OK ',$NewLine)"/>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('unique pins ==> NOT GOOD !!
',$NewLine)"/>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>

</xsl:template>

</xsl:stylesheet>

My results (gained by XMLSpy debug mode):

all pins of all test nodes ==> OK
A
B
B
C
X
A
D
C
X
A

multiple pins ==> OK
B
A
C
X
A

unique pins ==> NOT GOOD !!
B
A
C
X
A

( I expected B A C X )
#################### End #################################
 
R

Rolf Kemper

Hi Jürgen,

thank you for zour feedback.
Let me explain it once again, because my text was a bit weak.
What I need is a list of pins which occour more than once.
As a small hurdle I have to do some preselection by some other
properties before. Therefore the TestNodes.

So, the output should be the opposit of yours.

I have not tried the code provided by Marrow yet, but it explains
somehow what I have done wrong.

However, can you drop me some lines on xmlgawk ?
Can I use this within MSXML4 processor ?
How about schema check ?

As you stated, YES , the example is like counting some beans, but just
for the sake of demonstration. It addresses a very tiny problem of an
5M mixed data set. So my real world is a farm of beans , fruits ,
nuts, vegitables and many other things.
Myself is just creating different dishes out of it. XLST in
combination with XPATH is a great help, as I have to present the same
dish in differrent restaurants (HTML , Text , Exel , Spice ) too.
Finally, just for your valid comment and interest, we do a lot of PERL
scripting usually.

Thanks a lot
Rolf


Jürgen Kahrs said:
Rolf said:
What I wnat to do is to get a set of distinct nodes.

What is "distinct" ? Unique within one TEST node ?
Or Unique within one Root node ?

I wrote a short xmlgawk script, which tries to
reproduce your results. The script even looks
readable to me. Half of it consists of printing
test results:

# distinct_nodes.awk
# comp.text.xml 2004-10-01
# Read all nodes of type pin and find the ones
# which have a unique name attribute.
# JK 2004-10-01

BEGIN {
XMLMODE=1
print "all pins of all test nodes ==>"
}

XMLSTARTELEM == "Pin" {
count[XMLATTR["PinName"]] ++
print XMLATTR["PinName"]
}

END {
print "multiple pins ==>"
for (PinName in count) {
if (count[PinName] > 1)
print PinName, count[PinName]
}
print "unique pins ==>"
for (PinName in count) {
if (count[PinName] == 1)
print PinName, count[PinName]
}
}

all pins of all test nodes ==> OK
A
B
B
C
X
A
D
C
X
A

multiple pins ==> OK
B
A
C
X
A

unique pins ==> NOT GOOD !!
B
A
C
X
A

The results I get are:
all pins of all test nodes ==>
A
B
B
C
X
A
D
C
X
A
multiple pins ==>
A 3
B 2
C 2
X 2
unique pins ==>
D 1
( I expected B A C X )
#################### End #################################

Why do you expect B A C X ?
No matter how I understand "distinct", I would not
call B A C X "distinct" pins.


BTW: Do the Electrical Engineers at NEC really use XML
for counting their beans .. errhhh pins .. ?
 
R

Rolf Kemper

Dear Marrow,

thank you for your good input. Your code is basically working.
But my real world is much more complex, and on top of that my
understanding about match seems to be wrong.
1) I guess macth="Pin" will match Pin at any level of what is
currently selected.
Right ?
2) I still have some problem when I have a kind of preselection and
the node NOT selected has also a Pin which would normally match.
So I have made a more complex code to demonstrate that. Please find it
below.
(have a look to ... select="Root/x/TEST[44 >= @index and @index >=
43]" )
Maybe you can explain why this happens.

In case of fail, the $UniqueMultiplePins node set is empty !!

BTW, you stated that my concat is somehow ineffective. Does this mean
that several TEXT and VALUE-OF elements would be better ?

Thank you very much for your kind help
Rolf


######### new data #############################
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<E>
<!-- If we have the key match with pin only
the xslt does not work. If the maych is at least TEST/Pin
it is OK
-->
<Pin PinName="A" PinClass="T"/>
<!--<Pin PinName="AX" PinClass="T"/> -->
</E>
<x>
<!-- in case the TEST node below is not selected by Index
AND PinName="A" and CellType="M" matches
the result is wrong
-->
<TEST Index="42" CellType="M">
<Pin PinName="z" PinClass="R"/>
<!--<Pin PinName="AX" PinClass="T"/>-->
<Pin PinName="A" PinClass="T"/>
</TEST>
<TEST Index="43" CellType="M">
<Pin PinName="A" PinClass="T"/>
<Pin PinName="D" PinClass="R"/>
<Pin PinName="C" PinClass="R"/>
<Pin PinName="X" PinClass="R"/>
</TEST>
<TEST Index="44" CellType="M">
<Pin PinName="D" PinClass="R"/>
<Pin PinName="C" PinClass="R"/>
<Pin PinName="X" PinClass="R"/>
<Pin PinName="A" PinClass="T"/>
</TEST>
<TEST Index="45" CellType="M">
<Pin PinName="z" PinClass="R"/>
</TEST>
</x>
</Root>

############# new xslt ############################################

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="text"/>

<xsl:key name="kDistinctPins" match="TEST/Pin" use="@PinName"/>

<!-- @index >= 43 fails , @index >= 42 OK -->
<xsl:variable name="Selected" select="Root/x/TEST[44 >= @index and
@index >= 43]"/>
<xsl:variable name="Macros" select="$Selected[@CellType='M']"/>
<xsl:variable name="TopNetPins"
select="$Macros/Pin[@PinClass='T']"/>
<xsl:variable name="MultiplePins"
select="$TopNetPins[count(key('kDistinctPins',@PinName)) &gt; 1]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[generate-id() =
generate-id(key('kDistinctPins',@PinName))]"/>

<xsl:template match="/">
<xsl:text>all pins of all test nodes ==> OK
</xsl:text>
<xsl:for-each select="$TopNetPins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
multiple pins ==> OK
</xsl:text>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
unique pins ==> OK
</xsl:text>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

########## end ###################


Marrow said:
Hi Rolf,

When comparing node-sets it is as well to remember that x != y is not quite
the same as not(x = y).

But anyway, I don't think it's worth trying to do what you want using the
preceding method for obtaining uniques - the Muenchian technique will be
much easier and yield far better performance, e.g.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="text"/>

<xsl:key name="kDistinctPins" match="Pin" use="@PinName"/>

<xsl:variable name="TestNodes" select="Root/TEST/Pin"/>
<xsl:variable name="MultiplePins"
select="$TestNodes[count(key('kDistinctPins',@PinName)) &gt; 1]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[generate-id() =
generate-id(key('kDistinctPins',@PinName))]"/>

<xsl:template match="/">
<xsl:text>all pins of all test nodes ==> OK
</xsl:text>
<xsl:for-each select="$TestNodes">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
multiple pins ==> OK
</xsl:text>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
unique pins ==> OK
</xsl:text>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>


Btw, don't overuse the concat() function - especially not for literal
output... you are just doing concatenation where the transformation engine
will already serialize the output.

HTH
Marrow
http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
http://www.topxml.com/Xselerator


Rolf Kemper said:
Dear Experts,

I got stuck with the following problem and need your help.

What I wnat to do is to get a set of distinct nodes.
Before the distinct I have selected the multiple occourences already
sucsessfully. However , the rest does not work as expected.

Hope someone can help on that.
Rolf

############################### DATA ################################

My XML DATA:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<TEST>
<Pin PinName="A" />
<Pin PinName="B" />
<Pin PinName="B" />
<Pin PinName="C" />
<Pin PinName="X" />
</TEST>
<TEST>
<Pin PinName="A" />
<Pin PinName="D" />
<Pin PinName="C" />
<Pin PinName="X" />
<Pin PinName="A" />
</TEST>
</Root>


My Test XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:eek:utput method="text"/>
<xsl:variable name="NewLine" select="'
'"/>

<xsl:variable name="TestNodes" select="Root/TEST"/>
<xsl:variable name="MultiplePins" select="$TestNodes/Pin[@PinName =
preceding::pin/@PinName]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[@PinName != preceding::*/@PinName]"/>

<xsl:template match="/">
<xsl:value-of select="concat('all pins of all test nodes ==>
OK ',$NewLine)"/>
<xsl:for-each select="$TestNodes/Pin">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('multiple pins ==> OK ',$NewLine)"/>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('unique pins ==> NOT GOOD !!
',$NewLine)"/>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>

</xsl:template>

</xsl:stylesheet>

My results (gained by XMLSpy debug mode):

all pins of all test nodes ==> OK
A
B
B
C
X
A
D
C
X
A

multiple pins ==> OK
B
A
C
X
A

unique pins ==> NOT GOOD !!
B
A
C
X
A

( I expected B A C X )
#################### End #################################
 
R

Rolf Kemper

Marrow,

sorry I forgot one important thing. The constants 44,43 etc. used in
the preselection are actually variables in my case and must probaly
also be used in the key. But I think it is not allowed to have
variables in the match attribute!
This make it probably even more difficult.

Thanks
Rolf


Marrow said:
Hi Rolf,

When comparing node-sets it is as well to remember that x != y is not quite
the same as not(x = y).

But anyway, I don't think it's worth trying to do what you want using the
preceding method for obtaining uniques - the Muenchian technique will be
much easier and yield far better performance, e.g.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="text"/>

<xsl:key name="kDistinctPins" match="Pin" use="@PinName"/>

<xsl:variable name="TestNodes" select="Root/TEST/Pin"/>
<xsl:variable name="MultiplePins"
select="$TestNodes[count(key('kDistinctPins',@PinName)) &gt; 1]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[generate-id() =
generate-id(key('kDistinctPins',@PinName))]"/>

<xsl:template match="/">
<xsl:text>all pins of all test nodes ==> OK
</xsl:text>
<xsl:for-each select="$TestNodes">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
multiple pins ==> OK
</xsl:text>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
unique pins ==> OK
</xsl:text>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>


Btw, don't overuse the concat() function - especially not for literal
output... you are just doing concatenation where the transformation engine
will already serialize the output.

HTH
Marrow
http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
http://www.topxml.com/Xselerator


Rolf Kemper said:
Dear Experts,

I got stuck with the following problem and need your help.

What I wnat to do is to get a set of distinct nodes.
Before the distinct I have selected the multiple occourences already
sucsessfully. However , the rest does not work as expected.

Hope someone can help on that.
Rolf

############################### DATA ################################

My XML DATA:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<TEST>
<Pin PinName="A" />
<Pin PinName="B" />
<Pin PinName="B" />
<Pin PinName="C" />
<Pin PinName="X" />
</TEST>
<TEST>
<Pin PinName="A" />
<Pin PinName="D" />
<Pin PinName="C" />
<Pin PinName="X" />
<Pin PinName="A" />
</TEST>
</Root>


My Test XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:eek:utput method="text"/>
<xsl:variable name="NewLine" select="'
'"/>

<xsl:variable name="TestNodes" select="Root/TEST"/>
<xsl:variable name="MultiplePins" select="$TestNodes/Pin[@PinName =
preceding::pin/@PinName]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[@PinName != preceding::*/@PinName]"/>

<xsl:template match="/">
<xsl:value-of select="concat('all pins of all test nodes ==>
OK ',$NewLine)"/>
<xsl:for-each select="$TestNodes/Pin">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('multiple pins ==> OK ',$NewLine)"/>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('unique pins ==> NOT GOOD !!
',$NewLine)"/>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>

</xsl:template>

</xsl:stylesheet>

My results (gained by XMLSpy debug mode):

all pins of all test nodes ==> OK
A
B
B
C
X
A
D
C
X
A

multiple pins ==> OK
B
A
C
X
A

unique pins ==> NOT GOOD !!
B
A
C
X
A

( I expected B A C X )
#################### End #################################
 
M

Marrow

Hi Rolf,

First off...
BTW, you stated that my concat is somehow ineffective. Does this mean
that several TEXT and VALUE-OF elements would be better ?

Yes, it is actually more efficient during execution (in all transformation
engines i have come across) to do several <xsl:text>'s and <xsl:value-of>'s
than one <xsl:value-of> with a concat(). The reason is that concat() is an
operation that takes time - and an operation that the transformation was
intending to do anyway... so you are almost doing the same thing twice by
having things like <xsl:value-of select="concat('literal',value,...)"/>.
Much the same as in many computer lanuguages - where you would avoid
concatenations - especially repeated concantenations of literal/static text.

If you reduce it to a simple test...

<?xml version="1.0"?>
<root>
<item>xxxxxxxxxxxxx</item>
<item>xxxxxxxxxxxxx</item>
<item>xxxxxxxxxxxxx</item>
<item>xxxxxxxxxxxxx</item>
<item>xxxxxxxxxxxxx</item>
<item>xxxxxxxxxxxxx</item>
<item>xxxxxxxxxxxxx</item>
<item>xxxxxxxxxxxxx</item>
<item>xxxxxxxxxxxxx</item>
<item>xxxxxxxxxxxxx</item>
<!-- repeat a few thousand times -->
</root>

and run these two stylesheets as comparisons...

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="text"/>
<xsl:template match="/">
<xsl:for-each select="root/item">
<xsl:text>Item: </xsl:text>
<xsl:value-of select="."/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

and...

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="text"/>
<xsl:template match="/">
<xsl:for-each select="root/item">
<xsl:value-of select="concat('Item: ',.,'
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

on MSXML 4.0, which you mentioned, the first stylesheet only takes only
about 30% of the time it takes for the second stylesheet.


And the main problem...
1) I guess macth="Pin" will match Pin at any level of what is
currently selected.
Right ?

Correct - so as with any match pattern you keep specifying it from right to
left until it matches the required nodes. As you have done.
2) I still have some problem when I have a kind of preselection and
the node NOT selected has also a Pin which would normally match.
So I have made a more complex code to demonstrate that. Please find it
below.
(have a look to ... select="Root/x/TEST[44 >= @Index and @Index >=
43]" )
Maybe you can explain why this happens.

You need to have a pretty good understanding of how the Muenchian technique
works... it works by saying "is this node the same as the first occurence of
these nodes with this specific value". Therefore, any filtering needs to be
applied consistently for it to work correctly.

Filtering whilst trying to distincts can be tricky at the best of times -
and would be far more tricky (with hideously complex XPaths) if trying to
persue the preceding method of finding distincts.

If you use keys, you will be able to do some of the filtering within the
keys - by making the selection filter part of the key selection value. And
it is as well to learn keys and become very familiar with them - as they are
extremely powerful and useful.
The down side to keys is that you cannot use variables in the @select -
which means any selective filtering needs to be placed as a value in the
select and then passed into the key() value. Which means you could do your
@CellType and @PinClass filtering within the keys - but the comparator on
@index would be impossible... so that would need to be added consistently as
a predicate filter.

Something like...


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="idx_from" select="43"/>
<xsl:param name="idx_to" select="44"/>
<xsl:param name="cell_type" select="'M'"/>
<xsl:param name="pin_class" select="'T'"/>
<xsl:eek:utput method="text"/>

<!-- define a key that is capable of doing some of the filtering by
passing in key selection values -->
<xsl:key name="kSelectPins" match="TEST/Pin"
use="concat(../@CellType,'|',@PinClass)"/>

<!-- define the key that will filter and be used for finding distincts
with that filtering applied -->
<xsl:key name="kDistinctPins" match="TEST/Pin"
use="concat(../@CellType,'|',@PinClass,'|',@PinName)"/>

<xsl:variable name="TopNetPins"
select="key('kSelectPins',concat($cell_type,'|',$pin_class))[../@index &gt;=
$idx_from and ../@index &lt;= $idx_to]"/>
<xsl:variable name="MultiplePins"
select="$TopNetPins[count(key('kDistinctPins',concat(../@CellType,'|',@PinCl
ass,'|',@PinName))[../@index &gt;= $idx_from and ../@index &lt;= $idx_to])
&gt; 1]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[generate-id() =
generate-id(key('kDistinctPins',concat(../@CellType,'|',@PinClass,'|',@PinNa
me))[../@index &gt;= $idx_from and ../@index &lt;= $idx_to])]"/>

<xsl:template match="/">
<xsl:text>all pins of all test nodes ==> OK
</xsl:text>
<xsl:for-each select="$TopNetPins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
multiple pins ==> OK
</xsl:text>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
unique pins ==> OK
</xsl:text>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>


And yes, I have used concat() a fair bit - but not for serializing output.
;)

HTH
Marrow
http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
http://www.topxml.com/Xselerator



Rolf Kemper said:
Dear Marrow,

thank you for your good input. Your code is basically working.
But my real world is much more complex, and on top of that my
understanding about match seems to be wrong.
1) I guess macth="Pin" will match Pin at any level of what is
currently selected.
Right ?
2) I still have some problem when I have a kind of preselection and
the node NOT selected has also a Pin which would normally match.
So I have made a more complex code to demonstrate that. Please find it
below.
(have a look to ... select="Root/x/TEST[44 >= @Index and @Index >=
43]" )
Maybe you can explain why this happens.

In case of fail, the $UniqueMultiplePins node set is empty !!

BTW, you stated that my concat is somehow ineffective. Does this mean
that several TEXT and VALUE-OF elements would be better ?

Thank you very much for your kind help
Rolf


######### new data #############################
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<E>
<!-- If we have the key match with pin only
the xslt does not work. If the maych is at least TEST/Pin
it is OK
-->
<Pin PinName="A" PinClass="T"/>
<!--<Pin PinName="AX" PinClass="T"/> -->
</E>
<x>
<!-- in case the TEST node below is not selected by Index
AND PinName="A" and CellType="M" matches
the result is wrong
-->
<TEST Index="42" CellType="M">
<Pin PinName="z" PinClass="R"/>
<!--<Pin PinName="AX" PinClass="T"/>-->
<Pin PinName="A" PinClass="T"/>
</TEST>
<TEST Index="43" CellType="M">
<Pin PinName="A" PinClass="T"/>
<Pin PinName="D" PinClass="R"/>
<Pin PinName="C" PinClass="R"/>
<Pin PinName="X" PinClass="R"/>
</TEST>
<TEST Index="44" CellType="M">
<Pin PinName="D" PinClass="R"/>
<Pin PinName="C" PinClass="R"/>
<Pin PinName="X" PinClass="R"/>
<Pin PinName="A" PinClass="T"/>
</TEST>
<TEST Index="45" CellType="M">
<Pin PinName="z" PinClass="R"/>
</TEST>
</x>
</Root>

############# new xslt ############################################

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="text"/>

<xsl:key name="kDistinctPins" match="TEST/Pin" use="@PinName"/>

<!-- @Index >= 43 fails , @Index >= 42 OK -->
<xsl:variable name="Selected" select="Root/x/TEST[44 >= @Index and
@Index >= 43]"/>
<xsl:variable name="Macros" select="$Selected[@CellType='M']"/>
<xsl:variable name="TopNetPins"
select="$Macros/Pin[@PinClass='T']"/>
<xsl:variable name="MultiplePins"
select="$TopNetPins[count(key('kDistinctPins',@PinName)) &gt; 1]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[generate-id() =
generate-id(key('kDistinctPins',@PinName))]"/>

<xsl:template match="/">
<xsl:text>all pins of all test nodes ==> OK
</xsl:text>
<xsl:for-each select="$TopNetPins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
multiple pins ==> OK
</xsl:text>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
unique pins ==> OK
</xsl:text>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

########## end ###################


"Marrow" <[email protected]> wrote in message
Hi Rolf,

When comparing node-sets it is as well to remember that x != y is not quite
the same as not(x = y).

But anyway, I don't think it's worth trying to do what you want using the
preceding method for obtaining uniques - the Muenchian technique will be
much easier and yield far better performance, e.g.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="text"/>

<xsl:key name="kDistinctPins" match="Pin" use="@PinName"/>

<xsl:variable name="TestNodes" select="Root/TEST/Pin"/>
<xsl:variable name="MultiplePins"
select="$TestNodes[count(key('kDistinctPins',@PinName)) &gt; 1]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[generate-id() =
generate-id(key('kDistinctPins',@PinName))]"/>

<xsl:template match="/">
<xsl:text>all pins of all test nodes ==> OK
</xsl:text>
<xsl:for-each select="$TestNodes">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
multiple pins ==> OK
</xsl:text>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
unique pins ==> OK
</xsl:text>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="@PinName"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>


Btw, don't overuse the concat() function - especially not for literal
output... you are just doing concatenation where the transformation engine
will already serialize the output.

HTH
Marrow
http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
http://www.topxml.com/Xselerator


Rolf Kemper said:
Dear Experts,

I got stuck with the following problem and need your help.

What I wnat to do is to get a set of distinct nodes.
Before the distinct I have selected the multiple occourences already
sucsessfully. However , the rest does not work as expected.

Hope someone can help on that.
Rolf

############################### DATA ################################

My XML DATA:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<TEST>
<Pin PinName="A" />
<Pin PinName="B" />
<Pin PinName="B" />
<Pin PinName="C" />
<Pin PinName="X" />
</TEST>
<TEST>
<Pin PinName="A" />
<Pin PinName="D" />
<Pin PinName="C" />
<Pin PinName="X" />
<Pin PinName="A" />
</TEST>
</Root>


My Test XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:eek:utput method="text"/>
<xsl:variable name="NewLine" select="'
'"/>

<xsl:variable name="TestNodes" select="Root/TEST"/>
<xsl:variable name="MultiplePins" select="$TestNodes/Pin[@PinName =
preceding::pin/@PinName]"/>
<xsl:variable name="UniqueMultiplePins"
select="$MultiplePins[@PinName != preceding::*/@PinName]"/>

<xsl:template match="/">
<xsl:value-of select="concat('all pins of all test nodes ==>
OK ',$NewLine)"/>
<xsl:for-each select="$TestNodes/Pin">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('multiple pins ==> OK ',$NewLine)"/>
<xsl:for-each select="$MultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>
<xsl:value-of select="$NewLine"/>

<xsl:value-of select="concat('unique pins ==> NOT GOOD !!
',$NewLine)"/>
<xsl:for-each select="$UniqueMultiplePins">
<xsl:value-of select="concat(@PinName,$NewLine)"/>
</xsl:for-each>

</xsl:template>

</xsl:stylesheet>

My results (gained by XMLSpy debug mode):

all pins of all test nodes ==> OK
A
B
B
C
X
A
D
C
X
A

multiple pins ==> OK
B
A
C
X
A

unique pins ==> NOT GOOD !!
B
A
C
X
A

( I expected B A C X )
#################### End #################################
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Rolf said:
However, can you drop me some lines on xmlgawk ?

xmlgawk currently has the status of an experimental
extension of GNU Awk. You can find more information
about it in my posting in the newsgroup comp.lang.awk:

http://groups.google.de/groups?hl=d...2fdd&[email protected]#link1

I have just started writing some GNU-like doc about it
but the doc is not ready yet. Until then, Stefan Tramm's
description may be helpful:

http://homepage.mac.com/stefan.tramm/iWiki/XmlGawkMacOSX.html
Can I use this within MSXML4 processor ?

I don't know too much about XML tools in the Microsoft Universe.
How about schema check ?

xmlgawk is not meant to validate XML files.
xmlgawk does not use a DOM representation (although
Manuel Collado has written one in xmlgawk). Are you
sure you need a DOM-representation to do your work ?
My experience is that users often over-estimate the
need for a DOM. But I am ready to be convinced.
As you stated, YES , the example is like counting some beans, but just
for the sake of demonstration. It addresses a very tiny problem of an
5M mixed data set. So my real world is a farm of beans , fruits ,
nuts, vegitables and many other things.

XML files with several MB of data, sounds interesting.
How long does it take your XSL processor to do a run
over these files ? Several seconds or more ? I am asking
because I am interested in finding out typical uses of
XML files (outside web design), including turnaround times.
Myself is just creating different dishes out of it. XLST in
combination with XPATH is a great help, as I have to present the same
dish in differrent restaurants (HTML , Text , Exel , Spice ) too.

Spice ! Is this data-flow typical for industrial environments ?
If so, I would take this as an indication that XML has definitely
broken out of the web design niche.

Thanks for posting such an interesting use-case.
 
R

Rolf Kemper

Hi Juergen,

thank you for your info about xmlgawk and sorry for late reply.

Concerning XML in general my imagination is this:

1) XML is much more than just another web technology.
It is a data carrier !

2) As it really separates data from formats it is very helpfull for
data exchange. In my opinion it has the potential to make a lot of
specific conveters obsolete.
Example:
We store a lot of data in just one XML which is about 5M.
Than we compile different 'VIEWS' out of this big data set.
It takes some seconds usually, but this is totally OK as we do not
need realetime web services for thousands of users simultaniously.
If you create an Ecell sheet and store it as xml you will see how you
can easily create such a file with an xslt. Means we select certain
information out of the big xml and prersent it in Ecell.
Onother one may need a top level spice representation of some data. So
we can again select the required information and present this a text
(formated as spice).

3) I'm quite fed up with smal text files which store different values
(parameters , configurations , etc. ) in nearly each format you can
think of.
Doing this small files in XML makes a big progress as you now deal
with a standarized structure.
On top of that you can have a schema file which excactly specifies
what exact structure and data range is allowed. Means if you create a
new config you have a formal check BEFORE you get to the application!
Just now some colleage told me that first EDA tool vendors use XML for
such config files.

Hope that helps to understand our interest in xml

Rolf
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Rolf said:
Hope that helps to understand our interest in xml

Yes, thanks for your detailed report.
I am currently writing the doc for xmlgawk
and I am still searching for some examples
_other_ than web pages or text documents.
Could you send me a trivial example file
which reflects your typical data flow ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top