"walk over," and XPath-based substitutions?

I

Ivan Shmakov

[Cross-posting to yet omitting it from
Followup-To:, for I'm primarily interested in Perl-based
solutions.]

Is there an easy way to invoke a particular code for each of XML
nodes that satisfies an XPath expression out of a certain list?

A simple-minded approach (based on XML::LibXML) could be like:

require XML::LibXML;

my %xpath_sub = {
q {//node ()[@foo = "bar"]} => \&foo_bar,
q {//node ()[@baz = "qux"]} => sub { baz ("qux", @_); }
};

foreach my $xpath (keys (%xpath_sub)) {
my $sub
= $xpath_sub{$xpath};
foreach my $node ($context->findnodes ($xpath)) {
$sub->($node);
}
}

However, AIUI, the code above implies that the XML tree is to be
traversed multiple times. Which could probably be avoided by
traversing the tree explicitly, as in:

sub traverse {
my ($node, $xsubs) = @_;
foreach my $xpath (keys (%$xsubs)) {
next
unless ($node->find ($xpath));
## FIXME: check if the result is a boolean?
$xsubs->{$xpath}->($node);
## FIXME: there, one may wish for a recursion; or not
}
## recurse over the children
foreach my $child ($node->childNodes ()) {
traverse ($child, $xsubs);
}
## .
}

Still, it may repeatedly traverse the children of $node while
computing ->find () for each of the XPath expressions. (Unlike
the way an "optimized," or "compiled," regular expression would
be handled, IIUC.)

The question is: does LibXML (or some other library) provide a
way to make such a task both simpler to code and more efficient
on execution?

... Or do I "optimize" all the XPath expressions themselves into
a single one somehow?

TIA.
 
I

Ivan Shmakov

[Given that there were little Perl-specific matter in this
subthread, cross-posting back to and setting
Followup-To: there.]
First off, I'd suggest that you consider XSLT or XQuery, which are
specifically designed for this kind of find-and-process operation.

I see little advantage in using XSLT for my task (and I'm not
familiar with XQuery), as XML is not the only data source I need
to interface. (E. g., I'm also accessing an SQLite database.)
The usual benefits of XSLT -- the existence of browser-based
implementations and its "Lisp-like" nature (in that it uses the
same syntax for both the code and data) -- do not seem to apply.
What you're looking for is a "streaming processor" -- one which
rewrites the complete set of operations into a state machine which
can produce its results in a single pass over the nodes.

Indeed, thanks for clarification!
There are XPath/XSLT/XQuery systems which attempt to do this for a
subset of the query language -- I think Xerces

Is it Apache Xerces [1]? It doesn't seem to include either XSLT
or XQuery.

[1] https://xerces.apache.org/
and the IBM XML parser

Which is?
have streaming-subset XPath evaluators, and I know the DataPower "xml
appliance" machines have some limited XSLT streaming capability --
but even as subsets, those are fairly rare, and while they may be
able to reduce storage by not keeping the entire document model in
memory they may not reduce computational load. If you're looking for
something off-the-shelf, that's where I'd start.

ACK, thanks. My XMLs are rather small, so I'm more interested
in reducing computational load than memory usage. But even that
is not a priority right now. Rather, I'm looking for the ways
to avoid total code rewrite at some later point.

I guess I should check XML::Twig. Or, given that the conditions
that I currently need to consider are rather simple, a
straight-forward ->childNodes ()-based, no-XPath implementation
may be possible.

[...]
(I'm one of the authors of a patent on that topic, actually -- US
8,120,789 B2 -- but unfortunately our group didn't get the funding to
finish a product-quality implementation of that logic so it isn't
available for use. If someone wants to license the patent, I'm sure
IBM would be delighted to talk to you...)

I believe that I may be under a jurisdiction which has no notion
of software patents. (Subject to the reading of TRIPS, though.)
 
J

Joe Kesselman

I see little advantage in using XSLT for my task (and I'm not
familiar with XQuery), as XML is not the only data source I need
to interface. (E. g., I'm also accessing an SQLite database.)

That's a valid point.
The usual benefits of XSLT -- the existence of browser-based
implementations and its "Lisp-like" nature (in that it uses the
same syntax for both the code and data) -- do not seem to apply.

Those aren't. The benefit of XSLT and XQuery is that they are query
languages specialized for constructing new documents from XML input, and
for XSLT in particular that it's a nonprocedural language for the
purpose. This makes writing and maintaining the transformations easier,
and may permit optimizations under the covers that would otherwise cost
you a lot of coding effort.

Same reasons you use SQL rather than hand-coding your own database.

Not always the right answer, by any means. And "may" is indeed a valid
caveat; that's a quality-of-implementation issue, as is true any time
you use software provided by someone else (including compiler libraries)
rather than coding it yourself. But don't sell these short by assuming
that they are only for browsers, and don't get too hung up on the fact
that XSLT happens to be expressed in XML. (XQuery isn't, and the two are
in many ways isomorphic.)

Though in fact the ability to use XML tools -- including XSLT -- to
manipulate XSLT itself is sometimes surprisingly useful.
Is it Apache Xerces [1]? It doesn't seem to include either XSLT
or XQuery.

Xerces interacts with Xalan (the Apache XPath/XSLT code), but I *think*
Xerces also had an implementation of a small subset of XPath that could
operate in streaming mode.
Which is?

Not available independently, alas. XL-TXE 1.0 ships with IBM's JRE,
which in turn ships only with IBM products. XL-TXE 2.0 (which adds XPath
2.0, XSLT 2.0, and XQuery support) currently ships only as the IBM
Websphere Application Server's "XML feature".

(IBM had been actively contributing to Xerces and Xalan -- in fact, we'd
been pretty much carrying those projects -- but had to reassign that
resource. XL-TXE is major reimplementation.)
I guess I should check XML::Twig.

Not familiar with; can't advise.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top