"walk over," and XPath-based substitutions?

Discussion in 'XML' started by Ivan Shmakov, Apr 6, 2013.

  1. Ivan Shmakov

    Ivan Shmakov Guest

    [Cross-posting to news:comp.text.xml, yet omitting it from
    Followup-To:, for I'm primarily interested in Perl-based
    solutions.]

    Is there an easy way to invoke a particular code for each of XML
    nodes that satisfies an XPath expression out of a certain list?

    A simple-minded approach (based on XML::LibXML) could be like:

    require XML::LibXML;

    my %xpath_sub = {
    q {//node ()[@foo = "bar"]} => \&foo_bar,
    q {//node ()[@baz = "qux"]} => sub { baz ("qux", @_); }
    };

    foreach my $xpath (keys (%xpath_sub)) {
    my $sub
    = $xpath_sub{$xpath};
    foreach my $node ($context->findnodes ($xpath)) {
    $sub->($node);
    }
    }

    However, AIUI, the code above implies that the XML tree is to be
    traversed multiple times. Which could probably be avoided by
    traversing the tree explicitly, as in:

    sub traverse {
    my ($node, $xsubs) = @_;
    foreach my $xpath (keys (%$xsubs)) {
    next
    unless ($node->find ($xpath));
    ## FIXME: check if the result is a boolean?
    $xsubs->{$xpath}->($node);
    ## FIXME: there, one may wish for a recursion; or not
    }
    ## recurse over the children
    foreach my $child ($node->childNodes ()) {
    traverse ($child, $xsubs);
    }
    ## .
    }

    Still, it may repeatedly traverse the children of $node while
    computing ->find () for each of the XPath expressions. (Unlike
    the way an "optimized," or "compiled," regular expression would
    be handled, IIUC.)

    The question is: does LibXML (or some other library) provide a
    way to make such a task both simpler to code and more efficient
    on execution?

    ... Or do I "optimize" all the XPath expressions themselves into
    a single one somehow?

    TIA.

    --
    FSF associate member #7257 http://hfday.org/
    Ivan Shmakov, Apr 6, 2013
    #1
    1. Advertising

  2. Ivan Shmakov

    Ivan Shmakov Guest

    >>>>> Joe Kesselman <> writes:
    >>>>> On 4/6/2013 7:32 AM, Ivan Shmakov wrote:


    [Given that there were little Perl-specific matter in this
    subthread, cross-posting back to news:comp.text.xml, and setting
    Followup-To: there.]

    >> However, AIUI, the code above implies that the XML tree is to be
    >> traversed multiple times.


    > First off, I'd suggest that you consider XSLT or XQuery, which are
    > specifically designed for this kind of find-and-process operation.


    I see little advantage in using XSLT for my task (and I'm not
    familiar with XQuery), as XML is not the only data source I need
    to interface. (E. g., I'm also accessing an SQLite database.)
    The usual benefits of XSLT -- the existence of browser-based
    implementations and its "Lisp-like" nature (in that it uses the
    same syntax for both the code and data) -- do not seem to apply.

    > What you're looking for is a "streaming processor" -- one which
    > rewrites the complete set of operations into a state machine which
    > can produce its results in a single pass over the nodes.


    Indeed, thanks for clarification!

    > There are XPath/XSLT/XQuery systems which attempt to do this for a
    > subset of the query language -- I think Xerces


    Is it Apache Xerces [1]? It doesn't seem to include either XSLT
    or XQuery.

    [1] https://xerces.apache.org/

    > and the IBM XML parser


    Which is?

    > have streaming-subset XPath evaluators, and I know the DataPower "xml
    > appliance" machines have some limited XSLT streaming capability --
    > but even as subsets, those are fairly rare, and while they may be
    > able to reduce storage by not keeping the entire document model in
    > memory they may not reduce computational load. If you're looking for
    > something off-the-shelf, that's where I'd start.


    ACK, thanks. My XMLs are rather small, so I'm more interested
    in reducing computational load than memory usage. But even that
    is not a priority right now. Rather, I'm looking for the ways
    to avoid total code rewrite at some later point.

    I guess I should check XML::Twig. Or, given that the conditions
    that I currently need to consider are rather simple, a
    straight-forward ->childNodes ()-based, no-XPath implementation
    may be possible.

    [...]

    > (I'm one of the authors of a patent on that topic, actually -- US
    > 8,120,789 B2 -- but unfortunately our group didn't get the funding to
    > finish a product-quality implementation of that logic so it isn't
    > available for use. If someone wants to license the patent, I'm sure
    > IBM would be delighted to talk to you...)


    I believe that I may be under a jurisdiction which has no notion
    of software patents. (Subject to the reading of TRIPS, though.)

    --
    FSF associate member #7257 http://hfday.org/
    Ivan Shmakov, Apr 7, 2013
    #2
    1. Advertising

  3. On 4/7/2013 5:22 AM, Ivan Shmakov wrote:
    > I see little advantage in using XSLT for my task (and I'm not
    > familiar with XQuery), as XML is not the only data source I need
    > to interface. (E. g., I'm also accessing an SQLite database.)


    That's a valid point.

    > The usual benefits of XSLT -- the existence of browser-based
    > implementations and its "Lisp-like" nature (in that it uses the
    > same syntax for both the code and data) -- do not seem to apply.


    Those aren't. The benefit of XSLT and XQuery is that they are query
    languages specialized for constructing new documents from XML input, and
    for XSLT in particular that it's a nonprocedural language for the
    purpose. This makes writing and maintaining the transformations easier,
    and may permit optimizations under the covers that would otherwise cost
    you a lot of coding effort.

    Same reasons you use SQL rather than hand-coding your own database.

    Not always the right answer, by any means. And "may" is indeed a valid
    caveat; that's a quality-of-implementation issue, as is true any time
    you use software provided by someone else (including compiler libraries)
    rather than coding it yourself. But don't sell these short by assuming
    that they are only for browsers, and don't get too hung up on the fact
    that XSLT happens to be expressed in XML. (XQuery isn't, and the two are
    in many ways isomorphic.)

    Though in fact the ability to use XML tools -- including XSLT -- to
    manipulate XSLT itself is sometimes surprisingly useful.

    > Is it Apache Xerces [1]? It doesn't seem to include either XSLT
    > or XQuery.


    Xerces interacts with Xalan (the Apache XPath/XSLT code), but I *think*
    Xerces also had an implementation of a small subset of XPath that could
    operate in streaming mode.

    > > and the IBM XML parser

    > Which is?


    Not available independently, alas. XL-TXE 1.0 ships with IBM's JRE,
    which in turn ships only with IBM products. XL-TXE 2.0 (which adds XPath
    2.0, XSLT 2.0, and XQuery support) currently ships only as the IBM
    Websphere Application Server's "XML feature".

    (IBM had been actively contributing to Xerces and Xalan -- in fact, we'd
    been pretty much carrying those projects -- but had to reassign that
    resource. XL-TXE is major reimplementation.)

    > I guess I should check XML::Twig.


    Not familiar with; can't advise.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Apr 7, 2013
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tony Eva
    Replies:
    1
    Views:
    480
    Jeff Epler
    Nov 9, 2003
  2. rh0dium
    Replies:
    6
    Views:
    270
    Dennis Lee Bieber
    Aug 5, 2005
  3. john swilting

    help regex and substitutions

    john swilting, Aug 31, 2007, in forum: Perl Misc
    Replies:
    4
    Views:
    101
    Tad McClellan
    Aug 31, 2007
  4. Peter Makholm

    Substitutions based on Posix ERE's in perl

    Peter Makholm, Apr 5, 2009, in forum: Perl Misc
    Replies:
    1
    Views:
    115
    smallpond
    Apr 6, 2009
  5. Ivan Shmakov

    "walk over," and XPath-based substitutions?

    Ivan Shmakov, Apr 6, 2013, in forum: Perl Misc
    Replies:
    3
    Views:
    236
    Ivan Shmakov
    Apr 7, 2013
Loading...

Share This Page