c++ parsing with mix of sax & dom for large files

A

alex masselot

Hello

I'm not familiar with xerces in c++

Currently, we parse xml file with perl (typically XML::Twig) and java
(dom4j).
With both API, there is a very comfortable way to mix Sax/DOM, by
setting handlers to some elements paths.

The xml file is parsed, then once a defined paths is reached, the
element is considered and given to a handler subroutines.
All the subtree can be explored with domlike call (xpath etc.) as a
memory stored element.
Then, the tree can be purged, thus the memory released

It's a quite convenient merge, to get the best of two worlds.

Is ithat possible with xerces in c++???
I cannot find any simple answer in apache doc

thanks
Alex
 
P

Philippe Poulard

alex said:
Hello

I'm not familiar with xerces in c++

Currently, we parse xml file with perl (typically XML::Twig) and java
(dom4j).
With both API, there is a very comfortable way to mix Sax/DOM, by
setting handlers to some elements paths.

The xml file is parsed, then once a defined paths is reached, the
element is considered and given to a handler subroutines.
All the subtree can be explored with domlike call (xpath etc.) as a
memory stored element.
Then, the tree can be purged, thus the memory released

It's a job for Active Tags and the XML Control Language !

XCL pipelines are working in the same way in RefleX (the engine) ;
however, you can also use XPath directly on SAX streams :
you can define XPath patterns for filtering (like with XSLT) except that
large files are supported as well

additionally, you can "cast" a tree or a subtree from DOM to SAX or SAX
to DOM at will

here are some examples :
http://reflex.gforge.inria.fr/saxPatterns.html#N802B53
http://reflex.gforge.inria.fr/tutorial.html#N801C30

and the slides that were shown at <XML2006> in Boston :
http://disc.inria.fr/perso/philippe.poulard/xml/active-tags.pdf (pages 7
and 8)
It's a quite convenient merge, to get the best of two worlds.

this is also my opinion ; you can achieve very complex things thanks to
very few active tags
Is ithat possible with xerces in c++???

sure ! as you explain it yourself, it's not a question of language
I cannot find any simple answer in apache doc

thanks
Alex


--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
 
J

Joseph Kesselman

The traditional technique for mixing SAX and DOM is to use a SAX parser
together with a SAX-driven DOM-tree builder, and to write a SAX handler
that filters the events appropriately before passing them to the builder.

Once you've got your filtered DOM, you can of course run a compatable
XPath implementation against it. DOM Level 3 introduced XPath support,
though not all DOMs implement that optional feature and I'm not sure
offhand whether Xerces-C's DOM includes it or not. If not, I presume
Xalan-C has an XPath API, though I'm not sure how efficiently it
interoperates with the Xerces-C DOM (Xalan prefers to manipulate its own
data model).

So the answer is: Yes, it's possible, though you may need to write a bit
of code to glue it all together.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top