DOM sub trees whilst SAX'ing in perl?

B

bugbear

I need to process some XML files that are rather large.
However their structure may usefully be expressed
as
<ELEMENT FILE (RECORD)+>
..
..
..

Each record is a few Kb. The files are many 10's of Megabytes.

I would (dearly) like to use DOM to process each record,
since it's easier to get my head round than SAX events.

But I don't want to pull the whole file into
a DOM tree; it's too big.

These people have come up with a perfect (and obvious?)
solution:
http://www.devsphere.com/xml/saxdomix/

But I'm coding in a Perl environment.

Is there a similar Module, generating separate
DOM sub trees for Perl?

BugBear
 
M

Michel Rodriguez

bugbear said:
I need to process some XML files that are rather large.
However their structure may usefully be expressed
as
<ELEMENT FILE (RECORD)+>
.
.
.

Each record is a few Kb. The files are many 10's of Megabytes.

I would (dearly) like to use DOM to process each record,
since it's easier to get my head round than SAX events.

But I don't want to pull the whole file into
a DOM tree; it's too big.

These people have come up with a perfect (and obvious?)
solution:
http://www.devsphere.com/xml/saxdomix/

But I'm coding in a Perl environment.

Is there a similar Module, generating separate
DOM sub trees for Perl?

It looks like what XML::Twig does, except XML::Twig is not SAX/DOM based.
 
B

bugbear

Michel said:
bugbear wrote:


It looks like what XML::Twig does, except XML::Twig is not SAX/DOM based.

OK. That does the right thing; I'd prefer to stay with standards
(i.e. SAX and DOM) if possible. I'll keep looking, and bear
XML::Twig in mind as a fall back position.

BugBear
 
S

SL

Is there a similar Module, generating separate
OK. That does the right thing; I'd prefer to stay with standards
(i.e. SAX and DOM) if possible. I'll keep looking, and bear
XML::Twig in mind as a fall back position.

I haven't used it since a while, but there is (or was) a package doing what
you want on CPAN: DocSplitter in XML::SAX::Machines. It allows you to split
a SAX stream into several smaller documents by throwing a startDocument()
and endDocument() event before and after a particular element. For instance,
you may split your stream on each RECORD element, so that each filter below
in the pipeline process RECORD element as the root element of distinct
document. This is is useful in particular with the filtre XML::Filter::XSLT
by Matt Sergeant. If you want to merge again the results of the
transformation into a big document, you may use a "Merger" in the pipeline
package; it works with the splitter for removing the extra startDocument()
and endDocument() events. Machines provide several facilities for dealing
with SAX pipeline.

HTH,
SL
 
B

bugbear

SL said:
I haven't used it since a while, but there is (or was) a package doing what
you want on CPAN: DocSplitter in XML::SAX::Machines. It allows you to split
a SAX stream into several smaller documents by throwing a startDocument()
and endDocument() event before and after a particular element. For instance,
you may split your stream on each RECORD element, so that each filter below
in the pipeline process RECORD element as the root element of distinct
document. This is is useful in particular with the filtre XML::Filter::XSLT
by Matt Sergeant. If you want to merge again the results of the
transformation into a big document, you may use a "Merger" in the pipeline
package; it works with the splitter for removing the extra startDocument()
and endDocument() events. Machines provide several facilities for dealing
with SAX pipeline.

So how do I get my DOM(s)?

BugBear
 
S

SL

So how do I get my DOM(s)?

Look into the XML::Filter::XSLT::LibXSLT filter : it used
XML::LibXML::SAX::Builder for building a DOM using the SAX events received.

SL
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top