DOM sub trees whilst SAX'ing in perl?

Discussion in 'XML' started by bugbear, Feb 17, 2005.

  1. bugbear

    bugbear Guest

    I need to process some XML files that are rather large.
    However their structure may usefully be expressed
    as
    <ELEMENT FILE (RECORD)+>
    ..
    ..
    ..

    Each record is a few Kb. The files are many 10's of Megabytes.

    I would (dearly) like to use DOM to process each record,
    since it's easier to get my head round than SAX events.

    But I don't want to pull the whole file into
    a DOM tree; it's too big.

    These people have come up with a perfect (and obvious?)
    solution:
    http://www.devsphere.com/xml/saxdomix/

    But I'm coding in a Perl environment.

    Is there a similar Module, generating separate
    DOM sub trees for Perl?

    BugBear
    bugbear, Feb 17, 2005
    #1
    1. Advertising

  2. bugbear wrote:
    > I need to process some XML files that are rather large.
    > However their structure may usefully be expressed
    > as
    > <ELEMENT FILE (RECORD)+>
    > .
    > .
    > .
    >
    > Each record is a few Kb. The files are many 10's of Megabytes.
    >
    > I would (dearly) like to use DOM to process each record,
    > since it's easier to get my head round than SAX events.
    >
    > But I don't want to pull the whole file into
    > a DOM tree; it's too big.
    >
    > These people have come up with a perfect (and obvious?)
    > solution:
    > http://www.devsphere.com/xml/saxdomix/
    >
    > But I'm coding in a Perl environment.
    >
    > Is there a similar Module, generating separate
    > DOM sub trees for Perl?


    It looks like what XML::Twig does, except XML::Twig is not SAX/DOM based.

    --
    mirod
    Michel Rodriguez, Feb 17, 2005
    #2
    1. Advertising

  3. bugbear

    bugbear Guest

    Michel Rodriguez wrote:
    > bugbear wrote:


    >> These people have come up with a perfect (and obvious?)
    >> solution:
    >> http://www.devsphere.com/xml/saxdomix/
    >>
    >> But I'm coding in a Perl environment.
    >>
    >> Is there a similar Module, generating separate
    >> DOM sub trees for Perl?

    >
    >
    > It looks like what XML::Twig does, except XML::Twig is not SAX/DOM based.
    >


    OK. That does the right thing; I'd prefer to stay with standards
    (i.e. SAX and DOM) if possible. I'll keep looking, and bear
    XML::Twig in mind as a fall back position.

    BugBear
    bugbear, Feb 17, 2005
    #3
  4. bugbear

    SL Guest

    > >> Is there a similar Module, generating separate
    > >> DOM sub trees for Perl?

    > >
    > >
    > > It looks like what XML::Twig does, except XML::Twig is not SAX/DOM

    based.
    > >

    >
    > OK. That does the right thing; I'd prefer to stay with standards
    > (i.e. SAX and DOM) if possible. I'll keep looking, and bear
    > XML::Twig in mind as a fall back position.
    >


    I haven't used it since a while, but there is (or was) a package doing what
    you want on CPAN: DocSplitter in XML::SAX::Machines. It allows you to split
    a SAX stream into several smaller documents by throwing a startDocument()
    and endDocument() event before and after a particular element. For instance,
    you may split your stream on each RECORD element, so that each filter below
    in the pipeline process RECORD element as the root element of distinct
    document. This is is useful in particular with the filtre XML::Filter::XSLT
    by Matt Sergeant. If you want to merge again the results of the
    transformation into a big document, you may use a "Merger" in the pipeline
    package; it works with the splitter for removing the extra startDocument()
    and endDocument() events. Machines provide several facilities for dealing
    with SAX pipeline.

    HTH,
    SL
    SL, Feb 17, 2005
    #4
  5. bugbear

    bugbear Guest

    SL wrote:
    >>>>Is there a similar Module, generating separate
    >>>>DOM sub trees for Perl?
    >>>
    >>>
    >>>It looks like what XML::Twig does, except XML::Twig is not SAX/DOM

    >
    > based.
    >
    >>OK. That does the right thing; I'd prefer to stay with standards
    >>(i.e. SAX and DOM) if possible. I'll keep looking, and bear
    >>XML::Twig in mind as a fall back position.
    >>

    >
    >
    > I haven't used it since a while, but there is (or was) a package doing what
    > you want on CPAN: DocSplitter in XML::SAX::Machines. It allows you to split
    > a SAX stream into several smaller documents by throwing a startDocument()
    > and endDocument() event before and after a particular element. For instance,
    > you may split your stream on each RECORD element, so that each filter below
    > in the pipeline process RECORD element as the root element of distinct
    > document. This is is useful in particular with the filtre XML::Filter::XSLT
    > by Matt Sergeant. If you want to merge again the results of the
    > transformation into a big document, you may use a "Merger" in the pipeline
    > package; it works with the splitter for removing the extra startDocument()
    > and endDocument() events. Machines provide several facilities for dealing
    > with SAX pipeline.


    So how do I get my DOM(s)?

    BugBear
    bugbear, Feb 17, 2005
    #5
  6. bugbear

    SL Guest

    > So how do I get my DOM(s)?

    Look into the XML::Filter::XSLT::LibXSLT filter : it used
    XML::LibXML::SAX::Builder for building a DOM using the SAX events received.

    SL
    SL, Feb 17, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mondola

    FTP'ing with user sub-accounts

    Mondola, Aug 2, 2004, in forum: HTML
    Replies:
    4
    Views:
    532
    Mondola
    Aug 3, 2004
  2. Peter Saffrey

    Xerces C++ and DOM trees

    Peter Saffrey, Jan 6, 2005, in forum: XML
    Replies:
    1
    Views:
    823
    Martin Honnen
    Jan 6, 2005
  3. Ben
    Replies:
    2
    Views:
    862
  4. jacob navia

    Binary search trees (AVL trees)

    jacob navia, Jan 3, 2010, in forum: C Programming
    Replies:
    34
    Views:
    1,386
    Dann Corbit
    Jan 8, 2010
  5. Lawrence D'Oliveiro

    Death To Sub-Sub-Sub-Directories!

    Lawrence D'Oliveiro, May 5, 2011, in forum: Java
    Replies:
    92
    Views:
    1,971
    Lawrence D'Oliveiro
    May 20, 2011
Loading...

Share This Page