Parsing multiple XML trees?

Discussion in 'XML' started by David Svoboda, Dec 15, 2005.

  1. I have a server program that takes commands and acts on them. The
    server program can also take these commands from an input file or
    standard input (mainly for testing purposes). As such, I often have
    files full of input commands to feed to the server.

    Right now the commands that the server takes are well-defined, but not
    in XML. Since the commands are not self-delimiting, I have to prepend
    each command with a 'length' number indicating how many chars the
    command takes.

    I would like to change the server to accept XML commands, and provide
    a DTD (or Schema or RelaxNG or ...) to ensure that the server only
    receives valid commands.

    My question is this: Can I take the length number out of my input
    files & network commands? Since XML is self-delimiting (tags must
    balance) this should be possible. However, every time I try to run a
    Xerces (Java) parser on a file full of XML commands (with no length
    info), it silently discards all but the first command.

    I guess what I want to know is, can Xerces take an input stream full
    of multiple XML trees and give me each XML tree in turn w/o discarding
    any of them? (I can use either SAX or DOM or SAX2 to accomplish this.)

    Several friends have suggested that I wrap the entire input file
    around a <root> tag, which would make the series of commands into one
    big giant happy XML file. I suppose that could work, but that has
    several problems: (1) it requires a different DTD to handle multiple
    commands than it does to handle one command. (2) as a server it
    precludes me from using DOM since I need to act on each command before
    the entire stream has been parsed.

    Maybe this is the wrong forum to ask, but it's not clear what the
    right forum would be. Is this feature covered in SAX? DOM? Is it
    specific to Xerces?

    ~David Svoboda
    David Svoboda, Dec 15, 2005
    #1
    1. Advertising

  2. David Svoboda wrote:


    > However, every time I try to run a
    > Xerces (Java) parser on a file full of XML commands (with no length
    > info), it silently discards all but the first command.


    > Several friends have suggested that I wrap the entire input file
    > around a <root> tag, which would make the series of commands into one
    > big giant happy XML file. I suppose that could work, but that has
    > several problems: (1) it requires a different DTD to handle multiple
    > commands than it does to handle one command. (2) as a server it
    > precludes me from using DOM since I need to act on each command before
    > the entire stream has been parsed.


    One of the requirements of markup to be called XML is a single root
    element thus if you want to process some markup with XML tools then you
    need to have a single root element e.g.
    <commands>
    <command />
    <command />
    </commands>
    if you have e.g.
    <command />
    <command />
    then that is not XML as that is not well-formed markup.


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Dec 15, 2005
    #2
    1. Advertising

  3. Martin Honnen wrote:
    >
    >
    > David Svoboda wrote:
    >
    >
    >> However, every time I try to run a
    >> Xerces (Java) parser on a file full of XML commands (with no length
    >> info), it silently discards all but the first command.

    >
    >
    >> Several friends have suggested that I wrap the entire input file
    >> around a <root> tag, which would make the series of commands into one
    >> big giant happy XML file. I suppose that could work, but that has
    >> several problems: (1) it requires a different DTD to handle multiple
    >> commands than it does to handle one command. (2) as a server it
    >> precludes me from using DOM since I need to act on each command before
    >> the entire stream has been parsed.

    >
    >
    > One of the requirements of markup to be called XML is a single root
    > element thus if you want to process some markup with XML tools then you
    > need to have a single root element e.g.
    > <commands>
    > <command />
    > <command />
    > </commands>
    > if you have e.g.
    > <command />
    > <command />
    > then that is not XML as that is not well-formed markup.
    >
    >


    So does that mean if I'm running a server I can only send it one XML
    command? That seems to mean that sending multiple XML commands is invalid.

    What if a client sends two XML commands really quickly, and my server
    'forgets' the second one? How does my server 'pop' exactly one XML
    command off the socket?
    ~Dave
    David Svoboda, Dec 15, 2005
    #3
  4. David Svoboda wrote:
    > Maybe this is the wrong forum to ask, but it's not clear what the
    > right forum would be. Is this feature covered in SAX? DOM? Is it
    > specific to Xerces?


    I'm not sure this will be at all helpful, but we confronted this same
    issue when designing an
    XML parsing extension to gawk. If XMLMODE is positive, we allow only
    a single XML document
    to be parsed. But if XMLMODE is negative, we parse a stream of
    concatenated documents
    (issuing an "ENDDOCUMENT" event between documents).

    We do this using the expat parser. The basic approach is to keep
    parsing until an error
    is encountered. When we get a parse error, we check to see whether the
    current parse
    depth is 0 and more than 0 elements have been parsed already. If so,
    we infer that
    we are done parsing a single XML document, so we issue the
    "ENDDOCUMENT" event
    and try to proceed with the next document. We do that by calling the
    XML_GetCurrentByteIndex()
    function to determine where in the input the error occurred. We use
    that offset value to
    identify where in the input to attempt to start parsing a new document.

    If that's of any interest, you can take a look at the code here:
    http://sourceforge.net/projects/xmlgawk
    This could be directly useful (if you want to use xgawk's XML
    extension), or the code
    may serve as a guide for how to implement this in your environment.

    Regards,
    Andy
    Andrew Schorr, Dec 16, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    631
  2. jacob navia

    Binary search trees (AVL trees)

    jacob navia, Jan 3, 2010, in forum: C Programming
    Replies:
    34
    Views:
    1,390
    Dann Corbit
    Jan 8, 2010
  3. Stephen Schor
    Replies:
    0
    Views:
    85
    Stephen Schor
    Jan 14, 2008
  4. Michael Lesser

    Noob, html trees & parsing

    Michael Lesser, Jun 12, 2009, in forum: Ruby
    Replies:
    1
    Views:
    142
    Sanjay Sharma
    Jun 13, 2009
  5. Erik Wasser
    Replies:
    5
    Views:
    433
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page