DOM Partial Document Parsing

G

Gary V

I am trying to build a Java client which will read a never ending XML
data stream from a socket. Here is a simplified example of the XML
document:

<?xml version='1.0' encoding='us-ascii'?>
<NeverendingDataStream>
<Data>1.0</Data>
<Data>2.0</Data>

followed by a continuous stream of <Data> elements from a server
application that may not send the closing </NeverendingDataStream> tag
for several hours or days.

The DocumentBuilderFactory and DocumentBuilder parser for building a
DOM object tree fails because of a missing end tag.

Is there a way to force partial document parsing? I have turned
validating off but it continues to throw a fatal error.

Thanks,
Gary V
 
A

Alan

Well, no, there is no way to read a partial DOM object with DOM. But,
if you are talking streaming, you are talking SAX. This is exactly what
the SAX parser is for.

If you aren't familiar with SAX, the brief description from Sun is:

"The 'Simple API' for XML (SAX) is the event-driven, serial-access
mechanism that does element-by-element processing".

So, what does that mean? I think I should just point you to the right
place:

http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPSAX.html

All you do is just listen to your stream of XML data, and the SAX
processor will invoke callback methods as each
node/element/comment/attribute etc is detected.


I would also recommend reading the O'Reilly book
Java & XML, 2nd Edition
Solutions to Real-World Problems
By Brett McLaughlin

Hope that gets you on the way..
 
A

Anton Spaans

Gary V said:
I am trying to build a Java client which will read a never ending XML
data stream from a socket. Here is a simplified example of the XML
document:

<?xml version='1.0' encoding='us-ascii'?>
<NeverendingDataStream>
<Data>1.0</Data>
<Data>2.0</Data>

followed by a continuous stream of <Data> elements from a server
application that may not send the closing </NeverendingDataStream> tag
for several hours or days.

The DocumentBuilderFactory and DocumentBuilder parser for building a
DOM object tree fails because of a missing end tag.

Is there a way to force partial document parsing? I have turned
validating off but it continues to throw a fatal error.

Thanks,
Gary V

If you use builder.parse(inputStream), where 'builder' is a DocumentBuilder
instance and 'inputStream' is obtained from socket.getInputStream(): This
does work, or not?

Then, the parse(inputStream) call won't return until all data has been
received. This means that the thread calling the parse(...) method will
'block' untill the whole document has been read.

Then you have a problem that the returned Document (returned by parse) is
not available until the end-tag has been received.

Therefore, you should use the saxParser =
javax.xml.parsers.SAXFactory.newInstance().newSAXParser() call to obtain a
SAXParser.

Then do saxParser.parse(inputStream, (DefaultHandler)dh) , where 'dh' is
your own implementation of the org.xml.sax.helpers.DefaultHandler interface.
This interface is called when elements become available, so you can track
the progress of the document building (and build the document in the mean
time) when its startElement(...) and endElement(...) methods are called.

-- Anton.
 
G

Gary V

Anton Spaans said:
If you use builder.parse(inputStream), where 'builder' is a DocumentBuilder
instance and 'inputStream' is obtained from socket.getInputStream(): This
does work, or not?

Then, the parse(inputStream) call won't return until all data has been
received. This means that the thread calling the parse(...) method will
'block' untill the whole document has been read.

Then you have a problem that the returned Document (returned by parse) is
not available until the end-tag has been received.

Therefore, you should use the saxParser =
javax.xml.parsers.SAXFactory.newInstance().newSAXParser() call to obtain a
SAXParser.

Then do saxParser.parse(inputStream, (DefaultHandler)dh) , where 'dh' is
your own implementation of the org.xml.sax.helpers.DefaultHandler interface.
This interface is called when elements become available, so you can track
the progress of the document building (and build the document in the mean
time) when its startElement(...) and endElement(...) methods are called.

-- Anton.

Thanks for the informative reply. I am using an InputStream and have actually
used the the SAX parser to parse the stream as it becomes available. I was
hoping to use DOM for parsing the XML stream into an object tree, but the more
I thought about it, I realize it just can't work on a continuous stream. I thought
there might be something I had overlooked.

Thanks again,
Gary V
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,160
Latest member
CollinStri
Top