Progressive Parsing using Xerces C++

G

Girish

Hi All,

I have written a component(ATL COM) that wraps Xerces C++ parser.
I am firing necessary events for each of the notifications that I have
handled for the Content and Error handler. The events can then I am
able to parse XML input in the form of files.
I also have provided support for parsing of XML content in the form of
string data. I am able to do so by creating a MemBufInputSource object
using the XML content provided to the component. In this case I am
providing the full data as input. I want to avoid keeping the entire
xml content into memory.

Is it possible to parse the same XML content in chunks i.e. in a
progressive manner? Another scenarion could be that the data is passed
in chunks to the component for parsing? Can we use the ParseFirst and
ParseNext methods to achieve this?

Thanking you in advance!

regards,
Girish
 
F

Fabien R

Hi All,

I have written a component(ATL COM) that wraps Xerces C++ parser.
I am firing necessary events for each of the notifications that I have
handled for the Content and Error handler. The events can then I am
able to parse XML input in the form of files.
I also have provided support for parsing of XML content in the form of
string data. I am able to do so by creating a MemBufInputSource object
using the XML content provided to the component. In this case I am
providing the full data as input. I want to avoid keeping the entire
xml content into memory.

Is it possible to parse the same XML content in chunks i.e. in a
progressive manner?
Why don't you use the SAX API of Xerces ?
-
Fabien
 
G

Girish

Hi,

My apologies for not framing the issue properly. Here is the
updated one.

I have written a component(ATL COM) that wraps Xerces C++ parser.
I am firing necessary events for each of the methods that I have
handled for the Content and Error handler. These events can be trapped
at the client end. I am able to successfuly parse XML input in the
form of files.

I also have provided support for parsing of XML content in the
form of string data. To do so I create a "MemBufInputSource" object
from the input and pass it to the parse method. Here I am providing
the entire xml data as input. This approach is workable but will cause
problems when I have a large amount of data to be parsed. I will have
to load the entire data into memory.

The alternative to the above approach is too get the xml data in
chunks and then parse it. I have few queries related to this approach:
Is it possible to parse XML content in chunks i.e. in a progressive
manner?

I have tried the ParseFirst and ParseNext methods to achieve this. But
here again the entire data is to be passed or pointed to in parseFirst
method. Is there some other way to use these methods?

Thanking you in advance!

regards,
Girish
 
N

Nick Kew

Girish said:
I have tried the ParseFirst and ParseNext methods to achieve this. But
here again the entire data is to be passed or pointed to in parseFirst
method. Is there some other way to use these methods?

What you're looking for is a parseChunk API. Available in expat and
libxml2, but not in Xerces.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top