XML parsing problem

K

Kurt Klinner

Hello,

while trying to parse a "large" XML document i found a
strange behaviour of the Parser Module(s) (XML::parser:perlSAX,
XML::parser, XML::parser::Expat

If my file XML file is larger then 65536 bytes
the actual character string is interrupted and a whitespace
is added.

For Example

<DATASET>
<DATA><![CDATA["NOVDEC_B"]]></DATA>
<DATA><![CDATA["November\December"]]></DATA>
<DATA><![CDATA["Nov\Dec"]]></DATA>
<DATA><![CDATA["01.11."]]></DATA>
<DATA><![CDATA[11]]></DATA>
<DATA><![CDATA["begin_2month"]]></DATA>
<DATA><![CDATA[11]]></DATA>
</DATASET>

if now "Novemver\December" is at the 65536 border the String is
splitted in "Nov WHITESPACE ember\December"

Any ideas how to avoid /fix that problem


Thanks in advance

Regards

Kurt
 
M

Michel Rodriguez

Kurt said:
while trying to parse a "large" XML document i found a
strange behaviour of the Parser Module(s) (XML::parser:perlSAX,
XML::parser, XML::parser::Expat

If my file XML file is larger then 65536 bytes
the actual character string is interrupted and a whitespace
is added.


This is documented behaviour:

in XML::parser::Expat (I know, you have to know where to look ;--(

· Char (Parser, String)
This event is generated when non-markup is recognized. The non-
markup sequence of characters is in String. A single non-markup
sequence of characters may generate multiple calls to this han-
dler. Whatever the encoding of the string in the original docu-
ment, this is given to the handler in UTF-8.

All books or tutorials about XML::parser show how to do this (buffer the
text in the character handler and output it when you find any other event).
If you use SAX you can use XML::Filter::BufferText (set up a pipeline using
SAX::MAchines and have an XML::Filter::BufferText object as the first
handler in the pipeline).

Incidently, I believe most SAX parsers behave that way, character handlers
can be called several times for a single element content.

__
Michel Rodriguez
Perl &amp; XML
http://xmltwig.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top