Expat 1.95.8 fails on XML with newline

J

Jeff Lambert

I saw something similar on the sourceforge bugs list but it was from
2001 so I assume it's fixed by now.

O/S: WinXP SP2 and WinCE. Expat lib linked in VC++ 6 SP6.

I have the following XML (simplified for discussion purposes) The XML
starts and ends with the braces.

{

<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
</soapenv:Body>
</soapenv:Envelope>}

NOTICE the newline(0A) at the beginning of the file. Now I use the
following C++ code to read from the XML file:

do
{
size_t len = fread(buf, 1, sizeof(buf), xmlfile);
done = len < sizeof(buf);
if (XML_Parse(parser, buf, len, done) == XML_STATUS_ERROR)
{
return ReturnLua(State, 1, "Error while parsing.");
}
} while (!done);

This works mint on all other XML files, but not with that one. This is
how the XML file is returned to me by a SOAP server. What happens is
that on first pass through the while loop, XML_Parse doesn't even go
into the functions previously set, it instantly returns
XML_STATUS_ERROR, and the rest is history.

I would like to know if the error is in my file reading code or an
Expat bug. If it is the latter, is there a patch or quick fix? And if
it is my code, then what could I do to strip the initial newline(s)?

Thanks in advance!

Jeff Lambert
 
D

David Carlisle

NOTICE the newline(0A) at the beginning of the file. Now I use the
following C++ code to read from the XML file:

That makes the file not well formed, so any parser should reject it.
harsh but fair:)

David
 
M

Manuel Collado

Jeff said:
I saw something similar on the sourceforge bugs list but it was from
2001 so I assume it's fixed by now.

O/S: WinXP SP2 and WinCE. Expat lib linked in VC++ 6 SP6.

I have the following XML (simplified for discussion purposes) The XML
starts and ends with the braces.

{

<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
</soapenv:Body>
</soapenv:Envelope>}

NOTICE the newline(0A) at the beginning of the file. Now I use the
following C++ code to read from the XML file:

do
{
size_t len = fread(buf, 1, sizeof(buf), xmlfile);
done = len < sizeof(buf);
if (XML_Parse(parser, buf, len, done) == XML_STATUS_ERROR)
{
return ReturnLua(State, 1, "Error while parsing.");
}
} while (!done);

This works mint on all other XML files, but not with that one. This is
how the XML file is returned to me by a SOAP server. What happens is
that on first pass through the while loop, XML_Parse doesn't even go
into the functions previously set, it instantly returns
XML_STATUS_ERROR, and the rest is history.

I would like to know if the error is in my file reading code or an
Expat bug. If it is the latter, is there a patch or quick fix? And if
it is my code, then what could I do to strip the initial newline(s)?

The given sample XML file (with whitespace before the xml declaration)
is processed without errors in my build of XMLgawk with Expat 1.95.8 on
Windows XP. You can find the sources at:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/

You can download the source of "XMLpuller" and look how it uses Expat.

NOTE: I work in Windows, but I've tested the input file with both Unix
(LF) and Windows (CR+LF) newlines.
 
M

Manuel Collado

Jeff said:
I saw something similar on the sourceforge bugs list but it was from
2001 so I assume it's fixed by now.

O/S: WinXP SP2 and WinCE. Expat lib linked in VC++ 6 SP6.

I have the following XML (simplified for discussion purposes) The XML
starts and ends with the braces.

{

<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
</soapenv:Body>
</soapenv:Envelope>}

NOTICE the newline(0A) at the beginning of the file. Now I use the
following C++ code to read from the XML file:

do
{
size_t len = fread(buf, 1, sizeof(buf), xmlfile);
done = len < sizeof(buf);
if (XML_Parse(parser, buf, len, done) == XML_STATUS_ERROR)
{
return ReturnLua(State, 1, "Error while parsing.");
}
} while (!done);

This works mint on all other XML files, but not with that one. This is
how the XML file is returned to me by a SOAP server. What happens is
that on first pass through the while loop, XML_Parse doesn't even go
into the functions previously set, it instantly returns
XML_STATUS_ERROR, and the rest is history.

I would like to know if the error is in my file reading code or an
Expat bug. If it is the latter, is there a patch or quick fix? And if
it is my code, then what could I do to strip the initial newline(s)?

Thanks in advance!

Jeff Lambert

Please disregard my previous (cancelled) post about correct processing
in Windows. I was wrong.

My apologies.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,144
Latest member
KetoBaseReviews
Top