Multiple char[] causing problems during SAX parsing

H

hamacher

I am SAX pasring an XML file using a schema (Xerces, XMLReader). As I
understand SAX, it creates a single char[] that holds every character
in the document. However, when I debug my code (in debug mode), it
shows different arrays being used as an argument to characters(ch,
start, length). The result is an eventual NumberFormatException.

Unfortunately, I am not able to show you my code. However, this
behavior started when I switched from WebSphere 4.0 to Tomcat 5.0 using
the same version of Xerces.

It is as if the parser started to perceive my XML as multiple
documents. The arrays often contain whitespace, even though I use an
empty ignoreWhitespace() method.

Anybody relate to any part of this?
 
S

Sudsy

I am SAX pasring an XML file using a schema (Xerces, XMLReader). As I
understand SAX, it creates a single char[] that holds every character
in the document. However, when I debug my code (in debug mode), it
shows different arrays being used as an argument to characters(ch,
start, length). The result is an eventual NumberFormatException.

Unfortunately, I am not able to show you my code. However, this
behavior started when I switched from WebSphere 4.0 to Tomcat 5.0 using
the same version of Xerces.

It is as if the parser started to perceive my XML as multiple
documents. The arrays often contain whitespace, even though I use an
empty ignoreWhitespace() method.

Anybody relate to any part of this?

The method should be called ignorableWhitspace and your understanding
of the character callback is way off base. RTFM. It says, and I quote:

"The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks; however, all of the
characters in any single event must come from the same external entity
so that the Locator provides useful information."

So where you get the impression that "it creates a single char[] that
holds every character in the document" is beyond me.
 
H

hamacher

Interesting. Well, my parser is often passing char arrays consisting
of thousands of characters to characters(). It is as if the entire XML
document is being treated like a text node. Other times it consists of
characters from the wrong text nodes.

I am wondering if my migration from Websphere to Tomcat affected my
build order. We are now using Java 1.4 for the first time which
contains Xerces.
Or perhaps the parser is not finding my schema, but that is unlikely.
Oh well, I need to keep trying.
 
S

Sudsy

Interesting. Well, my parser is often passing char arrays consisting
of thousands of characters to characters(). It is as if the entire XML
document is being treated like a text node. Other times it consists of
characters from the wrong text nodes.
<snip>

In that case your document format might be to blame. Have you tried
running it through a validating DOM parser?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top