Multiple char[] causing problems during SAX parsing

hamacher · Dec 28, 2004

I am SAX pasring an XML file using a schema (Xerces, XMLReader). As I
understand SAX, it creates a single char[] that holds every character
in the document. However, when I debug my code (in debug mode), it
shows different arrays being used as an argument to characters(ch,
start, length). The result is an eventual NumberFormatException.

Unfortunately, I am not able to show you my code. However, this
behavior started when I switched from WebSphere 4.0 to Tomcat 5.0 using
the same version of Xerces.

It is as if the parser started to perceive my XML as multiple
documents. The arrays often contain whitespace, even though I use an
empty ignoreWhitespace() method.

Anybody relate to any part of this?

Sudsy · Dec 28, 2004

I am SAX pasring an XML file using a schema (Xerces, XMLReader). As I
understand SAX, it creates a single char[] that holds every character
in the document. However, when I debug my code (in debug mode), it
shows different arrays being used as an argument to characters(ch,
start, length). The result is an eventual NumberFormatException.

Unfortunately, I am not able to show you my code. However, this
behavior started when I switched from WebSphere 4.0 to Tomcat 5.0 using
the same version of Xerces.

It is as if the parser started to perceive my XML as multiple
documents. The arrays often contain whitespace, even though I use an
empty ignoreWhitespace() method.

Anybody relate to any part of this?

The method should be called ignorableWhitspace and your understanding
of the character callback is way off base. RTFM. It says, and I quote:

"The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks; however, all of the
characters in any single event must come from the same external entity
so that the Locator provides useful information."

So where you get the impression that "it creates a single char[] that
holds every character in the document" is beyond me.

hamacher · Dec 29, 2004

Interesting. Well, my parser is often passing char arrays consisting
of thousands of characters to characters(). It is as if the entire XML
document is being treated like a text node. Other times it consists of
characters from the wrong text nodes.

I am wondering if my migration from Websphere to Tomcat affected my
build order. We are now using Java 1.4 for the first time which
contains Xerces.
Or perhaps the parser is not finding my schema, but that is unlikely.
Oh well, I need to keep trying.

Sudsy · Dec 29, 2004

Interesting. Well, my parser is often passing char arrays consisting
of thousands of characters to characters(). It is as if the entire XML
document is being treated like a text node. Other times it consists of
characters from the wrong text nodes.

<snip>

In that case your document format might be to blame. Have you tried
running it through a validating DOM parser?

Validating SAX parser still reads whitespace/does strange things.	1	Jan 7, 2005
NullPointer Exception in class SAX2DTM during FOP generation with SAX Parser	0	Apr 12, 2005
SAX and file chooser	7	Feb 15, 2006
Parsing XML against multiple complex XSD	0	Oct 7, 2008
Parsing multiple XML trees?	3	Dec 15, 2005
Parsing XML: SAX, DOM, Expat, or Something Else?	2	Jan 23, 2009
SAX parser: problems with overwritten characters method	1	Apr 13, 2005
SAX multiple calls to characters()	2	Aug 2, 2006

Multiple char[] causing problems during SAX parsing

hamacher

Sudsy

hamacher

Sudsy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads