Sax Parser problem : xml encoding of string??

B

brightoceanlight

I'm trying to parse an XML string with Java's SaxParser. The program
fails at the end of an element or at the beginning of a new element.

Is my XML string okay?

<?xml version="1.0" encoding="utf-8"
?><DTYPE>OVL</DTYPE><DTYPE>IMG</DTYPE>. . .

If fails after </DTYPE> and before the next <DTYPE>

The Sax Parser works fine on files.

Here is the code I use to send the string to the parser :

// Parse the input
SAXParser saxParser = factory.newSAXParser();
InputStream is = new ByteArrayInputStream(stringToParse.getBytes());
saxParser.parse( is, handler );

Here is the error message I get :

org.xml.sax.SAXParseException: Unzulässiges Zeichen am Dokumentende,
<
at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3376)
at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3370)
at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:673)
at org.apache.crimson.parser.Parser2.parse(Parser2.java:337)
at
org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:143)
at EchoSaxParser.parse(EchoSaxParser.java:51)
at StringParser.parse(StringParser.java:23)
at Servant.stringparse(Servant.java:30)
at TEST._TestImplBase._invoke(_TestImplBase.java:43)
at
com.sun.corba.se.internal.corba.ServerDelegate.dispatch(ServerDelegate.java:353)
at com.sun.corba.se.internal.iiop.ORB.process(ORB.java:280)
at
com.sun.corba.se.internal.iiop.RequestProcessor.process(RequestProcessor.java:81)
at
com.sun.corba.se.internal.orbutil.ThreadPool$PooledThread.run(ThreadPool.java:106)

Any help would be greatly appreciated!

Thank you!

Gil
 
J

jan V

Here is the error message I get :
org.xml.sax.SAXParseException: Unzulässiges Zeichen am Dokumentende,
<

And what does this mean in English? Seems like a key piece of info..
 
T

Thomas Fritsch

I'm trying to parse an XML string with Java's SaxParser. The program
fails at the end of an element or at the beginning of a new element.

Is my XML string okay?

<?xml version="1.0" encoding="utf-8"
?><DTYPE>OVL</DTYPE><DTYPE>IMG</DTYPE>. . .

If fails after </DTYPE> and before the next <DTYPE>
If I recall it correctly, XML must have exactly *one* top-level element.
Your top-level element is DTYPE and you have more than one.
The solution could be to invent a top-level element surrounding all your
DTYPE elements. For example:
<?xml version="1.0" encoding="utf-8"?>
<BLA>
<DTYPE>OVL</DTYPE><DTYPE>IMG</DTYPE>
The Sax Parser works fine on files.

Here is the code I use to send the string to the parser :

// Parse the input
SAXParser saxParser = factory.newSAXParser();
InputStream is = new ByteArrayInputStream(stringToParse.getBytes());
One further aside (not your posted problem):
You shouldn't call getBytes() without argument and thus rely on the
system's default-encoding whatever that might be. Doing so would provoke
very obscure problems, if the XML's encoding is different from the
system-default-encoding *and* the XML constains non-ASCII characters.
Instead call getBytes("utf-8") in accordance with the encoding given in
the XML-header.
saxParser.parse( is, handler );

Here is the error message I get :

org.xml.sax.SAXParseException: Unzulässiges Zeichen am Dokumentende,
<
 is '<', so your analysis (in line 2 of your post) is correct.
 
B

brightoceanlight

If I recall it correctly, XML must have >exactly *one* top-level element.
Your top-level element is DTYPE and you >have more than one.
The solution could be to invent a top-level >element surrounding all your
DTYPE elements. For example:

</BLA>


Yes! It worked! Thank you so much for your help!
Instead call getBytes("utf-8") in >accordance with the encoding given in
the XML-header.

Thanks for the tip!

Gil
 
S

Steve W. Jackson

I'm trying to parse an XML string with Java's SaxParser. The program
fails at the end of an element or at the beginning of a new element.

Is my XML string okay?

<?xml version="1.0" encoding="utf-8"
?><DTYPE>OVL</DTYPE><DTYPE>IMG</DTYPE>. . .

If fails after </DTYPE> and before the next <DTYPE>

[ snip ]
Thank you!

Gil

The example above demonstrates why it *should* fail. Every XML document
is required to have one and only one root element. So the closing DTYPE
tag should signify the end of the document and not be followed by
anything else.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top