Help needed parsing a UTF-8 XML file

H

Huzefa

I have a XML file encoded in UTF-8. The parser works fine when
there are only English characters in the file.

However, when I PUT SOME Chinese characters in the file, I get the
following error:

org.xml.sax.SAXParseException: Content is not allowed in prolog.
org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
com.xyz.pqr.ParseXmlFile.<init>(ParseXmlFile.java:34)
org.apache.jsp.index3_jsp._jspService(index3_jsp.java:59)
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:324)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

I am setting the character encoding of the InputSource.
My code for doing so lokks like this:

InputSource input = new InputSource(file); //File is the FileReader
input.setEncoding("UTF-8");

DOMParser parser = new DOMParser();
parser.parse(input);

How can I get it to read Chinese/Japanese characters?

Any help would be appreciated.

Thanx

Huzefa Khalil
 
K

Keith M. Corbett

Huzefa said:
I have a XML file encoded in UTF-8. The parser works fine when
there are only English characters in the file.

However, when I PUT SOME Chinese characters in the file, I get the
following error:

org.xml.sax.SAXParseException: Content is not allowed in prolog.
org.apache.xerces.parsers.DOMParser.parse(Unknown Source)

The error suggests the XML may not be well-formed.

It would be easier to diagnose this by looking at a set of sample XML files.
Can you upload some samples to a server somewhere with public access? Or
send me a zip file, email to kmc(at)world.std.com.

/kmc
 
M

Malcolm Dew-Jones

Huzefa ([email protected]) wrote:
: I have a XML file encoded in UTF-8. The parser works fine when
: there are only English characters in the file.

: However, when I PUT SOME Chinese characters in the file, I get the
: following error:

: org.xml.sax.SAXParseException: Content is not allowed in prolog.

Perhaps you put some white space at the top of the file. The <? must be
the very first thing, and perhaps no white space before the first tag's <
either.
 
K

Keith M. Corbett

Malcolm Dew-Jones said:
Huzefa ([email protected]) wrote:
: I have a XML file encoded in UTF-8. The parser works fine when
: there are only English characters in the file.

: However, when I PUT SOME Chinese characters in the file, I get the
: following error:

: org.xml.sax.SAXParseException: Content is not allowed in prolog.

Perhaps you put some white space at the top of the file. The <? must be
the very first thing, [snip]

I believe a Unicode Byte Order Mark (BOM) may precede the XML declaration.
Per the XML 1.1 TR:

"Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin
with the Byte Order Mark described in ISO/IEC 10646" etc.
and perhaps no white space before the first tag's <
either.

I believe white space may appear in the prolog, after the XML declaration
and before or after the document type declaration.

[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?

[27] Misc ::= Comment | PI | S

/kmc
 
C

Chris Uppal

Huzefa said:
InputSource input = new InputSource(file); //File is the FileReader
input.setEncoding("UTF-8");

From the JavaDoc for org.xml.sax.InputSource.setEncoding():

This method has no effect when the application provides a character stream.

which may be your problem, since you are providing a character stream in your
constructor. There's more information in the intro to the class in the same
JavaDoc.

BTW, on the subject of the BOM (which someone mentioned elsewhere in this
thread) the JavaDoc for that constructor states:

The character stream shall not include a byte order mark.

HTH.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top