Facing exception: Invalid byte 2 of 4-byte UTF-8 sequence.

D

dk

Hi All,

While I'm trying to use some UTF-8 characters in my xml while parsing
the xml using JDOM parser I'm getting this below exception:

Malformed XML, Caused by: 'Invalid byte 2 of 4-byte UTF-8 sequence.'
at com.clarify.boss.utility.xml.SimpleXmlParser.build
(SimpleXmlParser.java:236)
at
com.clarify.boss.msf.handler.RespHeaderInitiateHandler.getStandardHeader
(RespHeaderInitiateHandler.java:366)
at com.clarify.boss.msf.handler.RespHeaderInitiateHandler.execute
(RespHeaderInitiateHandler.java:289)
at
com.clarify.boss.utility.appcontroller.support.AbstractHandler.execute
(AbstractHandler.java:42)
at
com.clarify.boss.utility.appcontroller.support.ApplicationControllerImpl.handleRequest
(ApplicationControllerImpl.java:174)
at
com.clarify.boss.utility.appcontroller.support.ApplicationControllerImpl.execute
(ApplicationControllerImpl.java:311)
at com.clarify.boss.msf.support.ServiceFaultPublisherAB.executeImpl
(ServiceFaultPublisherAB.java:87)
at com.clarify.boss.common.base.BossActionBeanBase.execute
(BossActionBeanBase.java:125)
at com.clarify.boss.sa.msf.xbean.InvokeResponseXB.executeImpl
(InvokeResponseXB.java:198)
at com.clarify.cbo.XBeanImpl.baselineExecuteImpl_(XBeanImpl.java:275)
at com.amdocs.oss.sm.core.common.XBeanBase.baselineExecuteImpl_
(XBeanBase.java:75)
at com.clarify.cbo.XBeanImpl.execute(XBeanImpl.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:64)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:615)
at com.clarify.sam.JavaDispatch.invokeMethodImp(JavaDispatch.java:
396)
at com.clarify.sam.JavaDispatch.invokeMethod(JavaDispatch.java:348)
at com.clarify.sam.ActionBeanService.invokeBeanMethod
(ActionBeanService.java:509)
at com.clarify.sam.ActionBeanService.invokeAifOperation
(ActionBeanService.java:128)
at com.clarify.sam.AppFrameworkBindingHandler.executeOperation
(AppFrameworkBindingHandler.java:69)
at com.amdocs.aif.consumer.ServiceContext.executeWithRetries
(ServiceContext.java:900)
at com.amdocs.aif.consumer.ServiceContext.executeOperationImpl
(ServiceContext.java:756)
at com.amdocs.aif.consumer.ServiceContext.executeOperation
(ServiceContext.java:676)
at com.amdocs.aif.consumer.ServiceContext.executeOperation
(ServiceContext.java:323)
at
com.clarify.boss.errorhandler.resolver.ResolverLauncherSynchXB.executeImpl
(ResolverLauncherSynchXB.java:157)
... 35 more
Caused by: org.jdom.input.JDOMParseException: Error on line 72:
Invalid byte 2 of 4-byte UTF-8 sequence.
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:468)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:770)
at com.clarify.boss.utility.xml.SimpleXmlParser.build
(SimpleXmlParser.java:231)
... 60 more
Caused by: org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte
UTF-8 sequence.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException
(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown
Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl
$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument
(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
... 62 more

I have declared the encoding to be used while parsing, in my xml as
UTF-8:
<?xml version="1.0" encoding="UTF-8"?>

Initially I doubted that the xml backup had some problem because on
the same application server while I was trying to use the same xml as
input it worked but from one of my friends machine it didn't. So is
this could be the cause?

But now I have even something more interesting out of all this. I
tried changing the encoding to ISO-8859-1 i.e. : <?xml version="1.0"
encoding="ISO-8859-1"?> & to surprise it worked.

Now this has led to a confusion. I thought ISO-8859-1 is a charset
which is subset of UTF-8. Then why didn't UTF-8 work whereas
ISO-8859-1 worked?

And lastly I can't change this encoding in my xml as in turn I would
have to do all the regression once again on my application. So please
let me know where I have gone wrong.

The Java code that I'm using is:

/*
* (non-Javadoc)
/ *
* @see com.clarify.boss.utility.xml.XmlParser#build
(org.springframework.core.io.Resource)
*/
public Document build(Resource source) {
try {
return (getSystemId() == null ? getSaxBuilder().build
(source.getInputStream()) : getSaxBuilder().build(
source.getInputStream(), getSystemId()));
} catch (Exception e) {
e.printStackTrace();
BossErrorCode bossErrorCode = new BossErrorCode
(ErrorCode.BOSS_MALFORMED_XML);
throw new BossException(bossErrorCode, new String[] {e.getCause
().getMessage()},e);
}
}

the sax builder method is:

/**
* Getter method for the <b>saxBuilder </b> property
*
* @return Returns the saxBuilder.
*/
private PropertyAwareSAXBuilder getSaxBuilder() {
if (saxBuilder == null) {

PropertyAwareSAXBuilder myParser = new PropertyAwareSAXBuilder(
isValidate());

myParser.setFeature("http://apache.org/xml/features/validation/
schema", isValidate());
myParser.setFeature("http://xml.org/sax/features/namespaces",
true);

//CatalogResolver myResolver = new CatalogResolver();

CatalogResolver myResolver = getCatalogResolver();

myParser.setEntityResolver(myResolver);
setSaxBuilder(myParser);

Iterator it = getProperties().keySet().iterator();
while (it.hasNext()) {
String name = (String) it.next();
saxBuilder.setProperty(name, getProperties().get(name));
}
}
return saxBuilder;
}

Regards,
Dhirendra
 
R

Roedy Green

While I'm trying to use some UTF-8 characters in my xml while parsing
the xml using JDOM parser I'm getting this below exception:

Partition your problem. Is it that the file is malformed or is the
problem getting the XML parser to understand the file is in UTF-8
encoding?

You can examine your file in a hex viewer if you are familiar with
UTF-8 encoding, or you could feed it to the Sun utility native2ascii
to see if it likes it.

See http://mindprod.com/jgloss/utf.html
http://mindprod.com/jgloss/encoding.html

You could also give up and use entities (NCRs).
see http://mindprod.com/jgloss/xml.html#AWKWARD
 
D

dk

Partition your problem.  Is it that the file is malformed or is the
problem getting the XML parser to understand the file is in UTF-8
encoding?

You can examine your file in a hex viewer if you are familiar with
UTF-8 encoding, or you could feed it to the Sun utility native2ascii
to see if it likes it.

Seehttp://mindprod.com/jgloss/utf.htmlhttp://mindprod.com/jgloss/encoding..html

You could also give up and use entities (NCRs).
seehttp://mindprod.com/jgloss/xml.html#AWKWARD


@BugBear: yeah the xml is a well formed and properly validated xml.

@Roedy: write now I'm using ultraEdit and inserting the characters
from the ASCII table that it has. I have even tried seeing it in hex
mode and I got the same value from both the places.

Meanwhile I have found something more interesting while reading the
input stream from my xml if I exclusively define it to be formatted to
UTF-8 in getByteStream it is working fine. Now here is this a Java bug
(1.5.0.12)? or something else?
 
M

Mike Schilling

It may be a clue that 4-byte UTE-8 sequences only occur with
surrogates, which there are two reasonable ways to encode:

1. Encode the code point as 4 bytes
2. Encode each 16-bit "char" as 3 bytes

Only 1 is correct, but I'm sure there's lots of non-surrogate-aware
code that does 2.
 
L

Lew

dk said:
@BugBear: yeah the xml [sic] is a well formed and properly validated xml [sic].

That didn't answer his question. Answer his question.
"Have you checked that your data IS valid UTF-8 ?"

Clearly there is an improperly-encoded character in your XML file.
Find that and fix it.
@Roedy: write now I'm using ultraEdit and inserting the characters
from the ASCII table that it has. I have even tried seeing it in hex
mode and I got the same value from both the places.

ASCII != UTF-8.

That hex value for the bad character, does it match the UTF-8 code
point for that character? It's four bytes long? What character is
it, and what is the hex value you observe? (Note: that's four
questions, so there ought to be four answers.)
Meanwhile I have found something more interesting while reading the
input stream from my xml [sic] if I exclusively define it to be formatted to
UTF-8 in getByteStream it is working fine. Now here is this a Java bug
(1.5.0.12)? or something else?

It's not a Java bug.
Now this has led to a confusion. I thought ISO-8859-1 is a charset

Did you mean "encoding"?
which is subset of UTF-8. Then why didn't UTF-8 work whereas
ISO-8859-1 worked?

Because you were wrong. The two encodings differ.

If you have an assumption, let's call it an hypothesis, and the
evidence contradicts the hypothesis, then the hypothesis is wrong.
Simple.
 
A

Arne Vajhøj

Meanwhile I have found something more interesting while reading the
input stream from my xml if I exclusively define it to be formatted to
UTF-8 in getByteStream it is working fine. Now here is this a Java bug
(1.5.0.12)? or something else?

If you post the XML input and the Java code, then we can
tell you.

Arne
 
R

Roedy Green

@Roedy: write now I'm using ultraEdit and inserting the characters
from the ASCII table that it has. I have even tried seeing it in hex
mode and I got the same value from both the places.

You need to know what the hex SHOULD look like.
See http://mindprod.com/jgloss/utf8.html

You need a tool to see what it DOES look like.
See http://www.sweetscape.com/010editor/
http://funduc.com/otsoft.htm#hexview

And a tool to validate the encoding:
http://mindprod.com/jgloss/native2asciiexe.html
http://mindprod.com/applet/ecodingrecogniser.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top