SAXParser and preserving special characters

U

User

I am trying to use JDOM's SAXBuilder to parse an XML document that contains
encoded latin-1 characters. After I parse the document, the special
character Strings seem to be replaced with their unicode characters (e.g.,
the String "®" is replaced with a character that has a decimal value of
174); I was expecting that the SAXBuilder would preserve the String
"®". Is it possible to instruct the SAX parser to preserve the special
character encodings?

The following is sample code that illustrates the issue that I am observing:

import java.io.ByteArrayInputStream;

import org.jdom.Document;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;

public class TestProductBuilder {

public static void main(String[] args) {
ByteArrayInputStream bis = null;
try {
String product = "<?xml version=\"1.0\"?>" +
"<product>" +
" <name>My Product ®</name>" +
"</product>";

bis = new ByteArrayInputStream(product.getBytes());
SAXBuilder builder = new SAXBuilder(false);
Document productDoc = builder.build(bis);

XMLOutputter outputter = new XMLOutputter("\t", true);
String productFromSAXBuilder = outputter.outputString(productDoc));
} catch (Exception e) {
System.err.println(e.getMessage());
} finally {
if (bis != null) { try { bis.close(); } catch (Exception e) {}}
}
}
}

The following is the value for "productFromSAXBuilder":
<?xml version="1.0" encoding="UTF-8"?>
<product>
<name>My Product ®</name>
</product>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top