Detecting and using the encoding of an XML file

N

Nomak

Hello,

i'm reading XML files (with Xerces SAX2). The thing is the strings are read as ASCII (8bits) instead of UTF-8 while UTF-8 is specified as the encoding of the XML file.

I googled a little bit but i didn't find THE way you must read strings from XML in java, so i'm asking.

Here is my base code:

parserClassName = "org.apache.xerces.parsers.SAXParser";
....

XMLReader reader = null;
try {
reader = XMLReaderFactory.createXMLReader(parserClassName);
} catch (Exception ex) {
ex.printStackTrace();
}

try {
try {
reader.setFeature("http://xml.org/sax/features/validation", true);
} catch (SAXException ex) {
ex.printStackTrace();
}

reader.setContentHandler(myContentHandler);
reader.setErrorHandler(myErrorHandler);
InputSource inputSource = new InputSource(xmlURI);

System.err.println("encoding = " + inputSource.getEncoding());
System.err.println("public id = " + inputSource.getPublicId());
System.err.println("system id = " + inputSource.getSystemId());

reader.parse(inputSource);

// String charsetName = reader...getCharset();
}


what must i add/remove/modify to get my strings properly?

TIA
 
I

iksrazal

Here's a utility class with some static methods I use for this:

package com.hostedtelecom.callcentreweb.util;

import java.io.*;
import java.util.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.dom.DOMSource;
import org.xml.sax.InputSource;

/**
* Utilty class for XML basic tasks
*/
public class XMLHelper
{
/** Convert W3C XML Document to String.
@param document
@return String
@throws XMLHelperException
*/
public static final String getDocumentAsString(Document document)
throws XMLHelperException
{
try
{
// Create source and result objects
Source source = new DOMSource(document);
StringWriter out = new StringWriter();
Result result = new StreamResult(out);
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.transform(source, result);
return out.toString();
}
catch(Exception e)
{
throw new XMLHelperException("XML Document to String Error", e);
}
}


/** Convert String to a W3C XML Document.
@param xmlString
@return Document
@throws XMLHelperException
*/
public static final Document getDocument(String xmlString) throws
XMLHelperException
{
try
{
String nstr = null;
//cannot have whitespace in the beginning of an xml document
if (xmlString.charAt(0) != ' ')
{
nstr = removeInitialWS(xmlString);
}
else
{
nstr = xmlString;
}

DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setIgnoringElementContentWhitespace(false);
DocumentBuilder builder = factory.newDocumentBuilder();

InputSource isXml = new InputSource (new StringReader(nstr));
return builder.parse(isXml);
}
catch(Exception e)
{
throw new XMLHelperException("String to XML Document Error for
String:\n\n "+xmlString+" ", e);
}
}

/**
Remove any blank spaces in beginning of the XML declaration
*/
public static final String removeInitialWS(String xmlString) throws
XMLHelperException
{
try
{
int pos = xmlString.indexOf("<");
if (-1 == pos)
{
throw new Exception("Invalid XML, char '<' not found");
}

return xmlString.substring(pos);
}
catch(Exception e)
{
throw new XMLHelperException("String to XML Document Error for
String: \n\n "+xmlString+" ", e);
}
}

public static final String getNodeToString(Node node) throws
XMLHelperException
{
try
{
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty("omit-xml-declaration", "yes");

StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource( node );
transformer.transform( source, result );

return sw.getBuffer().toString();
}
catch(Exception e)
{
throw new XMLHelperException("XML Document to String Err", e);
}
}
}


HTH,
iksrazal
http://www.braziloutsource.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top