Problems with UTF-8 encoded XML files

E

El Durango

I have an application which uses an XSLT file along with an XML file to
create a desired output. However when I use UTF-8 encoded files it give an
error, when I use ANSI files it works fine. I cannot figure this out and I
am stuck at the moment. If anyone knows anything regarding this I would
appreciate it. Also I have a snippet of some simple code that I am using.
It is nothing special so I don't understand why it won't work. I am using
the latest Xerces and Xalan from Apache.


public static void main(String[] args) throws TransformerException {
String xmlSourceFileName = null;
String xsltFileName = null;
String xmlResultFileName = null;
String lastSwitch = null;

// Check command-line arguments.
if (args.length != 6) {
System.err.println("Usage:");
System.err.println("java " + RFKGenerator.class.getName() + " " +
IN_ARG + " xml-source " + XSL_ARG +
" xslt-file-name " + OUT_ARG + " xml-result");
System.exit(1);
}
else {
for (int i = 0; i < args.length; i++) {
if (args.equals(IN_ARG) || args.equals(XSL_ARG) ||
args.equals(OUT_ARG)) {
lastSwitch = args;
}
else {
if (lastSwitch.equals(IN_ARG)) {
xmlSourceFileName = args;
}
else if (lastSwitch.equals(XSL_ARG)) {
xsltFileName = args;
}
else if (lastSwitch.equals(OUT_ARG)) {
xmlResultFileName = args;
}
}
}
}

if ((xmlSourceFileName != null) && (xsltFileName != null) &&
(xmlResultFileName != null)) {

File xmlSourceFile = new File(xmlSourceFileName);
File xsltFile = new File(xsltFileName);

// System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
// "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");

Source xmlSource = new StreamSource(xmlSourceFile);
Source xsltSource = new StreamSource(xsltFile);
DOMResult result = new DOMResult();

// Get a transformer that uses the XSLT input.
TransformerFactory transFact = TransformerFactory.newInstance();
Transformer trans = transFact.newTransformer(xsltSource);

// Apply the transform.
System.out.println("Applying stylesheet " + xsltFileName + "...");
trans.transform(xmlSource, result); // crashes here with UTF-8
encoded files.
 
X

X_AWemner_X

Probably problem lies in the "new File(fileName)" code. Standard File object
uses platform default encoding (or is it ISO8859-1 always in java).

You should instantiate InputStreamReader with proper encoding and then use
it for StreamSource instance.
FileInputStream fis = new FileInputStream(fileName);
r = new InputStreamReader(fis, "UTF-8");

But then you must be aware of UTF-8 BOM marker problem in current java. It
does recognize bom mark for utf-16 files but unofortunately not utf-8.

See here a short info about the problem and my two stream classes to solve
the problem.
http://koti.mbnet.fi/akini/java/unicodereader/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top