Detecting the codepage of a file beeing uploaded ?

M

Mehmet Gunacti

Hello,
We get xml files uploaded from users of our web application written in
Java. We published an XSD file, so the xml files we get are well
formed. But some users generate the xml files under DOS and use CP857
codepage, that includes Turkish characters.
After we receive an xml file we don't save it to disk, instead we
process the data and save to a database. But the Turkish characters
are corrupted because of the "wrong" characterset of the xml file,
although the first tag of the xml file is :
<?xml version="1.0" encoding="iSO-8859-9"?> the Turkish characters it
contains aren't saved correctly to database.

If there would be some method like getEncoding() which returns "cp857"
we would tell the user to generate the file under Windows. But after
researching for days now we couldn't find any usefull API. We get only
CP1254 for all kind of xml files generated under DOS or Windows. And
that doesn't solve our problem.

How can we detect the characterset of the incoming file ?

Thanks in advance
Mehmet Gunacti

PS: We use Java 1.4 under Windows OS.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top