Is the default Java character encoding always Cp1252?

M

Mickey Segal

I switched the default to Turkish on a Windows XP computer and found that
the default character encoding in our Java applet is still the same as with
US English - Cp1252. I tested using both the system property
"File.encoding" and with OutputStreamWriter.getEncoding().

Is Cp1252 always the default? Should one always specify UTF-8 if expecting
users with a large variety of language defaults or are there better
approaches?
 
A

asaguden

No expert opinion, but
A.F.A.I.K. Cp1252 is the default encoding on win platforms.
That said - it is probably your windows IDE that has that setting.
Try changing it there and recompile.
 
M

Mickey Segal

Our test Java code does not specify an encoding, yet we get Cp1252 when
checking encoding, even when supposedly set to Turkish. It does not look
like there is an IDE setting to specify an encoding, though we could do so
in our code by using method forms such as OutputStreamWriter(OutputStream
out, String enc) and choosing an encoding such as UTF-8.

What I am confused about is whether Java is not seeing my computer as
Turkish or instead that Java gives a default encoding of Cp1252 even to
computers seen as Turkish. It would be good to know the answer so I would
know if we are able to test and be treated as Turkish.

In either case the answer may just be to specify UTF-8 explicitly, but for
testing purposes it would be nice to know if we are really simulating the
Turkish user.
 
M

Mickey Segal

A further issue is character encoding for URLConnections. What encoding is
used for the URL? More concretely, if you use code such as this for an open
urlConnection object:

OutputStream outputStream = urlConnection.getOutputStream();
OutputStreamWriter outputStreamWriter = new
OutputStreamWriter(outputStream, "UTF8");
BufferedWriter bufferedWriter = new BufferedWriter(outputStreamWriter);
bufferedWriter.write(query,0,query.length());
bufferedWriter.flush();
bufferedWriter.close();

are you able to impose UTF-8 coding (or other encodings) or is the URL
connection not able to pass along such an encoding?
 
B

Bryce

Our test Java code does not specify an encoding, yet we get Cp1252 when
checking encoding, even when supposedly set to Turkish. It does not look
like there is an IDE setting to specify an encoding, though we could do so
in our code by using method forms such as OutputStreamWriter(OutputStream
out, String enc) and choosing an encoding such as UTF-8.

Check the system property "file.encoding"

System.getProperty("file.encoding");

In some instances, you may need to specify -Dfile.encoding=UTF8 at the
command line.
 
M

Mickey Segal

Bryce said:
Check the system property "file.encoding"

System.getProperty("file.encoding");

This is one of the two ways I had checked the encoding, and found it to be
Cp1252 despite the Turkish Windows settings. (The other way was by creating
an OutputStreamWriter and checking its encoding).
In some instances, you may need to specify -Dfile.encoding=UTF8 at
the command line.

That may explain why our testing always showed Cp1252. It sounds like we
will need to do a lot of explicit specification of encoding. As pointed out
(also by Bryce) in the "POSTing: can character encoding be specified?"
thread it looks like we need to specify the URL "Content-Type" encoding for
the data sent via URLConnection and need to set encoding explicitly in
Readers and Writers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top