C
cs_professional
I understand that Java Strings are Unicode (charset), but how are Java
String's stored in memory? As UTF-16 encoding or using the platform's
default charset?
There seems to be conflicting information this, the official String
javadoc says platform's default charset:
http://download.oracle.com/javase/6/docs/api/java/lang/String.html#String(byte[])
"Constructs a new String by decoding the specified array of bytes
using the platform's default charset."
I assume the platform's default charset is what you can get by
calling:
System.getProperty("file.encoding") OR
http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html#defaultCharset()
On my windows machine the above calls return Windows-1252 or CP-1252
(they are the same thing: http://en.wikipedia.org/wiki/Windows-1252).
So does this mean all Java Strings are encoded and stored in memory in
this Windows-1252 or CP-1252 format?
However, the "Java Internationalization FAQ" says UTF-16:
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#recommended-charset
"... internal representation in Java, which is UTF-16".
So, what is it correct answer? Are Java Strings stored in memory as
UTF-16 or the platform's default charset?
Btw, I'm trying to understand this so I know what to expect in a more
complex i18n Browser-Servlet scenario.
String's stored in memory? As UTF-16 encoding or using the platform's
default charset?
There seems to be conflicting information this, the official String
javadoc says platform's default charset:
http://download.oracle.com/javase/6/docs/api/java/lang/String.html#String(byte[])
"Constructs a new String by decoding the specified array of bytes
using the platform's default charset."
I assume the platform's default charset is what you can get by
calling:
System.getProperty("file.encoding") OR
http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html#defaultCharset()
On my windows machine the above calls return Windows-1252 or CP-1252
(they are the same thing: http://en.wikipedia.org/wiki/Windows-1252).
So does this mean all Java Strings are encoded and stored in memory in
this Windows-1252 or CP-1252 format?
However, the "Java Internationalization FAQ" says UTF-16:
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#recommended-charset
"... internal representation in Java, which is UTF-16".
So, what is it correct answer? Are Java Strings stored in memory as
UTF-16 or the platform's default charset?
Btw, I'm trying to understand this so I know what to expect in a more
complex i18n Browser-Servlet scenario.