Would you mind to elaborate? java.lang.Character's javadoc seems to
indicate that chars are UTF-16, and therefore it is enforced by the char
type itself.
It seems to me like making it possibly otherwise would cause rather
serious regression on out-of-BMP-enabled Java applications (where you at
least need to use java.lang.Character methods that depend on chars and
Strings to be UTF-16 or something close enough.)
From the JLS, s. 4.2.1:
The values of the integral types are integers in the following ranges:
... For char, from '\u0000' to '\uffff' inclusive, that is, from 0 to 65535
A 'char' is an unsigned short integer, in other words. The primitive
type enforces nothing character-ish; only the methods of 'Character'
and 'String' do that. The primitive type itself is numeric, not
inherently UTF-16.
Consider:
char schuss = 0xDF;
char div = 0xF7;
char x = (char)( schuss + div );
That makes no sense in terms of Unicode, but is perfectly legal. What
is the value of 'ß' + '÷' in Unicode? Would you expect it to be
'ǖ' (Latin small letter U with diæresis and macron)?
'char' is a *numeric* type. Its use to represent code points
(including surrogate pairs comprising more than one 'char') is a
matter of correspondence between the numeric value and the character
it represents, and is not intrinsic.