convert CharArray to ByteArray

M

Mike Schilling

markspace said:
Mike Schilling wrote:




Well you can't catch a RuntimeException with out having to catch ALL
runtime exceptions and then try to sort out which one it actually
was.

You'd never write code to catch this one.
Sure it never "should" happen but weird things happen in test
environments, and sometimes there are funky customer environments
too.

I supose you'd have to ask them to upgrade to a non-broken JVM.
 
M

markspace

Mike said:
You'd never write code to catch this one.


That's the assumption I'm challenging. Given the presumption "no one
will ever do this," someone will. Someone somewhere will find a need
or reason to catch it. Code defensively. Programming sometimes is like
a Vaudeville skit and the trick it to make sure it's not your face with
pie on it.
 
L

Lew

Would you mind to elaborate? java.lang.Character's javadoc seems to
indicate that chars are UTF-16, and therefore it is enforced by the char
type itself.

It seems to me like making it possibly otherwise would cause rather
serious regression on out-of-BMP-enabled Java applications (where you at
least need to use java.lang.Character methods that depend on chars and
Strings to be UTF-16 or something close enough.)

From the JLS, s. 4.2.1:
The values of the integral types are integers in the following ranges:
... For char, from '\u0000' to '\uffff' inclusive, that is, from 0 to 65535

A 'char' is an unsigned short integer, in other words. The primitive
type enforces nothing character-ish; only the methods of 'Character'
and 'String' do that. The primitive type itself is numeric, not
inherently UTF-16.

Consider:

char schuss = 0xDF;
char div = 0xF7;
char x = (char)( schuss + div );

That makes no sense in terms of Unicode, but is perfectly legal. What
is the value of 'ß' + '÷' in Unicode? Would you expect it to be
'ǖ' (Latin small letter U with diæresis and macron)?

'char' is a *numeric* type. Its use to represent code points
(including surrogate pairs comprising more than one 'char') is a
matter of correspondence between the numeric value and the character
it represents, and is not intrinsic.
 
M

Mike Schilling

markspace said:
That's the assumption I'm challenging. Given the presumption "no one
will ever do this," someone will.

It throws only when the JVM is broken. You fix that by replacing the JVM.
 
M

Mayeul

Lew said:
From the JLS, s. 4.2.1:

A 'char' is an unsigned short integer, in other words. The primitive
type enforces nothing character-ish; only the methods of 'Character'
and 'String' do that. The primitive type itself is numeric, not
inherently UTF-16.

Consider:

char schuss = 0xDF;
char div = 0xF7;
char x = (char)( schuss + div );

That makes no sense in terms of Unicode, but is perfectly legal. What
is the value of 'ß' + '÷' in Unicode? Would you expect it to be
'ǖ' (Latin small letter U with diæresis and macron)?

'char' is a *numeric* type. Its use to represent code points
(including surrogate pairs comprising more than one 'char') is a
matter of correspondence between the numeric value and the character
it represents, and is not intrinsic.

OK. Since Arne was pointing out the method he suggested would still
work, I had the impression you wanted to point out the existence of use
cases where it wouldn't.

Glad it's clarified.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top