Strange way to convert char to int (?)

J

john

I found this code-snippet in a book (Killer Game Programming in Java, O'Reilly) :

int i = ch - '0'; // We assume that ch is a digit ranging from 0 to 9.

How does this work, why/how does subtraction of
two char's result in an int ??

P

Pep

I found this code-snippet in a book (Killer Game Programming in Java,
O'Reilly) :

int i = ch - '0'; // We assume that ch is a digit ranging from 0 to 9.

How does this work, why/how does subtraction of
two char's result in an int ??

Primitive data types. A char is a int is a char

J

john

I found this code-snippet in a book (Killer Game Programming in Java,
Primitive data types. A char is a int is a char

Ah, ok. Thanks !

T

Thomas Hawtin

I found this code-snippet in a book (Killer Game Programming in Java, O'Reilly) :

int i = ch - '0'; // We assume that ch is a digit ranging from 0 to 9.

How does this work, why/how does subtraction of
two char's result in an int ??

Any arithmetic on bytes, shorts, chars and ints is always done by first
widening the type to an int. For char the unsigned 16-bit unicode value
is used. Implicitly converting a character to a number was probably a
mistake in the language design, but its not about to change now.

Anyway, in most western language the characters representing '0' to '9'
have values, IIRC, 48, 49, 50, ... 57. So if you do say '1' - '0' then
that is the equivalent of 49 - 48, i.e. 1.

The code isn't the most reliable way of doing it. Unicode as a number of
ranges of numbers. Character.digit is better.

Tom Hawtin

J

john

I found this code-snippet in a book (Killer Game Programming in Java, O'Reilly) :
Any arithmetic on bytes, shorts, chars and ints is always done by first
widening the type to an int. For char the unsigned 16-bit unicode value
is used. Implicitly converting a character to a number was probably a
mistake in the language design, but its not about to change now.

Anyway, in most western language the characters representing '0' to '9'
have values, IIRC, 48, 49, 50, ... 57. So if you do say '1' - '0' then
that is the equivalent of 49 - 48, i.e. 1.

The code isn't the most reliable way of doing it. Unicode as a number of
ranges of numbers. Character.digit is better.

Ok, thanks for the insight.

R

Roedy Green

I found this code-snippet in a book (Killer Game Programming in Java, O'Reilly) :

int i = ch - '0'; // We assume that ch is a digit ranging from 0 to 9.

How does this work, why/how does subtraction of
two char's result in an int ??

chars are automatically promoted to ints before doing arithmetic. So
are bytes. So are shorts. The JVM has a 32 bit stack and 32 bit
arithmetic only.
so '2' - '0'
becomes
50 - 48 = 2

This is a fast way of converting a single char digit to binary int.

He is computing the relative difference in the codes for "2" and "0",
which conveniently is the binary for 2 because of the logical pattern
of code assignment. See http://mindprod.com/jgloss/unicode.html

J

john

I found this code-snippet in a book (Killer Game Programming in Java, O'Reilly) :
chars are automatically promoted to ints before doing arithmetic. So
are bytes. So are shorts. The JVM has a 32 bit stack and 32 bit
arithmetic only.
so '2' - '0'
becomes
50 - 48 = 2

This is a fast way of converting a single char digit to binary int.

He is computing the relative difference in the codes for "2" and "0",
which conveniently is the binary for 2 because of the logical pattern
of code assignment. See http://mindprod.com/jgloss/unicode.html

Roedy, thanks.

O

Oliver Wong

BTW, I think this is bad way to convert characters to the numbers they
represent. I've written about a safer alternative on my blog at
http://nebupookins.net/entry.php?id=260 which correctly converts the unicode
characters for Roman numerals and Chinese/Japanese characters to the integer
they represent, for example.

I see from the book title that this is for a game, and one argue that
since this is game, the code should be very fast. My counter argument to
that is that I seriously doubt that converting chars to integers is going to
be the bottleneck in your game.

- Oliver

R

Roedy Green

I see from the book title that this is for a game, and one argue that
since this is game, the code should be very fast. My counter argument to
that is that I seriously doubt that converting chars to integers is going to
be the bottleneck in your game.

on the other hand, your game is not defined to work with roman
numerals. That would be considered an error.

I think there is room for both. The strongest argument for using your
way is it leaves programs open to easier internationalisation. English
speaking programmers tend to forget their code, if successful, will be
internationalised.

O

Oliver Wong

Roedy Green said:
on the other hand, your game is not defined to work with roman
numerals. That would be considered an error.

If the design document doesn't specify a behaviour for roman numeral
input one way or another, I think actually parsing those roman numerals
would be "good" in the sense of "least surprising for the user" and "more
robust", as opposed to say, crashing, or returning an undefined value (and
then later crashing).

If the design document DOES say that upon detecting a roman numeral, an
error should be reported (or more likely "On any value other than 0, 1, 2,
3, 4, 5, 6, 7, 8 or 9, an error should be reported"), then obviously my
solution would be violating the requirements of the program.

- Oliver

R

Roedy Green

If the design document doesn't specify a behaviour for roman numeral
input one way or another, I think actually parsing those roman numerals
would be "good" in the sense of "least surprising for the user" and "more
robust", as opposed to say, crashing, or returning an undefined value (and
then later crashing).

If the design document DOES say that upon detecting a roman numeral, an
error should be reported (or more likely "On any value other than 0, 1, 2,
3, 4, 5, 6, 7, 8 or 9, an error should be reported"), then obviously my
solution would be violating the requirements of the program.

On the other paw, perhaps one in 10,000 people entering a roman
numeral into your program would do it on purpose. So the principle of
least astonishment suggests the best thing to do is reject it.

O

Oliver Wong

Roedy Green said:
On the other paw, perhaps one in 10,000 people entering a roman
numeral into your program would do it on purpose. So the principle of
least astonishment suggests the best thing to do is reject it.

When I say that Character.getNumericValue() parses roman numerals, I
don't mean the string "VIII", but the actually unicode character whose
codepoint in hexadecimal is 0x2167. So I personally think it'd be unlikely
that someone would "accidentally" enter that character in.

Also, I don't know if this is the case for digits, but there are
distinct alphabetic characters in unicode which, in every font I've seen,
look identical. The Cyrillic character \u0430 and Latin character \u0061
both look like 'a' in most fonts. If this ever happens for digits as well,
the user's keyboard might be mapped to a local in which the the character
that the key labelled '9' generates looks identical to '9', but '0' minus
that character equals 400 or something. This would be an example of an
accidental usage of international character, but which should be accepted to
generate the least astonishment.

- Oliver