Mike Schilling said:
Which was, by the way, a bad idea.. I've never seen a program that wants
to do signed arithmetic on bytes. I've seen lots that assemble bytes
into chars and ints by shifting and or-ing, which would be much simpler
and more reliable if the damned things didn't sign-extend.
Yeah, signed bytes are stupid.
I don't know the actual history of what happened in Java, but I assume Java
just copied the C and C++ convention of chars being signed. The reason
bytes are signed in C is because of the PDP-11. When you did a byte move
into a register (registers were all 16 bits) it was sign extended by
default. There was no way to do a byte move to a register without the sign
extension happening and there were no ways to reference the high and low
bytes of the register as if they were separate byte registers. All you
could do was a byte-move from a memory location to one of the general 16
bit registers and you got the sign extension whether you wanted it or not.
If you wanted to undo the sign extension, you just had to zero out the high
order bits with a bit clear instruction.
Given the limited complexity of the computers in those days (very limited
sized instruction sets) you can see why the designers of the PDP-11 might
have chosen it to work that way. They couldn't justify having both signed,
and unsigned, byte move instructions, so they had to pick one behavior for
the byte move. It was a 16 bit machine, with byte addressable memory. All
the registers were 16 bits (including the program counter and stack
pointer). If you did a byte move to the register, and performed math on it
(16 bit), and did a byte move back to memory, or to an IO device, what
happened in the high order 8 bits wasn't important. It didn't matter if it
was sign extended or not. In those days, use of 8 bit signed ints was more
common because of the size of machines (64K of memory was possible) - today
we have so much memory we would never bother to use a signed 8 bit variable
just to save memory - we use signed 32 variables even when all we do is
count from 1 to 10. The design trade off was whether it was better to make
the byte move signed or unsigned. If unsigned, the programmer would have
to use two more instructions to do the sign extend (a test, and a bit set).
But if the hardware did the sign extend by default, and the programmer
didn't want it, all they would have to do is add an unconditional bit
clear. So, given the fact that when doing 8 bit operations, it made no
difference which way it worked, and when doing 8 bit to 16 bit conversion,
one default required 2 extra instructions to do the inverse, and the other
default only required 1 extra instruction, they picked the default that
made the inverse easier.
If C had made chars unsigned by default, it would have ended up generating
a lot of extra code to constantly undo the sign extension every time a char
was returned from a function since all function returns were promoted to
ints in those days. So C simply followed the convention of the very
limited PDP-11 hardware of those early days.
So now, we have it in Java, even though it really makes no sense at all in
our modern environment to be doing it - except for the advantage of
backward compatibility (which is important).