K
Keith Thompson
bartc said:Dik T. Winter said:int data[256]={0};
data['ú'] += 1;
I would not expect a line like that in code, more something like:
data[c] += 1;
where c is the return value of getchar().
Yet another source of confusion: getchar returns codes 128 to 255 as
positive values, but put that value into a char type, and it becomes
negative: char c; data[getchar()] works, but data[c=getchar()] doesn't.
getchar() returns an int; you should never store the result of
getchar() directly in an object of type char.
And why can't someone write: char text[100]; data[text] ?
Um, because char might be a signed type, and text might be
negative. Yeah, I know that's not really what you were asking.
Making arbitrary rules about what can or can't be coded is not really
helpful; why not just admit that negative characters are a bad idea as
Eric Sosman did in this thread?
The rules aren't arbitrary. They follow from the fact that plain
char may be either signed or unsigned. I certainly admit that that
causes problems, and all else being equal I would prefer plain char
to be unsigned.
Historically, I believe that making plain char signed made for more
efficient code on the PDP-11. This was before the types signed char
and unsigned char had been introduced. Since there probably were
other systems on which making plain char unsigned made for faster
code, the choice was left up to the implementation. On ASCII-based
systems at the time, character values outside the range 0..127 were
rare (Accented letters? On a computer? You're lucky to get lower
case!), so it wasn't much of an issue. EBCDIC-based systems made
plain char unsigned anyway.
If the standard were changed to require plain char to be unsigned,
it would not break any existing portable code. It might break some
existing non-portable code that assumes plain char is signed --
and that might be a reasonable assumption for code that's intended
only for a single target system (though I'd still prefer to use
signed char explicitly).
Any implementation could avoid the problems of negative characters
by making plain char unsigned. But most compilers I've used still
make plain char signed by default (though some have an option to
change it). I have to assume there's some valid reason for that,
though perhaps it's just inertia.