Spiros Bousbouras said:
On 10 Mar 2011 16:49:57 GMT
If you are reading from a file by successively calling fgetc() is there
any point in storing what you read in anything other than unsigned
char ?
Yes, when you read EOF which is not an unsigned char.
In my mind I was making a distinction between storing and temporarily
assigning but I guess it wasn't clear. What I had in mind was something
like:
unsigned char arr[some_size] ;
int a ;
while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
Would there be any reason for arr to be something other than
unsigned char ?
char is normally used for storing characters, and I think that is what
it was designed for. So it seems a bit odd not to use it.
But if arr[] is char how do you avoid the implementation defined
behavior when doing arr[position++] = a ?
Typically by ignoring the issue. (Well, this doesn't avoid
the implementation defined behavior; it just assumes it's
ok.) On any system where this is a sensible thing to do, the
implementation-defined behavior is almost certain to be what you
want. Assigning a value exceeding CHAR_MAX to a char (assuming
plain char is signed) *could* give you a strange result, or even
raise an implementation-defined signal, but any implementation that
chose to do such a thing would break a lot of existing code.
C uses plain char (which may be signed) for strings, but it reads
characters from files as unsigned char values. IMHO this is a flaw
in the language. A byte read from a file with a representation
of 10101001 (0xa9) is far more likely to mean 169 than -87 (it's
a copyright symbol in Latin-1, 'z' in EBCDIC).
One solution might be to require plain char to be unsigned, but that
causes inefficient code for some operations -- which was more of
issue in the PDP-11 days than it is now, but it's probably still
significant.
Another might be to have fgetc() return an int representing either
a *plain* char value or EOF, but it's too late to change that.
I'm usually a strong advocate for writing code as portably as possible,
but in this case I suspect that workaround around the unsigned char vs.
plain char mismatch would be more effort than it's worth.