Michael said:
Eric said:
if(isdigit(a[0]) && isdigit(a[1])) {printf("japp");
Here's another problem, probably not involved in your
trouble but a problem nonetheless. Write
if (isdigit( (unsigned char)a[0] ) && ...
to guard against character codes with negative values. All
of the digits 0-9 have positive codes, but characters like
'ß' and 'ø' may be negative on some machines. If the user
enters such a character, you cannot safely use isdigit() on
it until you convert it to a non-negative value.
I have a question on this one: Does this follow from
C99, 7.4 Character handling
#1:
"The header <ctype.h> declares several functions useful for classifying
and mapping characters. In all cases the argument is an int, the value
of which shall be representable as an unsigned char or shall equal the
value of the macro EOF. If the argument has any other value, the
behavior is undefined."
or is there another source as well?
Yes, that's the crucial paragraph. You must guarantee
that the argument to isxxx() is either EOF or a non-negative
value, specifically, a non-negative value corresponding to
a character code represented as an unsigned char. Informally,
the isxxx() argument must be in { EOF, 0..UCHAR_MAX }.
For values returned by getc() and friends no special
precautions need be taken: the returned value is exactly what
isxxx() expects, and you can pass it directly without fuss.
The surprises occur when you store a character code in a `char',
because `char' is a signed type on some implementations. If a
particular `char' has a negative value, it will (usually) become
a negative-valued `int' upon promotion, and if so will not be
in the range { 0..UCHAR_MAX }. If you're lucky(?) the negative
`char' will just happen to have the same value as EOF and you'll
avoid undefined behavior -- but at the cost of massive confusion.
Peter Nilsson suggests processing a string by converting its
`char*' pointer to `unsigned char*':
char *string = ...;
unsigned char *safe = (unsigned char*)string;
if (isxxx(*safe)) ...
While this will work on Every Single C Implementation I Have
Ever Encountered, it is in fact "lying to the compiler," and
thus braving the compiler's revenge. The thing that `string'
points to is a plain `char', not an `unsigned char', and it is
(very mildly) risky to pretend that the one is the other. I am
too full of post-prandial Cognac at the moment to work out the
details, but I rather suspect that a system with signed `char'
and ones' complement representation for negatives might get into
some trouble with such a lie. Since the incorruptibly honest
convert-at-point-of-call idiom cannot fail while the pointer-
twiddling approach carries with it a faint whiff of impropriety,
I see no reason to use the latter.
Accidentally, I just have got a request for a "ANSI C" program where
I have to use isalpha() extensively. Depending on the locale, the
above might be an issue, so I want to get that right and know why.
Essentially, I also would have to assert(UCHAR_MAX<=INT_MAX),
wouldn't I?
The characteristics of getc() and isxxx() seem to rule out
some "exotic" architectures, or at least to constrain them. If
If UCHAR_MAX <= INT_MAX all's well, but if UCHAR_MAX > INT_MAX
there are some `unsigned char' values that cannot be returned from
getc() or passed to isxxx(). The only way out of this predicament,
I think, is to say that such values do not correspond to legitimate
character codes for the implementation at hand. You might have
INT_MAX == 32767 and UCHAR_MAX == 65535, but if getc() can only
return values in the set { EOF, 0..511 }, say, all will still be
well.
Incidentally, note that a system with signed `char' must
either limit itself to non-negative codes for actual characters
or must provide a useful definition for the conversion of out-
of-range `unsigned char' values to plain signed `char'. It
would be unconscionable if
int ch;
char c;
if ((ch = getchar()) != EOF)
c = ch;
.... were to raise a signal or do "something weird" if `ch'
happened to exceed CHAR_MAX. The library was born as the servant
of the language, but now and then becomes its master.
With 20-20 hindsight one can opine that the original decision
to leave the signedness of `char' unspecified was unfortunate.
However, one can't lay the blame on Ritchie; he observed that
different machines took different positions on the matter and
decided not to make implementing his new language difficult for
half of them. So many of his other decisions turned out well that
it's hard to chastise him -- indeed, it's hard to do other than
admire him. "Too bad," nonetheless.