How should ctype.h's functions be used?

S

Sarloc

I'm a user of a brazilian programming forum and I've been reading a
discussion that never seems to end on how ctype.h's functions should be
used. One of the guys keeps saying that values passed to any of the
functions should be cast to unsigned char and the other guy keeps
saying that there is no need for that. Both of them seem to know what
they are talking about and I am confused. So I thought to myself: hey,
maybe I should ask other people and see what they think. That is the
purpose of this topic. Thank you for your time.
 
B

Ben Pfaff

Sarloc said:
I'm a user of a brazilian programming forum and I've been reading a
discussion that never seems to end on how ctype.h's functions should be
used. One of the guys keeps saying that values passed to any of the
functions should be cast to unsigned char and the other guy keeps
saying that there is no need for that.

With the to*() and is*() functions, you should be careful to cast
char arguments to unsigned char before calling them. Type `char'
may be signed or unsigned, depending on your compiler or its
configuration. If char is signed, then some characters have
negative values; however, the arguments to is*() and to*()
functions must be nonnegative (or EOF). Casting to unsigned char
fixes this problem by forcing the character to the corresponding
positive value.
 
K

Keith Thompson

Sarloc said:
I'm a user of a brazilian programming forum and I've been reading a
discussion that never seems to end on how ctype.h's functions should be
used. One of the guys keeps saying that values passed to any of the
functions should be cast to unsigned char and the other guy keeps
saying that there is no need for that. Both of them seem to know what
they are talking about and I am confused. So I thought to myself: hey,
maybe I should ask other people and see what they think. That is the
purpose of this topic. Thank you for your time.

You need to cast the arguments to unsigned char.

C99 7.4p1 says:

The header <ctype.h> declares several functions useful for
classifying and mapping characters. In all cases the argument is
an int, the value of which shall be representable as an unsigned
char or shall equal the value of the macro EOF. If the argument
has any other value, the behavior is undefined.

If c is an object of type char, and plain char happens to be signed in
your implementation, and the value of c happens to be a negative value
other than EOF, then isalpha(c), for example, will invoke undefined
behavior (after the value of c is promoted to int).

You're likely to get away with it if you happen not to use negative
char values, or if plain char is unsigned, or if the implementation
accomodates negative values (I've seen some that do).
 
E

Eric Sosman

Ben said:
With the to*() and is*() functions, you should be careful to cast
char arguments to unsigned char before calling them. Type `char'
may be signed or unsigned, depending on your compiler or its
configuration. If char is signed, then some characters have
negative values; however, the arguments to is*() and to*()
functions must be nonnegative (or EOF). Casting to unsigned char
fixes this problem by forcing the character to the corresponding
positive value.

There's a subtlety here that's worth pointing out: Ben
says (correctly) that you must cast a _char_ argument to
unsigned char when using a <ctype.h> function. However, if
the argument is an int obtained from getc() or something of
the sort, you must _not_ cast it.

The reason for casting a char argument to unsigned char
is, as Ben explains, to handle C's built-in uncertainty as
to whether char is signed or unsigned. The reason for _not_
casting the int returned by getchar() is that this int already
has the proper non-negative value for a "legitimate" character
or has the negative value EOF. That is, getchar() can return
more distinct values than a char can represent (the extra value
being EOF), and if you coerce EOF to an unsigned char you'll
lose the ability to distinguish it from a legitimate character.

char *p = "Überwald";
if (isupper(*p)) /* wrong */
if (isupper((unsigned char)*p)) /* right */

int ch = getchar();
if (isupper(ch)) /* right */
if (isupper((unsigned char)ch)) /* wrong */
 
B

Ben Pfaff

Eric Sosman said:
There's a subtlety here that's worth pointing out: Ben
says (correctly) that you must cast a _char_ argument to
unsigned char when using a <ctype.h> function. However, if
the argument is an int obtained from getc() or something of
the sort, you must _not_ cast it.

Good point. I'll add that to my boilerplate text.
 
D

Dave Thompson

Sarloc said:
I'm a user of a brazilian programming forum and I've been reading a
discussion that never seems to end on [... whether to cast for ctype.h]
You need to cast the arguments to unsigned char.

C99 7.4p1 says: <snip>

Not always. You must ensure the value is in the range of unsigned
char, or EOF (which last is rarely useful). Casting to unsigned char
is one simple and effective way of doing this, but not the only one.
For example if you use an int variable that was last assigned the
return from getchar(), it is guaranteed to be in range without
casting, except on the rare systems where UCHAR_MAX > INT_MAX, where
you are unlikely to be able to use stdio.h anyway.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top