Casting to unsigned char for isupper() and friends

F

Francine.Neary

I've read that you should always cast the argument you pass to
isupper(), isalnum(), etc. to unsigned char, even though their
signature is int is...(int).

This confuses me, for the following reason. The is...() functions can
either accept a character, or EOF. But now suppose (as is common) that
EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
character value! So this casting destroys the possibility to pass EOF
to is...(), and in fact gives misleading results in this case.
 
K

Keith Thompson

I've read that you should always cast the argument you pass to
isupper(), isalnum(), etc. to unsigned char, even though their
signature is int is...(int).

This confuses me, for the following reason. The is...() functions can
either accept a character, or EOF. But now suppose (as is common) that
EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
character value! So this casting destroys the possibility to pass EOF
to is...(), and in fact gives misleading results in this case.

If you have a value of type (plain) char, you should cast it to
unsigned char before passing it to isupper() (or any of the is*()
functions). For example, if plain char is signed, then -42
might be a valid character; you need to convert it to unsigned char,
yielding (assuming 8-bit characters) the value 214, which isupper()
can understand.

If you have the value EOF, then presumably you haven't tried to store
it in a variable of type char. For example, if it's the result of the
getchar() function, then it's already of type int (and any characters
that have negative values as signed char are already converted to
unsigned char), so no cast is necessary. Casting it to unsigned char
would, as you say, lose information.

So saying that you should *always* cast the argument to unsigned char
isn't quite correct. But the ability to pass the value EOF to the
is*() functions is fairly obscure, and it's not something I've ever
seen a use for. You're correct that EOF is an exception to the rule,
but I'd recommend just avoiding EOF in this context in the first
place.
 
M

Mark McIntyre

On 23 Mar 2007 16:30:13 -0700, in comp.lang.c ,
I've read that you should always cast the argument you pass to
isupper(), isalnum(), etc. to unsigned char, even though their
signature is int is...(int).

This confuses me, for the following reason. The is...() functions can
either accept a character, or EOF. But now suppose (as is common) that
EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
character value! So this casting destroys the possibility to pass EOF
to is...(), and in fact gives misleading results in this case.

While you can pass EOF to these functions it serves no useful purpose
to do so that I can think of. I suspect its there because getchar()
and the ilk can return it.

On the other hand, any other value outside the range of unsigned char
would invoke undefined behaviour. The cast is thus a safety measure to
prevent accidental invocation of UB.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
R

Richard Tobin

But the ability to pass the value EOF to the
is*() functions is fairly obscure, and it's not something I've ever
seen a use for.

I suppose if you have a series of tests like

c = getchar();
if(isupper(c))
...;
else if(isdigit(c))
...;
else if(c == '*')
...;
else if(c == EOF)
...;

you can do it without worrying about the order of the tests, just as if
it only had equality tests.

-- Richard
 
F

Flash Gordon

Mark McIntyre wrote, On 24/03/07 00:04:
On 23 Mar 2007 16:30:13 -0700, in comp.lang.c ,


While you can pass EOF to these functions it serves no useful purpose
to do so that I can think of. I suspect its there because getchar()
and the ilk can return it.

I can see a useful purpose. On the assumption that EOF is the rare case
you can produce efficient code with
while (c=getchar() && isspace(c) && !(c==EOF)) continue;
for skipping white space. There are times when this is both efficient
and convenient. It is efficient because normally when the loop
terminates it is because of isspace failing. I'm not sure what isspace
returns if the input is EOF, it might mean you don't even need the last
test!
On the other hand, any other value outside the range of unsigned char
would invoke undefined behaviour. The cast is thus a safety measure to
prevent accidental invocation of UB.

The cast is a safety measure when the argument is not an int value that
is the result of getchar.
 
C

CBFalconer

Richard said:
.... snip ...

I suppose if you have a series of tests like

c = getchar();
if(isupper(c))
...;
else if(isdigit(c))
...;
else if(c == '*')
...;
else if(c == EOF)
...;

you can do it without worrying about the order of the tests, just
as if it only had equality tests.

You can do this BECAUSE getchar (and fgetc and getc) return the int
equivalent of an unsigned char, or EOF. Note that c above MUST be
an int.

Stylewar note: if is not a function, so follow it with a blank.
 
J

jaysome

You can do this BECAUSE getchar (and fgetc and getc) return the int
equivalent of an unsigned char, or EOF. Note that c above MUST be
an int.

Stylewar note: if is not a function, so follow it with a blank.

Yes! And neither are else, switch, for, or while functions.
 
K

Keith Thompson

jaysome said:
Richard Tobin wrote: [...]
else if(isdigit(c))
...;
else if(c == '*')
[...]
Stylewar note: if is not a function, so follow it with a blank.

Yes! And neither are else, switch, for, or while functions.

True, but else is seldom a problem. I don't think I've ever seen an
else immediately followed by a left parenthesis. At least, I hadn't
until a couple of minutes ago, when I write this silly little program:

#include <stdio.h>
int main(int argc, char **argv)
{
if (argc == 1)
puts("No arguments");
else(puts("One or more arguments"));
return 0;
}

(Or I could have added a cast to void rather than enclosing the entire
call in parentheses.)

But I agree with your actual point.
 
K

Keith Thompson

Richard Heathfield said:
CBFalconer said:


Why? (The stated reason is considered insufficient.)

Jumping into the middle of this ...

Of course the compiler doesn't care whether there's a blank between
the "if" and the "(", so readability is the only issue.

In my opinion, function calls should look like function calls, and
things that are not function calls should not look like function calls
(except for invocations of function-like macros, which are supposed to
look and act like function calls). By convention, in a function call,
there's no whitespace between the function name and the "(":

printf("Hello, world\n");

By convention, if a keyword is followed by something in parentheses,
there should be whitespace:

if (condition) ...
while (condition) ...
for (expr; condition; expr) ...
switch (expr) ...

It's "merely" a matter of style, and opinions can legitimately differ.
Most of us know that if, while, for, and switch are keywords, not
functions. But for me, this consistent convention makes the code just
a little bit easier to read. And we've seen newbies here, misled by
seeing things like "return(0);", asking why return doesn't act like
other functions. Anything we can do to prevent that kind of
misconception, as long as there are no bad side effects (as there
aren't in this case), is a good thing.
 
C

CBFalconer

Richard said:
CBFalconer said:


Why? (The stated reason is considered insufficient.)

I know we don't agree, but a keyword is not a function, and it is
pleasant to easily differentiate between those classes during
source scans. I do not expect an identifier immediately followed
by a '(' to be the controlling element for further code. In
addition, whitespace generally prevents hidden typo errors.
 
P

pete

CBFalconer said:
I know we don't agree, but a keyword is not a function, and it is
pleasant to easily differentiate between those classes during
source scans. I do not expect an identifier immediately followed
by a '(' to be the controlling element for further code. In
addition, whitespace generally prevents hidden typo errors.

sizeof(int)
or
sizeof (int)
?

I was looking at
http://www.chris-lott.org/resources/cstyle/indhill-cstyle.html

They don't completely explain their spacing policy
and their examples aren't consistent.
They have
oogle (zork)
which I think is a function call, and
func()

They also have
sizeof(int)
and
return (NULL)
 
K

Keith Thompson

CBFalconer said:
The latter. sizeof is not a function. Consistency pays.

Thanks for pointing this out. I've been using "sizeof(int)" myself,
without really thinking about it. I'll try to remember to insert a
space from now on.

On the other hand, sizeof is a unary operator, and it's common to
leave no space between a unary operator and its operand: "-1" or
"!condition", for example. But in this case, I think making it not
look like a function call is more significant.

Informed opinions will inevitably vary, and I won't object to anyone
else writing "sizeof(int)". (For that matter, it's usually better to
apply sizeof to an expression, typically an object, rather than to a
type -- but "sizeof (type-name)" exists for a reason.
 
O

Old Wolf

I can see a useful purpose. On the assumption that EOF is the rare case
you can produce efficient code with
while (c=getchar() && isspace(c) && !(c==EOF)) continue;

A more common use might be:
int ch = toupper( getchar() );

and the isxxxx functions are the same for consistency.
The cast is a safety measure when the argument is not an int value that
is the result of getchar.

To be clear, it is a necessary safety measure, there's no reason
to expect that isxxx functions will accept negative values other
than EOF.
 
R

Richard Heathfield

Keith Thompson said:
Jumping into the middle of this ...

Of course the compiler doesn't care whether there's a blank between
the "if" and the "(", so readability is the only issue.

Right, and so we're heading towards subjectivity.
In my opinion, function calls should look like function calls, and
things that are not function calls should not look like function calls
(except for invocations of function-like macros, which are supposed to
look and act like function calls).
Ish.

By convention, in a function call,
there's no whitespace between the function name and the "(":

By some conventions, yes (including mine). But there's no actual rule.
printf("Hello, world\n");

By convention, if a keyword is followed by something in parentheses,
there should be whitespace:

By some conventions, yes - but not mine. And there's no actual rule.
if (condition) ...
while (condition) ...
for (expr; condition; expr) ...
switch (expr) ...

I use if(, while(, for(, switch(.
It's "merely" a matter of style, and opinions can legitimately differ.

Right. Which is why it is not my place, or yours, or - more particularly
in this case - Chuck's, to insist that people adopt a particular style.

<snip>
 
R

Richard Heathfield

CBFalconer said:
I know we don't agree, but a keyword is not a function, and it is
pleasant to easily differentiate between those classes during
source scans.

I can do so very easily without requiring a spurious space. So can you,
I'm sure, since you've never raised the issue when reading /my/ code
(as far as I can recall), and I never use a blank after if, while, etc.
 
C

CBFalconer

Richard said:
CBFalconer said:

I can do so very easily without requiring a spurious space. So can
you, I'm sure, since you've never raised the issue when reading
/my/ code (as far as I can recall), and I never use a blank after
if, while, etc.

I can remember raising it, which is why my comment above. But I
see no point to arguing about it. I simply stated my opinion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top