Casting to unsigned char for isupper() and friends

Discussion in 'C Programming' started by Francine.Neary@googlemail.com, Mar 23, 2007.

  1. Guest

    I've read that you should always cast the argument you pass to
    isupper(), isalnum(), etc. to unsigned char, even though their
    signature is int is...(int).

    This confuses me, for the following reason. The is...() functions can
    either accept a character, or EOF. But now suppose (as is common) that
    EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
    character value! So this casting destroys the possibility to pass EOF
    to is...(), and in fact gives misleading results in this case.
     
    , Mar 23, 2007
    #1
    1. Advertising

  2. writes:
    > I've read that you should always cast the argument you pass to
    > isupper(), isalnum(), etc. to unsigned char, even though their
    > signature is int is...(int).
    >
    > This confuses me, for the following reason. The is...() functions can
    > either accept a character, or EOF. But now suppose (as is common) that
    > EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
    > character value! So this casting destroys the possibility to pass EOF
    > to is...(), and in fact gives misleading results in this case.


    If you have a value of type (plain) char, you should cast it to
    unsigned char before passing it to isupper() (or any of the is*()
    functions). For example, if plain char is signed, then -42
    might be a valid character; you need to convert it to unsigned char,
    yielding (assuming 8-bit characters) the value 214, which isupper()
    can understand.

    If you have the value EOF, then presumably you haven't tried to store
    it in a variable of type char. For example, if it's the result of the
    getchar() function, then it's already of type int (and any characters
    that have negative values as signed char are already converted to
    unsigned char), so no cast is necessary. Casting it to unsigned char
    would, as you say, lose information.

    So saying that you should *always* cast the argument to unsigned char
    isn't quite correct. But the ability to pass the value EOF to the
    is*() functions is fairly obscure, and it's not something I've ever
    seen a use for. You're correct that EOF is an exception to the rule,
    but I'd recommend just avoiding EOF in this context in the first
    place.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Mar 23, 2007
    #2
    1. Advertising

  3. On 23 Mar 2007 16:30:13 -0700, in comp.lang.c ,
    wrote:

    >I've read that you should always cast the argument you pass to
    >isupper(), isalnum(), etc. to unsigned char, even though their
    >signature is int is...(int).
    >
    >This confuses me, for the following reason. The is...() functions can
    >either accept a character, or EOF. But now suppose (as is common) that
    >EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
    >character value! So this casting destroys the possibility to pass EOF
    >to is...(), and in fact gives misleading results in this case.


    While you can pass EOF to these functions it serves no useful purpose
    to do so that I can think of. I suspect its there because getchar()
    and the ilk can return it.

    On the other hand, any other value outside the range of unsigned char
    would invoke undefined behaviour. The cast is thus a safety measure to
    prevent accidental invocation of UB.

    --
    Mark McIntyre

    "Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are,
    by definition, not smart enough to debug it."
    --Brian Kernighan
     
    Mark McIntyre, Mar 24, 2007
    #3
  4. In article <>,
    Keith Thompson <> wrote:
    >But the ability to pass the value EOF to the
    >is*() functions is fairly obscure, and it's not something I've ever
    >seen a use for.


    I suppose if you have a series of tests like

    c = getchar();
    if(isupper(c))
    ...;
    else if(isdigit(c))
    ...;
    else if(c == '*')
    ...;
    else if(c == EOF)
    ...;

    you can do it without worrying about the order of the tests, just as if
    it only had equality tests.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
     
    Richard Tobin, Mar 24, 2007
    #4
  5. Flash Gordon Guest

    Mark McIntyre wrote, On 24/03/07 00:04:
    > On 23 Mar 2007 16:30:13 -0700, in comp.lang.c ,
    > wrote:
    >
    >> I've read that you should always cast the argument you pass to
    >> isupper(), isalnum(), etc. to unsigned char, even though their
    >> signature is int is...(int).
    >>
    >> This confuses me, for the following reason. The is...() functions can
    >> either accept a character, or EOF. But now suppose (as is common) that
    >> EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
    >> character value! So this casting destroys the possibility to pass EOF
    >> to is...(), and in fact gives misleading results in this case.

    >
    > While you can pass EOF to these functions it serves no useful purpose
    > to do so that I can think of. I suspect its there because getchar()
    > and the ilk can return it.


    I can see a useful purpose. On the assumption that EOF is the rare case
    you can produce efficient code with
    while (c=getchar() && isspace(c) && !(c==EOF)) continue;
    for skipping white space. There are times when this is both efficient
    and convenient. It is efficient because normally when the loop
    terminates it is because of isspace failing. I'm not sure what isspace
    returns if the input is EOF, it might mean you don't even need the last
    test!

    > On the other hand, any other value outside the range of unsigned char
    > would invoke undefined behaviour. The cast is thus a safety measure to
    > prevent accidental invocation of UB.


    The cast is a safety measure when the argument is not an int value that
    is the result of getchar.
    --
    Flash Gordon
     
    Flash Gordon, Mar 24, 2007
    #5
  6. CBFalconer Guest

    Richard Tobin wrote:
    >

    .... snip ...
    >
    > I suppose if you have a series of tests like
    >
    > c = getchar();
    > if(isupper(c))
    > ...;
    > else if(isdigit(c))
    > ...;
    > else if(c == '*')
    > ...;
    > else if(c == EOF)
    > ...;
    >
    > you can do it without worrying about the order of the tests, just
    > as if it only had equality tests.


    You can do this BECAUSE getchar (and fgetc and getc) return the int
    equivalent of an unsigned char, or EOF. Note that c above MUST be
    an int.

    Stylewar note: if is not a function, so follow it with a blank.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
     
    CBFalconer, Mar 24, 2007
    #6
  7. On Mar 23, 6:51 pm, CBFalconer <> wrote:
    <major snippage>
    >
    > Stylewar note: if is not a function, so follow it with a blank.
    >


    SILENCE, NUMBER TWO!!


    Mark F. Haigh
     
    Mark F. Haigh, Mar 24, 2007
    #7
  8. jaysome Guest

    On Fri, 23 Mar 2007 21:51:18 -0500, CBFalconer <>
    wrote:

    >Richard Tobin wrote:
    >>

    >... snip ...
    >>
    >> I suppose if you have a series of tests like
    >>
    >> c = getchar();
    >> if(isupper(c))
    >> ...;
    >> else if(isdigit(c))
    >> ...;
    >> else if(c == '*')
    >> ...;
    >> else if(c == EOF)
    >> ...;
    >>
    >> you can do it without worrying about the order of the tests, just
    >> as if it only had equality tests.

    >
    >You can do this BECAUSE getchar (and fgetc and getc) return the int
    >equivalent of an unsigned char, or EOF. Note that c above MUST be
    >an int.
    >
    >Stylewar note: if is not a function, so follow it with a blank.


    Yes! And neither are else, switch, for, or while functions.

    --
    jay
     
    jaysome, Mar 24, 2007
    #8
  9. CBFalconer said:

    > Stylewar note: if is not a function, so follow it with a blank.


    Why? (The stated reason is considered insufficient.)

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Mar 24, 2007
    #9
  10. jaysome <> writes:
    > On Fri, 23 Mar 2007 21:51:18 -0500, CBFalconer <>
    > wrote:
    >>Richard Tobin wrote:

    [...]
    >>> else if(isdigit(c))
    >>> ...;
    >>> else if(c == '*')

    [...]
    >>Stylewar note: if is not a function, so follow it with a blank.

    >
    > Yes! And neither are else, switch, for, or while functions.


    True, but else is seldom a problem. I don't think I've ever seen an
    else immediately followed by a left parenthesis. At least, I hadn't
    until a couple of minutes ago, when I write this silly little program:

    #include <stdio.h>
    int main(int argc, char **argv)
    {
    if (argc == 1)
    puts("No arguments");
    else(puts("One or more arguments"));
    return 0;
    }

    (Or I could have added a cast to void rather than enclosing the entire
    call in parentheses.)

    But I agree with your actual point.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Mar 24, 2007
    #10
  11. Richard Heathfield <> writes:
    > CBFalconer said:
    >
    >> Stylewar note: if is not a function, so follow it with a blank.

    >
    > Why? (The stated reason is considered insufficient.)


    Jumping into the middle of this ...

    Of course the compiler doesn't care whether there's a blank between
    the "if" and the "(", so readability is the only issue.

    In my opinion, function calls should look like function calls, and
    things that are not function calls should not look like function calls
    (except for invocations of function-like macros, which are supposed to
    look and act like function calls). By convention, in a function call,
    there's no whitespace between the function name and the "(":

    printf("Hello, world\n");

    By convention, if a keyword is followed by something in parentheses,
    there should be whitespace:

    if (condition) ...
    while (condition) ...
    for (expr; condition; expr) ...
    switch (expr) ...

    It's "merely" a matter of style, and opinions can legitimately differ.
    Most of us know that if, while, for, and switch are keywords, not
    functions. But for me, this consistent convention makes the code just
    a little bit easier to read. And we've seen newbies here, misled by
    seeing things like "return(0);", asking why return doesn't act like
    other functions. Anything we can do to prevent that kind of
    misconception, as long as there are no bad side effects (as there
    aren't in this case), is a good thing.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Mar 24, 2007
    #11
  12. CBFalconer Guest

    Blank verse and style (was: Casting to unsigned char for isupper() andfriends)

    Richard Heathfield wrote:
    > CBFalconer said:
    >
    >> Stylewar note: if is not a function, so follow it with a blank.

    >
    > Why? (The stated reason is considered insufficient.)


    I know we don't agree, but a keyword is not a function, and it is
    pleasant to easily differentiate between those classes during
    source scans. I do not expect an identifier immediately followed
    by a '(' to be the controlling element for further code. In
    addition, whitespace generally prevents hidden typo errors.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
     
    CBFalconer, Mar 24, 2007
    #12
  13. pete Guest

    Re: Blank verse and style (was: Casting to unsigned char for isupper() andfriends)

    CBFalconer wrote:
    >
    > Richard Heathfield wrote:
    > > CBFalconer said:
    > >
    > >> Stylewar note: if is not a function, so follow it with a blank.

    > >
    > > Why? (The stated reason is considered insufficient.)

    >
    > I know we don't agree, but a keyword is not a function, and it is
    > pleasant to easily differentiate between those classes during
    > source scans. I do not expect an identifier immediately followed
    > by a '(' to be the controlling element for further code. In
    > addition, whitespace generally prevents hidden typo errors.


    sizeof(int)
    or
    sizeof (int)
    ?

    I was looking at
    http://www.chris-lott.org/resources/cstyle/indhill-cstyle.html

    They don't completely explain their spacing policy
    and their examples aren't consistent.
    They have
    oogle (zork)
    which I think is a function call, and
    func()

    They also have
    sizeof(int)
    and
    return (NULL)

    --
    pete
     
    pete, Mar 25, 2007
    #13
  14. CBFalconer Guest

    Re: Blank verse and style (was: Casting to unsigned char for isupper()and friends)

    pete wrote:
    > CBFalconer wrote:
    >> Richard Heathfield wrote:
    >>> CBFalconer said:
    >>>
    >>>> Stylewar note: if is not a function, so follow it with a blank.
    >>>
    >>> Why? (The stated reason is considered insufficient.)

    >>
    >> I know we don't agree, but a keyword is not a function, and it is
    >> pleasant to easily differentiate between those classes during
    >> source scans. I do not expect an identifier immediately followed
    >> by a '(' to be the controlling element for further code. In
    >> addition, whitespace generally prevents hidden typo errors.

    >
    > sizeof(int)
    > or
    > sizeof (int)
    > ?


    The latter. sizeof is not a function. Consistency pays.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
     
    CBFalconer, Mar 25, 2007
    #14
  15. Re: Blank verse and style (was: Casting to unsigned char for isupper() and friends)

    CBFalconer <> writes:
    > pete wrote:
    >> CBFalconer wrote:
    >>> Richard Heathfield wrote:
    >>>> CBFalconer said:
    >>>>
    >>>>> Stylewar note: if is not a function, so follow it with a blank.
    >>>>
    >>>> Why? (The stated reason is considered insufficient.)
    >>>
    >>> I know we don't agree, but a keyword is not a function, and it is
    >>> pleasant to easily differentiate between those classes during
    >>> source scans. I do not expect an identifier immediately followed
    >>> by a '(' to be the controlling element for further code. In
    >>> addition, whitespace generally prevents hidden typo errors.

    >>
    >> sizeof(int)
    >> or
    >> sizeof (int)
    >> ?

    >
    > The latter. sizeof is not a function. Consistency pays.


    Thanks for pointing this out. I've been using "sizeof(int)" myself,
    without really thinking about it. I'll try to remember to insert a
    space from now on.

    On the other hand, sizeof is a unary operator, and it's common to
    leave no space between a unary operator and its operand: "-1" or
    "!condition", for example. But in this case, I think making it not
    look like a function call is more significant.

    Informed opinions will inevitably vary, and I won't object to anyone
    else writing "sizeof(int)". (For that matter, it's usually better to
    apply sizeof to an expression, typically an object, rather than to a
    type -- but "sizeof (type-name)" exists for a reason.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Mar 25, 2007
    #15
  16. Old Wolf Guest

    On Mar 24, 2:28 pm, Flash Gordon <> wrote:
    > I can see a useful purpose. On the assumption that EOF is the rare case
    > you can produce efficient code with
    > while (c=getchar() && isspace(c) && !(c==EOF)) continue;


    A more common use might be:
    int ch = toupper( getchar() );

    and the isxxxx functions are the same for consistency.

    > The cast is a safety measure when the argument is not an int value that
    > is the result of getchar.


    To be clear, it is a necessary safety measure, there's no reason
    to expect that isxxx functions will accept negative values other
    than EOF.
     
    Old Wolf, Mar 26, 2007
    #16
  17. Keith Thompson said:

    > Richard Heathfield <> writes:
    >> CBFalconer said:
    >>
    >>> Stylewar note: if is not a function, so follow it with a blank.

    >>
    >> Why? (The stated reason is considered insufficient.)

    >
    > Jumping into the middle of this ...
    >
    > Of course the compiler doesn't care whether there's a blank between
    > the "if" and the "(", so readability is the only issue.


    Right, and so we're heading towards subjectivity.

    > In my opinion, function calls should look like function calls, and
    > things that are not function calls should not look like function calls
    > (except for invocations of function-like macros, which are supposed to
    > look and act like function calls).


    Ish.

    > By convention, in a function call,
    > there's no whitespace between the function name and the "(":


    By some conventions, yes (including mine). But there's no actual rule.

    > printf("Hello, world\n");
    >
    > By convention, if a keyword is followed by something in parentheses,
    > there should be whitespace:


    By some conventions, yes - but not mine. And there's no actual rule.

    > if (condition) ...
    > while (condition) ...
    > for (expr; condition; expr) ...
    > switch (expr) ...


    I use if(, while(, for(, switch(.

    > It's "merely" a matter of style, and opinions can legitimately differ.


    Right. Which is why it is not my place, or yours, or - more particularly
    in this case - Chuck's, to insist that people adopt a particular style.

    <snip>

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Mar 26, 2007
    #17
  18. Re: Blank verse and style (was: Casting to unsigned char for isupper() and friends)

    CBFalconer said:

    > Richard Heathfield wrote:
    >> CBFalconer said:
    >>
    >>> Stylewar note: if is not a function, so follow it with a blank.

    >>
    >> Why? (The stated reason is considered insufficient.)

    >
    > I know we don't agree, but a keyword is not a function, and it is
    > pleasant to easily differentiate between those classes during
    > source scans.


    I can do so very easily without requiring a spurious space. So can you,
    I'm sure, since you've never raised the issue when reading /my/ code
    (as far as I can recall), and I never use a blank after if, while, etc.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Mar 26, 2007
    #18
  19. Re: Blank verse and style (was: Casting to unsigned char for isupper() and friends)

    pete said:

    > sizeof(int)
    > or
    > sizeof (int)


    sizeof i :)


    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Mar 26, 2007
    #19
  20. CBFalconer Guest

    Re: Blank verse and style (was: Casting to unsigned char for isupper()and friends)

    Richard Heathfield wrote:
    > CBFalconer said:
    >> Richard Heathfield wrote:
    >>> CBFalconer said:
    >>>
    >>>> Stylewar note: if is not a function, so follow it with a blank.
    >>>
    >>> Why? (The stated reason is considered insufficient.)

    >>
    >> I know we don't agree, but a keyword is not a function, and it is
    >> pleasant to easily differentiate between those classes during
    >> source scans.

    >
    > I can do so very easily without requiring a spurious space. So can
    > you, I'm sure, since you've never raised the issue when reading
    > /my/ code (as far as I can recall), and I never use a blank after
    > if, while, etc.


    I can remember raising it, which is why my comment above. But I
    see no point to arguing about it. I simply stated my opinion.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
     
    CBFalconer, Mar 26, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steffen Fiksdal

    void*, char*, unsigned char*, signed char*

    Steffen Fiksdal, May 8, 2005, in forum: C Programming
    Replies:
    1
    Views:
    607
    Jack Klein
    May 9, 2005
  2. lovecreatesbeauty
    Replies:
    1
    Views:
    1,099
    Ian Collins
    May 9, 2006
  3. Ioannis Vranos
    Replies:
    11
    Views:
    776
    Ioannis Vranos
    Mar 28, 2008
  4. Ioannis Vranos

    Padding bits and char, unsigned char, signed char

    Ioannis Vranos, Mar 28, 2008, in forum: C Programming
    Replies:
    6
    Views:
    626
    Ben Bacarisse
    Mar 29, 2008
  5. Alex Vinokur
    Replies:
    9
    Views:
    814
    James Kanze
    Oct 13, 2008
Loading...

Share This Page