char * signedness

Discussion in 'C Programming' started by Pietro Cerutti, Jul 5, 2007.

  1. Hi group,
    is it always safe to pass unsigned char * variables as parameters to
    functions accepting char * arguments?

    For instance, I have to compare two unsigned char * strings.
    Can I safely use strcmp? Do I need to cast the two strings to char *?

    Thank you

    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, Jul 5, 2007
    #1
    1. Advertising

  2. Pietro Cerutti wrote:
    > Hi group,
    > is it always safe to pass unsigned char * variables as parameters to
    > functions accepting char * arguments?


    For the standard library functions, yes, because while they take char *
    arguments, they convert it to unsigned char * internally anyway.

    > For instance, I have to compare two unsigned char * strings.
    > Can I safely use strcmp?


    Yes.

    > Do I need to cast the two strings to char *?


    You need to convert them to char *. You do not necessarily need a cast for
    that; you could use an implicit convertion from unsigned char * to void *,
    and then another implicit convertion from void * to char *. In this case, a
    cast would be a good idea though.
     
    Harald van =?UTF-8?B?RMSzaw==?=, Jul 5, 2007
    #2
    1. Advertising

  3. Harald van Dijk wrote:
    > Pietro Cerutti wrote:
    >> Hi group,
    >> is it always safe to pass unsigned char * variables as parameters to
    >> functions accepting char * arguments?

    >
    > For the standard library functions, yes, because while they take char *
    > arguments, they convert it to unsigned char * internally anyway.


    This is not true for the implementation of strncmp on my system, which is:

    /*** BEGIN STRNCMP ON FREEBSD ***/
    int
    strncmp(s1, s2, n)
    const char *s1, *s2;
    size_t n;
    {

    if (n == 0)
    return (0);
    do {
    if (*s1 != *s2++)
    return (*(const unsigned char *)s1 -
    *(const unsigned char *)(s2 - 1));
    if (*s1++ == 0)
    break;
    } while (--n != 0);
    return (0);
    }
    /*** END STRNCMP ON FREEBSD ***/

    I think I'm missing something about chars and/or implicit conversions.

    Could you please explain the output of the following program to me?
    The two chars c[0] and d[0] have different values (220 and -36), are not
    equal (the comparison operator returns 0) but the two strings c and d
    are equal to strncmp (which returns 0) and represent the same string to
    printf ("ü").

    /*** BEGIN DUMMY TEST PROGRAM ***/
    #include <stdio.h>
    #include <string.h>

    int main(void)
    {
    unsigned char c[2];
    char d[2];

    c[0] = 220; c[1] = '\0';
    d[0] = c[0]; d[1] = '\0';

    printf("c is %s\n", c);
    printf("d is %s\n", d);
    printf("c[0] is %02x\n", c[0]);
    printf("d[0] is %02x\n", d[0]);
    printf("c[0] == d[0] is %d\n", (c[0] == d[0]));
    printf("strncmp(c, d, 1) is %d\n", strncmp(c, d, 1));

    return(0);
    }
    /*** END DUMMY TEST PROGRAM ***/


    Thank you!

    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, Jul 6, 2007
    #3
  4. Pietro Cerutti <> wrote:
    > Harald van D k wrote:
    > > Pietro Cerutti wrote:
    > > > is it always safe to pass unsigned char * variables as
    > > > parameters to functions accepting char * arguments?

    > >
    > > For the standard library functions, yes, because while
    > > they take char * arguments, they convert it to unsigned
    > > char * internally anyway.

    >
    > This is not true for the implementation of strncmp on
    > my system, which is:


    Yes it is, under the 'as if' rule.

    >
    > /*** BEGIN STRNCMP ON FREEBSD ***/
    > int
    > strncmp(s1, s2, n)
    > const char *s1, *s2;
    > size_t n;
    > {
    >
    > if (n == 0)
    > return (0);
    > do {
    > if (*s1 != *s2++)


    On systems where plain char is signed but unpadded,
    this will find differences irrespective of whether
    the bytes are treated as signed or unsigned char.

    > return (*(const unsigned char *)s1 -
    > *(const unsigned char *)(s2 - 1));


    Here the unsigned char rule is applied explicitly as
    required by the language specification. Note that
    on your system, unsigned char promotes to int which
    allows for negative results.

    > if (*s1++ == 0)
    > break;
    > } while (--n != 0);
    > return (0);}
    >
    > /*** END STRNCMP ON FREEBSD ***/
    >
    > I think I'm missing something about chars and/or implicit
    > conversions.


    The problem is that plain char can be signed or unsigned.
    Character codings are all non-negative, but char is only
    required to be able to store positive values for characters
    in the basic execution character set. So characters in the
    extended character set may be negative.

    > Could you please explain the output of the following
    > program to me? The two chars c[0] and d[0] have different
    > values (220 and -36), are not equal (the comparison
    > operator returns 0) but the two strings c and d are equal
    > to strncmp (which returns 0) and represent the same string
    > to printf ("ü").
    >
    > /*** BEGIN DUMMY TEST PROGRAM ***/
    > #include <stdio.h>
    > #include <string.h>
    >
    > int main(void)
    > {
    > unsigned char c[2];
    > char d[2];
    >
    > c[0] = 220; c[1] = '\0';
    > d[0] = c[0]; d[1] = '\0';


    If plain char is signed (and 8-bits) on your system, this
    will put an implementation defined value into d[0]. Most
    likely is 220 - 256 == -36. The representation of -36 in
    two's complement is the same as the representation of 220
    in pure binary notation of an unsigned char.

    > printf("c is %s\n", c);
    > printf("d is %s\n", d);


    For the reason above, this should print the same thing.
    [Note that assuming character codings will make your code
    non-portable.]

    > printf("c[0] is %02x\n", c[0]);
    > printf("d[0] is %02x\n", d[0]);
    > printf("c[0] == d[0] is %d\n", (c[0] == d[0]));


    Both char and unsigned char values will promote to int
    which is capable of supporting the full range of both
    character types. Hence, -36 is not the same value as 220.

    > printf("strncmp(c, d, 1) is %d\n", strncmp(c, d, 1));


    Here you are using a function which _must_ compare the
    unsigned char values of the character representation.
    Not surprisingly, 220 is the same as 220.

    > return(0);}
    >
    > /*** END DUMMY TEST PROGRAM ***/


    --
    Peter
     
    Peter Nilsson, Jul 6, 2007
    #4
  5. Peter Nilsson wrote:
    > Pietro Cerutti <> wrote:
    >> Harald van D k wrote:
    >>> Pietro Cerutti wrote:
    >>>> is it always safe to pass unsigned char * variables as
    >>>> parameters to functions accepting char * arguments?
    >>> For the standard library functions, yes, because while
    >>> they take char * arguments, they convert it to unsigned
    >>> char * internally anyway.

    >> This is not true for the implementation of strncmp on
    >> my system, which is:

    >
    > Yes it is, under the 'as if' rule.
    >
    >> /*** BEGIN STRNCMP ON FREEBSD ***/
    >> int
    >> strncmp(s1, s2, n)
    >> const char *s1, *s2;
    >> size_t n;
    >> {
    >>
    >> if (n == 0)
    >> return (0);
    >> do {
    >> if (*s1 != *s2++)

    >
    > On systems where plain char is signed but unpadded,
    > this will find differences irrespective of whether
    > the bytes are treated as signed or unsigned char.
    >
    >> return (*(const unsigned char *)s1 -
    >> *(const unsigned char *)(s2 - 1));

    >
    > Here the unsigned char rule is applied explicitly as
    > required by the language specification. Note that
    > on your system, unsigned char promotes to int which
    > allows for negative results.
    >
    >> if (*s1++ == 0)
    >> break;
    >> } while (--n != 0);
    >> return (0);}
    >>
    >> /*** END STRNCMP ON FREEBSD ***/
    >>
    >> I think I'm missing something about chars and/or implicit
    >> conversions.

    >
    > The problem is that plain char can be signed or unsigned.
    > Character codings are all non-negative, but char is only
    > required to be able to store positive values for characters
    > in the basic execution character set. So characters in the
    > extended character set may be negative.
    >
    >> Could you please explain the output of the following
    >> program to me? The two chars c[0] and d[0] have different
    >> values (220 and -36), are not equal (the comparison
    >> operator returns 0) but the two strings c and d are equal
    >> to strncmp (which returns 0) and represent the same string
    >> to printf ("ü").
    >>
    >> /*** BEGIN DUMMY TEST PROGRAM ***/
    >> #include <stdio.h>
    >> #include <string.h>
    >>
    >> int main(void)
    >> {
    >> unsigned char c[2];
    >> char d[2];
    >>
    >> c[0] = 220; c[1] = '\0';
    >> d[0] = c[0]; d[1] = '\0';

    >
    > If plain char is signed (and 8-bits) on your system, this
    > will put an implementation defined value into d[0]. Most
    > likely is 220 - 256 == -36. The representation of -36 in
    > two's complement is the same as the representation of 220
    > in pure binary notation of an unsigned char.
    >
    >> printf("c is %s\n", c);
    >> printf("d is %s\n", d);

    >
    > For the reason above, this should print the same thing.
    > [Note that assuming character codings will make your code
    > non-portable.]
    >
    >> printf("c[0] is %02x\n", c[0]);
    >> printf("d[0] is %02x\n", d[0]);
    >> printf("c[0] == d[0] is %d\n", (c[0] == d[0]));

    >
    > Both char and unsigned char values will promote to int
    > which is capable of supporting the full range of both
    > character types. Hence, -36 is not the same value as 220.
    >
    >> printf("strncmp(c, d, 1) is %d\n", strncmp(c, d, 1));

    >
    > Here you are using a function which _must_ compare the
    > unsigned char values of the character representation.
    > Not surprisingly, 220 is the same as 220.
    >
    >> return(0);}
    >>
    >> /*** END DUMMY TEST PROGRAM ***/

    >


    Thank you for the exhaustive explanation!

    Regards,

    > --
    > Peter
    >



    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, Jul 6, 2007
    #5
  6. Pietro Cerutti

    CBFalconer Guest

    Pietro Cerutti wrote:
    >

    .... snip ...
    >
    > /*** BEGIN STRNCMP ON FREEBSD ***/
    > int
    > strncmp(s1, s2, n)
    > const char *s1, *s2;
    > size_t n;
    > {
    >

    .... snip ...
    >
    > Could you please explain the output of the following program to
    > me? The two chars c[0] and d[0] have different values (220 and
    > -36), are not equal (the comparison operator returns 0) but the
    > two strings c and d are equal to strncmp (which returns 0) and
    > represent the same string to printf ("ü").


    Of course not. Your test program classifies one as unsigned char,
    and the other as signed char. The same bit pattern represents both
    (at least in 2's complement). The freebsd implementation does not
    have a proper prototype (uses old fashioned K&R I header), so all
    arguments are passed in as received, and then treated as "const
    char *". This makes them equal.

    There are three char types, plain, signed, and unsigned. plain
    "char" is identical to one of the other two, but you don't know
    which without examining your compile system documentation.

    --
    <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
    <http://www.securityfocus.com/columnists/423>
    <http://www.aaxnet.com/editor/edit043.html>
    cbfalconer at maineline dot net



    --
    Posted via a free Usenet account from http://www.teranews.com
     
    CBFalconer, Jul 6, 2007
    #6
  7. Pietro Cerutti

    Guest

    Harald van D k wrote:
    > Pietro Cerutti wrote:
    > > Hi group,
    > > is it always safe to pass unsigned char * variables as parameters to
    > > functions accepting char * arguments?

    >
    > For the standard library functions, yes, because while they take char *
    > arguments, they convert it to unsigned char * internally anyway.
    >

    I don't agree with Harald van D k, long time back I had similar
    sort of
    querry, please refer the below link, and follow the therad, as it
    will
    help you in getting the insight behaviour of the unsigned and
    signed
    values.

    http://groups.google.co.in/group/alt.comp.lang.learn.c- c++/
    browse_thread/thread/6b06d071ddda12bc/b6aba0a74dff26a0?
    lnk=st&q=&rnum=9&hl=en#b6aba0a74dff26a0

    Look for the explanation given by BARAT and KARL

    HTH
    ~Ranjeet Gupta


    > > For instance, I have to compare two unsigned char * strings.
    > > Can I safely use strcmp?

    >
    > Yes.
    >
    > > Do I need to cast the two strings to char *?

    >
    > You need to convert them to char *. You do not necessarily need a cast for
    > that; you could use an implicit convertion from unsigned char * to void *,
    > and then another implicit convertion from void * to char *. In this case, a
    > cast would be a good idea though.
     
    , Jul 6, 2007
    #7
  8. Pietro Cerutti

    Joe Wright Guest

    Pietro Cerutti wrote:
    > Hi group,
    > is it always safe to pass unsigned char * variables as parameters to
    > functions accepting char * arguments?
    >
    > For instance, I have to compare two unsigned char * strings.
    > Can I safely use strcmp? Do I need to cast the two strings to char *?
    >
    > Thank you
    >

    Given C89 and prototypes:

    int strcmp(const char *_s1, const char *_s2);

    Your unsigned char * arguments will be coerced automatically to the type
    required.

    --
    Joe Wright
    "Everything should be made as simple as possible, but not simpler."
    --- Albert Einstein ---
     
    Joe Wright, Jul 6, 2007
    #8
  9. writes:

    > Harald van D k wrote:
    >> Pietro Cerutti wrote:
    >> > Hi group,
    >> > is it always safe to pass unsigned char * variables as parameters to
    >> > functions accepting char * arguments?

    >>
    >> For the standard library functions, yes, because while they take char *
    >> arguments, they convert it to unsigned char * internally anyway.
    >>

    > I don't agree with Harald van D k, long time back I had similar
    > sort of
    > querry, please refer the below link, and follow the therad, as it
    > will
    > help you in getting the insight behaviour of the unsigned and
    > signed
    > values.
    >
    > http://groups.google.co.in/group/alt.comp.lang.learn.c- c++/
    > browse_thread/thread/6b06d071ddda12bc/b6aba0a74dff26a0?
    > lnk=st&q=&rnum=9&hl=en#b6aba0a74dff26a0


    I see nothing there that has a bearing on this thread. You were
    asking about signed representations and got the usual mix of correct
    and incorrect replies.

    > Look for the explanation given by BARAT and KARL


    I could not find anything by BARAT but Karl misled you (as least as
    far as C is concerned) by suggesting that a left shift of a signed
    integer with negative value was well-defined.

    --
    Ben.
     
    Ben Bacarisse, Jul 6, 2007
    #9
  10. Harald van D?k <> wrote:

    > Pietro Cerutti wrote:
    > > is it always safe to pass unsigned char * variables as parameters to
    > > functions accepting char * arguments?


    > For the standard library functions, yes, because while they take char *
    > arguments, they convert it to unsigned char * internally anyway.


    If by "the standard library functions", you mean strcmp() and
    strncmp(), then yes, by 7.21.4. If you intended that statement to
    include the rest of the str*() functions, then I would like to see
    C&V, as I was not able to locate any text that suggests that any of
    the other str*() functions interpret their arguments as unsigned char
    *.

    --
    C. Benson Manica | I *should* know what I'm talking about - if I
    cbmanica(at)gmail.com | don't, I need to know. Flames welcome.
     
    Christopher Benson-Manica, Jul 6, 2007
    #10
  11. Christopher Benson-Manica wrote:
    > Harald van D?k <> wrote:
    >
    >> Pietro Cerutti wrote:
    >> > is it always safe to pass unsigned char * variables as parameters to
    >> > functions accepting char * arguments?

    >
    >> For the standard library functions, yes, because while they take char *
    >> arguments, they convert it to unsigned char * internally anyway.

    >
    > If by "the standard library functions", you mean strcmp() and
    > strncmp(), then yes, by 7.21.4. If you intended that statement to
    > include the rest of the str*() functions, then I would like to see
    > C&V, as I was not able to locate any text that suggests that any of
    > the other str*() functions interpret their arguments as unsigned char
    > *.


    7.21.1p3 (from n1124; it might have been added even after C99):
    "For all functions in this subclause, each character shall be interpreted as
    if it had the type unsigned char (and therefore every possible object
    representation is valid and has a different value)."
     
    Harald van =?UTF-8?B?RMSzaw==?=, Jul 6, 2007
    #11
  12. Harald van D?k <> wrote:

    > 7.21.1p3 (from n1124; it might have been added even after C99):
    > "For all functions in this subclause, each character shall be interpreted as
    > if it had the type unsigned char (and therefore every possible object
    > representation is valid and has a different value)."


    Thanks. That text is indeed not present in n869, and it's nice to see
    that the issue was (eventually) addressed. As long as OP isn't
    running on a C89 DS9K implementation, all would seem likely to be well.

    --
    C. Benson Manica | I *should* know what I'm talking about - if I
    cbmanica(at)gmail.com | don't, I need to know. Flames welcome.
     
    Christopher Benson-Manica, Jul 6, 2007
    #12
  13. Pietro Cerutti

    CryptiqueGuy Guest

    On Jul 6, 9:33 am, CBFalconer <> wrote:
    > Pietro Cerutti wrote:
    >
    > ... snip ...
    >
    > > /*** BEGIN STRNCMP ON FREEBSD ***/
    > > int
    > > strncmp(s1, s2, n)
    > > const char *s1, *s2;
    > > size_t n;
    > > {

    >
    > ... snip ...
    >
    > > Could you please explain the output of the following program to
    > > me? The two chars c[0] and d[0] have different values (220 and
    > > -36), are not equal (the comparison operator returns 0) but the
    > > two strings c and d are equal to strncmp (which returns 0) and
    > > represent the same string to printf ("ü").

    >
    > Of course not. Your test program classifies one as unsigned char,
    > and the other as signed char. The same bit pattern represents both
    > (at least in 2's complement). The freebsd implementation does not
    > have a proper prototype (uses old fashioned K&R I header),


    >so all arguments are passed in as received, and then treated as "const
    > char *". This makes them equal.


    I thought that passing unsigned char* when char* is expected is a UB,
    when we have old K&R style function declaration.
    There is a possibility of having trap values for char when it is
    signed, and a pointer pointing to the trap signed char value might be
    passed using unsigned char* pointer. This produces UB when
    dereferenced with char* pointer.

    IMHO, as per the standards, the function call passing unsigned char*
    for this implementation of strncmp() is a UB.

    If it isn't a UB, please cite the relevant words of the standards,
    which make the behavior well-defined.
     
    CryptiqueGuy, Jul 6, 2007
    #13
  14. Harald van Dijk <> writes:
    [...]
    > 7.21.1p3 (from n1124; it might have been added even after C99):
    > "For all functions in this subclause, each character shall be interpreted as
    > if it had the type unsigned char (and therefore every possible object
    > representation is valid and has a different value)."


    Yes, that paragraph is new in n1124. It was added by TC 2, in response
    to DR 274, <http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_274.htm>.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jul 6, 2007
    #14
  15. Pietro Cerutti

    pete Guest

    Keith Thompson wrote:
    >
    > Harald van Dijk <> writes:
    > [...]
    > > 7.21.1p3 (from n1124; it might have been added even after C99):
    > > "For all functions in this subclause,
    > > each character shall be interpreted as
    > > if it had the type unsigned char
    > > (and therefore every possible object
    > > representation is valid and has a different value)."

    >
    > Yes, that paragraph is new in n1124.
    > It was added by TC 2, in response
    > to DR 274, <http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_274.htm>.


    Then I would suggest changing the description of strchr
    so that the value of the c parameter, is converted to
    (unsigned char) instead of (char).

    Another problem concerning a situation where the standard
    can't possibly mean what it says,
    is that the rules concerning rank,
    prevent char from being signed.

    N1124.pdf

    6.3.1 Arithmetic operands
    6.3.1.1 Boolean, characters, and integers
    1 Every integer type has an integer conversion rank
    defined as follows:
    — No two signed integer types shall have the same rank,
    even if they have the same representation.

    — The rank of char shall equal the rank of signed char
    and unsigned char.


    --
    pete
     
    pete, Aug 1, 2007
    #15
  16. pete wrote:
    > Keith Thompson wrote:
    >> Harald van Dijk <> writes:
    >> [...]
    >> > 7.21.1p3 (from n1124; it might have been added even after C99):
    >> > "For all functions in this subclause,
    >> > each character shall be interpreted as
    >> > if it had the type unsigned char
    >> > (and therefore every possible object
    >> > representation is valid and has a different value)."

    >>
    >> Yes, that paragraph is new in n1124.
    >> It was added by TC 2, in response
    >> to DR 274, <http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_274.htm>.

    >
    > Then I would suggest changing the description of strchr
    > so that the value of the c parameter, is converted to
    > (unsigned char) instead of (char).


    That's an interesting find. You may be right that there's a problem here.

    > Another problem concerning a situation where the standard
    > can't possibly mean what it says,
    > is that the rules concerning rank,
    > prevent char from being signed.
    >
    > N1124.pdf
    >
    > 6.3.1 Arithmetic operands
    > 6.3.1.1 Boolean, characters, and integers
    > 1 Every integer type has an integer conversion rank
    > defined as follows:
    > — No two signed integer types shall have the same rank,
    > even if they have the same representation.
    >
    > — The rank of char shall equal the rank of signed char
    > and unsigned char.


    Plain char may be signed, and an integer type, but it is never a signed
    integer type, because signed integer type has a specific definition which
    doesn't include plain char, regardless of its signedness. See 6.2.5p4.
     
    Harald van =?UTF-8?B?RMSzaw==?=, Aug 1, 2007
    #16
  17. Harald van Dijk wrote:
    > Plain char may be signed, and an integer type, but it is never a signed
    > integer type, because signed integer type has a specific definition which
    > doesn't include plain char, regardless of its signedness. See 6.2.5p4.


    Sorry, it appears that it isn't an integer type, for the same reason that it
    isn't a signed integer type: integer type also has a specific definition
    that doesn't include plain char.
     
    Harald van =?UTF-8?B?RMSzaw==?=, Aug 1, 2007
    #17
  18. Pietro Cerutti

    pete Guest

    Harald van =?UTF-8?B?RMSzaw==?= wrote:
    >
    > Harald van Dijk wrote:
    > > Plain char may be signed, and an integer type,
    > > but it is never a signed integer type,
    > > because signed integer type has a specific definition which
    > > doesn't include plain char, regardless of its signedness.
    > > See 6.2.5p4.

    >
    > Sorry, it appears that it isn't an integer type,
    > for the same reason that it
    > isn't a signed integer type:
    > integer type also has a specific definition
    > that doesn't include plain char.


    Thank you.
    I see now that char is one of the "basic types"
    and distinct from the signed and unsigned integer types.

    --
    pete
     
    pete, Aug 2, 2007
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter
    Replies:
    5
    Views:
    14,049
    Dale King
    Jun 2, 2006
  2. lovecreatesbeauty
    Replies:
    1
    Views:
    1,058
    Ian Collins
    May 9, 2006
  3. Frederick Gotham

    Bitshifting independant of signedness

    Frederick Gotham, Aug 24, 2006, in forum: C Programming
    Replies:
    4
    Views:
    337
    Michael Mair
    Aug 25, 2006
  4. Philipp Klaus Krause

    int*_t, int_least*_t, int_fast*_t signedness

    Philipp Klaus Krause, Jun 14, 2010, in forum: C Programming
    Replies:
    5
    Views:
    621
    Keith Thompson
    Jun 14, 2010
  5. Ben Gribaudo
    Replies:
    0
    Views:
    214
    Ben Gribaudo
    Jan 9, 2006
Loading...

Share This Page