Writing through an unsigned char pointer

Discussion in 'C Programming' started by Noob, Apr 11, 2013.

  1. Noob

    Noob Guest

    Hello,

    Is the following code valid:

    static void foo(unsigned char *buf)
    {
    int i;
    for (i = 0; i < 16; ++i) buf = i;
    }

    void bar(void)
    {
    unsigned long arr[4];
    foo(arr);
    }

    The compiler points out that (unsigned long *) is not
    compatible with (unsigned char *).

    So I cast to the expected type:

    void bar(void)
    {
    unsigned long arr[4];
    foo((unsigned char *)arr);
    }

    I think it is allowed to cast to (unsigned char *)
    but I don't remember if it's allowed only to inspect
    (read) the values, or also to set them. Also the fact
    the "real" type is unsigned means there are no trap
    representations, right?

    Regards.
    Noob, Apr 11, 2013
    #1
    1. Advertising

  2. Noob

    Nobody Guest

    On Thu, 11 Apr 2013 15:52:22 +0200, Noob wrote:

    > I think it is allowed to cast to (unsigned char *)
    > but I don't remember if it's allowed only to inspect
    > (read) the values, or also to set them.


    It's allowed, but it isn't specified what the result will be.

    An unsigned long can be any number of bytes (so long as it's at least 32
    bits, which isn't necessarily the same thing as 4 bytes); on the most
    common platforms, it will be either 4 bytes or 8 bytes. The byte order can
    be big-endian, little-endian, Vax-endian or something else.
    Nobody, Apr 11, 2013
    #2
    1. Advertising

  3. Noob

    James Kuyper Guest

    On 04/11/2013 09:52 AM, Noob wrote:
    > Hello,
    >
    > Is the following code valid:
    >
    > static void foo(unsigned char *buf)
    > {
    > int i;
    > for (i = 0; i < 16; ++i) buf = i;
    > }
    >
    > void bar(void)
    > {
    > unsigned long arr[4];
    > foo(arr);
    > }


    This code assumes that sizeof(arr) == 16, or equivalently,
    sizeof(long)==4. You should either make the behavior of foo() depend
    upon sizeof(long), or at least put in assert(sizeof(long)==4).

    > The compiler points out that (unsigned long *) is not
    > compatible with (unsigned char *).
    >
    > So I cast to the expected type:
    >
    > void bar(void)
    > {
    > unsigned long arr[4];
    > foo((unsigned char *)arr);
    > }
    >
    > I think it is allowed to cast to (unsigned char *)
    > but I don't remember if it's allowed only to inspect
    > (read) the values, or also to set them. ...


    It's allowed, for both purposes.

    > ... Also the fact
    > the "real" type is unsigned means there are no trap
    > representations, right?


    Footnote 53 of n1570.pdf says, with respect to unsigned integer types,
    that "Some combinations of padding bits might generate trap
    representations." Unsigned char isn't allowed to have any padding bits,
    but unsigned long certainly can.
    --
    James Kuyper
    James Kuyper, Apr 11, 2013
    #3
  4. Noob

    Jorgen Grahn Guest

    On Thu, 2013-04-11, Noob wrote:
    > Hello,
    >
    > Is the following code valid:
    >
    > static void foo(unsigned char *buf)
    > {
    > int i;
    > for (i = 0; i < 16; ++i) buf = i;
    > }
    >
    > void bar(void)
    > {
    > unsigned long arr[4];
    > foo(arr);
    > }
    >
    > The compiler points out that (unsigned long *) is not
    > compatible with (unsigned char *).
    >
    > So I cast to the expected type:
    >
    > void bar(void)
    > {
    > unsigned long arr[4];
    > foo((unsigned char *)arr);
    > }
    >
    > I think it is allowed to cast to (unsigned char *)
    > but I don't remember if it's allowed only to inspect
    > (read) the values, or also to set them. Also the fact
    > the "real" type is unsigned means there are no trap
    > representations, right?


    Don't know what the language guarantees.

    Even if I knew about trap representations and stuff, and knew a long
    is four chars on my target, it would worry me that I have no idea what
    the 16 chars look like when viewed as 4 longs. I would have
    introduced endianness issues into the program, and that's never a good
    thing -- they tend to spread.

    If I were you, at this point I'd sidestep the problem by rewriting the
    code without unusual casts. I don't think I've ever seen a problem
    which could be solved by things like the casting above, but not by
    everyday code without casts. (Ok, except for badly written third-party
    APIs, perhaps.)

    You don't show the problem you're trying to solve, so I cannot suggest
    an alternative (except for the obvious and trivial change to bar()).

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Apr 11, 2013
    #4
  5. Noob

    Tim Rentsch Guest

    Nobody <> writes:

    > On Thu, 11 Apr 2013 15:52:22 +0200, Noob wrote:
    >
    >> I think it is allowed to cast to (unsigned char *)
    >> but I don't remember if it's allowed only to inspect
    >> (read) the values, or also to set them.

    >
    > It's allowed, but it isn't specified what the result will be.
    > [snip]


    Not quite right. The Standard does specify the behavior in
    such cases, as implementation-defined. So even though the
    results won't be portable, you can find out what they will
    be.
    Tim Rentsch, Apr 12, 2013
    #5
  6. Noob

    Tim Rentsch Guest

    Jorgen Grahn <> writes:

    > On Thu, 2013-04-11, Noob wrote:
    >> Hello,
    >>
    >> Is the following code valid:
    >>
    >> static void foo(unsigned char *buf)
    >> {
    >> int i;
    >> for (i = 0; i < 16; ++i) buf = i;
    >> }
    >>
    >> void bar(void)
    >> {
    >> unsigned long arr[4];
    >> foo(arr);
    >> }
    >>
    >> The compiler points out that (unsigned long *) is not
    >> compatible with (unsigned char *).
    >>
    >> So I cast to the expected type:
    >>
    >> void bar(void)
    >> {
    >> unsigned long arr[4];
    >> foo((unsigned char *)arr);
    >> }
    >>
    >> I think it is allowed to cast to (unsigned char *)
    >> but I don't remember if it's allowed only to inspect
    >> (read) the values, or also to set them. Also the fact
    >> the "real" type is unsigned means there are no trap
    >> representations, right?

    >
    > Don't know what the language guarantees.
    >
    > Even if I knew about trap representations and stuff, and knew a long
    > is four chars on my target, it would worry me that I have no idea what
    > the 16 chars look like when viewed as 4 longs. I would have
    > introduced endianness issues into the program, and that's never a good
    > thing -- they tend to spread. [snip]


    If CHAR_BIT == 8 and sizeof (long) == 4 (both of which are pretty
    likely under the circumstances, and can easily be tested statically),
    then unsigned long cannot have a trap representation, and it is easy
    to (write code that will) discover just what the representation of
    unsigned long is (and also signed long, although signed long might
    have one trap representation, which is identifiable by checking the
    value of LONG_MIN).

    It's probably true that 99 times out of 100 it's better to avoid
    using character-type access of other types. Even so, it's better to
    know what the Standard actually does require, and to convey that
    understanding to other people. Promoting a style of making decisions
    out of uncertainty, where there is no need for that uncertainty, is a
    bad habit to instill in people.
    Tim Rentsch, Apr 12, 2013
    #6
  7. Noob

    Jorgen Grahn Guest

    On Thu, 2013-04-11, Tim Rentsch wrote:
    > Jorgen Grahn <> writes:
    >
    >> On Thu, 2013-04-11, Noob wrote:
    >>> Hello,
    >>>
    >>> Is the following code valid:
    >>>
    >>> static void foo(unsigned char *buf)
    >>> {
    >>> int i;
    >>> for (i = 0; i < 16; ++i) buf = i;
    >>> }
    >>>
    >>> void bar(void)
    >>> {
    >>> unsigned long arr[4];
    >>> foo(arr);
    >>> }
    >>>
    >>> The compiler points out that (unsigned long *) is not
    >>> compatible with (unsigned char *).
    >>>
    >>> So I cast to the expected type:
    >>>
    >>> void bar(void)
    >>> {
    >>> unsigned long arr[4];
    >>> foo((unsigned char *)arr);
    >>> }
    >>>
    >>> I think it is allowed to cast to (unsigned char *)
    >>> but I don't remember if it's allowed only to inspect
    >>> (read) the values, or also to set them. Also the fact
    >>> the "real" type is unsigned means there are no trap
    >>> representations, right?

    >>
    >> Don't know what the language guarantees.
    >>
    >> Even if I knew about trap representations and stuff, and knew a long
    >> is four chars on my target, it would worry me that I have no idea what
    >> the 16 chars look like when viewed as 4 longs. I would have
    >> introduced endianness issues into the program, and that's never a good
    >> thing -- they tend to spread. [snip]

    >

    ....
    > It's probably true that 99 times out of 100 it's better to avoid
    > using character-type access of other types. Even so, it's better to
    > know what the Standard actually does require, and to convey that
    > understanding to other people. Promoting a style of making decisions
    > out of uncertainty, where there is no need for that uncertainty, is a
    > bad habit to instill in people.


    Are you saying I promote such a style?

    Uncertainty is not the reason I to stay away from weird casts.
    But yes, a side effect is that I don't have to waste energy trying
    to find out what they mean, in relation to the language and in
    relation to my compiler/environment.

    I do not want to forbid anyone from discussing casts, trap
    representations and UB in this thread. But someone /also/ needed to
    point out the obvious: that there are easy, portable and readable
    alternatives.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Apr 12, 2013
    #7
  8. On Thursday, April 11, 2013 2:52:22 PM UTC+1, Noob wrote:
    >
    > Is the following code valid:
    >
    > static void foo(unsigned char *buf)
    > {
    > int i;
    >
    > for (i = 0; i < 16; ++i) buf = i;
    > }
    >
    > void bar(void)
    > {
    > unsigned long arr[4];
    >
    > foo(arr);
    >
    > }
    >

    It's allowed to cast any block of memory to unsigned chars, and treat it
    as sequence of raw bytes or bits. The chars cannot trap, and the compiler
    warning is in error, except that void * is a slightly more fiddly
    way of achieving the same thing.

    However going the other way, from a sequence of bytes to a higher-level
    structure like a long, is a bit problematic. On any platform you are likely
    to run on, longs will by two's complement. But they might be big endian
    or little endian, and occasionally they might not be four bytes.
    Technically the sequence 0 1 2 3 could be a trap representation for a
    long, but that's so highly unlikely you can ignore the issue.

    So you have to know the binary representation of a long. The only portable
    way is to copy from one to another.
    --
    Basic Algorithms. A massive compendium of C routines.
    http://www.malcolmmclean.site11.com/www
    Malcolm McLean, Apr 12, 2013
    #8
  9. Noob

    Noob Guest

    Jorgen Grahn wrote:
    > On Thu, 2013-04-11, Noob wrote:
    >> Hello,
    >>
    >> Is the following code valid:
    >>
    >> static void foo(unsigned char *buf)
    >> {
    >> int i;
    >> for (i = 0; i < 16; ++i) buf = i;
    >> }
    >>
    >> void bar(void)
    >> {
    >> unsigned long arr[4];
    >> foo(arr);
    >> }
    >>
    >> The compiler points out that (unsigned long *) is not
    >> compatible with (unsigned char *).
    >>
    >> So I cast to the expected type:
    >>
    >> void bar(void)
    >> {
    >> unsigned long arr[4];
    >> foo((unsigned char *)arr);
    >> }
    >>
    >> I think it is allowed to cast to (unsigned char *)
    >> but I don't remember if it's allowed only to inspect
    >> (read) the values, or also to set them. Also the fact
    >> the "real" type is unsigned means there are no trap
    >> representations, right?

    >
    > Don't know what the language guarantees.
    >
    > Even if I knew about trap representations and stuff, and knew a long
    > is four chars on my target, it would worry me that I have no idea what
    > the 16 chars look like when viewed as 4 longs. I would have
    > introduced endianness issues into the program, and that's never a good
    > thing -- they tend to spread.
    >
    > If I were you, at this point I'd sidestep the problem by rewriting the
    > code without unusual casts. I don't think I've ever seen a problem
    > which could be solved by things like the casting above, but not by
    > everyday code without casts. (Ok, except for badly written third-party
    > APIs, perhaps.)
    >
    > You don't show the problem you're trying to solve, so I cannot suggest
    > an alternative (except for the obvious and trivial change to bar()).


    I'll tell you the whole story, so you can cringe like I did!

    What I'm REALLY working with is a 128-bit AES key.

    My "sin" is use the knowledge that CHAR_BIT is 8 on the
    two platforms I work with, which is why I'm implicitly
    using an unsigned char buf[16].

    HOWEVER, on one of the two platforms, the geniuses who
    implemented the API thought it would be a good idea to
    pass the key in an array of 4 uint32_t o_O

    Thus what's missing from my stripped-down example is:

    extern nasty_API_func(uint32_t *key);

    static void foo(unsigned char *buf)
    {
    int i;
    /* not the actual steps to populate buf */
    for (i = 0; i < 16; ++i) buf = i;
    }

    void bar(void)
    {
    unsigned long key[4];
    foo((unsigned char *)key);
    nasty_API_func(key);
    }

    I don't think I have much choice than to cast (or use
    an implicit conversion to void *) given the constraints
    of the API, do I?

    Regards.
    Noob, Apr 12, 2013
    #9
  10. Noob

    James Kuyper Guest

    On 04/12/2013 07:30 AM, Noob wrote:
    ....
    > What I'm REALLY working with is a 128-bit AES key.
    >
    > My "sin" is use the knowledge that CHAR_BIT is 8 on the
    > two platforms I work with, which is why I'm implicitly
    > using an unsigned char buf[16].


    That's not too deadly a sin, it's an accurate assumption on a great many
    platforms, but other values do exist: 16 is a popular value on some
    DSPs. At least once, somewhere with any of your code that makes that
    assumption, you should do something like

    #if CHAR_BIT != 8
    #error This code requires CHAR_BIT == 8
    #endif

    > HOWEVER, on one of the two platforms, the geniuses who
    > implemented the API thought it would be a good idea to
    > pass the key in an array of 4 uint32_t o_O
    >
    > Thus what's missing from my stripped-down example is:
    >
    > extern nasty_API_func(uint32_t *key);
    >
    > static void foo(unsigned char *buf)
    > {
    > int i;
    > /* not the actual steps to populate buf */
    > for (i = 0; i < 16; ++i) buf = i;
    > }
    >
    > void bar(void)
    > {
    > unsigned long key[4];
    > foo((unsigned char *)key);
    > nasty_API_func(key);
    > }
    >
    > I don't think I have much choice than to cast (or use
    > an implicit conversion to void *) given the constraints
    > of the API, do I?


    Well, at the very least you should change "unsigned long" to uint32_t.

    Secondly, you might need to be worried about the endianess of uint32_t.
    Your code might set key[0] to 0x1020304 or 0x4030201 (among other
    possibilities); do you know which of those two values the
    nasty_API_func() should be receiving? If the API is supported only on
    platforms with a single endianess, you can get away with building that
    knowledge into foo(). However, if API is defined for platforms with
    different endianesses, and requires that key[0] have the value
    0x1020304, regardless of which endianess uint32_t has, you'll have to
    fill in key[] using << and | rather than by accessing it as an array of
    char.
    --
    James Kuyper
    James Kuyper, Apr 12, 2013
    #10
  11. Noob

    James Kuyper Guest

    On 04/12/2013 08:26 AM, James Kuyper wrote:
    > On 04/12/2013 07:30 AM, Noob wrote:

    ....
    >> static void foo(unsigned char *buf)
    >> {
    >> int i;
    >> /* not the actual steps to populate buf */
    >> for (i = 0; i < 16; ++i) buf = i;
    >> }
    >>
    >> void bar(void)
    >> {
    >> unsigned long key[4];
    >> foo((unsigned char *)key);
    >> nasty_API_func(key);
    >> }

    ....
    > Your code might set key[0] to 0x1020304 or 0x4030201 (among other


    Correction: 0x00010203 or 0x03020100.
    James Kuyper, Apr 12, 2013
    #11
  12. Noob

    James Kuyper Guest

    On 04/12/2013 09:36 AM, David Brown wrote:
    > On 12/04/13 14:26, James Kuyper wrote:
    >> On 04/12/2013 07:30 AM, Noob wrote:
    >> ...
    >>> What I'm REALLY working with is a 128-bit AES key.
    >>>
    >>> My "sin" is use the knowledge that CHAR_BIT is 8 on the
    >>> two platforms I work with, which is why I'm implicitly
    >>> using an unsigned char buf[16].

    >>
    >> That's not too deadly a sin, it's an accurate assumption on a great many
    >> platforms, but other values do exist: 16 is a popular value on some
    >> DSPs. At least once, somewhere with any of your code that makes that
    >> assumption, you should do something like
    >>
    >> #if CHAR_BIT != 8
    >> #error This code requires CHAR_BIT == 8
    >> #endif
    >>

    >
    > I would say that unless you are writing exceptionally cross-platform
    > code, just assume CHAR_BIT is 8. It is true that there are a few cpus
    > with 16-bit, 32-bit, or even 24-bit "characters", but if you are not
    > actually using them, then your life will be easier - and your code
    > clearer ...


    I can't see my suggestion as something that impairs the clarity of the
    code. Rather the opposite, IMO.

    > ... and neater, and therefore better - if you forget about them.
    > When the day comes that you have to write code for a TMS320 with 16-bit
    > "char", you will find that you have so many other problems trying to use
    > existing code that you will be better off using code written /only/ for
    > these devices. So you lose nothing by assuming CHAR_BIT is 8.
    >
    > Portability is always a useful thing, but it is not the most important
    > aspect of writing good code - taken to extremes it leads to messy and
    > unclear code (which is always a bad thing), and often inefficient code.
    >
    > In my work - which is programming on a wide variety of embedded systems
    > - code always implicitly assumes CHAR_BIT is 8, all integer types are
    > plain powers-of-two sizes with two's compliment for signed types, there
    > are no "trap bits", etc. Since there are no realistic platforms where
    > this is not true, worrying about them is a waste of time. (There are
    > some platforms that support additional types such as 20-bit or 40-bit
    > types - but these are always handled explicitly if they are used.) ...


    I attach greater importance to portability than you do. I do worry
    mainly about "realistic" platforms, but I also write my code to cope, to
    the extent possible, with unrealistic but conforming possibilities.
    Every portability issue that you mention has been important in the past,
    or the standard would not have been written to accommodate such systems.
    It's a fair bet that, even if they aren't important right now, one or
    more of those issues will become important again in the future. I don't
    want my programs to be the ones that fail because of such changes.

    > ... I
    > can't assume endianness for general code, and I can't assume integer
    > sizes - so <stdint.h> sizes are essential (as you suggest below).


    I didn't suggest using uint32_t because it has a standardized size, but
    simply because the relevant third-party function declaration uses it:
    ....
    >>> extern nasty_API_func(uint32_t *key);

    ....
    >>> void bar(void)
    >>> {
    >>> unsigned long key[4];
    >>> foo((unsigned char *)key);
    >>> nasty_API_func(key);
    >>> }


    My first rule for choosing the type of a variable is that it should be
    compatible with the API of the standard library or third-party functions
    it will be used with, if possible. Change "should" to "must" and drop ",
    if possible", if the variable is being passed to that function via a
    pointer. For my own functions, I declare the function to match the data,
    rather than vice-versa.
    James Kuyper, Apr 12, 2013
    #12
  13. Noob

    ImpalerCore Guest

    On Apr 12, 7:30 am, Noob <r...@127.0.0.1> wrote:
    > Jorgen Grahn wrote:
    > > On Thu, 2013-04-11, Noob wrote:
    > >> Hello,

    >
    > >> Is the following code valid:

    >
    > >> static void foo(unsigned char *buf)
    > >> {
    > >>   int i;
    > >>   for (i = 0; i < 16; ++i) buf = i;
    > >> }

    >
    > >> void bar(void)
    > >> {
    > >>   unsigned long arr[4];
    > >>   foo(arr);
    > >> }

    >
    > >> The compiler points out that (unsigned long *) is not
    > >> compatible with (unsigned char *).

    >
    > >> So I cast to the expected type:

    >
    > >> void bar(void)
    > >> {
    > >>   unsigned long arr[4];
    > >>   foo((unsigned char *)arr);
    > >> }

    >
    > >> I think it is allowed to cast to (unsigned char *)
    > >> but I don't remember if it's allowed only to inspect
    > >> (read) the values, or also to set them. Also the fact
    > >> the "real" type is unsigned means there are no trap
    > >> representations, right?

    >
    > > Don't know what the language guarantees.

    >
    > > Even if I knew about trap representations and stuff, and knew a long
    > > is four chars on my target, it would worry me that I have no idea what
    > > the 16 chars look like when viewed as 4 longs.  I would have
    > > introduced endianness issues into the program, and that's never a good
    > > thing -- they tend to spread.

    >
    > > If I were you, at this point I'd sidestep the problem by rewriting the
    > > code without unusual casts.  I don't think I've ever seen a problem
    > > which could be solved by things like the casting above, but not by
    > > everyday code without casts. (Ok, except for badly written third-party
    > > APIs, perhaps.)

    >
    > > You don't show the problem you're trying to solve, so I cannot suggest
    > > an alternative (except for the obvious and trivial change to bar()).

    >
    > I'll tell you the whole story, so you can cringe like I did!
    >
    > What I'm REALLY working with is a 128-bit AES key.
    >
    > My "sin" is  use the knowledge that CHAR_BIT is 8 on the
    > two platforms I work with, which is why I'm implicitly
    > using an unsigned char buf[16].


    If you really want 8-bit characters, why not use uint8_t? Any system
    that doesn't support that type should give you a heads-up with a
    compiler error.

    > HOWEVER, on one of the two platforms, the geniuses who
    > implemented the API thought it would be a good idea to
    > pass the key in an array of 4 uint32_t o_O
    >
    > Thus what's missing from my stripped-down example is:
    >
    > extern nasty_API_func(uint32_t *key);
    >
    > static void foo(unsigned char *buf)
    > {
    >   int i;
    >   /* not the actual steps to populate buf */
    >   for (i = 0; i < 16; ++i) buf = i;
    >
    > }
    >
    > void bar(void)
    > {
    >   unsigned long key[4];
    >   foo((unsigned char *)key);
    >   nasty_API_func(key);
    >
    > }
    >
    > I don't think I have much choice than to cast (or use
    > an implicit conversion to void *) given the constraints
    > of the API, do I?


    The most portable method is to simply memcpy the relevant portions
    from the uint8_t[16] array to uint32_t[4].

    \code
    void bar(void)
    {
    uint8_t buf[16];
    uint32_t key[4];
    foo(buf);

    /* memcpy bytes buf[0-3] to key[0] */
    /* memcpy bytes buf[4-7] to key[1] */
    /* memcpy bytes buf[8-11] to key[2] */
    /* memcpy bytes buf[12-15] to key[3] */

    nasty_API_func(key);
    }
    \endcode

    I use the technique to read IEEE-754 'float' and 'double' as
    "integers" from a packet or in a file, use a 'ntohl' type function to
    deal with endian issues, and then memcpy the bytes from a 'uint32_t'
    or 'uint64_t' type into a 'float' or 'double' type.

    I know that it's not guaranteed to be portable (trap representations,
    non IEEE-754 floating point representations, etc.), but if there's a
    better methodology, I'd like to know as I need to read IEEE-754 4 and
    8 byte floating point values collected from live traffic or packet
    dumps (that I have no control over). I do verify that 'sizeof
    (uint32_t) == sizeof (float)' and similarly for 'double'.

    Best regards,
    John D.
    ImpalerCore, Apr 12, 2013
    #13
  14. Noob

    Jorgen Grahn Guest

    On Fri, 2013-04-12, Noob wrote:
    > Jorgen Grahn wrote:

    ....
    >> If I were you, at this point I'd sidestep the problem by rewriting the
    >> code without unusual casts. I don't think I've ever seen a problem
    >> which could be solved by things like the casting above, but not by
    >> everyday code without casts. (Ok, except for badly written third-party
    >> APIs, perhaps.)
    >>
    >> You don't show the problem you're trying to solve, so I cannot suggest
    >> an alternative (except for the obvious and trivial change to bar()).

    >
    > I'll tell you the whole story, so you can cringe like I did!
    >
    > What I'm REALLY working with is a 128-bit AES key.
    >
    > My "sin" is use the knowledge that CHAR_BIT is 8 on the
    > two platforms I work with, which is why I'm implicitly
    > using an unsigned char buf[16].
    >
    > HOWEVER, on one of the two platforms, the geniuses who
    > implemented the API thought it would be a good idea to
    > pass the key in an array of 4 uint32_t o_O


    Ok, that's one of the "badly written APIs" scenarios I was thinking
    of. Crypto, compression and checksumming code often seems to play
    fast and loose with the type system.

    If it's just a matter of the 128-bit key's representation, you could
    hide that part in a translation function.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Apr 13, 2013
    #14
  15. Jorgen Grahn <> wrote:
    > On Fri, 2013-04-12, Noob wrote:

    (snip)

    >> What I'm REALLY working with is a 128-bit AES key.


    (snip)
    >> HOWEVER, on one of the two platforms, the geniuses who
    >> implemented the API thought it would be a good idea to
    >> pass the key in an array of 4 uint32_t o_O


    > Ok, that's one of the "badly written APIs" scenarios I was thinking
    > of. Crypto, compression and checksumming code often seems to play
    > fast and loose with the type system.


    > If it's just a matter of the 128-bit key's representation, you could
    > hide that part in a translation function.


    Just a note, that an AES key is not a 128 bit number, but
    a 128 bit string of bits. (That is, the bits don't have place
    value like integers have.)

    That doesn't mean that you don't have to worry about bit,
    byte, or word ordering, but that the question isn't the
    same as when working with integers.

    -- glen
    glen herrmannsfeldt, Apr 13, 2013
    #15
  16. Noob

    Tim Rentsch Guest

    Jorgen Grahn <> writes:

    > On Thu, 2013-04-11, Tim Rentsch wrote:
    >> Jorgen Grahn <> writes:
    >>
    >>> On Thu, 2013-04-11, Noob wrote:
    >>>> Hello,
    >>>>
    >>>> Is the following code valid:
    >>>>
    >>>> static void foo(unsigned char *buf)
    >>>> {
    >>>> int i;
    >>>> for (i = 0; i < 16; ++i) buf = i;
    >>>> }
    >>>>
    >>>> void bar(void)
    >>>> {
    >>>> unsigned long arr[4];
    >>>> foo(arr);
    >>>> }
    >>>>
    >>>> The compiler points out that (unsigned long *) is not
    >>>> compatible with (unsigned char *).
    >>>>
    >>>> So I cast to the expected type:
    >>>>
    >>>> void bar(void)
    >>>> {
    >>>> unsigned long arr[4];
    >>>> foo((unsigned char *)arr);
    >>>> }
    >>>>
    >>>> I think it is allowed to cast to (unsigned char *)
    >>>> but I don't remember if it's allowed only to inspect
    >>>> (read) the values, or also to set them. Also the fact
    >>>> the "real" type is unsigned means there are no trap
    >>>> representations, right?
    >>>
    >>> Don't know what the language guarantees.
    >>>
    >>> Even if I knew about trap representations and stuff, and knew a long
    >>> is four chars on my target, it would worry me that I have no idea what
    >>> the 16 chars look like when viewed as 4 longs. I would have
    >>> introduced endianness issues into the program, and that's never a good
    >>> thing -- they tend to spread. [snip]

    >>

    > ...
    >> It's probably true that 99 times out of 100 it's better to avoid
    >> using character-type access of other types. Even so, it's better to
    >> know what the Standard actually does require, and to convey that
    >> understanding to other people. Promoting a style of making decisions
    >> out of uncertainty, where there is no need for that uncertainty, is a
    >> bad habit to instill in people.

    >
    > Are you saying I promote such a style?


    Of course I don't know what you meant to suggest, or even might
    have meant to suggest. But I do think someone reading the
    posting may come away with the impression that this advice was
    being offered, and take it to heart (even if perhaps not being
    conscious of doing so), whether you meant it that way or not.

    > Uncertainty is not the reason I to stay away from weird casts.
    > But yes, a side effect is that I don't have to waste energy trying
    > to find out what they mean, in relation to the language and in
    > relation to my compiler/environment.


    As a general principle, I think saying most casts should be
    avoided is good advice to give. But that advice is good
    because in many or most cases a "suspect" cast is an indication
    that whoever wrote the code was thinking about the problem the
    wrong way -- not because casting is always dangerous, or poorly
    defined, or necessarily a poor design choice. I think it's
    important not to blur the distinction between those two lines
    of reasoning, and I think your comments are likely to be taken
    that way, even if that isn't how you meant them.

    > I do not want to forbid anyone from discussing casts, trap
    > representations and UB in this thread. But someone /also/
    > needed to point out the obvious: that there are easy,
    > portable and readable alternatives.


    Forgive me if this sounds harsh, but that seems like a pretty
    arrogant statement. You don't know what the questioner wants
    to do exactly, or why he wants to do it. Yet you presume to
    give advice assuming that you do know, even after admitting you
    don't know exactly what the rules of the language are for the
    question he's asking about. IMO the message that came across
    is very much the wrong message.

    In cases like this one, I think a better way to proceed is to
    first answer the question that was asked: "The Standard says
    that blah blah blah...". Then, second, to point out the kinds
    of problems that might come up, and how to guard against them:
    "It looks like what you're doing assumes CHAR_BIT == 8 and blah
    blah blah, which can be staticly tested using blah and blah.
    Also different byte orderings might cause endianness problems,
    so you might want to check that with blah blah blah." Lastly,
    after the preceeding two kinds of responses, then and only then
    give the general or generic kinds of advice: "Usually casting
    indicates some kind of deeper problem in the approach being
    used. As a general rule it's better to avoid casting, both
    because it relies on less common parts of the Standard, and
    because it tends to make programs more brittle in terms of
    depending on specific implementation choices. You might
    consider writing this instead as blah blah blah (assuming of
    course that your application is amenable to that), and see
    if the question might be avoided altogether."

    Again, I'm sorry if my response here comes across as too
    harsh, I don't mean it to be. I appreciate what you are
    trying to do -- it's just that how it comes across is (I
    think) not how you mean it. (And hopefully what I am trying
    to say is coming across as I mean it.)
    Tim Rentsch, Apr 14, 2013
    #16
  17. Noob

    Tim Rentsch Guest

    Noob <root@127.0.0.1> writes:

    > [asking about accessing an (unsigned long [4]) array
    > using (unsigned char *)]
    >
    > I'll tell you the whole story, so you can cringe like I did!
    >
    > What I'm REALLY working with is a 128-bit AES key.
    >
    > My "sin" is use the knowledge that CHAR_BIT is 8 on the
    > two platforms I work with, which is why I'm implicitly
    > using an unsigned char buf[16].
    >
    > HOWEVER, on one of the two platforms, the geniuses who
    > implemented the API thought it would be a good idea to
    > pass the key in an array of 4 uint32_t o_O
    >
    > Thus what's missing from my stripped-down example is:
    >
    > extern nasty_API_func(uint32_t *key);
    >
    > static void foo(unsigned char *buf)
    > {
    > int i;
    > /* not the actual steps to populate buf */
    > for (i = 0; i < 16; ++i) buf = i;
    > }
    >
    > void bar(void)
    > {
    > unsigned long key[4];
    > foo((unsigned char *)key);
    > nasty_API_func(key);
    > }
    >
    > I don't think I have much choice than to cast (or use
    > an implicit conversion to void *) given the constraints
    > of the API, do I?


    Actually you do:

    /* ... foo() as above ... */

    void
    bar( void ){
    union { uint32_t u32[4]; unsigned char uc[16]; } key;
    extern char checkit[ sizeof key.u32 == sizeof key.uc ? 1 : -1 ];
    foo( key.uc );
    nasty_API_func( key.u32 );
    }

    IMO writing bar() like this gives a better indication of what's
    going on, and why. (It also checks to make sure the different
    types used are appropriately sympatico in this implementation,
    but that's just my reflexive programming habit.)
    Tim Rentsch, Apr 15, 2013
    #17
  18. Noob

    Ike Naar Guest

    On 2013-04-12, Noob <root@127.0.0.1> wrote:
    > What I'm REALLY working with is a 128-bit AES key.
    >
    > My "sin" is use the knowledge that CHAR_BIT is 8 on the
    > two platforms I work with, which is why I'm implicitly
    > using an unsigned char buf[16].
    >
    > HOWEVER, on one of the two platforms, the geniuses who
    > implemented the API thought it would be a good idea to
    > pass the key in an array of 4 uint32_t o_O


    And the API lacks a function to initialize a key?
    Ike Naar, Apr 15, 2013
    #18
  19. Noob

    Tim Rentsch Guest

    David Brown <> writes:

    > On 12/04/13 14:26, James Kuyper wrote:
    >> On 04/12/2013 07:30 AM, Noob wrote:
    >> ...
    >>> What I'm REALLY working with is a 128-bit AES key.
    >>>
    >>> My "sin" is use the knowledge that CHAR_BIT is 8 on the
    >>> two platforms I work with, which is why I'm implicitly
    >>> using an unsigned char buf[16].

    >>
    >> That's not too deadly a sin, it's an accurate assumption on a great many
    >> platforms, but other values do exist: 16 is a popular value on some
    >> DSPs. At least once, somewhere with any of your code that makes that
    >> assumption, you should do something like
    >>
    >> #if CHAR_BIT != 8
    >> #error This code requires CHAR_BIT == 8
    >> #endif
    >>

    >


    > I would say that unless you are writing exceptionally cross-platform
    > code, just assume CHAR_BIT is 8. It is true that there are a few
    > cpus with 16-bit, 32-bit, or even 24-bit "characters", but if you
    > are not actually using them, then your life will be easier - and
    > your code clearer and neater, and therefore better - if you forget
    > about them. When the day comes that you have to write code for a
    > TMS320 with 16-bit "char", you will find that you have so many other
    > problems trying to use existing code that you will be better off
    > using code written /only/ for these devices. So you lose nothing by
    > assuming CHAR_BIT is 8.
    >
    > Portability is always a useful thing, but it is not the most important
    > aspect of writing good code - taken to extremes it leads to messy and
    > unclear code (which is always a bad thing), and often inefficient code.
    >
    > In my work - which is programming on a wide variety of embedded systems
    > - code always implicitly assumes CHAR_BIT is 8, all integer types are
    > plain powers-of-two sizes with two's compliment for signed types, there
    > are no "trap bits", etc. Since there are no realistic platforms where
    > this is not true, worrying about them is a waste of time. (There are
    > some platforms that support additional types such as 20-bit or 40-bit
    > types - but these are always handled explicitly if they are used.) I
    > can't assume endianness for general code, and I can't assume integer
    > sizes - so <stdint.h> sizes are essential (as you suggest below).


    Assuming is okay. Implicitly assuming is not.

    There are two benefits to writing an explicit check:

    1. A clear indication of what condition (or assumption) is
    being violated, in the rare event that one of those unusual
    implementations is used.

    2. More importantly, a clear indication to a human reader that
    the assumption has been made, and isn't just an oversight
    or an unconscious presumption made out of ignorance.

    If you have a set of assumptions that you routinely rely on in
    code that you write, put all the checks together in a header
    file, and then just do

    #include "standard_assumptions.h"

    which provides all the benefits of making assumptions explicit,
    but at essentially zero cost after the header is first written.
    Tim Rentsch, Apr 15, 2013
    #19
  20. Noob

    Tim Rentsch Guest

    ImpalerCore <> writes:

    > On Apr 12, 7:30 am, Noob <r...@127.0.0.1> wrote:
    >> Jorgen Grahn wrote:
    >> > On Thu, 2013-04-11, Noob wrote:
    >> >> Hello,

    >>
    >> >> Is the following code valid:

    >>
    >> >> static void foo(unsigned char *buf)
    >> >> {
    >> >> int i;
    >> >> for (i = 0; i < 16; ++i) buf = i;
    >> >> }

    >>
    >> >> void bar(void)
    >> >> {
    >> >> unsigned long arr[4];
    >> >> foo(arr);
    >> >> }

    >>
    >> >> The compiler points out that (unsigned long *) is not
    >> >> compatible with (unsigned char *).

    >>
    >> >> So I cast to the expected type:

    >>
    >> >> void bar(void)
    >> >> {
    >> >> unsigned long arr[4];
    >> >> foo((unsigned char *)arr);
    >> >> }

    >>
    >> >> I think it is allowed to cast to (unsigned char *)
    >> >> but I don't remember if it's allowed only to inspect
    >> >> (read) the values, or also to set them. Also the fact
    >> >> the "real" type is unsigned means there are no trap
    >> >> representations, right?

    >>
    >> > Don't know what the language guarantees.

    >>
    >> > Even if I knew about trap representations and stuff, and knew a long
    >> > is four chars on my target, it would worry me that I have no idea what
    >> > the 16 chars look like when viewed as 4 longs. I would have
    >> > introduced endianness issues into the program, and that's never a good
    >> > thing -- they tend to spread.

    >>
    >> > If I were you, at this point I'd sidestep the problem by rewriting the
    >> > code without unusual casts. I don't think I've ever seen a problem
    >> > which could be solved by things like the casting above, but not by
    >> > everyday code without casts. (Ok, except for badly written third-party
    >> > APIs, perhaps.)

    >>
    >> > You don't show the problem you're trying to solve, so I cannot suggest
    >> > an alternative (except for the obvious and trivial change to bar()).

    >>
    >> I'll tell you the whole story, so you can cringe like I did!
    >>
    >> What I'm REALLY working with is a 128-bit AES key.
    >>
    >> My "sin" is use the knowledge that CHAR_BIT is 8 on the
    >> two platforms I work with, which is why I'm implicitly
    >> using an unsigned char buf[16].

    >
    > If you really want 8-bit characters, why not use uint8_t?
    > Any system that doesn't support that type should give you
    > a heads-up with a compiler error.


    Using unsigned char (perhaps via a typedef), and checking that
    CHAR_BIT == 8 is more portable, specifically for people who might
    be using C90 rather than C99/C11. But beyond that, there are
    some subtle reasons why uint8_t should not be used as a synonym
    for an 8-bit unsigned char:

    1. A uint8_t type isn't necessarily compatible with an 8-bit
    unsigned char. This may cause unexpected problems, eg,
    passing a (unsigned char *) to a function that expects a
    (uint8_t *).

    2. A uint8_t type is not necessarily a character type. This
    means accessing an area of memory using a (uint8_t *)
    could produce undefined behavior through violation of
    effective type rules, whereas using (unsigned char *)
    would not.

    These choices may seem rather far-fetched, but there is an
    incentive for implementors to adopt them: by making uint8_t
    be different from unsigned char, better aliasing information
    can be gleaned in some cases, which might allow better code to
    be generated. Existing compilers are already pushing pretty
    hard at the edges of the undefined-behavior envelope, in the
    quest for better and better performance; it isn't hard to
    imagine an implementation making uint8_t be a non-character
    integer type, if it results in better performance, or even if
    the implementors just think it _might_ result in better
    performance.
    Tim Rentsch, Apr 15, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steffen Fiksdal

    void*, char*, unsigned char*, signed char*

    Steffen Fiksdal, May 8, 2005, in forum: C Programming
    Replies:
    1
    Views:
    565
    Jack Klein
    May 9, 2005
  2. Ioannis Vranos
    Replies:
    11
    Views:
    738
    Ioannis Vranos
    Mar 28, 2008
  3. Ioannis Vranos

    Padding bits and char, unsigned char, signed char

    Ioannis Vranos, Mar 28, 2008, in forum: C Programming
    Replies:
    6
    Views:
    597
    Ben Bacarisse
    Mar 29, 2008
  4. Alex Vinokur
    Replies:
    9
    Views:
    764
    James Kanze
    Oct 13, 2008
  5. pozz
    Replies:
    12
    Views:
    703
    Tim Rentsch
    Mar 20, 2011
Loading...

Share This Page