Copying an array to a similar struct

Discussion in 'C Programming' started by Noob, Apr 13, 2011.

  1. Noob

    Noob Guest

    Hello,

    I suppose the following code has undefined behavior?

    #include <string.h>
    #include <assert.h>
    typedef unsigned long ULONG;
    struct msg
    {
    ULONG event;
    ULONG args[4];
    };
    void dispatch_msg(ULONG *argv, int argc)
    {
    struct msg foob;
    assert(0 <= argc && argc <= 5);
    memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/
    /* send foob to the handler */
    }

    I imagine a (nasty ;-) compiler implementation could insert
    padding between event and args in struct msg, making

    sizeof(struct msg) > sizeof(ULONG[5])

    thereby giving the memcpy undefined behavior?

    Regards.
     
    Noob, Apr 13, 2011
    #1
    1. Advertising

  2. Noob

    James Kuyper Guest

    On 04/13/2011 05:21 AM, Noob wrote:
    > Hello,
    >
    > I suppose the following code has undefined behavior?
    >
    > #include<string.h>
    > #include<assert.h>
    > typedef unsigned long ULONG;


    C gives you a lot of freedom in the naming of identifiers, but it's
    generally a good idea to accept well-established naming conventions, to
    improve communication with other programmers. One of the most widely
    used naming conventions is to reserve names that are all CAPITAL letters
    for macros. I'd recommend 'ulong' for this typedef.

    > struct msg
    > {
    > ULONG event;
    > ULONG args[4];
    > };
    > void dispatch_msg(ULONG *argv, int argc)
    > {
    > struct msg foob;
    > assert(0<= argc&& argc<= 5);
    > memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/


    No, this is not undefined behavior. Whether or not there's any padding
    between event and args is merely unspecified. It's perfectly permissible
    to copy only a portion of an object, and argc*sizeof *argv is guaranteed
    to be less than sizeof foob. However, the contents of foob.args after
    the memcpy() call may, in principle, be indeterminate; among other
    possibilities, they may contain trap representations. Even if they have
    non-trap representations, if there's any padding, they won't contain the
    values you expect them to. But that's not undefined behavior, just
    unspecified.

    > /* send foob to the handler */


    Whether or not this code has undefined behavior depends upon what the
    handler does with it. Realistically, the results are unlikely to be good
    if there's any padding.

    > }
    >
    > I imagine a (nasty ;-) compiler implementation could insert
    > padding between event and args in struct msg, making


    True, though that's rather unlikely. Reasonable implementations seldom
    (never?) insert padding for any reason other than alignment issues, and
    that can't be a relevant issue in this case.
    --
    James Kuyper
     
    James Kuyper, Apr 13, 2011
    #2
    1. Advertising

  3. Noob <root@127.0.0.1> writes:

    > I suppose the following code has undefined behavior?
    >
    > #include <string.h>
    > #include <assert.h>
    > typedef unsigned long ULONG;
    > struct msg
    > {
    > ULONG event;
    > ULONG args[4];
    > };
    > void dispatch_msg(ULONG *argv, int argc)
    > {
    > struct msg foob;
    > assert(0 <= argc && argc <= 5);
    > memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/
    > /* send foob to the handler */
    > }
    >
    > I imagine a (nasty ;-) compiler implementation could insert
    > padding between event and args in struct msg, making
    >
    > sizeof(struct msg) > sizeof(ULONG[5])
    >
    > thereby giving the memcpy undefined behavior?


    You'd have problems, but there would be no undefined behaviour in the
    copy. Certain subsequent uses of foob.args could be undefined, but do
    the technicalities matter? The code is not portable because of the
    assumption that it makes. However, you could test that there is no
    padding so as to identify problem implementation at the offset, though
    I'd be very surprised if you ever find an implementation that wants to
    pad that structure anywhere but at the end (and even that seems
    unlikely).

    --
    Ben.
     
    Ben Bacarisse, Apr 13, 2011
    #3
  4. Noob

    Eric Sosman Guest

    On 4/13/2011 5:21 AM, Noob wrote:
    > Hello,
    >
    > I suppose the following code has undefined behavior?
    >
    > #include<string.h>
    > #include<assert.h>
    > typedef unsigned long ULONG;
    > struct msg
    > {
    > ULONG event;
    > ULONG args[4];
    > };
    > void dispatch_msg(ULONG *argv, int argc)
    > {
    > struct msg foob;
    > assert(0<= argc&& argc<= 5);
    > memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/


    I see nothing wrong here ...

    > /* send foob to the handler */


    ... but very likley here. Not only is there the issue of
    possible padding, but the case argc == 0 leaves you with a
    completely uninitialized struct for the handler to handle.

    > }
    >
    > I imagine a (nasty ;-) compiler implementation could insert
    > padding between event and args in struct msg, making
    >
    > sizeof(struct msg)> sizeof(ULONG[5])
    >
    > thereby giving the memcpy undefined behavior?


    The memcpy() is fine, assuming argv does in fact point to
    a region of at least argc ULONG's. In that case, the number
    of bytes copied does not exceed the size of the source or of
    the destination, and the source and destination do not overlap.

    It's *after* the memcpy() that the fun begins ...

    --
    Eric Sosman
    d
     
    Eric Sosman, Apr 13, 2011
    #4
  5. Noob

    Noob Guest

    Eric Sosman wrote:

    > Noob wrote:
    >
    >> I suppose the following code has undefined behavior?
    >>
    >> #include<string.h>
    >> #include<assert.h>
    >> typedef unsigned long ULONG;
    >> struct msg
    >> {
    >> ULONG event;
    >> ULONG args[4];
    >> };
    >> void dispatch_msg(ULONG *argv, int argc)
    >> {
    >> struct msg foob;
    >> assert(0<= argc&& argc<= 5);
    >> memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/

    >
    > I see nothing wrong here ...
    >
    >> /* send foob to the handler */

    >
    > ... but very likley here. Not only is there the issue of
    > possible padding, but the case argc == 0 leaves you with a
    > completely uninitialized struct for the handler to handle.
    >
    >> }
    >>
    >> I imagine a (nasty ;-) compiler implementation could insert
    >> padding between event and args in struct msg, making
    >>
    >> sizeof(struct msg)> sizeof(ULONG[5])
    >>
    >> thereby giving the memcpy undefined behavior?

    >
    > The memcpy() is fine, assuming argv does in fact point to
    > a region of at least argc ULONG's. In that case, the number
    > of bytes copied does not exceed the size of the source or of
    > the destination, and the source and destination do not overlap.
    >
    > It's *after* the memcpy() that the fun begins ...


    I think the following changes fix all potential cases of UB
    upon reading the "foob" message. Is this correct?

    #include <string.h>
    #include <assert.h>
    typedef unsigned long ULONG;
    struct msg
    {
    ULONG event;
    ULONG args[4];
    };
    void dispatch_msg(ULONG *argv, int argc)
    {
    struct msg foob;
    assert(0 < argc && argc <= 5);
    foob.event = argv[0];
    memcpy(foob.args, argv+1, (argc-1) * sizeof *argv);
    /* send foob to the handler */
    }
     
    Noob, Apr 13, 2011
    #5
  6. Noob

    Eric Sosman Guest

    On 4/13/2011 8:06 AM, Noob wrote:
    > [...]
    > I think the following changes fix all potential cases of UB
    > upon reading the "foob" message. Is this correct?
    >
    > #include<string.h>
    > #include<assert.h>
    > typedef unsigned long ULONG;
    > struct msg
    > {
    > ULONG event;
    > ULONG args[4];
    > };
    > void dispatch_msg(ULONG *argv, int argc)
    > {
    > struct msg foob;
    > assert(0< argc&& argc<= 5);
    > foob.event = argv[0];
    > memcpy(foob.args, argv+1, (argc-1) * sizeof *argv);
    > /* send foob to the handler */
    > }


    Looks okay, assuming the handler does nothing weird. I'd
    mention that assert() can be useful in debugging but is not a
    reliable guard in "production code," because somebody somewhere
    is likely to toss in an NDEBUG for the sake of optimization.

    I'd also wonder why the handler can't just use the argv
    data directly, instead of needing the same information shrink-
    wrapped in a struct. Copying information from one place to
    another doesn't advance the state of a computation much; if
    you've learned that a collection contains 42 widgets you don't
    gain a lot by saying it holds 42, 42, 42, widgets. But that's
    a design question, not a correctness issue.

    --
    Eric Sosman
    d
     
    Eric Sosman, Apr 13, 2011
    #6
  7. Noob

    Noob Guest

    Eric Sosman wrote:
    > On 4/13/2011 8:06 AM, Noob wrote:
    >> [...]
    >> I think the following changes fix all potential cases of UB
    >> upon reading the "foob" message. Is this correct?
    >>
    >> #include<string.h>
    >> #include<assert.h>
    >> typedef unsigned long ULONG;
    >> struct msg
    >> {
    >> ULONG event;
    >> ULONG args[4];
    >> };
    >> void dispatch_msg(ULONG *argv, int argc)
    >> {
    >> struct msg foob;
    >> assert(0< argc&& argc<= 5);


    Did you notice how your client messed this line up?
    I've disabled flowed-format in my client.

    >> foob.event = argv[0];
    >> memcpy(foob.args, argv+1, (argc-1) * sizeof *argv);
    >> /* send foob to the handler */
    >> }

    >
    > Looks okay, assuming the handler does nothing weird. I'd
    > mention that assert() can be useful in debugging but is not a
    > reliable guard in "production code," because somebody somewhere
    > is likely to toss in an NDEBUG for the sake of optimization.


    Right. Then again, as strlen(NULL) has UB,
    so does dispatch_msg(NULL, n) (which I didn't guard against)
    or dispatch(argv, 0) (which I asserted against).

    If the user of the API wants to ignore the function's
    description, and optimize for speed, instead of correct
    operation, then I suppose they deserve what they get.

    > I'd also wonder why the handler can't just use the argv
    > data directly, instead of needing the same information shrink-
    > wrapped in a struct. Copying information from one place to
    > another doesn't advance the state of a computation much; if
    > you've learned that a collection contains 42 widgets you don't
    > gain a lot by saying it holds 42, 42, 42, widgets. But that's
    > a design question, not a correctness issue.


    The message is handled in a different thread of execution,
    so as not to block the thread that sent the message.
    (Hence the "send foob to the handler" comment.)

    Regards.
     
    Noob, Apr 13, 2011
    #7
  8. Noob

    James Kuyper Guest

    On 04/13/2011 08:49 AM, Noob wrote:
    > Eric Sosman wrote:
    >> On 4/13/2011 8:06 AM, Noob wrote:
    >>> [...]
    >>> I think the following changes fix all potential cases of UB
    >>> upon reading the "foob" message. Is this correct?
    >>>
    >>> #include<string.h>
    >>> #include<assert.h>
    >>> typedef unsigned long ULONG;
    >>> struct msg
    >>> {
    >>> ULONG event;
    >>> ULONG args[4];
    >>> };
    >>> void dispatch_msg(ULONG *argv, int argc)
    >>> {
    >>> struct msg foob;
    >>> assert(0< argc&& argc<= 5);

    ....
    >> Looks okay, assuming the handler does nothing weird. I'd
    >> mention that assert() can be useful in debugging but is not a
    >> reliable guard in "production code," because somebody somewhere
    >> is likely to toss in an NDEBUG for the sake of optimization.

    >
    > Right. Then again, as strlen(NULL) has UB,
    > so does dispatch_msg(NULL, n) (which I didn't guard against)
    > or dispatch(argv, 0) (which I asserted against).
    >
    > If the user of the API wants to ignore the function's
    > description, and optimize for speed, instead of correct
    > operation, then I suppose they deserve what they get.


    The fact that assert() can be turned off if NDEBUG is #defined is one of
    the most important features of assert(). Code which uses assert(), yet
    also requires NDEBUG to be undefined in order to fail properly in the
    face of unexpected inputs, isn't using assert() correctly, or at least
    isn't ready for delivery. Most likely, such code is using assert() where
    some other approach, such as if(error_condition) {/* error handling
    */}, would be more appropriate.

    --
    James Kuyper
     
    James Kuyper, Apr 13, 2011
    #8
  9. James Kuyper <> writes:
    > On 04/13/2011 05:21 AM, Noob wrote:
    >> I suppose the following code has undefined behavior?
    >>
    >> #include<string.h>
    >> #include<assert.h>


    Spaces between "include" and "<" would make this easier to read.

    >> typedef unsigned long ULONG;

    >
    > C gives you a lot of freedom in the naming of identifiers, but it's
    > generally a good idea to accept well-established naming conventions, to
    > improve communication with other programmers. One of the most widely
    > used naming conventions is to reserve names that are all CAPITAL letters
    > for macros. I'd recommend 'ulong' for this typedef.


    I'd recommend dropping the typedef altogether and referring to unsigned
    long as unsigned long.

    Is ULONG ever going to be defined as something other than unsigned long?
    If so, ULONG is a misleading name. If not, what's the point of defining
    an alias for something that already has a perfectly good name? (If your
    answer is "it saves typing", that's not a good reason. Your code will
    be read many more times than it will be written.)

    [...]

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Apr 13, 2011
    #9
  10. Noob

    Noob Guest

    Keith Thompson wrote:
    > James Kuyper <> writes:
    >> On 04/13/2011 05:21 AM, Noob wrote:
    >>> I suppose the following code has undefined behavior?
    >>>
    >>> #include<string.h>
    >>> #include<assert.h>

    >
    > Spaces between "include" and "<" would make this easier to read.


    NB: it was James's client that gobbled these spaces.
    cf. https://bugzilla.mozilla.org/show_bug.cgi?id=625961
    (Personally, I've disabled flowed-format in about:config)

    >>> typedef unsigned long ULONG;

    >>
    >> C gives you a lot of freedom in the naming of identifiers, but it's
    >> generally a good idea to accept well-established naming conventions, to
    >> improve communication with other programmers. One of the most widely
    >> used naming conventions is to reserve names that are all CAPITAL letters
    >> for macros. I'd recommend 'ulong' for this typedef.

    >
    > I'd recommend dropping the typedef altogether and referring to unsigned
    > long as unsigned long.
    >
    > Is ULONG ever going to be defined as something other than unsigned long?
    > If so, ULONG is a misleading name. If not, what's the point of defining
    > an alias for something that already has a perfectly good name? (If your
    > answer is "it saves typing", that's not a good reason. Your code will
    > be read many more times than it will be written.)


    The library that defines ULONG[1] is not under my control.

    [1] along with all sorts of other "useful" stuff such as:

    typedef unsigned short int USHORT;
    typedef short int SHORT;
    typedef USHORT* PUSHORT;
    typedef unsigned char UCHAR;
    typedef UCHAR* PUCHAR;
    typedef long int LONG;
    typedef unsigned long ULONG;
    typedef ULONG* PULONG;
    typedef unsigned int UINT;
    typedef void* PVOID;
    typedef USHORT WCHAR;

    Oh boy! :)
     
    Noob, Apr 13, 2011
    #10
  11. Noob <root@127.0.0.1> writes:
    > Keith Thompson wrote:
    >> James Kuyper <> writes:
    >>> On 04/13/2011 05:21 AM, Noob wrote:

    [...]
    >>>> typedef unsigned long ULONG;
    >>>
    >>> C gives you a lot of freedom in the naming of identifiers, but it's
    >>> generally a good idea to accept well-established naming conventions, to
    >>> improve communication with other programmers. One of the most widely
    >>> used naming conventions is to reserve names that are all CAPITAL letters
    >>> for macros. I'd recommend 'ulong' for this typedef.

    >>
    >> I'd recommend dropping the typedef altogether and referring to unsigned
    >> long as unsigned long.
    >>
    >> Is ULONG ever going to be defined as something other than unsigned long?
    >> If so, ULONG is a misleading name. If not, what's the point of defining
    >> an alias for something that already has a perfectly good name? (If your
    >> answer is "it saves typing", that's not a good reason. Your code will
    >> be read many more times than it will be written.)

    >
    > The library that defines ULONG[1] is not under my control.
    >
    > [1] along with all sorts of other "useful" stuff such as:
    >
    > typedef unsigned short int USHORT;
    > typedef short int SHORT;
    > typedef USHORT* PUSHORT;
    > typedef unsigned char UCHAR;
    > typedef UCHAR* PUCHAR;
    > typedef long int LONG;
    > typedef unsigned long ULONG;
    > typedef ULONG* PULONG;
    > typedef unsigned int UINT;
    > typedef void* PVOID;
    > typedef USHORT WCHAR;
    >
    > Oh boy! :)


    What, no "typedef int INT;"? And I take it the author assumed
    that wchar_t is unsigned short (which may be a valid assumption
    for system-specific code, but ... ick).

    Well, you don't *have* to use it. But if you're working on existing
    code that does use it, it's better to maintain a consistent style.

    (At least they're typedefs and not macros!)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Apr 13, 2011
    #11
  12. Noob

    James Kuyper Guest

    On 04/13/2011 11:51 AM, Noob wrote:
    > Keith Thompson wrote:
    >> James Kuyper<> writes:
    >>> On 04/13/2011 05:21 AM, Noob wrote:
    >>>> I suppose the following code has undefined behavior?
    >>>>
    >>>> #include<string.h>
    >>>> #include<assert.h>

    >>
    >> Spaces between "include" and "<" would make this easier to read.

    >
    > NB: it was James's client that gobbled these spaces.
    > cf. https://bugzilla.mozilla.org/show_bug.cgi?id=625961
    > (Personally, I've disabled flowed-format in about:config)


    Thanks for the information. I've made the recommended change; let me
    know if whether or not it worked.

    --
    James Kuyper
     
    James Kuyper, Apr 14, 2011
    #12
  13. Noob

    Noob Guest

    [OT] flowed-format

    James Kuyper wrote:
    > On 04/13/2011 11:51 AM, Noob wrote:
    >> Keith Thompson wrote:
    >>> James Kuyper<> writes:
    >>>> On 04/13/2011 05:21 AM, Noob wrote:
    >>>>> I suppose the following code has undefined behavior?
    >>>>>
    >>>>> #include<string.h>
    >>>>> #include<assert.h>
    >>>
    >>> Spaces between "include" and "<" would make this easier to read.

    >>
    >> NB: it was James's client that gobbled these spaces.
    >> cf. https://bugzilla.mozilla.org/show_bug.cgi?id=625961
    >> (Personally, I've disabled flowed-format in about:config)

    >
    > Thanks for the information. I've made the recommended change; let me
    > know if whether or not it worked.


    Did you want to disable flowed-format? You still have it enabled:

    Content-Type: text/plain; charset=UTF-8; format=flowed

    I suppose you've set
    mailnews.display.disable_format_flowed_support = true

    But, more importantly, you have to set
    mailnews.send_plaintext_flowed = false

    Regards.
     
    Noob, Apr 14, 2011
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Rasmussen

    Copying struct with array

    David Rasmussen, May 11, 2004, in forum: C++
    Replies:
    10
    Views:
    6,704
    Andrey Tarasevich
    May 12, 2004
  2. Chris Fogelklou
    Replies:
    36
    Views:
    1,433
    Chris Fogelklou
    Apr 20, 2004
  3. Replies:
    21
    Views:
    669
    Chris Croughton
    Jul 30, 2005
  4. Daniel Rudy
    Replies:
    7
    Views:
    464
    Daniel Rudy
    Mar 31, 2006
  5. Tuan  Bui
    Replies:
    14
    Views:
    507
    it_says_BALLS_on_your forehead
    Jul 29, 2005
Loading...

Share This Page