Copying an array to a similar struct

N

Noob

Hello,

I suppose the following code has undefined behavior?

#include <string.h>
#include <assert.h>
typedef unsigned long ULONG;
struct msg
{
ULONG event;
ULONG args[4];
};
void dispatch_msg(ULONG *argv, int argc)
{
struct msg foob;
assert(0 <= argc && argc <= 5);
memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/
/* send foob to the handler */
}

I imagine a (nasty ;-) compiler implementation could insert
padding between event and args in struct msg, making

sizeof(struct msg) > sizeof(ULONG[5])

thereby giving the memcpy undefined behavior?

Regards.
 
J

James Kuyper

Hello,

I suppose the following code has undefined behavior?

#include<string.h>
#include<assert.h>
typedef unsigned long ULONG;

C gives you a lot of freedom in the naming of identifiers, but it's
generally a good idea to accept well-established naming conventions, to
improve communication with other programmers. One of the most widely
used naming conventions is to reserve names that are all CAPITAL letters
for macros. I'd recommend 'ulong' for this typedef.
struct msg
{
ULONG event;
ULONG args[4];
};
void dispatch_msg(ULONG *argv, int argc)
{
struct msg foob;
assert(0<= argc&& argc<= 5);
memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/

No, this is not undefined behavior. Whether or not there's any padding
between event and args is merely unspecified. It's perfectly permissible
to copy only a portion of an object, and argc*sizeof *argv is guaranteed
to be less than sizeof foob. However, the contents of foob.args after
the memcpy() call may, in principle, be indeterminate; among other
possibilities, they may contain trap representations. Even if they have
non-trap representations, if there's any padding, they won't contain the
values you expect them to. But that's not undefined behavior, just
unspecified.
/* send foob to the handler */

Whether or not this code has undefined behavior depends upon what the
handler does with it. Realistically, the results are unlikely to be good
if there's any padding.
}

I imagine a (nasty ;-) compiler implementation could insert
padding between event and args in struct msg, making

True, though that's rather unlikely. Reasonable implementations seldom
(never?) insert padding for any reason other than alignment issues, and
that can't be a relevant issue in this case.
 
B

Ben Bacarisse

Noob said:
I suppose the following code has undefined behavior?

#include <string.h>
#include <assert.h>
typedef unsigned long ULONG;
struct msg
{
ULONG event;
ULONG args[4];
};
void dispatch_msg(ULONG *argv, int argc)
{
struct msg foob;
assert(0 <= argc && argc <= 5);
memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/
/* send foob to the handler */
}

I imagine a (nasty ;-) compiler implementation could insert
padding between event and args in struct msg, making

sizeof(struct msg) > sizeof(ULONG[5])

thereby giving the memcpy undefined behavior?

You'd have problems, but there would be no undefined behaviour in the
copy. Certain subsequent uses of foob.args could be undefined, but do
the technicalities matter? The code is not portable because of the
assumption that it makes. However, you could test that there is no
padding so as to identify problem implementation at the offset, though
I'd be very surprised if you ever find an implementation that wants to
pad that structure anywhere but at the end (and even that seems
unlikely).
 
E

Eric Sosman

Hello,

I suppose the following code has undefined behavior?

#include<string.h>
#include<assert.h>
typedef unsigned long ULONG;
struct msg
{
ULONG event;
ULONG args[4];
};
void dispatch_msg(ULONG *argv, int argc)
{
struct msg foob;
assert(0<= argc&& argc<= 5);
memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/

I see nothing wrong here ...
/* send foob to the handler */

... but very likley here. Not only is there the issue of
possible padding, but the case argc == 0 leaves you with a
completely uninitialized struct for the handler to handle.
}

I imagine a (nasty ;-) compiler implementation could insert
padding between event and args in struct msg, making

sizeof(struct msg)> sizeof(ULONG[5])

thereby giving the memcpy undefined behavior?

The memcpy() is fine, assuming argv does in fact point to
a region of at least argc ULONG's. In that case, the number
of bytes copied does not exceed the size of the source or of
the destination, and the source and destination do not overlap.

It's *after* the memcpy() that the fun begins ...
 
N

Noob

Eric said:
Noob said:
I suppose the following code has undefined behavior?

#include<string.h>
#include<assert.h>
typedef unsigned long ULONG;
struct msg
{
ULONG event;
ULONG args[4];
};
void dispatch_msg(ULONG *argv, int argc)
{
struct msg foob;
assert(0<= argc&& argc<= 5);
memcpy(&foob, argv, argc * sizeof *argv); /*** UB here ?? ***/

I see nothing wrong here ...
/* send foob to the handler */

... but very likley here. Not only is there the issue of
possible padding, but the case argc == 0 leaves you with a
completely uninitialized struct for the handler to handle.
}

I imagine a (nasty ;-) compiler implementation could insert
padding between event and args in struct msg, making

sizeof(struct msg)> sizeof(ULONG[5])

thereby giving the memcpy undefined behavior?

The memcpy() is fine, assuming argv does in fact point to
a region of at least argc ULONG's. In that case, the number
of bytes copied does not exceed the size of the source or of
the destination, and the source and destination do not overlap.

It's *after* the memcpy() that the fun begins ...

I think the following changes fix all potential cases of UB
upon reading the "foob" message. Is this correct?

#include <string.h>
#include <assert.h>
typedef unsigned long ULONG;
struct msg
{
ULONG event;
ULONG args[4];
};
void dispatch_msg(ULONG *argv, int argc)
{
struct msg foob;
assert(0 < argc && argc <= 5);
foob.event = argv[0];
memcpy(foob.args, argv+1, (argc-1) * sizeof *argv);
/* send foob to the handler */
}
 
E

Eric Sosman

[...]
I think the following changes fix all potential cases of UB
upon reading the "foob" message. Is this correct?

#include<string.h>
#include<assert.h>
typedef unsigned long ULONG;
struct msg
{
ULONG event;
ULONG args[4];
};
void dispatch_msg(ULONG *argv, int argc)
{
struct msg foob;
assert(0< argc&& argc<= 5);
foob.event = argv[0];
memcpy(foob.args, argv+1, (argc-1) * sizeof *argv);
/* send foob to the handler */
}

Looks okay, assuming the handler does nothing weird. I'd
mention that assert() can be useful in debugging but is not a
reliable guard in "production code," because somebody somewhere
is likely to toss in an NDEBUG for the sake of optimization.

I'd also wonder why the handler can't just use the argv
data directly, instead of needing the same information shrink-
wrapped in a struct. Copying information from one place to
another doesn't advance the state of a computation much; if
you've learned that a collection contains 42 widgets you don't
gain a lot by saying it holds 42, 42, 42, widgets. But that's
a design question, not a correctness issue.
 
N

Noob

Eric said:
[...]
I think the following changes fix all potential cases of UB
upon reading the "foob" message. Is this correct?

#include<string.h>
#include<assert.h>
typedef unsigned long ULONG;
struct msg
{
ULONG event;
ULONG args[4];
};
void dispatch_msg(ULONG *argv, int argc)
{
struct msg foob;
assert(0< argc&& argc<= 5);

Did you notice how your client messed this line up?
I've disabled flowed-format in my client.
foob.event = argv[0];
memcpy(foob.args, argv+1, (argc-1) * sizeof *argv);
/* send foob to the handler */
}

Looks okay, assuming the handler does nothing weird. I'd
mention that assert() can be useful in debugging but is not a
reliable guard in "production code," because somebody somewhere
is likely to toss in an NDEBUG for the sake of optimization.

Right. Then again, as strlen(NULL) has UB,
so does dispatch_msg(NULL, n) (which I didn't guard against)
or dispatch(argv, 0) (which I asserted against).

If the user of the API wants to ignore the function's
description, and optimize for speed, instead of correct
operation, then I suppose they deserve what they get.
I'd also wonder why the handler can't just use the argv
data directly, instead of needing the same information shrink-
wrapped in a struct. Copying information from one place to
another doesn't advance the state of a computation much; if
you've learned that a collection contains 42 widgets you don't
gain a lot by saying it holds 42, 42, 42, widgets. But that's
a design question, not a correctness issue.

The message is handled in a different thread of execution,
so as not to block the thread that sent the message.
(Hence the "send foob to the handler" comment.)

Regards.
 
J

James Kuyper

Eric said:
[...]
I think the following changes fix all potential cases of UB
upon reading the "foob" message. Is this correct?

#include<string.h>
#include<assert.h>
typedef unsigned long ULONG;
struct msg
{
ULONG event;
ULONG args[4];
};
void dispatch_msg(ULONG *argv, int argc)
{
struct msg foob;
assert(0< argc&& argc<= 5);
....
Looks okay, assuming the handler does nothing weird. I'd
mention that assert() can be useful in debugging but is not a
reliable guard in "production code," because somebody somewhere
is likely to toss in an NDEBUG for the sake of optimization.

Right. Then again, as strlen(NULL) has UB,
so does dispatch_msg(NULL, n) (which I didn't guard against)
or dispatch(argv, 0) (which I asserted against).

If the user of the API wants to ignore the function's
description, and optimize for speed, instead of correct
operation, then I suppose they deserve what they get.

The fact that assert() can be turned off if NDEBUG is #defined is one of
the most important features of assert(). Code which uses assert(), yet
also requires NDEBUG to be undefined in order to fail properly in the
face of unexpected inputs, isn't using assert() correctly, or at least
isn't ready for delivery. Most likely, such code is using assert() where
some other approach, such as if(error_condition) {/* error handling
*/}, would be more appropriate.
 
K

Keith Thompson

Spaces between "include" and "<" would make this easier to read.
C gives you a lot of freedom in the naming of identifiers, but it's
generally a good idea to accept well-established naming conventions, to
improve communication with other programmers. One of the most widely
used naming conventions is to reserve names that are all CAPITAL letters
for macros. I'd recommend 'ulong' for this typedef.

I'd recommend dropping the typedef altogether and referring to unsigned
long as unsigned long.

Is ULONG ever going to be defined as something other than unsigned long?
If so, ULONG is a misleading name. If not, what's the point of defining
an alias for something that already has a perfectly good name? (If your
answer is "it saves typing", that's not a good reason. Your code will
be read many more times than it will be written.)

[...]
 
N

Noob

Keith said:
Spaces between "include" and "<" would make this easier to read.

NB: it was James's client that gobbled these spaces.
cf. https://bugzilla.mozilla.org/show_bug.cgi?id=625961
(Personally, I've disabled flowed-format in about:config)
I'd recommend dropping the typedef altogether and referring to unsigned
long as unsigned long.

Is ULONG ever going to be defined as something other than unsigned long?
If so, ULONG is a misleading name. If not, what's the point of defining
an alias for something that already has a perfectly good name? (If your
answer is "it saves typing", that's not a good reason. Your code will
be read many more times than it will be written.)

The library that defines ULONG[1] is not under my control.

[1] along with all sorts of other "useful" stuff such as:

typedef unsigned short int USHORT;
typedef short int SHORT;
typedef USHORT* PUSHORT;
typedef unsigned char UCHAR;
typedef UCHAR* PUCHAR;
typedef long int LONG;
typedef unsigned long ULONG;
typedef ULONG* PULONG;
typedef unsigned int UINT;
typedef void* PVOID;
typedef USHORT WCHAR;

Oh boy! :)
 
K

Keith Thompson

Noob said:
Keith said:
James Kuyper said:
On 04/13/2011 05:21 AM, Noob wrote: [...]
typedef unsigned long ULONG;

C gives you a lot of freedom in the naming of identifiers, but it's
generally a good idea to accept well-established naming conventions, to
improve communication with other programmers. One of the most widely
used naming conventions is to reserve names that are all CAPITAL letters
for macros. I'd recommend 'ulong' for this typedef.

I'd recommend dropping the typedef altogether and referring to unsigned
long as unsigned long.

Is ULONG ever going to be defined as something other than unsigned long?
If so, ULONG is a misleading name. If not, what's the point of defining
an alias for something that already has a perfectly good name? (If your
answer is "it saves typing", that's not a good reason. Your code will
be read many more times than it will be written.)

The library that defines ULONG[1] is not under my control.

[1] along with all sorts of other "useful" stuff such as:

typedef unsigned short int USHORT;
typedef short int SHORT;
typedef USHORT* PUSHORT;
typedef unsigned char UCHAR;
typedef UCHAR* PUCHAR;
typedef long int LONG;
typedef unsigned long ULONG;
typedef ULONG* PULONG;
typedef unsigned int UINT;
typedef void* PVOID;
typedef USHORT WCHAR;

Oh boy! :)

What, no "typedef int INT;"? And I take it the author assumed
that wchar_t is unsigned short (which may be a valid assumption
for system-specific code, but ... ick).

Well, you don't *have* to use it. But if you're working on existing
code that does use it, it's better to maintain a consistent style.

(At least they're typedefs and not macros!)
 
N

Noob

James said:
Thanks for the information. I've made the recommended change; let me
know if whether or not it worked.

Did you want to disable flowed-format? You still have it enabled:

Content-Type: text/plain; charset=UTF-8; format=flowed

I suppose you've set
mailnews.display.disable_format_flowed_support = true

But, more importantly, you have to set
mailnews.send_plaintext_flowed = false

Regards.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top