Checking memset

jacob navia · Dec 19, 2005

Many compilers check printf for errors, lcc-win32 too. But there are
other functions that would be worth to check, specially memset.

Memset is used mainly to clear a memory zone, receiving a pointer to
the start, the value (most of the time zero) and the size of the
memory array to clear.

Problems appear when the size given is not the size of the object
given as its first argument. For instance
void fn(void)
{
int array[128];

memset(array,0,128);
}

This will fail to clear the array entirely.

The compiler can check this kind of things when the size of the object
is known.

When the pointer given points to a structure/union/or array with known
size (i.e. its size is available at compilation time) I have added code
to check for this. If the size given is bigger than the size of the
object the compiler will now issue a warning. If the size is less, and
the value to set is zero, the compiler will issue a warning too, saying
that memset fails to clear completely the array.

Borderline cases:

What to do with unions?

You can use memset to clear a member of the union, without clearing the
whole union. I have choosen not to complain in this case. What do you think?

Other interesting cases arise with clearing of an array:

struct foo *p;
p = malloc(sizeof(struct foo)*123);
if (p) {
memset(p,0,123);
}
Supposing sizeof(struct foo) is 64, the given size is not
a multiple of the size of the structure.
Would it be possible to issue a warning here? or would it lead to
many "false positives" (unwarranted warnings) ?

Of course when the memory points to a primitive type (int/double, etc)
there is no way the compiler can check that the size given is
correct, specially of course in the case of chars, where sizeof is 1.

In other cases it should be a multiple of the element size, isn't it?
I.e. the size when clearing a int * should be a multiple of sizeof(int)
with a double * a multiple of sizeof double, etc.

What do you think?

Input appreciated.

Thanks for your time.

jacob

Eric Sosman · Dec 19, 2005

jacob navia wrote On 12/19/05 17:01,:

Many compilers check printf for errors, lcc-win32 too. But there are
other functions that would be worth to check, specially memset.

Memset is used mainly to clear a memory zone, receiving a pointer to
the start, the value (most of the time zero) and the size of the
memory array to clear.

Problems appear when the size given is not the size of the object
given as its first argument. For instance
void fn(void)
{
int array[128];

memset(array,0,128);
}

This will fail to clear the array entirely.

The compiler can check this kind of things when the size of the object
is known.

When the pointer given points to a structure/union/or array with known
size (i.e. its size is available at compilation time) I have added code
to check for this. If the size given is bigger than the size of the
object the compiler will now issue a warning. If the size is less, and
the value to set is zero, the compiler will issue a warning too, saying
that memset fails to clear completely the array.

YMMV, but I use memset() rather rarely and have a hard
time understanding why it's "specially" worthy of a check.
memcpy() and memmove() would seem more deserving candidates,
and the same machinery could probably be used.

Pursuing the size-checking theme a little further might
be helpful. qsort(), fread(), ... offer opportunities to
sanity-check as many as two arguments. It might be even
more beneficial to sanity-check the argument of malloc() and
the second arguments of calloc() and realloc(). This would
be harder because some of the information is "outside" the
function call itself and would need to be derived from the
wider context: `malloc(3)' in isolation could be right or
wrong, but `char **p = malloc(3)' is almost certainly wrong.

Borderline cases:

What to do with unions?

You can use memset to clear a member of the union, without clearing the
whole union. I have choosen not to complain in this case. What do you think?

To fill one element of a struct or union, one would normally
write memset(&s.x, 42, sizeof s.x). To fill the whole thing,
you'd typically write memset(&s, 42, sizeof s). If you're going
to issue a warning for memset(&v, 42, unequal_to_sizeof_v), it
seems to me you could treat s or s.x the same way.

Other interesting cases arise with clearing of an array:

struct foo *p;
p = malloc(sizeof(struct foo)*123);
if (p) {
memset(p,0,123);
}
Supposing sizeof(struct foo) is 64, the given size is not
a multiple of the size of the structure.
Would it be possible to issue a warning here? or would it lead to
many "false positives" (unwarranted warnings) ?

You'd get false positives on the struct hack, even in its
C99-approved form. Too many? I have no idea, except that I
quite frequently write the related form

struct foo *p = malloc(sizeof *p + strlen(s) + 1);
if (p == NULL) die();
p->string = (char*)(p + 1);
strcpy(p->string, s);

.... and would not like to be scolded for it. (You might not
do so anyhow, being unable to determine strlen(s) at compile
time.)

Hmmm: Have you considered checking for malloc(strlen(s))?

Of course when the memory points to a primitive type (int/double, etc)
there is no way the compiler can check that the size given is
correct, specially of course in the case of chars, where sizeof is 1.

In other cases it should be a multiple of the element size, isn't it?
I.e. the size when clearing a int * should be a multiple of sizeof(int)
with a double * a multiple of sizeof double, etc.

What do you think?

What I think is the usual load of baloney ;-) How about
implementing your checks and running a few million lines of
source through it, counting up the false positives and the
actual hitherto-undetected errors that are caught, and then
making a judgement?

Gordon Burditt · Dec 19, 2005

Memset is used mainly to clear a memory zone, receiving a pointer to

the start, the value (most of the time zero) and the size of the
memory array to clear.

Problems appear when the size given is not the size of the object
given as its first argument.

Careful with the wording here, I often give a pointer as the first
argument and the size is not the size *of the pointer*. The compiler
might still be able to tell where the pointer points, though.

For instance
void fn(void)
{
int array[128];

memset(array,0,128);
}

This will fail to clear the array entirely.

It is not wrong to fail to clear an array entirely.

The compiler can check this kind of things when the size of the object
is known.

When the pointer given points to a structure/union/or array with known
size (i.e. its size is available at compilation time) I have added code
to check for this. If the size given is bigger than the size of the
object the compiler will now issue a warning.

Yes, this makes a lot of sense.

If the size is less, and
the value to set is zero, the compiler will issue a warning too, saying
that memset fails to clear completely the array.

I'm not so sure about this one. It sounds like it could be more
of a nuisance than it's worth. It isn't that uncommon to declare
one large buffer and re-use it multiple times where using a smaller
buffer might be more appropriate, but would take more space.

Borderline cases:

What to do with unions?

You can use memset to clear a member of the union, without clearing the
whole union. I have choosen not to complain in this case. What do you think?

Other interesting cases arise with clearing of an array:

struct foo *p;
p = malloc(sizeof(struct foo)*123);
if (p) {
memset(p,0,123);
}
Supposing sizeof(struct foo) is 64, the given size is not
a multiple of the size of the structure.
Would it be possible to issue a warning here? or would it lead to
many "false positives" (unwarranted warnings) ?

Of course when the memory points to a primitive type (int/double, etc)
there is no way the compiler can check that the size given is
correct, specially of course in the case of chars, where sizeof is 1.

If the pointer is known to point to a single variable (not an array)
of primitive type, you can be fairly safe warning if the size
is larger than the size of that variable.

In other cases it should be a multiple of the element size, isn't it?
I.e. the size when clearing a int * should be a multiple of sizeof(int)
with a double * a multiple of sizeof double, etc.

What do you think?

If you're going to check the destination pointer on memset(), you
might as well check the destination pointer on memcpy(), strncpy(),
memmove(), and similar functions also.

Gordon L. Burditt

Barry · Dec 19, 2005

jacob navia said:
Many compilers check printf for errors, lcc-win32 too. But there are
other functions that would be worth to check, specially memset.

Memset is used mainly to clear a memory zone, receiving a pointer to
the start, the value (most of the time zero) and the size of the
memory array to clear.

Problems appear when the size given is not the size of the object
given as its first argument. For instance
void fn(void)
{
int array[128];

memset(array,0,128);
}

This will fail to clear the array entirely.

The compiler can check this kind of things when the size of the object
is known.

When the pointer given points to a structure/union/or array with known
size (i.e. its size is available at compilation time) I have added code
to check for this. If the size given is bigger than the size of the
object the compiler will now issue a warning. If the size is less, and
the value to set is zero, the compiler will issue a warning too, saying
that memset fails to clear completely the array.

If I were going to warn on the "size is less" condition I do not see any
reason to restrict it to the value being zero. In any case, and this
applies
to your questions about false positives below, I would require this to be
an option to be turned on, rather than default. I am not familiar with your
compiler (I am aware of it but have not used it), so these comments are
purely based on how I would like the compilers I use or have used to behave.

jacob navia · Dec 19, 2005

Gordon said:
Careful with the wording here, I often give a pointer as the first
argument and the size is not the size *of the pointer*. The compiler
might still be able to tell where the pointer points, though.

Yes, the compiler dereferences the pointer; In any case you *have* to
give a pointer, according to memset's prototype!

For instance
void fn(void)
{
int array[128];

memset(array,0,128);
}

This will fail to clear the array entirely.

Click to expand...

It is not wrong to fail to clear an array entirely.

The compiler can check this kind of things when the size of the object
is known.

When the pointer given points to a structure/union/or array with known
size (i.e. its size is available at compilation time) I have added code
to check for this. If the size given is bigger than the size of the
object the compiler will now issue a warning.

Click to expand...

Yes, this makes a lot of sense.

If the size is less, and
the value to set is zero, the compiler will issue a warning too, saying
that memset fails to clear completely the array.

Click to expand...

I'm not so sure about this one. It sounds like it could be more
of a nuisance than it's worth. It isn't that uncommon to declare
one large buffer and re-use it multiple times where using a smaller
buffer might be more appropriate, but would take more space.

You clear it partially?
That would be surprising...
I caught one bug like that in the IDE of lcc-win32, failing to
clear a memory buffer completely. Those errors are really hard to
find... and the symptoms (using not cleared memory) are random
crashes, very difficult to follow to the original source of
the problem.

If the pointer is known to point to a single variable (not an array)
of primitive type, you can be fairly safe warning if the size
is larger than the size of that variable.

If you're going to check the destination pointer on memset(), you
might as well check the destination pointer on memcpy(), strncpy(),
memmove(), and similar functions also.

Memcpy: error if size bigger than source or bigger than destination.
error if surce overlaps destination.
Other cases (copying less) are quite common I suppose.
strncpy: error when size > sizeof destination
memmove: same as memcpy but not error if src overlaps dst

All this supposes the compiler has this information at compile
time.

Gordon Burditt · Dec 19, 2005

If the size is less, and

You clear it partially?
That would be surprising...

Why?

I have a field of, say, 50 characters (well, really, its sizeof()
some structure element) which is going to be stored on disk. It's
a string, but for security and regression testing and general
cleanliness, I want to store strings nulled out to the end of the
field, not one null followed by a bunch of junk that will miscompare
if I compare two files.

If I need a temporary copy of such a field, I might use a handy
large buffer (say, 20k) which is guaranteed to be large enough and
can be re-used, but the code is still written in terms of the size
of the *field*, not the size of the temporary buffer it's in.
And why memset() a huge buffer when you know you're only using
a small portion of it?

Gordon L. Burditt

Arthur J. O'Dwyer · Dec 19, 2005

You clear it partially?
That would be surprising...

Uncommon, definitely, but not surprising to me in the rare cases when
it does happen. For example:

char buf[101];

/* Initialize the buffer */
memset(buf, '?', 100); /* <--- this line */
buf[100] = '\0';

for (i=0; i < nconstraints; ++i)
buf[constraints.idx] = constraints.val;
try_match(buf, input);

I suggest that an extremely useful diagnostic would be anytime
the programmer uses 'malloc', 'realloc', 'memset', or 'memcpy' without
using the keyword 'sizeof' in the appropriate place. E.g.,

cap += 10;
dict = realloc(dict, cap); /* <-- Oops! */

If you wanted to be really fancy, you'd attach a tag to each variable
telling whether its current value had been derived from a 'sizeof'
expression, and if so, 'sizeof' which object. Arithmetic on those
variables would update the tags and produce diagnostics in the obvious
ways:

int capsize = cap * sizeof buf;
int recover = 16;
int marypoppins = capsize + recover; /* Warning: Huh? */
int exsize = sizeof ex;
int starship = capsize * exsize; /* Warning: Huh? */

Care would have to be taken to "balance" the usefulness of these warnings
against their potential for producing bogus diagnostics. The user should
rarely, if ever, have to read warnings that don't apply to his code.

my $.02,
-Arthur

jacob navia · Dec 19, 2005

Arthur said:
I suggest that an extremely useful diagnostic would be anytime
the programmer uses 'malloc', 'realloc', 'memset', or 'memcpy' without
using the keyword 'sizeof' in the appropriate place. E.g.,

cap += 10;
dict = realloc(dict, cap); /* <-- Oops! */

-Arthur

I thought about that but I think it would generate
too much warnings, in principle justified but it could
be annoying, and distract from the real important warnings.

In general you are right of course. It should be *always* sizeof.

It is a pity that it is up to the programmer to write that,
since the compiler could do it very easily:

struct foo m;
Memclear(m);

In principle Memclear is
#define Memclear(m) memset(&m,0,sizeof(m));

Since there is no standard, each one of use has some
"memclear" macro.

Richard Bos · Dec 20, 2005

Arthur J. O'Dwyer said:
I suggest that an extremely useful diagnostic would be anytime
the programmer uses 'malloc', 'realloc', 'memset', or 'memcpy' without
using the keyword 'sizeof' in the appropriate place. E.g.,

cap += 10;
dict = realloc(dict, cap); /* <-- Oops! */

Yes?

/* Maximum street name length for official postal addresses */
#define STREETNAMELEN 28

void setdefaultstreet(char *street)
{
memset(street, '?', STREETNAMELEN);
street[STREETNAMELEN]='\0';
}

...

char customer_street[STREETNAMELEN+1];
setdefaultstreet(customer_street);

Not a sizeof in sight, but not, I venture, a silly application of
memset().

OTOH:

int arr[20];

memset(arr, 0, 20*sizeof int);

Not a correct application of memset, and not, in fact, a wise
application of sizeof - good code would use sizeof arr - but it does use
sizeof.

Richard

Simon Biber · Dec 21, 2005

Richard said:
OTOH:

int arr[20];

memset(arr, 0, 20*sizeof int);

Not a correct application of memset, and not, in fact, a wise
application of sizeof - good code would use sizeof arr - but it does use
sizeof.

It's not just an unwise application of sizeof, but actually a constraint
violation. When using sizeof with a type, you must surround the type
with parentheses.

The use of memset to set ints to zero is not absolutely guaranteed to
work as expected, but it does on every C implementation I've ever used.

In the worst case, this should be OK, though I don't know why anyone
would want to do it:
size_t i;
int arr[20];
unsigned char *p;
memset(arr, 0, sizeof arr);
p = (unsigned char *)arr;
for(i = 0; i < sizeof arr; i++)
{
assert(p == 0);
}

Richard Heathfield · Dec 21, 2005

Simon Biber said:

The use of memset to set ints to zero is not absolutely guaranteed to
work as expected, but it does on every C implementation I've ever used.

I think you'll find the committee finally sorted that out.

Keith Thompson · Dec 21, 2005

Richard Heathfield said:
Simon Biber said:

I think you'll find the committee finally sorted that out.

Yup.

The Defect Report #263
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_263.htm>
says:

Append to 6.2.6.2#5:

For any integer type, the object representation where all the
bits are zero shall be a representation of the value zero in
that type.

This was published in Technical Corrigendum 2, which is incorporated
into N1124.

Tim Rentsch · Dec 28, 2005

(e-mail address removed) (Richard Bos) writes:

[snip]

OTOH:

int arr[20];

memset(arr, 0, 20*sizeof int);

Not a correct application of memset, and not, in fact, a wise
application of sizeof - good code would use sizeof arr - but it does use
sizeof.

(Presumably 'sizeof(int)' was intended.)

I agree about using 'sizeof arr' rather than using sizeof on
a type. If one wants to use the type form, however, then

int arr[20];

memset( arr, 0, sizeof(int[20]) );

seems a little nicer.

Keith Thompson · Dec 28, 2005

Tim Rentsch said:
I agree about using 'sizeof arr' rather than using sizeof on
a type. If one wants to use the type form, however, then

int arr[20];

memset( arr, 0, sizeof(int[20]) );

seems a little nicer.

You have the magic number 20 in two places (possibly more than that if
it's used elsewhere), which means you'll have problems if you later
decide to change the size of the array. You'll also have problems
if you change the element type. Using

memset(arr, 0, sizeof arr);

avoids both of these problems. (It could still cause problems if arr
is changed to a pointer rather than an array.)

Tim Rentsch · Dec 28, 2005

Keith Thompson said:
Tim Rentsch said:

I agree about using 'sizeof arr' rather than using sizeof on
a type. If one wants to use the type form, however, then

int arr[20];

memset( arr, 0, sizeof(int[20]) );

seems a little nicer.

Click to expand...

You have the magic number 20 in two places (possibly more than that if
it's used elsewhere), which means you'll have problems if you later
decide to change the size of the array. You'll also have problems
if you change the element type. Using

memset(arr, 0, sizeof arr);

avoids both of these problems. (It could still cause problems if arr
is changed to a pointer rather than an array.)

I totally agree. My remark was intended only in the context
of comparing '20*sizeof(int)' and 'sizeof(int[20])'. It does
say that, yes?

Keith Thompson · Dec 28, 2005

Tim Rentsch said:
Keith Thompson said:

Tim Rentsch said:

I agree about using 'sizeof arr' rather than using sizeof on
a type. If one wants to use the type form, however, then

int arr[20];

memset( arr, 0, sizeof(int[20]) );

seems a little nicer.

Click to expand...

You have the magic number 20 in two places (possibly more than that if
it's used elsewhere), which means you'll have problems if you later
decide to change the size of the array. You'll also have problems
if you change the element type. Using

memset(arr, 0, sizeof arr);

avoids both of these problems. (It could still cause problems if arr
is changed to a pointer rather than an array.)

Click to expand...

I totally agree. My remark was intended only in the context
of comparing '20*sizeof(int)' and 'sizeof(int[20])'. It does
say that, yes?

Yes, you did; sometimes I pay more attention to code than to English.

On the other hand, the phrase "If one wants to use the type form"
should arguably be followed by ", you shouldn't".

Old Wolf · Dec 28, 2005

Keith said:
Tim Rentsch said:

int arr[20];

memset( arr, 0, sizeof(int[20]) );

seems a little nicer.

Click to expand...

memset(arr, 0, sizeof arr);

avoids both of these problems. (It could still cause problems if arr
is changed to a pointer rather than an array.)

This can be avoided by using the 'canonical form' (my choice
of words):

memset( &arr, 0, sizeof arr );

which works for any object type (AFAIK). Or for any pointer:

memset( arr, 0, sizeof *arr );

(and will fail to compile if arr isn't a pointer to complete type).

Richard Bos · Dec 29, 2005

Old Wolf said:
Keith said:

Tim Rentsch said:

int arr[20];

memset( arr, 0, sizeof(int[20]) );

seems a little nicer.

Click to expand...

memset(arr, 0, sizeof arr);

avoids both of these problems. (It could still cause problems if arr
is changed to a pointer rather than an array.)

Click to expand...

This can be avoided by using the 'canonical form' (my choice
of words):

memset( &arr, 0, sizeof arr );

Yes, but that only sets the pointer _itself_ to all-bits-zero. Since
that is not guaranteed to be a valid pointer, not even a null pointer,
it's a useless operation. The intent was to set all members of the array
- in the case of the pointer, all ints that the pointer points at - to
zero.

which works for any object type (AFAIK). Or for any pointer:

memset( arr, 0, sizeof *arr );

Yes, but that only sets a single member to all bits zero. If you have a
pointer, not an array, memset() cannot find out for itself how many
members there are; therefore, you must tell it in that case.

Richard

Keith Thompson · Dec 29, 2005

Old Wolf said:
Old Wolf said:

Keith said:

int arr[20];

memset( arr, 0, sizeof(int[20]) );

seems a little nicer.

memset(arr, 0, sizeof arr);

avoids both of these problems. (It could still cause problems if arr
is changed to a pointer rather than an array.)

Click to expand...

This can be avoided by using the 'canonical form' (my choice
of words):

memset( &arr, 0, sizeof arr );

Click to expand...

Yes, but that only sets the pointer _itself_ to all-bits-zero. Since
that is not guaranteed to be a valid pointer, not even a null pointer,
it's a useless operation. The intent was to set all members of the array
- in the case of the pointer, all ints that the pointer points at - to
zero.

I think he intended that to be used when arr is declared as an array.

Yes, but that only sets a single member to all bits zero. If you have a
pointer, not an array, memset() cannot find out for itself how many
members there are; therefore, you must tell it in that case.

For any solution, you have to modify the memset() call depending on
whether you have a pointer or an array -- because you're doing two
different things. For an array, you usually want to zero the object
itself; for a pointer, you usually want to zero what it points to.
C confusingly mixes pointers and arrays, but it doesn't (at least in
this case) provide an abstraction that lets you usefully handle both
without knowing which one you're dealing with.

Old Wolf · Dec 29, 2005

Richard said:
Old Wolf said:

int arr[20];

Click to expand...

memset( &arr, 0, sizeof arr );

Click to expand...

Yes, but that only sets the pointer _itself_ to all-bits-zero. Since
that is not guaranteed to be a valid pointer, not even a null pointer,
it's a useless operation.

What pointer? arr is an array, and this memset sets all of
its members to 0 (all-bits-0 for ints must be the value 0).

Yes, but that only sets a single member to all bits zero.

It sets the entire object (*arr) to all-bits-zero. Examples:

struct S { int a[20]; };
S s, *p = &s;
int (*q)[20] = &s.a;

memset(p, 0, sizeof *p); // good
memset(q, 0, sizeof *q); // also good

If you have a pointer, not an array, memset() cannot find out for itself
how many members there are

If you have a pointer to the object, then you can zero it.
Otherwise, you can't. (A pointer to part of the object doesn't count).

I'm sure you know this -- perhaps there was some misunderstanding
about the actual code that was being discussed..

memset	24	Apr 15, 2014
a = b or memset/cpy?	17	Feb 7, 2012
memset	21	Jan 26, 2008
Adding adressing of IPv6 to program	1	Feb 16, 2023
memset() on a struct or union	21	Feb 2, 2010
Filter sober in c++ don't pass test	0	Dec 2, 2023
JavaScript String Syntax Checking	5	Jun 29, 2022
memset + free	17	Jun 10, 2007

Checking memset

jacob navia

Eric Sosman

Gordon Burditt

Barry

jacob navia

Gordon Burditt

Arthur J. O'Dwyer

jacob navia

Richard Bos

Simon Biber

Richard Heathfield

Keith Thompson

Tim Rentsch

Keith Thompson

Tim Rentsch

Keith Thompson

Old Wolf

Richard Bos

Keith Thompson

Old Wolf

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads