Better free(invalid pointer) behavior?

A

anyone.anon

Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
0<k<N causes undefined behavior, since only a pointer returned by (m|
re|c)alloc() can validly be passed to free().

This seems pretty silly. Wouldn't a better behavior of free() be to
assume that if it receives q, where q lies in some malloc()ated-and-
not-yet-free()d block starting at p, then it should interpret this as
free(p)?

Even better, the *alloc() routines could leave a spare byte between
the blocks they return so that free(p+N) would also be equivalent to
free(p).

This would let one move p around within the allocated block, without
always having to keep track of the start location of the block - a
tedious business that seems only to obfuscate one's code.
 
I

Ian Malone

Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
0<k<N causes undefined behavior, since only a pointer returned by (m|
re|c)alloc() can validly be passed to free().

This seems pretty silly. Wouldn't a better behavior of free() be to
assume that if it receives q, where q lies in some malloc()ated-and-
not-yet-free()d block starting at p, then it should interpret this as
free(p)?

To me it seems fairly logical; you get an object from the
the interface, then pass it back to be destroyed. It's
a common design, the confusion that can afflict beginners
is that in the case of a returned pointer from malloc the
object is both the object to be sent back to the interface
and a pointer which can be used to access a memory region.
Even better, the *alloc() routines could leave a spare byte between
the blocks they return so that free(p+N) would also be equivalent to
free(p).

This would let one move p around within the allocated block, without
always having to keep track of the start location of the block - a
tedious business that seems only to obfuscate one's code.

It's an intriguing idea, though it sounds as if it would slow
down and complicate malloc implementation (necessitating a
search for the allocated block. When do you find that keeping
the object you asked for causes obfuscation?
 
C

Chris Dollin

Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
0<k<N causes undefined behavior, since only a pointer returned by (m|
re|c)alloc() can validly be passed to free().

`free(p+k)` is undefined for any k > 0, in fact: your restriction
`<N` is misleading.
This seems pretty silly.

It seems completely reasonable to /me/. Give back what you took
out.
Wouldn't a better behavior of free() be to
assume that if it receives q, where q lies in some malloc()ated-and-
not-yet-free()d block starting at p, then it should interpret this as
free(p)?

No. Because now `free(p)` has to work out from the value of `p`
where the block you're freeing starts, and it has to do so quickly
and compactly. That's a significant constraint on the implementation.
Even better, the *alloc() routines could leave a spare byte between
the blocks they return so that free(p+N) would also be equivalent to
free(p).

I don't see why this is "better".
This would let one move p around within the allocated block, without
always having to keep track of the start location of the block - a
tedious business that seems only to obfuscate one's code.

I've never found it tedious, and I've never found it obfuscating. It's
at most one pointer with a sensible name, isn't it?

Perhaps you should exhibit an example.
 
S

santosh

Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
0<k<N causes undefined behavior, since only a pointer returned by (m|
re|c)alloc() can validly be passed to free().

A pointer value previously returned by *alloc, yes.
This seems pretty silly. Wouldn't a better behavior of free() be to
assume that if it receives q, where q lies in some malloc()ated-and-
not-yet-free()d block starting at p, then it should interpret this as
free(p)?

Even better, the *alloc() routines could leave a spare byte between
the blocks they return so that free(p+N) would also be equivalent to
free(p).

This would let one move p around within the allocated block, without
always having to keep track of the start location of the block - a
tedious business that seems only to obfuscate one's code.

IMHO, it's cleaner to pass to free what you got from *alloc, not some
arbitrary value. Almost always you need to keep track of the start of
the block anyway, so it's not as if it's difficult.

To refrain from breaking existing implementations, the standard has
mandated this rule. It's not an inconvinient rule, and it encourages
good programming practise of preserving pointer values, before
modifying them.
 
G

Gordon Burditt

Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
0<k<N causes undefined behavior, since only a pointer returned by (m|
re|c)alloc() can validly be passed to free().

This seems pretty silly. Wouldn't a better behavior of free() be to
assume that if it receives q, where q lies in some malloc()ated-and-
not-yet-free()d block starting at p, then it should interpret this as
free(p)?

That's assuming the implementation keeps enough information around to
TELL if this is the case. Some don't. It may also take what's considered
by some to be too much CPU time to do that.
Even better, the *alloc() routines could leave a spare byte between
the blocks they return so that free(p+N) would also be equivalent to
free(p).

This would let one move p around within the allocated block, without
always having to keep track of the start location of the block - a
tedious business that seems only to obfuscate one's code.

Why not ask for full garbage collection? That's more useful in more places.
Very difficult to implement efficiently, though, especially with terabyte
files open.

If you can't (or won't) keep track of where the beginning of the
block is, how do you avoid going off the beginning or the end of
it? That's a common vulnerability in software.
 
C

CBFalconer

Chris said:
`free(p+k)` is undefined for any k > 0, in fact: your restriction
`<N` is misleading.


It seems completely reasonable to /me/. Give back what you took
out.

Even worse, a faulty free of some wild pointer could quietly
destroy badly needed data. Convenient does not mean safe.
 
R

Richard Tobin

CBFalconer said:
Even worse, a faulty free of some wild pointer could quietly
destroy badly needed data. Convenient does not mean safe.

That can perfectly well happen with the existing specification. If
the implementation does not validate the value passed to free a bogus
value could be misinterpreted as a large block with the result that
arbitrary data could be overwritten.

It's never likely to be part of the specification of free() that it
will not destroy "badly needed data" if you pass it a "wild pointer".

-- Richard
 
A

anyone.anon

Chris said:
No. Because now `free(p)` has to work out from the value of `p`
where the block you're freeing starts, and it has to do so quickly
and compactly. That's a significant constraint on the implementation.

I don't think it's much of a constraint. The allocator has to keep a
list of the pointers it's returned and the number of bytes they point
to, so it would just be a case of free() testing something like
"p < pall + bytesall" instead of "p == pall" if it's passed
p.
I've never found it tedious, and I've never found it obfuscating. It's
at most one pointer with a sensible name, isn't it?

Perhaps you should exhibit an example.

How about a snippet like this?

char *p=malloc(100);
fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
null-terminate the buffer it's passed
*/
while(*p++) {
/* stuff */
}
free(p);

would work for the implementation I suggest, with no need to keep a
copy
of the pointer returned by malloc() hanging around.
 
R

Richard Tobin

No. Because now `free(p)` has to work out from the value of `p`
where the block you're freeing starts, and it has to do so quickly
and compactly. That's a significant constraint on the implementation.
[/QUOTE]
I don't think it's much of a constraint. The allocator has to keep a
list of the pointers it's returned and the number of bytes they point
to

No, that would not be a realistic way to implement malloc(). A common
method is to store the size immediately before the block, so that the
size can be found by something like ((size_t *)block)[-1].

Another technique is to use different areas of memory for different
size blocks so that the size can be derived from the high-order
address bits, something like size_table[(intptr_t)block >> NBITS].
This one could be adapted straightforwardly to work with free()
applied to an internal pointer; the size computation would still work
and the start of the block could be found by taking the address mod
the discovered block size (I believe such implementations generally
only use a small number of block sizes).
How about a snippet like this?

char *p=malloc(100);
fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
null-terminate the buffer it's passed
*/
while(*p++) {
/* stuff */
}

At this point p might be block+100, which might be the start of another
malloc()ed block. But in any case, I don't see that it's worth it.


-- Richard
 
A

anyone.anon

Richard said:
At this point p might be block+100, which might be the start of
another malloc()ed block. But in any case, I don't see that it's
worth it.

You misread my original post:
 
E

Eric Sosman

Chris said:
No. Because now `free(p)` has to work out from the value of `p`
where the block you're freeing starts, and it has to do so quickly
and compactly. That's a significant constraint on the implementation.

I don't think it's much of a constraint. The allocator has to keep a
list of the pointers it's returned and the number of bytes they point
to, so it would just be a case of free() testing something like
"p < pall + bytesall" instead of "p == pall" if it's passed
p.


For "the allocator has to" read "the allocator might."
On the other hand, the allocator might not. One common
technique (mentioned elsethread) is for the allocator to
store its metadata at a fixed distance from the allocated
block, usually just before it. Another might be for the
allocator to store the metadata in a hash table, keyed by
the bit pattern of the returned pointer. Either strategy
requires that the free() argument be a perfect match for
the value returned by *alloc().

For my own use in debugging I've written a memory package
that does in fact keep lists of allocated and free areas, trying
to keep the metadata "remote" from the areas themselves to avoid
damage from simple off-by-one and overrun errors. (I use a skip
list keyed on the address values; as written this is not portable
because it relies on being able to use < and > on pointers that
are not necessarily "related.") Given such lists it would be
easy to do what you want -- but the lists chew up a significant
amount of memory and their maintenance takes significant time.
If you don't mind spending time and memory to get the behavior
you want (and if you can stomach the abuse of < and >), you are
of course at liberty to write my_malloc() and my_free() to do
this for yourself. But inflicting the overhead on every user
of malloc() seems like too many bucks for too little bang.
 
C

CBFalconer

Richard said:
That can perfectly well happen with the existing specification.
If the implementation does not validate the value passed to free
a bogus value could be misinterpreted as a large block with the
result that arbitrary data could be overwritten.

It's never likely to be part of the specification of free() that
it will not destroy "badly needed data" if you pass it a "wild
pointer".

Certainly, but the resistance to a wild free is a QOI factor. The
nmalloc package performs some elementary checks (not unbeatable) to
avoid any such. Continuing would require the user to catch
SIGABRT, which is possible because the memory arena is not yet
fouled.
 
C

CBFalconer

Chris said:
No. Because now `free(p)` has to work out from the value of `p`
where the block you're freeing starts, and it has to do so quickly
and compactly. That's a significant constraint on the implementation.

I don't think it's much of a constraint. The allocator has to keep a
list of the pointers it's returned and the number of bytes they point
to, so it would just be a case of free() testing something like
"p < pall + bytesall" instead of "p == pall" if it's passed
p.


Which involves searching a list of possibly millions of allocated
memory chunks.
How about a snippet like this?

char *p=malloc(100);
fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
null-terminate the buffer it's passed
*/
while(*p++) {
/* stuff */
}
free(p);

would work for the implementation I suggest, with no need to keep a
copy of the pointer returned by malloc() hanging around.

Except, at best, free becomes an O(n) operation, where n is the
number of mallocations made. This can be a killer. Just such a
fault was the impetus for creating nmalloc.
 
K

Keith Thompson

CBFalconer said:
(e-mail address removed) wrote: [...]
How about a snippet like this?

char *p=malloc(100);
fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
null-terminate the buffer it's passed
*/
while(*p++) {
/* stuff */
}
free(p);

would work for the implementation I suggest, with no need to keep a
copy of the pointer returned by malloc() hanging around.

Except, at best, free becomes an O(n) operation, where n is the
number of mallocations made. This can be a killer. Just such a
fault was the impetus for creating nmalloc.

If the malloc subsystem keeps track of every current allocation,
there's no reason to assume that it can only search linearly. It
could easily use a binary tree, for example; finding the base address
of a block given an arbitrary address within the block could be
O(n log n). You could probably use a hash table, making it
potentially O(1), but the fact that you're searching for a range
rather than a specific value makes it more difficult; I'm too lazy to
work out the details.

But the overhead, even if it's O(1) would still be significant. It's
perfectly reasonable, I think, to place the burden on the programmer
to pass to free() the exact same pointer value was returned by
malloc().
 
C

CBFalconer

Eric said:
.... snip ...

For my own use in debugging I've written a memory package
that does in fact keep lists of allocated and free areas, trying
to keep the metadata "remote" from the areas themselves to avoid
damage from simple off-by-one and overrun errors. (I use a skip
list keyed on the address values; as written this is not portable
because it relies on being able to use < and > on pointers that
are not necessarily "related.") Given such lists it would be
easy to do what you want -- but the lists chew up a significant
amount of memory and their maintenance takes significant time.
If you don't mind spending time and memory to get the behavior
you want (and if you can stomach the abuse of < and >), you are
of course at liberty to write my_malloc() and my_free() to do
this for yourself. But inflicting the overhead on every user
of malloc() seems like too many bucks for too little bang.

Take a look at the way nmalloc, combined with malldbg, handles this
(at least on suitable systems, which should include most Linae).
See nmalloc.zip, at:

<http://cbfalconer.home.att.net/download/>

There is no (or very little) overhead imposed on users of malloc,
free, realloc. Yet the entire arena can be (and is, via malldbg)
scanned for usage and self-consistency. Since the basic storage
provided by sbrk can be discontinuous, a table of bases is
maintained. Memory leaks can be quickly detected by scans before
and after program runs (since malloc and friends may be used by the
initialization code, outside the program proper). If there is no
leak there will only be free blocks, and all should have been
combined, apart from the storage claimed and not released by the
initialization. Some systems will not use malloc during
initialization, in which case nothing should be allocated in either
scan.
 
R

Richard Tobin

Keith Thompson said:
If the malloc subsystem keeps track of every current allocation,
there's no reason to assume that it can only search linearly. It
could easily use a binary tree, for example; finding the base address
of a block given an arbitrary address within the block could be
O(n log n). You could probably use a hash table, making it
potentially O(1), but the fact that you're searching for a range
rather than a specific value makes it more difficult; I'm too lazy to
work out the details.

In another posting I described a BIBOP ("big bag of pages") style
allocator for which finding the base would be O(1) and quite simple:
the high order bits of the address determine the size, and finding
the base would just be a mod operation.

-- Richard
 
C

Chris Dollin

Chris Dollin wrote:

How about a snippet like this?

char *p=malloc(100);
fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
null-terminate the buffer it's passed
*/
while(*p++) {
/* stuff */
}
free(p);

would work for the implementation I suggest, with no need to keep a
copy
of the pointer returned by malloc() hanging around.

Perhaps - but adding that one pointer doesn't seem to me to be
either tedious or obfuscatory, so it's not the example I was
asking for.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top