Better free(invalid pointer) behavior?

Discussion in 'C Programming' started by anyone.anon@googlemail.com, Feb 23, 2007.

  1. Guest

    Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
    0<k<N causes undefined behavior, since only a pointer returned by (m|
    re|c)alloc() can validly be passed to free().

    This seems pretty silly. Wouldn't a better behavior of free() be to
    assume that if it receives q, where q lies in some malloc()ated-and-
    not-yet-free()d block starting at p, then it should interpret this as
    free(p)?

    Even better, the *alloc() routines could leave a spare byte between
    the blocks they return so that free(p+N) would also be equivalent to
    free(p).

    This would let one move p around within the allocated block, without
    always having to keep track of the start location of the block - a
    tedious business that seems only to obfuscate one's code.
    , Feb 23, 2007
    #1
    1. Advertising

  2. Ian Malone Guest

    wrote:
    > Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
    > 0<k<N causes undefined behavior, since only a pointer returned by (m|
    > re|c)alloc() can validly be passed to free().
    >
    > This seems pretty silly. Wouldn't a better behavior of free() be to
    > assume that if it receives q, where q lies in some malloc()ated-and-
    > not-yet-free()d block starting at p, then it should interpret this as
    > free(p)?
    >


    To me it seems fairly logical; you get an object from the
    the interface, then pass it back to be destroyed. It's
    a common design, the confusion that can afflict beginners
    is that in the case of a returned pointer from malloc the
    object is both the object to be sent back to the interface
    and a pointer which can be used to access a memory region.

    > Even better, the *alloc() routines could leave a spare byte between
    > the blocks they return so that free(p+N) would also be equivalent to
    > free(p).
    >
    > This would let one move p around within the allocated block, without
    > always having to keep track of the start location of the block - a
    > tedious business that seems only to obfuscate one's code.
    >


    It's an intriguing idea, though it sounds as if it would slow
    down and complicate malloc implementation (necessitating a
    search for the allocated block. When do you find that keeping
    the object you asked for causes obfuscation?

    --
    imalone
    Ian Malone, Feb 23, 2007
    #2
    1. Advertising

  3. Chris Dollin Guest

    wrote:

    > Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
    > 0<k<N causes undefined behavior, since only a pointer returned by (m|
    > re|c)alloc() can validly be passed to free().


    `free(p+k)` is undefined for any k > 0, in fact: your restriction
    `<N` is misleading.

    > This seems pretty silly.


    It seems completely reasonable to /me/. Give back what you took
    out.

    > Wouldn't a better behavior of free() be to
    > assume that if it receives q, where q lies in some malloc()ated-and-
    > not-yet-free()d block starting at p, then it should interpret this as
    > free(p)?


    No. Because now `free(p)` has to work out from the value of `p`
    where the block you're freeing starts, and it has to do so quickly
    and compactly. That's a significant constraint on the implementation.

    > Even better, the *alloc() routines could leave a spare byte between
    > the blocks they return so that free(p+N) would also be equivalent to
    > free(p).


    I don't see why this is "better".

    > This would let one move p around within the allocated block, without
    > always having to keep track of the start location of the block - a
    > tedious business that seems only to obfuscate one's code.


    I've never found it tedious, and I've never found it obfuscating. It's
    at most one pointer with a sensible name, isn't it?

    Perhaps you should exhibit an example.

    --
    Chris "electric hedgehog" Dollin
    "Life is full of mysteries. Consider this one of them." Sinclair, /Babylon 5/
    Chris Dollin, Feb 23, 2007
    #3
  4. santosh Guest

    wrote:
    > Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
    > 0<k<N causes undefined behavior, since only a pointer returned by (m|
    > re|c)alloc() can validly be passed to free().


    A pointer value previously returned by *alloc, yes.

    > This seems pretty silly. Wouldn't a better behavior of free() be to
    > assume that if it receives q, where q lies in some malloc()ated-and-
    > not-yet-free()d block starting at p, then it should interpret this as
    > free(p)?
    >
    > Even better, the *alloc() routines could leave a spare byte between
    > the blocks they return so that free(p+N) would also be equivalent to
    > free(p).
    >
    > This would let one move p around within the allocated block, without
    > always having to keep track of the start location of the block - a
    > tedious business that seems only to obfuscate one's code.


    IMHO, it's cleaner to pass to free what you got from *alloc, not some
    arbitrary value. Almost always you need to keep track of the start of
    the block anyway, so it's not as if it's difficult.

    To refrain from breaking existing implementations, the standard has
    mandated this rule. It's not an inconvinient rule, and it encourages
    good programming practise of preserving pointer values, before
    modifying them.
    santosh, Feb 23, 2007
    #4
  5. >Let p=malloc(N) for some N>0. As far as I understand it, free(p+k) for
    >0<k<N causes undefined behavior, since only a pointer returned by (m|
    >re|c)alloc() can validly be passed to free().
    >
    >This seems pretty silly. Wouldn't a better behavior of free() be to
    >assume that if it receives q, where q lies in some malloc()ated-and-
    >not-yet-free()d block starting at p, then it should interpret this as
    >free(p)?


    That's assuming the implementation keeps enough information around to
    TELL if this is the case. Some don't. It may also take what's considered
    by some to be too much CPU time to do that.

    >Even better, the *alloc() routines could leave a spare byte between
    >the blocks they return so that free(p+N) would also be equivalent to
    >free(p).
    >
    >This would let one move p around within the allocated block, without
    >always having to keep track of the start location of the block - a
    >tedious business that seems only to obfuscate one's code.


    Why not ask for full garbage collection? That's more useful in more places.
    Very difficult to implement efficiently, though, especially with terabyte
    files open.

    If you can't (or won't) keep track of where the beginning of the
    block is, how do you avoid going off the beginning or the end of
    it? That's a common vulnerability in software.
    Gordon Burditt, Feb 23, 2007
    #5
  6. CBFalconer Guest

    Chris Dollin wrote:
    > wrote:
    >
    >> Let p=malloc(N) for some N>0. As far as I understand it, free(p+k)
    >> for 0<k<N causes undefined behavior, since only a pointer returned
    >> by (m| re|c)alloc() can validly be passed to free().

    >
    > `free(p+k)` is undefined for any k > 0, in fact: your restriction
    > `<N` is misleading.
    >
    >> This seems pretty silly.

    >
    > It seems completely reasonable to /me/. Give back what you took
    > out.


    Even worse, a faulty free of some wild pointer could quietly
    destroy badly needed data. Convenient does not mean safe.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
    CBFalconer, Feb 23, 2007
    #6
  7. In article <>,
    CBFalconer <> wrote:

    >Even worse, a faulty free of some wild pointer could quietly
    >destroy badly needed data. Convenient does not mean safe.


    That can perfectly well happen with the existing specification. If
    the implementation does not validate the value passed to free a bogus
    value could be misinterpreted as a large block with the result that
    arbitrary data could be overwritten.

    It's never likely to be part of the specification of free() that it
    will not destroy "badly needed data" if you pass it a "wild pointer".

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
    Richard Tobin, Feb 23, 2007
    #7
  8. Guest

    Chris Dollin wrote:
    > > Wouldn't a better behavior of free() be to
    > > assume that if it receives q, where q lies in some malloc()ated-and-
    > > not-yet-free()d block starting at p, then it should interpret this as
    > > free(p)?

    >
    > No. Because now `free(p)` has to work out from the value of `p`
    > where the block you're freeing starts, and it has to do so quickly
    > and compactly. That's a significant constraint on the implementation.


    I don't think it's much of a constraint. The allocator has to keep a
    list of the pointers it's returned and the number of bytes they point
    to, so it would just be a case of free() testing something like
    "p < pall + bytesall" instead of "p == pall" if it's passed
    p.

    > I've never found it tedious, and I've never found it obfuscating. It's
    > at most one pointer with a sensible name, isn't it?
    >
    > Perhaps you should exhibit an example.


    How about a snippet like this?

    char *p=malloc(100);
    fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
    null-terminate the buffer it's passed
    */
    while(*p++) {
    /* stuff */
    }
    free(p);

    would work for the implementation I suggest, with no need to keep a
    copy
    of the pointer returned by malloc() hanging around.
    , Feb 23, 2007
    #8
  9. In article <>,
    <> wrote:

    >> No. Because now `free(p)` has to work out from the value of `p`
    >> where the block you're freeing starts, and it has to do so quickly
    >> and compactly. That's a significant constraint on the implementation.


    >I don't think it's much of a constraint. The allocator has to keep a
    >list of the pointers it's returned and the number of bytes they point
    >to


    No, that would not be a realistic way to implement malloc(). A common
    method is to store the size immediately before the block, so that the
    size can be found by something like ((size_t *)block)[-1].

    Another technique is to use different areas of memory for different
    size blocks so that the size can be derived from the high-order
    address bits, something like size_table[(intptr_t)block >> NBITS].
    This one could be adapted straightforwardly to work with free()
    applied to an internal pointer; the size computation would still work
    and the start of the block could be found by taking the address mod
    the discovered block size (I believe such implementations generally
    only use a small number of block sizes).

    >How about a snippet like this?
    >
    >char *p=malloc(100);
    >fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
    > null-terminate the buffer it's passed
    >*/
    >while(*p++) {
    > /* stuff */
    >}


    At this point p might be block+100, which might be the start of another
    malloc()ed block. But in any case, I don't see that it's worth it.


    -- Richard

    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
    Richard Tobin, Feb 23, 2007
    #9
  10. Guest

    Richard Tobin wrote:
    > At this point p might be block+100, which might be the start of
    > another malloc()ed block. But in any case, I don't see that it's
    > worth it.


    You misread my original post:

    > Even better, the *alloc() routines could leave a spare byte between
    > the blocks they return so that free(p+N) would also be equivalent to
    > free(p).
    , Feb 23, 2007
    #10
  11. Eric Sosman Guest

    wrote:
    > Chris Dollin wrote:
    >>> Wouldn't a better behavior of free() be to
    >>> assume that if it receives q, where q lies in some malloc()ated-and-
    >>> not-yet-free()d block starting at p, then it should interpret this as
    >>> free(p)?

    >> No. Because now `free(p)` has to work out from the value of `p`
    >> where the block you're freeing starts, and it has to do so quickly
    >> and compactly. That's a significant constraint on the implementation.

    >
    > I don't think it's much of a constraint. The allocator has to keep a
    > list of the pointers it's returned and the number of bytes they point
    > to, so it would just be a case of free() testing something like
    > "p < pall + bytesall" instead of "p == pall" if it's passed
    > p.


    For "the allocator has to" read "the allocator might."
    On the other hand, the allocator might not. One common
    technique (mentioned elsethread) is for the allocator to
    store its metadata at a fixed distance from the allocated
    block, usually just before it. Another might be for the
    allocator to store the metadata in a hash table, keyed by
    the bit pattern of the returned pointer. Either strategy
    requires that the free() argument be a perfect match for
    the value returned by *alloc().

    For my own use in debugging I've written a memory package
    that does in fact keep lists of allocated and free areas, trying
    to keep the metadata "remote" from the areas themselves to avoid
    damage from simple off-by-one and overrun errors. (I use a skip
    list keyed on the address values; as written this is not portable
    because it relies on being able to use < and > on pointers that
    are not necessarily "related.") Given such lists it would be
    easy to do what you want -- but the lists chew up a significant
    amount of memory and their maintenance takes significant time.
    If you don't mind spending time and memory to get the behavior
    you want (and if you can stomach the abuse of < and >), you are
    of course at liberty to write my_malloc() and my_free() to do
    this for yourself. But inflicting the overhead on every user
    of malloc() seems like too many bucks for too little bang.

    --
    Eric Sosman
    lid
    Eric Sosman, Feb 23, 2007
    #11
  12. CBFalconer Guest

    Richard Tobin wrote:
    > CBFalconer <> wrote:
    >
    >> Even worse, a faulty free of some wild pointer could quietly
    >> destroy badly needed data. Convenient does not mean safe.

    >
    > That can perfectly well happen with the existing specification.
    > If the implementation does not validate the value passed to free
    > a bogus value could be misinterpreted as a large block with the
    > result that arbitrary data could be overwritten.
    >
    > It's never likely to be part of the specification of free() that
    > it will not destroy "badly needed data" if you pass it a "wild
    > pointer".


    Certainly, but the resistance to a wild free is a QOI factor. The
    nmalloc package performs some elementary checks (not unbeatable) to
    avoid any such. Continuing would require the user to catch
    SIGABRT, which is possible because the memory arena is not yet
    fouled.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
    CBFalconer, Feb 23, 2007
    #12
  13. CBFalconer Guest

    wrote:
    > Chris Dollin wrote:
    >>
    >>> Wouldn't a better behavior of free() be to assume that if it
    >>> receives q, where q lies in some malloc()ated-and-not-yet-free()d
    >>> block starting at p, then it should interpret this as free(p)?

    >>
    >> No. Because now `free(p)` has to work out from the value of `p`
    >> where the block you're freeing starts, and it has to do so quickly
    >> and compactly. That's a significant constraint on the implementation.

    >
    > I don't think it's much of a constraint. The allocator has to keep a
    > list of the pointers it's returned and the number of bytes they point
    > to, so it would just be a case of free() testing something like
    > "p < pall + bytesall" instead of "p == pall" if it's passed
    > p.


    Which involves searching a list of possibly millions of allocated
    memory chunks.

    >
    >> I've never found it tedious, and I've never found it obfuscating.
    >> It's at most one pointer with a sensible name, isn't it?
    >>
    >> Perhaps you should exhibit an example.

    >
    > How about a snippet like this?
    >
    > char *p=malloc(100);
    > fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
    > null-terminate the buffer it's passed
    > */
    > while(*p++) {
    > /* stuff */
    > }
    > free(p);
    >
    > would work for the implementation I suggest, with no need to keep a
    > copy of the pointer returned by malloc() hanging around.


    Except, at best, free becomes an O(n) operation, where n is the
    number of mallocations made. This can be a killer. Just such a
    fault was the impetus for creating nmalloc.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
    CBFalconer, Feb 23, 2007
    #13
  14. In article <>,
    <> wrote:
    >You misread my original post:


    s/misread/forgot/

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
    Richard Tobin, Feb 23, 2007
    #14
  15. CBFalconer <> writes:
    > wrote:

    [...]
    >> How about a snippet like this?
    >>
    >> char *p=malloc(100);
    >> fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
    >> null-terminate the buffer it's passed
    >> */
    >> while(*p++) {
    >> /* stuff */
    >> }
    >> free(p);
    >>
    >> would work for the implementation I suggest, with no need to keep a
    >> copy of the pointer returned by malloc() hanging around.

    >
    > Except, at best, free becomes an O(n) operation, where n is the
    > number of mallocations made. This can be a killer. Just such a
    > fault was the impetus for creating nmalloc.


    If the malloc subsystem keeps track of every current allocation,
    there's no reason to assume that it can only search linearly. It
    could easily use a binary tree, for example; finding the base address
    of a block given an arbitrary address within the block could be
    O(n log n). You could probably use a hash table, making it
    potentially O(1), but the fact that you're searching for a range
    rather than a specific value makes it more difficult; I'm too lazy to
    work out the details.

    But the overhead, even if it's O(1) would still be significant. It's
    perfectly reasonable, I think, to place the burden on the programmer
    to pass to free() the exact same pointer value was returned by
    malloc().

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Feb 23, 2007
    #15
  16. CBFalconer Guest

    Eric Sosman wrote:
    >

    .... snip ...
    >
    > For my own use in debugging I've written a memory package
    > that does in fact keep lists of allocated and free areas, trying
    > to keep the metadata "remote" from the areas themselves to avoid
    > damage from simple off-by-one and overrun errors. (I use a skip
    > list keyed on the address values; as written this is not portable
    > because it relies on being able to use < and > on pointers that
    > are not necessarily "related.") Given such lists it would be
    > easy to do what you want -- but the lists chew up a significant
    > amount of memory and their maintenance takes significant time.
    > If you don't mind spending time and memory to get the behavior
    > you want (and if you can stomach the abuse of < and >), you are
    > of course at liberty to write my_malloc() and my_free() to do
    > this for yourself. But inflicting the overhead on every user
    > of malloc() seems like too many bucks for too little bang.


    Take a look at the way nmalloc, combined with malldbg, handles this
    (at least on suitable systems, which should include most Linae).
    See nmalloc.zip, at:

    <http://cbfalconer.home.att.net/download/>

    There is no (or very little) overhead imposed on users of malloc,
    free, realloc. Yet the entire arena can be (and is, via malldbg)
    scanned for usage and self-consistency. Since the basic storage
    provided by sbrk can be discontinuous, a table of bases is
    maintained. Memory leaks can be quickly detected by scans before
    and after program runs (since malloc and friends may be used by the
    initialization code, outside the program proper). If there is no
    leak there will only be free blocks, and all should have been
    combined, apart from the storage claimed and not released by the
    initialization. Some systems will not use malloc during
    initialization, in which case nothing should be allocated in either
    scan.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
    CBFalconer, Feb 23, 2007
    #16
  17. In article <>,
    Keith Thompson <> wrote:

    >If the malloc subsystem keeps track of every current allocation,
    >there's no reason to assume that it can only search linearly. It
    >could easily use a binary tree, for example; finding the base address
    >of a block given an arbitrary address within the block could be
    >O(n log n). You could probably use a hash table, making it
    >potentially O(1), but the fact that you're searching for a range
    >rather than a specific value makes it more difficult; I'm too lazy to
    >work out the details.


    In another posting I described a BIBOP ("big bag of pages") style
    allocator for which finding the base would be O(1) and quite simple:
    the high order bits of the address determine the size, and finding
    the base would just be a mod operation.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
    Richard Tobin, Feb 23, 2007
    #17
  18. Chris Dollin Guest

    wrote:

    > Chris Dollin wrote:


    >> I've never found it tedious, and I've never found it obfuscating. It's
    >> at most one pointer with a sensible name, isn't it?
    >>
    >> Perhaps you should exhibit an example.

    >
    > How about a snippet like this?
    >
    > char *p=malloc(100);
    > fuzzbar(p,100, otherstuff); /* does something; fuzzbar promises to
    > null-terminate the buffer it's passed
    > */
    > while(*p++) {
    > /* stuff */
    > }
    > free(p);
    >
    > would work for the implementation I suggest, with no need to keep a
    > copy
    > of the pointer returned by malloc() hanging around.


    Perhaps - but adding that one pointer doesn't seem to me to be
    either tedious or obfuscatory, so it's not the example I was
    asking for.

    --
    Chris "electric hedgehog" Dollin
    The "good old days" used to be much better.
    Chris Dollin, Feb 26, 2007
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lucas Machado

    pointer-to-pointer (invalid lvalue in unary `&)

    Lucas Machado, Apr 3, 2004, in forum: C Programming
    Replies:
    19
    Views:
    11,004
    Irrwahn Grausewitz
    Apr 15, 2004
  2. c language

    free(): invalid pointer

    c language, Jun 14, 2006, in forum: C Programming
    Replies:
    4
    Views:
    562
    Michael Mair
    Jun 14, 2006
  3. Cyron

    free(): invalid pointer:

    Cyron, Sep 30, 2007, in forum: C Programming
    Replies:
    3
    Views:
    411
    Richard Tobin
    Sep 30, 2007
  4. george
    Replies:
    0
    Views:
    1,084
    george
    Aug 29, 2008
  5. mohammed_a_o
    Replies:
    0
    Views:
    259
    mohammed_a_o
    Nov 30, 2010
Loading...

Share This Page