Book-keeping metadata for allocated memory

pinkfloydhomer · Mar 15, 2006

Please read the example below. I'm sorry I couldn't make it smaller,
but it is pretty simple.

When client code is calling newThingy(), it is similar to malloc: the
client gets a pointer to a chunk of memory that it can use for it's own
purposes.

But on the "library" side, I want to be able to do some book-keeping
for each memory block that I hand out. I want some "hidden" meta-data
(that I keep in struct Container) to be associated with each memory
block that I return from newThingy().

The _problem_ lies in findContainer(). It runs in linear time. I would
like constant time. I would like not having to search for the Container
containing the allocated memory. What do I do?

#include <stdio.h>
#include <stddef.h>
#include <malloc.h>
#include <assert.h>

/* "library" code */

typedef struct Container_
{
int foo;
int bar;
void* thingy;
struct Container_* next;

} Container;

Container* list = NULL;

void* newThingy(size_t size)
{
void* thingy;
Container* container;

thingy = malloc(size);
assert(thingy);

container = (Container*) malloc(sizeof(Container));
assert(container);

container->foo = 42;
container->bar = 3;
container->thingy = thingy;

container->next = list;
list = container;

return thingy;
}

Container* findContainer(void* thingy)
{
Container* container;

for (container = list; container; container = container->next)
{
if (container->thingy == thingy)
return container;
}

return NULL;
}

void doSomething(void* thingy)
{
Container* container = findContainer(thingy);
assert(container);

/* do something */
printf("%d\n", container->foo);
}

/* "client" code */

typedef struct
{
int whatever;
float something;
} ClientThingy1;

typedef struct
{
char dontask;
int* whocares;
} ClientThingy2;

int main(void)
{
ClientThingy1* thingy1 = newThingy(sizeof(ClientThingy1));
ClientThingy2* thingy2 = newThingy(sizeof(ClientThingy2));

doSomething(thingy1);
doSomething(thingy2);

return 0;
}

I was considering something along the lines of allocating the memory
for Container and thingy in one go, like:

Container* container = malloc(sizeof(Container) + size);

and then return something like

return (container + sizeof(Container));

and then findContainer() would become constant time, something like:

Container* findContainer(void* thingy)
{
return thingy - sizeof(Container);
}

But is that portable and well-defined? Or are there other solutions?

/David

Rod Pemberton · Mar 15, 2006

Please read the example below. I'm sorry I couldn't make it smaller,
but it is pretty simple.

When client code is calling newThingy(), it is similar to malloc: the
client gets a pointer to a chunk of memory that it can use for it's own
purposes.

But on the "library" side, I want to be able to do some book-keeping
for each memory block that I hand out. I want some "hidden" meta-data
(that I keep in struct Container) to be associated with each memory
block that I return from newThingy().

The _problem_ lies in findContainer(). It runs in linear time. I would
like constant time. I would like not having to search for the Container
containing the allocated memory. What do I do?

But is that portable and well-defined? Or are there other solutions?

What you need is a method to map thingy's void* to your metadata's
Container*. I'd probably setup an array and a numeric hash function. The
hash function would convert a thingy void* to a unique integer which could
be used as an index into an array. The array would be of Container* 's. As
you allocate Container's, you call the hash function with thingy's void* and
store the Container* in the array. When you need to deallocate a Container,
you call the hash function with thingy's void * and retreive Container* from
the array. The implementation of the hash function may or may not be easy.

Rod Pemberton
PS. Learned what linear time and constant time are today...

pinkfloydhomer · Mar 15, 2006

What you need is a method to map thingy's void* to your metadata's
Container*.

True. I considered the hashing myself, but I was hoping that there was
a simpler solution such as the one I presented. I just don't know if
that solution is portable and well-defined.

/David

Ben C · Mar 15, 2006

A pointer returned by malloc is properly aligned for any use. But you
are returning an offset into the malloc'd block. I'm not sure how
you could guarantee that the offset is correctly aligned for your
needs, unless, as I read it, the things being allocated are always
structures. The standard says in section 6.2.5#25 "All pointers to
structure types shall have the same representation and alignment
requirements as each other.". I gather that means that offsetting
into a malloced block by the size of a structure will result in
pointer to memory correctly aligned for that structure, and hence for
other structures? If I'm wrong here, I'll no doubt be swiftly
corrected

It does sound a bit surprising. I tried this:

struct thing
{
char c;
};

int main(int argc, char **argv)
{
struct thing *t = malloc(100 * sizeof (struct thing));
printf("%p, %p, %d, %d\n", t, t+1, t+1 - t, sizeof (struct thing));
return 0;
}

and it printed out: 0x804a008, 0x804a009, 1, 1

So t+1 is not aligned on my system.

But there are various ways to guarantee the alignment for a particular
struct, as I suggested in an earlier post.

If the offset pointer needs to have the same alignment as a pointer
originally returned from malloc (i.e. maximum you ever might need), you
have to know what that is.

Practically speaking you probably just define a macro for it, but that's
not ideal, I wonder if there is a better way of determining your maximum
required alignment?

Eric Sosman · Mar 15, 2006

David said:
A pointer returned by malloc is properly aligned for any use.
But you are returning an offset into the malloc'd block.
I'm not sure how you could guarantee that the offset
is correctly aligned for your needs, unless, as I read it, the
things being allocated are always structures. The
standard says in section 6.2.5#25 "All pointers to structure
types shall have the same representation and alignment
requirements as each other.". I gather that means that
offsetting into a malloced block by the size of a structure
will result in pointer to memory correctly aligned for that
structure, and hence for other structures? If I'm wrong here,
I'll no doubt be swiftly corrected

"Take this offender to the deepest dungeon," said Tom,
condescendingly. (That's a Tom Swifty correction ;-)

Although all struct pointers have the same alignment
requirement, the different structs themselves need not.

struct small { char x; } *smallp;
struct large { long double y; } *largep;

6.2.5/25 says that `smallp' and `largep' themselves have
the same alignment requirement. However, the alignment
for a `struct small' may be different from the alignment
of a `struct large', and both may be different from the
alignment of the pointer variables.

David Resnick · Mar 15, 2006

"Take this offender to the deepest dungeon," said Tom,
condescendingly. (That's a Tom Swifty correction ;-)

Although all struct pointers have the same alignment
requirement, the different structs themselves need not.

struct small { char x; } *smallp;
struct large { long double y; } *largep;

6.2.5/25 says that `smallp' and `largep' themselves have
the same alignment requirement. However, the alignment
for a `struct small' may be different from the alignment
of a `struct large', and both may be different from the
alignment of the pointer variables.

OK, I was tired. Thanks for the correction. So my point
was correct, that an offset into the malloc'd block might well
not be aligned suitably for whatever use he has for it.
I had a feeling I was misreading something there...

-David

Chris Torek · Mar 15, 2006

If the offset pointer needs to have the same alignment as a pointer
originally returned from malloc (i.e. maximum you ever might need), you
have to know what that is.

Indeed -- the problem is isomorphic to that of writing malloc() in
the first place.

Practically speaking you probably just define a macro for it, but that's
not ideal, I wonder if there is a better way of determining your maximum
required alignment?

The answer seems to be "no".

For 4.xBSD we put a pair of macros into <machine/param.h>, ALIGNBYTES
and ALIGN. ALIGNBYTES is the maximum number of bytes that will be
used by the ALIGN macro, while ALIGN(p) returns p plus "enough
bytes to be maximally aligned". Given:

unsigned char *p1, *p2;
... set p1 to some value ...
p2 = ALIGN(p1);

you are guaranteed that p2-p1 <= ALIGNBYTES. Hence ALIGNBYTES tells
you how many "extra" bytes to allocate, and you can build on malloc()
by doing something like this:

void *layer_on_malloc(size_t size) {
size_t need;
struct Container *p;

need = size + ALIGNBYTES + sizeof *p;
p = malloc(need);
if (p == NULL) ... handle error ...
p->... = ...; /* set up our various fields */
return ALIGN((char *)(p + 1));
}

Computing the orignal value of p is a little trickier:

void layer_on_free(void *mem) {
char *p2;
struct Container *p;

if (mem == NULL)
return;

/* this is the address we gave out: */
p2 = mem;

/* this is where the Container would be, if there were no alignment */
p2 -= sizeof *p;

/*
* ... but p2 might be too far forward, because ALIGN() might have
* added something. Now we get sneaky, and depend on the fact that
* ALIGN always adds 0..ALIGNBYTES and winds up on an aligned
* boundary, and we depend on no bad things happening with the
* undefined behavior of underflow.
*/
p2 -= ALIGNBYTES;
p2 = ALIGN(p2);
p = (struct Container *)p2;
...
free(p);
}

Another way to compute p2 (without undefined behavior, but depending
on ALIGNBYTES to be a power of 2):

p2 = mem;
p2 -= sizeof *p;
p2 -= (ALIGNBYTES - ALIGN(p2)) & ~ALIGNBYTES;

(I think this works but have not tested it).

(<machine/param.h> is of course not a Standard C header. The idea
here is that <machine/foo.h>, for any name "foo", defines various
machine-dependent values and structures and so on for whatever
machine you are targeting with your compile. That is, if you are
running the compiler to produde x86 code, this gets you <x86/param.h>,
while if you are running the compiler to produce SPARC code, this
gets you <sparc/param.h>, and so on. Each machine-specific header
has to be written for each machine, and not every header defines
everything, but if something makes sense on a given machine, we
create some user-defined abstraction that enables writing code that
works on all machine that have similar features. Hence, for machines
that have stack frames in the first place, <machine/frame.h> defines
the stack frame format(s). Of course, not every machine has a
separate stack and frame pointer, so simply using the frame.h header
does not instantly make your debugger portable.)

CBFalconer · Mar 15, 2006

Eric said:
There are some difficulties in coming up with a portable
hash function. You could try converting the pointer to an
integer (uintptr_t, if you've got it) and hashing the integer,
but the conversion is implementation-defined and need not be
useful. You could inspect the individual bytes of a pointer's
representation and derive a hash value from them, but this
requires (1) that all the pointer's bits be "value bits" or
that the implementation gives any "padding bits" consistent
values, and (2) that each pointer have just one representation
or that the implementation consistently uses just one of the
possible representations as "canonical."

Another non-portable possibility would be to use a comparison-
based data structure like a skip list or AVL tree, using the
pointer value itself as the search key. The non-portability in
this approach is that the relational operators <, <=, >=, > are
not defined on arbitrary pointers, but only on pointers to
elements of a single array. Nonetheless, many systems actually
do give meaningful results from comparing "random" pointers, and
if you're willing to restrict yourself to such systems -- not too
great a restriction, perhaps -- this may be an acceptable approach.
Its advantage over hashing is that you can easily find not only
the item you're searching for, but also its neighbors (if any),
an operation one often needs in memory management packages.

Whatever you choose, be aware that although the method is
likely to work on many systems it is not actually guaranteed by
the Standard and there may be systems where it will fail. If you
use a hashing method, for example, be prepared to plug in different
hashes for different systems to accommodate local peculiarities.
If at all possible, write a sanity-checking program that tries to
test the non-portable assumptions you rely on; this program might
even suggest which of several alternative hashing methods you
should use.

I think he could do quite well using the pointer to integer cast
for hashing purposes, and feeding the result to my hashlib
package. He would have to devise a second hash for the rehash,
which could be quite simple, such as the bit complement of the
cast. The point is that with a good hashtable system the only
penalty for a bad hash is speed - the data integrity is not
harmed. The system lets him easily modify the hash functions and
gather statistics on the resultant performance.

<http://cbfalconer.home.att.net/download/hashlib.zip>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Jack Klein · Mar 16, 2006

Please read the example below. I'm sorry I couldn't make it smaller,
but it is pretty simple.

When client code is calling newThingy(), it is similar to malloc: the
client gets a pointer to a chunk of memory that it can use for it's own
purposes.

But on the "library" side, I want to be able to do some book-keeping
for each memory block that I hand out. I want some "hidden" meta-data
(that I keep in struct Container) to be associated with each memory
block that I return from newThingy().

The _problem_ lies in findContainer(). It runs in linear time. I would
like constant time. I would like not having to search for the Container
containing the allocated memory. What do I do?

Others have addressed your questions directly, but:

#include <stdio.h>
#include <stddef.h>
#include <malloc.h>

There is header named malloc.h in standard C. The memory allocation
functions, malloc(), calloc(), realloc(), and free() are prototyped in

#include <assert.h>

/* "library" code */

typedef struct Container_
{
int foo;
int bar;
void* thingy;
struct Container_* next;

} Container;

Personal observation: I was once a big fan of using typedef to define
aliases for structure types, but I have come around to being against
it, except for the very few cases where certain self-referential or
recursive declarations can't be done without it.

But it you must, or just prefer to, define an alias for a structure
type with typedef, and you prefer to or must provide a structure tag,
there is no reason to make them different. They are in different
namespaces.

A declaration like:

typedef struct Container
{
/* members */
} Container;

....is just fine.

Container* list = NULL;

void* newThingy(size_t size)
{
void* thingy;
Container* container;

thingy = malloc(size);
assert(thingy);

container = (Container*) malloc(sizeof(Container));

The line above is the biggie. Never cast the return value of malloc()
and siblings in C, especially not to shut up a compiler message about
type mismatch. That only hides the error of failing to include
<stdlib.h> and failing to have a prototype in scope.

Secondly, never use sizeof(type) in calls to memory allocation
functions when you have a pointer to the proper type. To allocate
space for 'n' objects of type T and assign the result to a pointer to
T, do this:

T *t_pointer;

t_pointer = malloc(n * sizeof *t_pointer);

The reason? If you ever chose to use a different type, merely
changing the type of the pointer automatically makes the proper change
to the sizeof expression.

Keith Thompson · Mar 16, 2006

Jack Klein said:
Personal observation: I was once a big fan of using typedef to define
aliases for structure types, but I have come around to being against
it, except for the very few cases where certain self-referential or
recursive declarations can't be done without it.

[...]

What cases are those? Usually it's *easier* to declare a
self-referential type without a typedef (just a struct tag).
For example:

struct node {
int data;
struct node *next;
};

The alternative with just a typedef doesn't work:

typedef struct {
int data;
node *next; /* illegal, the name "node" isn't visible yet */
} node;

and the form with both a typedef and a struct tag:

typedef struct node {
int data;
struct node *next;
} node;

requires you to use "struct node" within the definition, but allows
either "struct node" or "node" once the definition is completed.

Or you can use a typedef for an incomplete type:

typedef struct node node;
struct node {
int data;
node *next;
};

Of all the alternatives, I find the one with no typedef the simplest.

I'd use a typedef for a struct only when I want to hide the fact that
the type is a struct; for example, the standard's use of "FILE" rather
than "struct file" makes it clear that client code isn't supposed to
use its members.

Ben Pfaff · Mar 16, 2006

CBFalconer said:
I think he could do quite well using the pointer to integer
cast for hashing purposes, and feeding the result to my hashlib
package.

That won't work on systems where a given pointer has multiple
representations, e.g. large model real-mode x86.

CBFalconer · Mar 16, 2006

Ben said:
That won't work on systems where a given pointer has multiple
representations, e.g. large model real-mode x86.

In that case the hash and comparison routines should take care to
first normalize the input pointer. It's just overhead, but system
dependant overhead.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Ben Pfaff · Mar 16, 2006

CBFalconer said:
Ben said:

That won't work on systems where a given pointer has multiple
representations, e.g. large model real-mode x86.

Click to expand...

In that case the hash and comparison routines should take care to
first normalize the input pointer. [...]

I was assuming the OP wanted a portable solution.

CBFalconer · Mar 16, 2006

Ben said:
CBFalconer said:

Ben said:

I think he could do quite well using the pointer to integer
cast for hashing purposes, and feeding the result to my hashlib
package.

That won't work on systems where a given pointer has multiple
representations, e.g. large model real-mode x86.

Click to expand...

In that case the hash and comparison routines should take care to
first normalize the input pointer. [...]

Click to expand...

I was assuming the OP wanted a portable solution.

That seems hard to provide considering that the only testable
relationship between arbitrary pointers is equality. So you have
to document the assumptions. Luckily there are, in practice, only
a finite number of ways to implement pointers. Also, the
guaranteed testable relationship is enough to implement a hashtable
mechanism, provided you can find a way of creating a hash. Note
that the normalization above is only for the purposes of the hash
function. The C system should already be taking care of it for the
comparison function.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Ben Pfaff · Mar 16, 2006

CBFalconer said:
Ben said:

CBFalconer said:

Ben Pfaff wrote:

I think he could do quite well using the pointer to integer
cast for hashing purposes, and feeding the result to my hashlib
package.

That won't work on systems where a given pointer has multiple
representations, e.g. large model real-mode x86.

In that case the hash and comparison routines should take care to
first normalize the input pointer. [...]

Click to expand...

I was assuming the OP wanted a portable solution.

Click to expand...

That seems hard to provide considering that the only testable
relationship between arbitrary pointers is equality. [...]

I would say impossible, not just hard.

It seems like a bad idea to propose a solution on clc without
pointing out that it is non-portable. I've pointed it out now,
so that's fine.

Dave Thompson · Mar 27, 2006

There is header named malloc.h in standard C. The memory allocation
functions, malloc(), calloc(), realloc(), and free() are prototyped in
<stdlib.h>. Replace the non-standard malloc.h with <stdlib.h>

Obviously you meant is _no_ (or is not a) standard malloc.h.

(I can't believe no one else got this before laggard me.)

- David.Thompson1 at worldnet.att.net

zero up memory	44	Jan 23, 2012
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
why 50% of allocated memory is not freed?	32	Feb 24, 2009
Adding adressing of IPv6 to program	1	Feb 16, 2023
Memory allocated during run time	24	Aug 31, 2009
Engineering a List container Part 2: Implementations	20	Dec 8, 2013
Engineering a list container. Part 1.	71	Dec 7, 2013
Can one get away with an under-allocated union?	5	Dec 25, 2010

Book-keeping metadata for allocated memory

pinkfloydhomer

Rod Pemberton

pinkfloydhomer

Ben C

Eric Sosman

David Resnick

Chris Torek

CBFalconer

Jack Klein

Keith Thompson

Ben Pfaff

CBFalconer

Ben Pfaff

CBFalconer

Ben Pfaff

Dave Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads