Reference counting

L

luser- -droog

Or you can put your accounting object at/near the end of the allocated
memory, as mentioned else-thread.  Would anyone else mind mentioning
this or does nobody really do that?  I thought it was fairly common
practice for some I/O buffers.  Filtering... *sigh*

With a suitable convention established for finding the end, sure.
But the simplest way to find the end is to put a size field or
pointer at the beginning, and we're back to the same alignment
question. So it's hard to explore that option without making the
entire issue more complicated.
 
S

Shao Miller

With a suitable convention established for finding the end, sure.
But the simplest way to find the end is to put a size field or
pointer at the beginning, and we're back to the same alignment
question. So it's hard to explore that option without making the
entire issue more complicated.

I agree that a disadvantage to accounting at the end is the convention
for finding the end.

The example code offered elsethread has a function that takes a 'size_t'
which must match what was passed to the allocator, and passing this
around is certainly overhead.

For a simple string-accounting struct which keeps track of the size, one
can pass 'sizeof *my_string + my_string->max_length' (or whatever), but
you're right; that seems like an awkward thing to pass to a
reference-releasing function.

For statically-sized objects, a macro makes things a bit easier (note
that 'ptr' is only evaluated once, for whatever that's worth):

#define release(ptr) (release_((ptr), sizeof *(ptr)))

Here's a code example, below. It ought to compile as one file. If
anyone has time to review it, what're some advantages and disadvantages
for the approach?:

#define ALL_IN_ONE 1


/**** ref_cnt.h */

/* For 'size_t' */
#include <stddef.h>

/*** Macros */
#define GetRefForFixedSizeObj(obj) \
(GetRefFromPtrAndSize((obj), sizeof *(obj)))

/*** Object types */
typedef void ** ref_counted_obj;

/*** Function types */
typedef void f_ref_cnt_free(ref_counted_obj);

/*** Functions */
extern ref_counted_obj GetRefFromPtrAndSize(void *, size_t);
extern ref_counted_obj NewRefCountedObj(size_t, f_ref_cnt_free *);
extern ref_counted_obj GetRef(ref_counted_obj);
extern void PutRef(ref_counted_obj);


/**** ref_cnt.c */

#include <stdlib.h>
#if !ALL_IN_ONE
#include <stddef.h>
#include "ref_cnt.h"
#endif

/*** Types */
struct ref_counted_obj_ {
void * obj;
unsigned int ref_count;
f_ref_cnt_free * dealloc;
};

/*** Functions */

static ptrdiff_t ref_at(size_t size) {
size_t result;

result = size;
result += sizeof (struct ref_counted_obj_);
/* Check for wrap-around */
if (result < size)
return 0;
--result;
result /= sizeof (struct ref_counted_obj_);
return (ptrdiff_t) result;
}

ref_counted_obj GetRefFromPtrAndSize(
void * ptr,
size_t size
) {
ptrdiff_t ref_index;
struct ref_counted_obj_ * ref;

ref_index = ref_at(size);
if (!ptr || !ref_index)
return NULL;
/* Point to the accounting info */
ref = ptr;
ref += ref_index;
/* Increment ref count and check for wrap-around */
if (!++ref->ref_count) {
--ref->ref_count;
return NULL;
}
return &ref->obj;
}

ref_counted_obj NewRefCountedObj(
size_t size,
f_ref_cnt_free * dealloc
) {
ptrdiff_t ref_index;
size_t new_size;
struct ref_counted_obj_ * ref;

ref_index = ref_at(size);
if (!ref_index)
return NULL;
/* Calculate required size */
new_size = (ref_index + 1) * sizeof *ref;
/* Check for wrap-around */
if (new_size < size)
return NULL;
/* Allocate and check for failure */
ref = malloc(new_size);
if (!ref)
return NULL;
/* Populate accounting info */
ref[ref_index].obj = ref;
ref += ref_index;
ref->ref_count = 1;
ref->dealloc = dealloc;
return &ref->obj;
}

ref_counted_obj GetRef(ref_counted_obj ref) {
struct ref_counted_obj_ * ref_;

if (!ref)
return NULL;
ref_ = (void *) ref;
/* Increment ref count and check for wrap-around */
if (++ref_->ref_count) {
--ref_->ref_count;
return NULL;
}
return ref;
}

void PutRef(ref_counted_obj ref) {
struct ref_counted_obj_ * ref_;

if (!ref)
return;
ref_ = (void *) ref;
/* Decrement ref count and check for zero */
if (!--ref_->ref_count) {
ref_->dealloc(ref);
free(ref_->obj);
}
return;
}


/**** Test program */

#include <stdio.h>
#include <string.h>
#if !ALL_IN_ONE
#include <stdlib.h>
#include "ref_cnt.h"
#endif

/*** Macros */
#define GetFooRef(foo) \
(GetRefFromPtrAndSize((foo), sizeof *(foo) + (foo)->len))

/*** Types */
struct foo {
double bobble;
size_t len;
char bear[1];
};

/*** Functions */
f_ref_cnt_free foo_dealloc;
void foo_dealloc(ref_counted_obj ptr) {
struct foo * bar;

bar = *ptr;
printf(
"foo_dealloc(%p): {%f, %d, \"%s\"}\n",
*ptr,
bar->bobble,
(int)bar->len,
bar->bear
);
return;
}

int main(void) {
ref_counted_obj bar_ref;
struct foo * bar;
char laughsalot[] = "Laughs-a-lot";

bar_ref = NewRefCountedObj(
sizeof *bar + sizeof laughsalot,
foo_dealloc
);
if (!bar_ref) {
puts("NewRefCountedObj() failed!");
return EXIT_FAILURE;
}

bar = *bar_ref;
bar->bobble = 3.14159;
bar->len = sizeof laughsalot;
memcpy(bar->bear, laughsalot, sizeof laughsalot);

/* Test re-getting the accounting info */
bar_ref = NULL;
bar_ref = GetFooRef(bar);
if (!bar_ref) {
puts("GetRefFromPtrAndSize() failed!");
free(bar); /* Yeah */
return EXIT_FAILURE;
}

/* We have two references */
PutRef(bar_ref);
PutRef(bar_ref);
return EXIT_SUCCESS;
}
 
I

ImpalerCore

Or you can put your accounting object at/near the end of the allocated
memory, as mentioned else-thread.  Would anyone else mind mentioning
this or does nobody really do that?  I thought it was fairly common
practice for some I/O buffers.  Filtering... *sigh*

I just find the interface with your footer method to be more verbose
without any tangible benefit. You add these four functions, yet you
haven't tried to use them to come up with the reference counted
allocation, copy, and release functions.

size_t footer_at(size_t size);
size_t size_with_footer(size_t size);
footer_t * get_footer(void * mem, size_t size);
void * malloc_with_footer(size_t size);

My opinion is that the interface is cluttered, and doesn't even do
what was asked, to reference count. When you come to the release
step, are you going to require everyone to pass in the size of the
allocated object?

void rc_release( void* p, size_t
I_need_to_know_the_object_size_to_find_the_footer );

Raw pointers don't carry the size of the allocated memory with them.
I don't have access to what malloc uses internally. If you have a
malloced character string, are you going to require the user to pass
in the original allocated size of each allocated string so you can
find the footer when you go to decrement its reference count? Placing
the bookkeeping before the object pointer allows one to know where the
bookkeeping object is without knowing the size or contents of the
allocated object.

Write your version of rc_malloc, rc_copy, and rc_release in terms of
your footer interface and make a pros vs cons list. I just see more
cons to this technique without any perceived pros.

Best regards,
John D.
 
I

ImpalerCore

Chris M. Thomasson said:
Unfortunately I do not know how
to manually find a platform's ALIGN_MAX.  I've seen some incantations
with unions and offsetof, and autoconf has their AC_CHECK_ALIGNOF, but
I don't know how to use it.
read all of these:
[...]



the `ALIGN_OF()' function macro should always give you a "sufficient"
alignment for a given type.

What about some crazy hacked shi% like this crap:

http://codepad.org/A65Noc37

lol...

Thanks for the sample and thread references, I will give it some study
in my free time.

Best regards,
John D.
 
I

ImpalerCore

Thanks.  I posted more code and thoughts about half a day before your
response.  Perhaps your news-reader has filtered it? :)

Is keeping the accounting info before the object and working out some
way to get the alignment right really much more common than keeping it
after the object?  (Obviously this is only a poll of anyone here who
responds.)

It depends, if you're a debugging allocator, you'll likely want to put
markers at each end to detect buffer overruns and the like. The
dmalloc and SQLite allocators are examples that place information at
both sides.

The dmalloc allocator simply guesses the alignment by defining
'#define ALLOCATION_ALIGNMENT N' in a conf.h file that just guesses
the alignment from 'ac_cv_sizeof_long'. I haven't investigated how
SQLite goes about it.

For myself, I've never implemented a reference counted pointer scheme,
so what you saw is just conjecture about how I'd attempt it.
Have a pleasant day.

Same to you.
John D.
 
S

Shao Miller

[...]
/*** Macros */
#define GetFooRef(foo) \
(GetRefFromPtrAndSize((foo), sizeof *(foo) + (foo)->len))

[...]

Using macros... Humm, well how about something simple like this:

http://codepad.org/CzUX4xDB

Cool. :)
_____________________________________________________
typedef void (ref_count_func_dtor) (void*);


struct ref_count
{
unsigned count;
ref_count_func_dtor* fp_dtor;
};


#define REF_COUNT_STRUCT(mp_type) \
struct \
{ \
struct ref_count rcount; \
mp_type type; \
}


void* ref_count_sys_create(
size_t size,
ref_count_func_dtor* fp_dtor
){
struct ref_count* const self = malloc(size);
if (! self) return NULL;
self->count = 1;
self->fp_dtor = fp_dtor;
return self + 1;
}

How do you know that '(self + 1) == ((char *) self +
offsetof(REF_COUNT_STRUCT(mp_type), type))'? It might be unlikely that
there is padding there, but is no padding guaranteed?
#define REF_COUNT_CREATE(mp_type, mp_fp_dtor) \
ref_count_sys_create( \
sizeof(REF_COUNT_STRUCT(mp_type)), \
(mp_fp_dtor) \
)

This interface appears to restrict callers to types whose sizes are
known and are integer constant expressions... Not that there's anything
wrong with that. :)
 
S

Shao Miller

It depends, if you're a debugging allocator, you'll likely want to put
markers at each end to detect buffer overruns and the like. The
dmalloc and SQLite allocators are examples that place information at
both sides.

The dmalloc allocator simply guesses the alignment by defining
'#define ALLOCATION_ALIGNMENT N' in a conf.h file that just guesses
the alignment from 'ac_cv_sizeof_long'. I haven't investigated how
SQLite goes about it.

For myself, I've never implemented a reference counted pointer scheme,
so what you saw is just conjecture about how I'd attempt it.

Well, two people in another C-devoted forum suggested another concern
with keeping the accounting information at/near the end of the allocated
storage: Someone sloppy can overwrite it. By accounting near the
beginning, that might be less likely.

I'm just not sure how one gets the alignment at the beginning to be
guaranteed as correct without making an extra burden to callers, for C
implementations conforming to < C1X.

Given an adaptation of Mr. Chris M. Thomasson's macro:

#define ALIGNOF(type) (offsetof(struct{char c; type t;}, t))

(which, by the way, I just noticed doesn't appear to be guaranteed as
giving the same result every time due to a lack of tag, though that
might be highly unlikely)

We could have a 'malloc' wrapper:

void * malloc_aligned(size_t size, size_t alignment);

But then callers would need to:

some * p = malloc_aligned(
sizeof *p + optional_variable_length,
ALIGNOF(some)
);

where we'd forfeit the convenience of having 'some' appear just once.

Any pad-to-nice-alignment-at-the-beginning strategy for < C1X is just
guessing, isn't it? And the more conservative we are, the greater the
risk of wasted space?

If one only requires a 'ptrdiff_t' or 'size_t' at the beginning, this
might be minimal and could specify where additional accounting info at
the end is, but that does seem complicated.

But even with the overhead of tracking the size of the allocated data
and of requiring that paired with the pointer in order to find the
accounting info, are there scenarios where a caller _shouldn't_ know the
apparent size of the allocated data? Isn't that asking for trouble?
Sentinels are great for unknown lengths, but a call to 'malloc' requires
a known length... Shouldn't this be tracked somewhere as a good
programming practice in order to avoid out-of-bounds accesses?
 
L

luser- -droog

An external table would avoid all the alignment-guessing for pre-C1X.
I'm too lazy to finish coding this, but what about a simple open-
addressed
hash-table?

#define MAX 100

typedef struct ref_ {
void *ptr;
int count;
void (*final)(void *ptr);
} ref_;

ref_ ref_tab_[2*MAX+1];

void *ref_malloc(size_t sz) { }

void ref_inc(void *ptr) { }

void ref_dec(void *ptr) { }

void ref_set_final(void *ptr, void (*final)(void *ptr)) { }
 
A

Angel

For hashing, is the object representation of a 'void *' guaranteed to be
the same any time a 'void *' points to the same object? Is it likely?
(I think so.) :)

The standard only says pointers are represented in an
implementation-defined manner. But as long as the pointer value doesn't
change, I doubt the representation will. On most implementations, a
pointer is really just a 32- or 64-bit unsigned number, indicating a
certain spot in virtual memory.
 
S

Shao Miller

An external table would avoid all the alignment-guessing for pre-C1X.
I'm too lazy to finish coding this, but what about a simple open-
addressed
hash-table?

#define MAX 100

typedef struct ref_ {
void *ptr;
int count;
void (*final)(void *ptr);
} ref_;

ref_ ref_tab_[2*MAX+1];

void *ref_malloc(size_t sz) { }

void ref_inc(void *ptr) { }

void ref_dec(void *ptr) { }

void ref_set_final(void *ptr, void (*final)(void *ptr)) { }

For hashing, is the object representation of a 'void *' guaranteed to be
the same any time a 'void *' points to the same object? Is it likely?
(I think so.) :)
 
S

Shao Miller

io_x said:
struct String* CostruttoreString(char* what)
{struct String *a;
char *b;
unsigned c, i;

a=malloc(sizeof(String));
[...more code...]

I think you meant 'sizeof (struct String)' there. :)

I don't completely understand how your code accomplishes reference
counting. Your code appears to keep track of how many strings have been
allocated. But I think that the "reference counting" subject of this
thread is more along the lines of:
- You allocate an object and assume that there is one user of that object.
- Another user can subscribe for use of the object (increment the
reference count of that object).
- A user can unsubscribe ("release") from use of the object (decrement
the reference count of the object).
- When there are no more users of the object, the object can be freed.
 
S

Shao Miller

The standard only says pointers are represented in an
implementation-defined manner. But as long as the pointer value doesn't
change, I doubt the representation will. On most implementations, a
pointer is really just a 32- or 64-bit unsigned number, indicating a
certain spot in virtual memory.

The strategy would then seem to be hopefully portable just as
alignment-guessing is also hopefully portable.

What are the odds that a given pointer to a reference-counted object
would have the same, simple XOR "hash" of all of its bytes in common
with another such pointer?

unsigned char ptr_hash(void * ptr) {
unsigned char
* pos = &ptr,
* end = pos + sizeof ptr,
result = 0;
while (pos < end)
result ^= *pos++;
return result;
}

/* pete says UB when 'sizeof (int) == 1' */
static xxx ptr_hash_table[1 << CHAR_BIT];

Maybe 'xxx' is a pointer type to a hopefully short list of pointers
which can be tested with '=='. Such a short list could be 'realloc'd.
But then 'ptr_hash_table' is 'sizeof (xxx) * (1 << CHAR_BIT)' bytes large.

Maybe 'xxx' is an integer type for an index into another array which is
itself 'realloc'd.

(Insert more hashing strategies here.)

Interesting. :)
 
K

Keith Thompson

Angel said:
The standard only says pointers are represented in an
implementation-defined manner. But as long as the pointer value doesn't
change, I doubt the representation will. On most implementations, a
pointer is really just a 32- or 64-bit unsigned number, indicating a
certain spot in virtual memory.

Quibble: Pointer representation is unspecified, not
implementation-defined. The difference is that an implementation
aren't required to document its pointer representation(s).
 
S

Seebs

The standard only says pointers are represented in an
implementation-defined manner. But as long as the pointer value doesn't
change, I doubt the representation will. On most implementations, a
pointer is really just a 32- or 64-bit unsigned number, indicating a
certain spot in virtual memory.

On segmented architectures, it is possible for segments to overlap, meaning
that there are multiple pointer representations for the same address; on
such implementations, comparisons have to take that into account.

-s
 
L

luser- -droog

Quibble: Pointer representation is unspecified, not
implementation-defined.  The difference is that an implementation
aren't required to document its pointer representation(s).

Does that make it even more seat-of-the-pants than guessing alignment?
 
S

Shao Miller

Does that make it even more seat-of-the-pants than guessing alignment?

Well alignment-guessing seems straight-forward, at least. N1570
has[6.2.8p4] that "... Every valid alignment value shall be a
nonnegative integral power of two." Which 4 KiB satisfies, for
example... Plenty of room for a reference count, a 'size_t' for the
object, a pointer to a handler function for OO "messages," a signature
for the original allocator, a UUID, a doubly-linked-list entry, a
'jmp_buf', some opcode bytes, a checksum of the header...
 
I

ImpalerCore

Well, two people in another C-devoted forum suggested another concern
with keeping the accounting information at/near the end of the allocated
storage: Someone sloppy can overwrite it.  By accounting near the
beginning, that might be less likely.

Seems reasonable. For example, how much do you trust users of
sprintf?
I'm just not sure how one gets the alignment at the beginning to be
guaranteed as correct without making an extra burden to callers, for C
implementations conforming to < C1X.

Given an adaptation of Mr. Chris M. Thomasson's macro:

   #define ALIGNOF(type) (offsetof(struct{char c; type t;}, t))

(which, by the way, I just noticed doesn't appear to be guaranteed as
giving the same result every time due to a lack of tag, though that
might be highly unlikely)

Can you elaborate?
We could have a 'malloc' wrapper:

   void * malloc_aligned(size_t size, size_t alignment);

But then callers would need to:

   some * p = malloc_aligned(
       sizeof *p + optional_variable_length,
       ALIGNOF(some)
     );

where we'd forfeit the convenience of having 'some' appear just once.

Any pad-to-nice-alignment-at-the-beginning strategy for < C1X is just
guessing, isn't it?  And the more conservative we are, the greater the
risk of wasted space?

It's a cost benefit analysis. We're not necessarily limited to using
one single method. A simpler interface can use a conservative
alignment valid for any type to make it easier for the user (assuming
one can determine ALIGN_MAX with enough confidence through platform
and compiler documentation, autoconf, or some C offsetof
incantation). If more performance needs to be squeaked out, a more
complex interface may be used. Personally, I just haven't been in a
situation that's motivated me to think of using reference counting.
If one only requires a 'ptrdiff_t' or 'size_t' at the beginning, this
might be minimal and could specify where additional accounting info at
the end is, but that does seem complicated.

But even with the overhead of tracking the size of the allocated data
and of requiring that paired with the pointer in order to find the
accounting info, are there scenarios where a caller _shouldn't_ know the
apparent size of the allocated data?

The design of malloc is predicated on the user not needing to track
the size of the allocation. Otherwise C would be designed with
'free( void* p, size_t sz )'. Of course, malloc is responsible to
track allocation size internally, but it's in the implementation
space, not user space. For better or worse, there is no standard
method to take a raw pointer and obtain its allocation size. There
are times when knowing the allocation space is desirable: designing a
custom allocator, managing an auto-resizing container like a string or
array, when its the destination buffer of a snprintf function, etc,
but certainly not universal.
 Isn't that asking for trouble?
Sentinels are great for unknown lengths, but a call to 'malloc' requires
a known length...  Shouldn't this be tracked somewhere as a good
programming practice in order to avoid out-of-bounds accesses?

That is in the realm of debugging allocators, which is good practice
to use. Delegating this responsibility directly to users is in my
opinion a bad idea.

Most debugging allocators use the preprocessor to replace instances of
'malloc' with their own special call to allocator. By providing your
own malloc wrapper, one can create an interface to allocate a
reference counted pointer through something like 'rc_malloc', but
still tap into dmalloc or valgrind to verify that your implementation
doesn't trash the address space, i.e.

---dmalloc.h---

#define malloc(size) \
dmalloc_malloc(__FILE__, __LINE__, (size), DMALLOC_FUNC_MALLOC, 0,
0)

Again, this is just one path from several viable methods to implement
reference counting. I'm not arguing for superiority of one method
over another simply because I have no real-use experience with
reference counting, just curiosity.

Best regards,
John D.
 
T

Tim Rentsch

James Kuyper said:
On 06/10/2011 02:30 PM, ImpalerCore wrote:
...
One option is to abstract the reference count out of the structure and
piggyback it onto a raw allocated pointer. The same trick used to
make debugging allocators could apply to a reference counted pointer.
I've never done it before, but it's something to ponder. Here's my
first pass.

\code snippet
#define PTR_HEADER_SIZE (sizeof (int))

void* rc_malloc( size_t size )
{
void* mem = NULL;
void* p = NULL;

p = malloc( size + PTR_HEADER_SIZE );

if ( p )
{
/* Set the reference count to 1. */
*((int*)p) = 1;

mem = (unsigned char*)p + PTR_HEADER_SIZE;
}

return mem;
}

The value stored in 'p' is guaranteed to be correctly aligned for all
types. However, the only thing portably guaranteed about alignment off
the value stored in 'mem' is that it is suitable for 'int', it need not
be suitably aligned for any type with a size greater than 1 that is
different from that of 'int'. [snip]

Unless one considers the type 'unsigned int' to be a type
different from that of 'int'.
 
T

Tim Rentsch

Keith Thompson said:
Angel said:
The standard only says pointers are represented in an
implementation-defined manner. But as long as the pointer value doesn't
change, I doubt the representation will. On most implementations, a
pointer is really just a 32- or 64-bit unsigned number, indicating a
certain spot in virtual memory.

Quibble: Pointer representation is unspecified, not
implementation-defined. [snip]

How do you reconcile this assertion with 6.2.6.1p2? Certainly
it looks like the representation of any pointer value stored in
an object would be (at least) implementation-defined.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,227
Latest member
Daniella65

Latest Threads

Top