How do you create an efficient _and_ scaleable multi-threaded allocator..

C

Chris Thomasson

[comp.lang.c++ added because its going to standardize fine-grain, very
low-level multi-threading primitives; refer cpp-threads mailing list...]


I decoupled the allocation logic from the synchronization mechanism such
that single-threaded allocators can now be used in full-blown multi-threaded
environments. IMHO, it's a really neat feature. You could even set things up
where a thread A uses a different allocator than thread B, yet they are
still able to allocate; pass around and free blocks between themselves. Its
an extremely flexible framework.

This can be an important invention. Here is the basic sales pitch:

Question: How do you create a multi-threaded allocator, without creating a
multi-threaded allocator?

Answer: Create a single-threaded allocator and tell the vZOOM library to
multi-thread it for you...

Wow! :^)





Here is the "basic" vZOOM allocation scheme:

________________________
- create a thread-local instance of a user-defined single-threaded
allocator in every thread (e.g., ms heap w/ HEAP_NO_SERIALIZE).

- allocation requests are forwarded to this thread-local user allocator
directly.

- if free request goes from thread that allocated block (e.g., the origin
thread), then free request is forwarded to this thread-local user allocator.

- if free request goes from another thread, then you accumulate this block
in per-thread stack-based freelist "belonging to the origin thread", using
single atomic CAS.

- blocks from this freelist is actually reused/freed in batches using
single atomic SWAP when thread allocates/deallocates another block. For
instance, a thread that fails to allocate from its thread-local heap will do
a SWAP on the freelist and try and fulfill the allocation request from
there.
________________________



Any thoughts/comments/suggestions/rants/raves?

;^)
 
B

Bill Todd

Chris said:
[comp.lang.c++ added because its going to standardize fine-grain, very
low-level multi-threading primitives; refer cpp-threads mailing list...]


I decoupled the allocation logic from the synchronization mechanism such
that single-threaded allocators can now be used in full-blown
multi-threaded
environments. IMHO, it's a really neat feature. You could even set
things up
where a thread A uses a different allocator than thread B, yet they are
still able to allocate; pass around and free blocks between themselves. Its
an extremely flexible framework.

This can be an important invention. Here is the basic sales pitch:

Question: How do you create a multi-threaded allocator, without creating a
multi-threaded allocator?

Answer: Create a single-threaded allocator and tell the vZOOM library to
multi-thread it for you...

Wow! :^)





Here is the "basic" vZOOM allocation scheme:

________________________
- create a thread-local instance of a user-defined single-threaded
allocator in every thread (e.g., ms heap w/ HEAP_NO_SERIALIZE).

- allocation requests are forwarded to this thread-local user allocator
directly.

- if free request goes from thread that allocated block (e.g., the origin
thread), then free request is forwarded to this thread-local user
allocator.

- if free request goes from another thread, then you accumulate this block
in per-thread stack-based freelist "belonging to the origin thread", using
single atomic CAS.

- blocks from this freelist is actually reused/freed in batches using
single atomic SWAP when thread allocates/deallocates another block. For
instance, a thread that fails to allocate from its thread-local heap
will do
a SWAP on the freelist and try and fulfill the allocation request from
there.
________________________



Any thoughts/comments/suggestions/rants/raves?

Describing how blocks eventually get deallocated from the free lists
back to the originating thread's heap (without requiring interlocks that
defeat your goal of not having them on such heaps) so that those heaps
don't become so fragmented that they become unusable might be nice. So
might a simulation to demonstrate that the free lists themselves don't
create increasingly unusable (effectively, fragmented) storage for their
temporary owners (surely you're not assuming that all block requests are
equal in size).

- bill
 
C

Chris Thomasson

Bill Todd said:
Chris said:
[comp.lang.c++ added because its going to standardize fine-grain, very
low-level multi-threading primitives; refer cpp-threads mailing list...]


I decoupled the allocation logic from the synchronization mechanism such
that single-threaded allocators can now be used in full-blown
multi-threaded
environments. [...]
Here is the "basic" vZOOM allocation scheme:

________________________ [...]
________________________



Any thoughts/comments/suggestions/rants/raves?

Describing how blocks eventually get deallocated from the free lists back
to the originating thread's heap (without requiring interlocks that defeat
your goal of not having them on such heaps) so that those heaps don't
become so fragmented that they become unusable might be nice. So might a
simulation to demonstrate that the free lists themselves don't create
increasingly unusable (effectively, fragmented) storage for their
temporary owners (surely you're not assuming that all block requests are
equal in size).

I use the per-thread user-provided allocator, to allocate a simple
per-thread segregated array of slabs. Something like:


thread::slab[0].sz = sizeof(void*);
thread::slab[1].sz = thread::slab[0].sz * 2;
thread::slab[2].sz = thread::slab[1].sz * 2;
thread::slab[3].sz = thread::slab[2].sz * 2;
thread::slab[4].sz = thread::slab[3].sz * 2;


Each slab is aligned on a common boundary value such that you can round an
address of one of its blocks down to that boundary in order to get at its
anchor/header. Each slab anchor has a freelist; thread-id and blocks of
memory.


- When a thread deallocates a block it rounds down to get at its slab
anchor; compares its thread-id with its own and places it into the slab's
freelist via. CAS if they do not match.


- A thread allocates by locating the right slab based on the requested size;
try to get block from that slab; if its empty, it does a SWAP on the slabs
freelist and tries to get the block from there. If that fails, it then tries
to allocate directly from the user-allocator.
 
C

Chris Thomasson

Chris Thomasson said:
I use the per-thread user-provided allocator, to allocate a simple
per-thread segregated array of slabs. Something like:
[...]

The user can request vZOOM to bypass the segregated slabs and forward to the
user-defined allocator directly.


Blocks are deallocated from the free-list directly back to the user-defined
allocator, in this case. Something like:

__________________________________________
void* malloc(size_t sz) {
per_thread* const _this = pthread_getspecific(...);
void* buf = _this->fp_user_defined_malloc(sz);
if (! buf) {
block* blk = SWAP(&_this->freelist, 0);
while(blk) {
block* const next = blk->next;
_this->fp_user_defined_free(blk);
blk = next;
}
buf = _this->fp_user_defined_malloc(sz);
}
return buf;
}
__________________________________________



Does that answer your question?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top