Is this standard c++...

C

Chris Thomasson

kwikius said:
First I don't know enough about threads to make any constructive
commnts.

<an "on the soap box" like statement>
I know that you know more than you think you do.... The steps that are
necessary in order to realize a constructive use of concurrency in general
eventually end up being a strict lesson in common sense... Well, of course
you, and "virtually" all of us, are already "blessed" with some form of
ingrained comment sense. So, do you have what it takes to make interesting
and good uses of multithreading techniques, IMHO, of course you do!

Never sell yourself short wrt any aspect of what you can and cannot
learn!... :O
</an "on the soap box" like statement>

;^)

OTOH give me access to the system timer and program counter
and ability to disable interrupts and allow me to write interrrupt
service routines, in fact control of the system event mechanisms and I
would be happy.

Yup. I remember the good old days when I was implementing the early stages
of operating system loading processes. Kind of gives you a GOD complex
though. I mean, you get to fill in all of the interrupt vectors with your
stuff, and you have access to those wonderful hardware control words. Good
times!

OTOH I guess I should head over to the threads
newsgroup and try to understand them better.

We would be happy to help you. That group needs some more traffic anyway!
Seems like its sort of "dead" at times... So, if you end up handing out some
more on comp.programming.threads, go ahead an tell some of your friends
about it! BTW, David Butenhof hangs around there sometimes... Indeed you can
have some of the worlds best threading gurus give a little introspection on
any question you may have. Okay, enough with the "salesman" crap! :O Sorry.

Anyway here are some
thoughts (in contradiction to the above) thouh I havent studied your
code in any depth:
Okay.


Firstly I don't understand from that the way you want to use the
device, but I would guess it would be restricted to specialised use,
however a few (confused) thoughts spring to mind.

First of all, let me provide a "quick introduction" on all of my subsequent
comments:

"I want the device to be able to provide the end user with the exact same
functionality as the following ANSI C functions do:

extern "C" {
extern void* malloc(size_t);
extern void free(void*);
}

with the following exception: malloc and free are going to be 100%
thread-safe. The low-level synchronization scheme is going to exist in
per-thread data-structures. Any allocations that cannot be safely and
efficiently addressed by the per-thread scheme (e.g., the "local heap") will
be subjected to an allocation from the "global heap". In other words, if
allocations are sufficiently small in size (e.g., sizeof(void*) to 256 bytes
are fine), then all aspects of the device will be completely utilized. The
devices "global heaps" will be invoked for everything else.

The device is able to accommodate highly efficient threading designs. For
instance, if your threads keep their allocations local, then the device will
not use any multiprocessor synchronization techniques at all. It will be
equivalent to a single-threaded allocation. The device WILL make use of an
interlocked RMW instruction and a memory barrier (e.g., CAS and SPARC's
membar #StoreStore) ONLY IF your threads can "pass allocations around
amongst themselves", AND, "the thread that allocated an object 'A' winds up
not being the thread that eventually frees object 'A'"
"

The first is that in
my environment the stack is actually quite a scarce resource, default
around 1 Mb (In VC7.1), after which you get a (non C++) stack overflow
in VC7.1. I presume you can modify this though.

Yes. Most threading abstractions allow one to setup a threads individual
stack size. Actually, and in IMHO of course, a "lot of programs" don't
really end up making "complete" use of that 1MB stack... Also, lets try to
keep in mind that virtually any so-called "competent" system that makes use
of some form of recursion has to have a method for "ensuring that there is
enough space on the stack to accommodate the complete depth of the recursive
operation".

The heap on the other hand can be looked on as a(almost) infinite
resource, even if you run out of physical memory the system will start
swapping memory to from disk. Very system specific though I guess..

Indeed it is system specific. However, you raise a critical point. I address
this scenario by funneling all allocations that the per-thread part of my
allocator implementation cannot address through an abstraction of the OS
heap. The abstraction can be as simple as a call to the OS provided standard
C library malloc function. My stuff essentially sits on top of the threads
stacks and uses malloc to enable it to not fall off when it runs out of
memory during "high-stress" situations that a multi-threaded user
application can sometimes, and usually will end up generating.

In that sense I would guess that use of the stack is by nature not as
scaleable as using the heap.

Which is exactly why I am forced to use the heap as a slow-path in my
current implementation of the allocator algorithm.

So from that point of view it is interesting to try to come up with
scenarios where you would use the device.

Thanks! IMHO, the allocator works perfectly when you try to make any
deallocations of a memory block 'X' occur in the same thread that originally
allocated block 'X' in the first place. IMHO, I can sort of "get away" with
using the phrase "works perfectly" simply because the allocator endures no
more overhead than a single-threaded allocator would when an application
tryst real hard to keep things "thread local". My invention will use an
interlocked RMW instruction and a #StoreStore memory barrier to "penalize
any thread that tries to deallocate a memory block that it did not allocate
itself!". Everything is lock-free, however, as we all should know,
interlocked RMW and/or memory barrier instructions are fairly expensive for
basically any modern processor to execute. They tend to have the unfortunate
side-effect of blowing any cache the CPU has accumulated and devastating
part of any pipelined operations that were pending. This type of behavior
can gravely wound a number of aspects that are involved with scalability and
throughput in general.

to know how much you are going to allocate beforehand, but if the
stack is a scarce resource,its not viable to treat it in the same
carefree way as the heap proper and just allocate huge. Of course it
may be possible to use some assembler to write your own functions
grabbing runtime amounts of stack, which would maybe make the device
more versatile.

Perhaps. I am leaning toward keeping things local, and using a thin API
layer around the OS heap as a sort of "last resort".

For use as an allocator, the alternative is of course to use malloc
for your one time start up allocation for your own allocator, and then
after the start up cost whether you allocated on the heap or stack I
would guess the cost of getting memory from the allocator is going to
be the same regardless where it is.

Great point. Perhaps I am being a bit unpractical wrt my line of thought
that most programs don't make use of the fairly large default stack size the
OS regularly hands end up handing out to a processes threads. Anyway, I kind
of like the idea of not having to make use of malloc when you have perfectly
good, and most likely unused, stack space sitting around on your
applications threads. The synchronization scheme I created allows for a
thread 'A' to allocate a memory block 'X' and subsequently pass it around to
threads 'B' through 'Z'. So, thread 'B' can use block 'X' even though the
memory the makes up block 'X' resides on thread 'A' stack. Humm... Your
solution to the problem (e.g., use malloc to provide the allocators
per-thread data-structures) is well grounded in common sense. Like I said,
perhaps I am being a bit unpractical here... Well, screw it! I think its
neat that an allocator can use a plurality use threads stacks for most of
its allocations.

;^)

[...]
This does somehow
bring to mind use in embedded systems where there is no heap as such
so the scheme could be used as a heap for systems without a heap as it
were.

Ahh yes. That sure seems like it could possible come in handy in that type
of situation! Indeed. Thanks for your comments!

:^)

You would then presumably need to keep passing a reference to
the allocator in to child functions or put it in a global variable.

My allocators system data-structures rely on a given platform to provide
each of its threads with a stack space large enough to hold at least 4 to 8
kilobytes, a function that is analogous to pthread_get/setspecific(...)
and/or pthread_self(), an interlocked RMW (e.g., CAS or even LL/SC), and of
course (e.g., assuming you looked at the pseudo-code implementation), a
#StoreStore memory barrier instruction. Luckily for me, CAS and LL/SC are
fairly common, and if there not there, well, they can certainly be emulated
with a hashed locking pattern. If the platform has no interlocked RMW
instructions, or mutexs, then the multi-threaded aspect of my allocator
cannot be realized.

Overall then its difficult to know where the advantages outweigh the
difficulties.

Hopefully, I cleared some things up. What do ya think Andy?

:^)
 
C

Chris Thomasson

Okay... Here is a snippet of some compliable example code:


<code>

#include <cstdio>
#include <cstddef>
#include <cassert>
#include <new>


template<size_t T_basesz, size_t T_metasz, size_t T_basealign>
class lmem {
unsigned char m_basebuf[T_basesz + T_metasz + T_basealign - 1];
unsigned char *m_alignbuf;

private:
static unsigned char* alignptr(unsigned char *buf, size_t alignsz) {
ptrdiff_t base = buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = base % alignsz;
ptrdiff_t result = (! offset) ? base : base + alignsz - offset;
assert(! (result % alignsz));
return static_cast<unsigned char*>(0) + result;
}

public:
lmem() : m_alignbuf(alignptr(m_basebuf, T_basealign)) {
printf("(%p)lmem::lmem()\n - buffer size: %u\n - m_basebuf(%p)\n -
m_alignbuf(%p)\n\n",
(void*)this,
T_basesz + T_metasz + T_basealign - 1,
(void*)m_basebuf,
(void*)m_alignbuf);
}
template<typename T>
void* loadptr() const {
assert(T_basesz >= (sizeof(T) * 2) - 1);
printf("(%p)lmem::loadptr() - buffer size: %u\n",
(void*)this,
(sizeof(T) * 2) - 1);
return alignptr(m_alignbuf, sizeof(T));
}
void* loadmetaptr() const {
return m_alignbuf + T_basesz;
}
};


namespace detail {
namespace os {
namespace cfg {
enum config_e {
PAGE_SZ = 8192
};
}}

namespace arch {
namespace cfg {
enum config_e {
L2_CACHELINE_SZ = 128
};
}}

namespace lheap {
namespace cfg {
enum config_e {
BUF_SZ = os::cfg::pAGE_SZ * 2,
BUF_ALIGN_SZ = arch::cfg::L2_CACHELINE_SZ,
BUF_METADATA_SZ = sizeof(void*)
};
}}
}


template<typename T>
class autoptr_calldtor {
T *m_ptr;
public:
autoptr_calldtor(T *ptr) : m_ptr(ptr) {}
~autoptr_calldtor() {
if (m_ptr) { m_ptr->~T(); }
}
T* loadptr() const {
return m_ptr;
}
};


namespace lheap {
using namespace detail::lheap;
}


class foo1 {
// mess with the stack
int m_1;
char m_2[73];
short m_3;
char m_4[18];
public:
foo1() { printf("(%p)foo1::foo1()\n", (void*)this); }
~foo1() { printf("(%p)foo1::~foo1()\n\n", (void*)this); }
};


class foo2 {
// mess with the stack
int m_1;
char m_2[111];
short m_3;
char m_4[222];
public:
foo2() { printf("(%p)foo2::foo2()\n", (void*)this); }
~foo2() { printf("(%p)foo2::~foo2()\n\n", (void*)this); }
};


int main() {
// mess with the stack
int m_1;
char m_2[73];
short m_3;
char m_4[18];

{
// setup this threads allocator
lmem<lheap::cfg::BUF_SZ,
lheap::cfg::BUF_METADATA_SZ,
lheap::cfg::BUF_ALIGN_SZ> foomem;

{
// mess with the stack
int m_1;
char m_2[142];
short m_3;
char m_4[188];

autoptr_calldtor<foo1> f(new (foomem.loadptr<foo1>()) foo1);
}

{
// mess with the stack
int m_1;
char m_2[1];
short m_3;
char m_4[3];

autoptr_calldtor<foo2> f(new (foomem.loadptr<foo2>()) foo2);
}
}

printf("\n_________\npress any key to exit...\n");
getchar();
return 0;
}

</code>



Can anybody run this without tripping an assertion? It seems to be running
fine on my systems...

;^)


P.S.

This should compile fine with the following options:

-Wall -pedantic -ansi

If you have any problems, let me know.
 
K

kwikius

Hopefully, I cleared some things up. What do ya think Andy?

I think I'd better pull back out of discussions re threads and leave
it to the experts :)

regards
Andy Little
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top