Is this standard c++...

C

Chris Thomasson

I am thinking about using this technique for all the "local" memory pools in
a paticular multi-threaded allocator algorithm I invented. Some more info on
that can be found here:

http://groups.google.com/group/comp.arch/browse_frm/thread/24c40d42a04ee855

Anyway, here is the code snippet:

#include <cstdio>
#include <cstddef>
#include <new>


template<size_t T_sz>
class lmem {
unsigned char m_buf[T_sz];
public:
void* loadptr() {
return m_buf;
}
};


class foo {
public:
foo() { printf("(%p)foo::~foo()", (void*)this); }
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};


int main(void) {
foo *f;
lmem<sizeof(*f)> foomem;
f = new (foomem.loadptr()) foo;
f->~foo();
return 0;
}
 
V

Victor Bazarov

Chris said:
I am thinking about using this technique for all the "local" memory
pools in a paticular multi-threaded allocator algorithm I invented.
Some more info on that can be found here:

http://groups.google.com/group/comp.arch/browse_frm/thread/24c40d42a04ee855

Anyway, here is the code snippet:

#include <cstdio>
#include <cstddef>
#include <new>


template<size_t T_sz>
class lmem {
unsigned char m_buf[T_sz];
public:
void* loadptr() {
return m_buf;
}
};


class foo {
public:
foo() { printf("(%p)foo::~foo()", (void*)this); }

The string seems to be incorrect.
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};


int main(void) {
foo *f;
lmem<sizeof(*f)> foomem;
f = new (foomem.loadptr()) foo;

It's not "incorrect", but I believe you may have a problem
with alignment. Only a memory block sized 'sizeof(foo)'
obtained from free store is aligned correctly to have a 'foo'
constructed in it like that.
f->~foo();
return 0;
}

V
 
I

Ian Collins

Chris said:
I am thinking about using this technique for all the "local" memory pools in
a paticular multi-threaded allocator algorithm I invented. Some more info on
that can be found here:

http://groups.google.com/group/comp.arch/browse_frm/thread/24c40d42a04ee855

Anyway, here is the code snippet:

#include <cstdio>
#include <cstddef>
#include <new>


template<size_t T_sz>
class lmem {
unsigned char m_buf[T_sz];

You may run into alignment problems by using an array of unsigned char,
you should follow the same system specific alignment rules as malloc.
public:
void* loadptr() {
return m_buf;
}
};


class foo {
public:
foo() { printf("(%p)foo::~foo()", (void*)this); }
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};
C++ purists avoid printf. Never mix it with iostreams.
int main(void) {

Just int main() is the norm in C++.
 
K

kwikius

Chris said:
I am thinking about using this technique for all the "local" memory
pools in a paticular multi-threaded allocator algorithm I invented.
Some more info on that can be found here:

Anyway, here is the code snippet:
#include <cstdio>
#include <cstddef>
#include <new>
template<size_t T_sz>
class lmem {
unsigned char m_buf[T_sz];
public:
void* loadptr() {
return m_buf;
}
};
class foo {
public:
foo() { printf("(%p)foo::~foo()", (void*)this); }

The string seems to be incorrect.
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};
int main(void) {
foo *f;
lmem<sizeof(*f)> foomem;
f = new (foomem.loadptr()) foo;

It's not "incorrect", but I believe you may have a problem
with alignment. Only a memory block sized 'sizeof(foo)'
obtained from free store is aligned correctly to have a 'foo'
constructed in it like that.

IMO, because the array is wrapped in a class there shouldnt be a
problem. IOW thc class alignment will take care of the issue. However
It should be possible to check . (Not absolutely sure if this is the
correct solution but whatever ...

regards
Andy Little

#include <stdexcept>

// aka std::tr1::
template <typename T>
struct alignment_of{
#if defined _MSC_VER
static const unsigned int value = __alignof(T);
#elif defined __GNUC__
static const unsigned int value = __alignof__(T);
#else
#error need to define system dependent align_of
#endif
};


template<typename T, size_t T_sz>
class lmem {

unsigned char m_buf[T_sz];
public:
void* loadptr() {
// check its aligned correctly
ptrdiff_t dummy = m_buf - static_cast<char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
if(!offset)
return m_buf;
throw std::logic_error("lmem memory doesnt satisfy alignment");
}
};


int main()
{
typedef double type;
lmem<type,sizeof(type)> x;
}
 
K

kwikius

IMO, because the array is wrapped in a class there shouldnt be a
problem. IOW thc class alignment will take care of the issue.

Well I was wrong about that :.
(Back to the drawing board I guess)

But at least the code seems to detect the problem :

#include <stdexcept>

// aka std::tr1::
template <typename T>
struct alignment_of{
#if defined _MSC_VER
static const unsigned int value = __alignof (T);
#elif defined __GNUC__
static const unsigned int value = __alignof__(T);
#else
#error need to define system dependent align_of
#endif
};


template<typename T, size_t T_sz>
class lmem {

unsigned char m_buf[T_sz];
public:
void* loadptr() {
// check its aligned correctly
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
if(!offset)
return m_buf;
throw std::logic_error("lmem memory doesnt staisfy alignment");
}
};

#include <iostream>
int main()
{
try{
typedef double type;
char c;
lmem<type,sizeof(type)> x;
void * p = x.loadptr() ;
}
catch(std::exception & e)
{
std::cout << e.what() <<'\n';
}
}
/* output:
lmem memory doesnt staisfy alignment

*/
regards
Andy Little
 
I

Ian Collins

kwikius said:
IMO, because the array is wrapped in a class there shouldnt be a
problem. IOW thc class alignment will take care of the issue. However
It should be possible to check . (Not absolutely sure if this is the
correct solution but whatever ...
No, the class may be aligned according to the alignment requirements of
its members, in this case unsigned char.
 
K

kwikius

No, the class may be aligned according to the alignment requirements of
its members, in this case unsigned char.

Yep. I figured it out eventaully I think.
It seems to be possible but at the expense of always allocating your
char array oversize by alignment_of<T> -1. Its not possible to know
where on the stack the lmem object will go.

Anyway the following seems to work :


template<typename T, size_t T_sz>
class lmem {
// Don't think there is a way to avoid
// allocating extra stack space ...
unsigned char m_buf[T_sz + alignment_of<T>::value -1];
public:
void* loadptr() {
// align memory to T
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
ptrdiff_t result = offset == 0
? dummy
: dummy + alignment_of<T>::value - offset;
// check this works
assert( result % alignment_of<T>::value == 0);
return static_cast<unsigned char*>(0) + result;
}
};

struct my{
int x, y;
double z;
my(int xx, int yy, double zz): x(xx),y(yy),z(zz){}
};

#include <iostream>
int main()
{

// muck about with stack offsets
char c = '\n';
short n = 1;
lmem<double,sizeof(double)> x;
double * pd = new (x.loadptr()) double(1234.56789);
std::cout << *pd <<'\n';

lmem<my, sizeof(my)> y;
my * pc = new (y.loadptr()) my(1,2,3);

std::cout << pc->x << ' ' << pc->y << ' ' << pc->z <<'\n';

}

regards
Andy Little
 
I

Ian Collins

kwikius said:
kwikius wrote:



No, the class may be aligned according to the alignment requirements of
its members, in this case unsigned char.


Yep. I figured it out eventaully I think.
It seems to be possible but at the expense of always allocating your
char array oversize by alignment_of<T> -1. Its not possible to know
where on the stack the lmem object will go.

Anyway the following seems to work :


template<typename T, size_t T_sz>
class lmem {
// Don't think there is a way to avoid
// allocating extra stack space ...
unsigned char m_buf[T_sz + alignment_of<T>::value -1];
public:
void* loadptr() {
// align memory to T
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
ptrdiff_t result = offset == 0
? dummy
: dummy + alignment_of<T>::value - offset;
// check this works
assert( result % alignment_of<T>::value == 0);
return static_cast<unsigned char*>(0) + result;
}
};
Or even:

template <typename T>
class lmem {
T t;
public:
void* loadptr() {
return &t;
}
};

:)
 
K

kwikius

Yep. I figured it out eventaully I think.
It seems to be possible but at the expense of always allocating your
char array oversize by alignment_of<T> -1. Its not possible to know
where on the stack the lmem object will go.
Anyway the following seems to work :
template<typename T, size_t T_sz>
class lmem {
// Don't think there is a way to avoid
// allocating extra stack space ...
unsigned char m_buf[T_sz + alignment_of<T>::value -1];
public:
void* loadptr() {
// align memory to T
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
ptrdiff_t result = offset == 0
? dummy
: dummy + alignment_of<T>::value - offset;
// check this works
assert( result % alignment_of<T>::value == 0);
return static_cast<unsigned char*>(0) + result;
}
};

Or even:

template <typename T>
class lmem {
T t;
public:
void* loadptr() {
return &t;
}

};

But consider a variant, and you are smokin', treating stack like heap
with no alloc overhead...

This may be the purpose behind the device ...

template< size_t T_sz>
class lmem {

unsigned char m_buf[T_sz];
public:
template <typename T>
void* loadptr() {
assert( T_sz >= sizeof(T) + alignment_of<T>::value -1);
// align memory to T
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
ptrdiff_t result = offset == 0
? dummy
: dummy + alignment_of<T>::value - offset;
// check this works
assert( result % alignment_of<T>::value == 0);
return static_cast<unsigned char*>(0) + result;
}
};

struct my{
int x, y;
double z;
my(int xx, int yy, double zz): x(xx),y(yy),z(zz){}
};

#include <iostream>
int main()
{

// muck about with stack offsets
char c = '\n';
short n = 1;
lmem<1000> x;
double * pd = new (x.loadptr<double>()) double(1234.56789);
std::cout << *pd <<'\n';
// add some detroy function which discrimainates pods etc
my * pc = new (x.loadptr<my>()) my(1,2,3);
std::cout << pc->x << ' ' << pc->y << ' ' << pc->z <<'\n';
// use some destroy() for my dtor
}

regards
Andy Little
 
D

Default User

Ian said:
Care to cite an example?

Any time you need a compact statement for formatting output from
multiple variables. All that screwing around with setw and setprecision
and whatnot is ridiculous.

Just because dumbasses sometimes fail to use printf() in a correct
manner doesn't mean that those of us who aren't dumbasses should not
use it.

Yeah, yeah, I know Boost has some sort of formatted output thing, but
some of us can't use it. I have no idea what its status is vis-a-vis
the standard.





Brian
 
I

Ian Collins

Default said:
Ian Collins wrote:




Any time you need a compact statement for formatting output from
multiple variables. All that screwing around with setw and setprecision
and whatnot is ridiculous.
Then use sprintf.

I've seen real problems where printf and iostreams were used on the same
stream (stdout).
Just because dumbasses sometimes fail to use printf() in a correct
manner doesn't mean that those of us who aren't dumbasses should not
use it.
I wasn't saying don't use it, I was saying don't mix.
 
P

Pete Becker

Ian said:
I've seen real problems where printf and iostreams were used on the same
stream (stdout).

There shouldn't be any problems with a standard-conforming library,
unless the program does something stupid. cout and stdout are
synchronized, so output comes out just the way you'd expect.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
I

Ian Collins

Pete said:
There shouldn't be any problems with a standard-conforming library,
unless the program does something stupid. cout and stdout are
synchronized, so output comes out just the way you'd expect.
The problems I had were with threaded code where the access guards for
the streams got in a mess (IIRC there were different guards for
iostreams and stdio streams). Anyway, this was in pre-standard days.
 
C

Chris Thomasson

[...]
[...]

But consider a variant, and you are smokin', treating stack like heap
with no alloc overhead...

This may be the purpose behind the device ...
[...]

Indeed it its. I believe that I could make use of the following function
'ac_malloc_aligned' :

http://appcore.home.comcast.net/appcore/src/appcore_c.html
(2nd to last function in the file...)

to get the alignment correct. Your correct in that I have to endure a
penalty of an "over allocation" in order to get the alignment correct. I was
hoping to align the "main buffer" to a multiple of the size of a
architecture specific L2 cache-line then align is on L2 cacheline boundary.
I could then start to allocate the individual buffered chunks from there...
Well, I will post some more source code in a day or two... So far, it should
still be in the realm of standard c++... However, once I get this first
phase out of the way... well, the code is going to get HIGHLY architecture
specific as the so-called "critical" parts of my memory allocator algorithm
wart the interlocked RMW instructions and memory barriers are packed away
into ia32 and SPARC assembly language.
 
C

Chris Thomasson

[...]> Indeed it its. I believe that I could make use of the following
function
'ac_malloc_aligned' :

Minus the explicit heap allocation of course!

;^)
 
K

kwikius

I have been lazily following your thread related stuff. One day I may
be able to make use of it, though currently it all looks pretty opaque
I'm afraid.

On this subject I had hoped that I could use boost::shared_ptr for a
GUI smart_ptr class but unfortunately it doesnt work in that
situation, so I have been forced to roll my own.

Here is some (slightly confused discussion baout it):

http://tinyurl.com/3debeh

I have been sticking to single threaded version and probably will for
the time being, but if I get onto anything more substatial it will
need to work with some form of concurrency, so at (if I get to) that
point I will certainly be interested...

regards
Andy Little
 
C

Chris Thomasson

Chris Thomasson said:
[...]
Yep. I figured it out eventaully I think.
It seems to be possible but at the expense of always allocating your
char array oversize by alignment_of<T> -1. Its not possible to know
where on the stack the lmem object will go.
[...]

But consider a variant, and you are smokin', treating stack like heap
with no alloc overhead...

This may be the purpose behind the device ...
[...]

Indeed it its. I believe that I could make use of the following function
'ac_malloc_aligned' :

http://appcore.home.comcast.net/appcore/src/appcore_c.html
(2nd to last function in the file...)

Okay. I was thinking of something kind of like:


<pseudo-code/sketch>
---------

#include <cstdio>
#include <cstddef>
#include <cassert>
#include <new>


template<size_t T_basesz, size_t T_metasz, size_t T_basealign>
class lmem {
unsigned char m_basebuf[T_basesz + T_metasz + T_basealign - 1];
unsigned char *m_alignbuf;

private:
static unsigned char* alignptr(unsigned char *buf, size_t alignsz) {
ptrdiff_t base = buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = base % alignsz;
ptrdiff_t result = (! offset) ? base : base + alignsz - offset;
assert(! result % alignsz);
return static_cast<unsigned char*>(0) + result;
}

public:
lmem() : m_alignbuf(alignptr(m_basebuf, T_basealign)) {}
template<typename T>
void* loadptr() const {
assert(T_basesz >= (sizeof(T) * 2) - 1);
return alignptr(m_alignbuf, sizeof(T));
}
void* loadmetaptr() const {
return m_alignbuf + T_basesz;
}
};


namespace detail {
namespace os {
namespace cfg {
enum config_e {
PAGE_SZ = 8192
};
}}

namespace arch {
namespace cfg {
enum config_e {
L2_CACHELINE_SZ = 64
};
}}

namespace lheap {
namespace cfg {
enum config_e {
BUF_SZ = os::cfg::pAGE_SZ * 2,
BUF_ALIGN_SZ = arch::cfg::L2_CACHELINE_SZ,
BUF_METADATA_SZ = sizeof(void*)
};
}}
}


template<typename T>
class autoptr_calldtor {
T *m_ptr;
public:
autoptr_calldtor(T *ptr) : m_ptr(ptr) {}
~autoptr_calldtor() {
if (m_ptr) { m_ptr->~T(); }
}
T* loadptr() const {
return m_ptr;
}
};

namespace lheap {
using namespace detail::lheap;
}

class foo {
public:
foo() { printf("(%p)foo::foo()", (void*)this); }
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};

int main() {
{
lmem<lheap::cfg::BUF_SZ,
lheap::cfg::BUF_METADATA_SZ,
lheap::cfg::BUF_ALIGN_SZ> foomem;

autoptr_calldtor<foo> f(new (foomem.loadptr<foo>()) foo);
}

printf("\npress any key to exit...\n"); getchar();
return 0;
}





The lmem object is meant to be barebones low-level buffer object in the
"system-code" part of my c++ memory allocator library I am currently
developing. Basically, I am going for a fairly thin wrapper over the
allocator pseudo-code I posted; you can follow the link to the invention to
look at it. Humm... As you can probably clearly see by now, I am a hard core
C programmer and I must admit that my c++ could skills can be improved
upon... So, any ideas for interface designs, or even system level design,
are welcome...

I was thinking about using a single lmem object per-thread and then
subsequently using it to allocate all of the per-thread data-structures my
allocator algorithms relies upon. So, essentially, every single
data-structure that makes up my multi-threaded allocator design can be based
entirely in the stacks of a plurality of threads. Wow, this has the
potential to have simply excellent scalability and performance
characteristics; anyway... ;^)

So, since lmem is all I "really" need and I don't want to post any of the
implementations details wrt lock-free algorithms, ect... what else can I
discuss here that's on topic... I am going to need to finally decide on
exactly how I will be laying out the per-thread structures in the buffer
managed by lmem...

Then I need to think about how I am going to ensure that the threads stacks
don't go away when any of its allocator structures are in use by any other
thread. The following technique currently works fine:

<pseudo-code>

template

class mylib_thread {
// ...
public:
~mylib_thread() {
/*
special atomic-decrement-and-wait function;
off-topic, not shown here...
we can discuss the lock-free aspects of my algorithm
over on comp.programming.threads...
*/
}
};

user_thread_entry(mylib_thread &_this) {
// user application code
}

libsys_thread_entry(...) {
// library system code

lmem<lheap::cfg::BUF_SZ,
lheap::cfg::BUF_METADATA_SZ,
lheap::cfg::BUF_ALIGN_SZ> foomem;

autoptr_calldtor<foo> _this(new (_thismem.loadptr<mylib_thread>())
mylib_thread);

user_thread_entry(*_this.loadptr());
}




Any thoughts?
 
K

kwikius

Any thoughts?

First I don't know enough about threads to make any constructive
commnts. OTOH give me access to the system timer and program counter
and ability to disable interrupts and allow me to write interrrupt
service routines, in fact control of the system event mechanisms and I
would be happy. OTOH I guess I should head over to the threads
newsgroup and try to understand them better. Anyway here are some
thoughts (in contradiction to the above) thouh I havent studied your
code in any depth:

Firstly I don't understand from that the way you want to use the
device, but I would guess it would be restricted to specialised use,
however a few (confused) thoughts spring to mind. The first is that in
my environment the stack is actually quite a scarce resource, default
around 1 Mb (In VC7.1), after which you get a (non C++) stack overflow
in VC7.1. I presume you can modify this though.

The heap on the other hand can be looked on as a(almost) infinite
resource, even if you run out of physical memory the system will start
swapping memory to from disk. Very system specific though I guess..

In that sense I would guess that use of the stack is by nature not as
scaleable as using the heap.

So from that point of view it is interesting to try to come up with
scenarios where you would use the device.
From the viewpoint of allocationg on the stack, essentially you have
to know how much you are going to allocate beforehand, but if the
stack is a scarce resource,its not viable to treat it in the same
carefree way as the heap proper and just allocate huge. Of course it
may be possible to use some assembler to write your own functions
grabbing runtime amounts of stack, which would maybe make the device
more versatile.

For use as an allocator, the alternative is of course to use malloc
for your one time start up allocation for your own allocator, and then
after the start up cost whether you allocated on the heap or stack I
would guess the cost of getting memory from the allocator is going to
be the same regardless where it is.

Therefore the main potential use of such a device would be where you
know the size of fixed size allocation, but don't know what types you
want to put in it, and where you doing your start up allocation
frequently. This may be in some situation where you actually dont want
to use the function call mechanism for some reason, IOW using the
scheme as a sort of scratch space where you are modifying the types in
the scratch space dependent on what you are doing. This does somehow
bring to mind use in embedded systems where there is no heap as such
so the scheme could be used as a heap for systems without a heap as it
were. You would then presumably need to keep passing a reference to
the allocator in to child functions or put it in a global variable.

Overall then its difficult to know where the advantages outweigh the
difficulties.

regards
Andy Little
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top