An idea for heap allocation at near stack allocation speed

Bjarke Hammersholt Roune · Feb 13, 2011

I want to allocate dynamically on the stack since it is faster, but
that requires platform dependent code and the stack can overflow. So
instead I want to write a memory allocator that works as stack
allocation does including in speed yet is backed by memory from the
heap. I want to be able to replace fast (but terrible!) code like

void foo(const size_t size) {
if (size > HugeConstant)
throw no_can_do_exception;
int hugeArray[HugeConstant];
HugeStruct hugeStruct;
// do computations on hugeArray and hugeStruct
}

or fast (but terrible!) interfaces like

void foo(const size_t size, int* scratchBuffer) {
// do computations using scratchBuffer as temporary memory
}

with something like:

void foo(const size_t size) {
ScopeArrayAlloc<int> hugeArray(size);
ScopeAlloc<HugeStruct> hugeStruct;
// do computations on hugeArray and hugeStruct
}

ScopeAlloc should have these properties:

1) Be nearly as fast as stack allocation.
2) Be type-safe and unable to cause memory leaks or double deletes
3) Live on the heap and take up only around sizeof(void*) bytes on
the stack
4) Allocate memory dynamically and allocate no more memory then
needed
6) Require no platform dependent code
7) Work with any class that could be allocated on the stack
8) Be able to allocate any amount of memory up to what is available
on the machine

Using new/delete inside ScopeAlloc could support all of these except
1. I'll now go into how I want to implement ScopeAlloc, but what I'd
really like is for there to be a Free software library already
available that does all of this. Do you know anything like that?

** Implementing ScopeAlloc

Here's how I'll implement ScopeAlloc. I have some questions about how
I might improve this plan.

All ScopeAlloc's are supported by a static arena allocator - it is the
same allocator for all types T. An arena allocator has a memory area
it allocates out of that is structured like this:

[ -- memory in use -- p -- memory not in use -- ]

Here p is a char* that points into the memory area, and an allocation
of X bytes simply returns p and increments p by X. It may be necessary
to allocate another memory block, so that has to be checked. Here's a
way to do that that doesn't work:

if (p + X < endOfMemoryBlock) ...

It doesn't work because p + X could be more than 1 element past the
end of the memory block which makes it undefined behavior to even
calculate p + X (?). Worse, even if it was defined behavior, there
could be an overflow which would make < do the wrong thing. Here's
something else that doesn't work:

if (bytesAllocated + X < memoryBlockCapacity) ...

Here bytesAllocated + X can overflow. What I'm thinking of now is to
have ScopeAlloc internally use a function like this on a global
allocator to get its memory:

inline char* alloc(const size_t X) {
if (X < remainingCapacity)
allocateMoreMemory(X)
remainingCapacity -= X;
char* const buffer = p;
p += X;
return p;
}

Here allocateMoreMemory would allocate a new block of
2*max(lastBlockSize, X) bytes which would have the effect of making
the condition X < remainingCapacity almost always false. So the cost
here is one very predictable branch and two additions (counting a
subtraction as an addition).

This ignores alignment. ScopeAlloc would have to adjust X to preserve
alignment by rounding up to the proper alignment. To align to
alignment Y, that is to round up to the nearest multiple of Y, I would
calculate (notice no branches)

X = Y * ((X - 1) / Y + 1)

For single objects this would all be compile-time constant, so the
compiler would probably optimize it all away, so no problem. If X is a
dynamic size, then at least Y would be a compile-time constant that is
a power of two, so I would expect the compiler to replace the
multiplication and division by shifts. So the cost would be 4
additions (counting shifts as additions). Maybe a compile-time
evaluatable branch could be used to skip the alignment adjustment at
compile-time if the size of the object that we are making an array of
is already a multiple of Y.

** Question: Is there a better way to do the alignment calculation?

Now for deallocation of a buffer of size X. We can't just do "p -= X;"
since maybe the object being deallocated was allocated on a previous
block of memory to the one that p is pointing into. That has to be
detected, so there must be a branch (grumble grumble). One way to deal
with that would be to check this:

if (p == startOfMemoryblock) {
// Adjust member variables to now allocate
// out of the previous block
}

The condition works since allocations and deallocations are required
to be in reverse order so the only way that we could have gotten to
deallocating into the previous block is if p is at the start of the
current block. The big problem with this approach is that if the
memory use of the program hovers around the end of a block then we are
going to be adjusting lots of fields for lots of allocations and
deallocations to move allocation from one block to the next and then
back. Even worse, the branches that detect when to move to another
block will become poorly predictable which greatly increases the cost
of those branches. Also, memory use will become split across two or
more blocks decreasing locality of reference.

Another approach is to never move allocation into a previous block.
Instead we can let p stay where it is in that case. So the
deallocation function becomes just:

inline free(const size_t X) {
if (p != startOfMemoryBlock) {
p -= X;
remainingCapacity += X;
}
}

If we don't want to waste the memory of the previous blocks, we could
do something like this:

inline free(const size_t X) {
if (p != startOfMemoryBlock) {
p -= X;
remainingCapacity += X;
} else {
// Look into previous block and delete that
// block if it has become completely freed.
}
}

A nice thing here is that the else branch doesn't have to be that
efficient since it can't be taken that many times.

The last point is destruction. ScopeAlloc<T> knows the type of T, so
it can just call the destructor of T directly in the destructor of
ScopeAlloc. A ScopeArrayAlloc<T> could store the size of the array and
then run through a loop to destruct each element. However lots of
classes have do-nothing destructors and when using stack allocation or
even delete[] I'm sure that the compiler is aware of that so it emits
no code to call the destructors. I would like my code to have zero
overhead in that case as well.

Question: If I have a loop that calls do-nothing destructors, can I
generally trust compilers to optimize the loop away?

So the total cost for an allocate/deallocate pair is 2 well-predicted
branches and 4 additions. Add 4 additions for arrays of classes whose
size is not a multiple of the alignment. I'm guessing that dynamic
stack allocation has an overhead of 2 additions per allocate/
deallocate pair, has better cache behavior and dirties fewer
registers. Still, this is much better than new/delete in terms of
performance and it should have better cache behavior than object
pools. Also, unlike arenas the interface for ScopeAlloc has no
pitfalls that is not shared by normal stack allocation.

Any suggestions for improvements or links to similar ideas?

Cheers
Bjarke Hammersholt Roune
www.broune.com

Bjarke Hammersholt Roune · Feb 14, 2011

[...]
This ignores alignment. ScopeAlloc would have to adjust X to preserve
alignment by rounding up to the proper alignment. To align to
alignment Y, that is to round up to the nearest multiple of Y, I would
calculate (notice no branches)

X = Y * ((X - 1) / Y + 1)

I just realized that the situation is much worse than this. To show
the problem, let T be some type, and suppose we want to allocate an
array of T's. Define

N = number of elements in the array
S = sizeof(T)
A = required alignment
C = remaining capacity

Define align(x) as x rounded up to the nearest multiple of A. Then all
I considered above is that mathematically

align(x) = A * ((x - 1) / A + 1)

Unfortunately the actual condition that we need to check is

align(N*S) <= C

In the discussion above I did not account for the possibility of
overflow of N*S or of the align operation, but they can very
definitely overflow, which needs to be detected. In general overflow
detection involving multiplications is a lot of work, but fortunately
this is unsigned multiplication and we know that align(x) and x differ
by at most A - 1. Because of this it is enough to check that

N <= (((size_t)-1) - (A - 1)) / S

to ensure that align(N*S) does not overflow. Here the right hand side
is all compile-time constant so this is just a comparison against a
constant. So the cost for detecting overflow is a single well
predicted branch. Is there a better way?

Cheers
Bjarke Hammersholt Roune

Finbar Slingshot · Feb 14, 2011

Bjarke Hammersholt Roune said:
I want to allocate dynamically on the stack since it is faster,

Have you actually profilled to see if it is worthwhile?

IME when the size of the struct/array gets too big for the stack then any
processing you're doing with it will completely dwarf any overhead you get
from using heap allocation.

itaj sherman · Feb 14, 2011

I want to allocate dynamically on the stack since it is faster, but
that requires platform dependent code and the stack can overflow. So
instead I want to write a memory allocator that works as stack
allocation does including in speed yet is backed by memory from the
heap. I want to be able to replace fast (but terrible!) code like

Cheers
Bjarke Hammersholt Rounewww.broune.com

I also have thought about this issue before.
I think what stands behind that, is that different structure types
have different memory allocation requirements:
1) all instances always have the same size.
2) each instance has a constant size for its lifetime.
3) an instance may change size through its lifetime.

These requirements are orders from most strict to least, and the
restrictions in each level could be applied to improve performance.
What makes this separation nice, is that each of these groups is
recursively closed with respect to aggregation:
1) a structure that is built of a set of structures (fixed at compile
time) from this group, also belongs to this group
2) a structure that is built of a set of structures (fixed at
instanciation) from this group, also belongs to this group
3) a structure that is build of a set of structures (changeable) from
this group, also belongs to this group

In the c++ language structures with allocation scheme 1, can be
defined with class. The way classes are defined allows implementations
to allocate all instances of a class to always have the same size.
Implementations do take advantage of this in the way they implement
the memory allocation for classes - They know in compile time what
would be the total size of an instance. So if it is dynamically
storage, they can do that with 1 chunck, and if stack storage even
better, they can usually sum all local objects in a function and
allocate them all with one machine operation.
A good example for a structure from group 1 is std::array<T,N> where T
is also from group 1.

Allocation scheme 3 is done in c++ using pointers and dynamic storage.
An instance of structure of this type, can never be allocated in a
signle chunk, because it may change size through its lifetime.
An example would be std::vector<T> (is doesn't matter which group T
is).
* c++ types themselves are all from group 1. A structure from group 3
(like std::vector<T>) is composed of instances of many structures from
group 1 that hold pointers to connect them.

What c++ doesn't have a general clear way to take advantage of
structures with allocation scheme 2. When you have such a structure,
then you can calulate the size of an intance when its created, and
know that size will stay for its lifetime. But there is no pre-
designed way in the language to take advantage of that restriction and
make performance out of it.
The only structures in this group that c++ is aware of are dynamic
arrays of type T, where T is of group 1. But that doesn't take
advantage of the recursive aggregation property of group 2.

For example: a structure that is an aggregation of a few dynamic
arrays is of group 2, but there is no way to express such a structure
in c++. We are forced to define such a structure as an aggregation of
pointers to those arrays, and it belongs to group 3.

Another example: I could imagine a template class
std::fixed_vector<T>, which would be much like std::vector, except
that the number of elements would be given to the constructor and
stays constant for that instance's lifetime. If T is a type of group
2, such std::fixed_vector<T> would also be of group 2 (std::vector<T>
wouldn't). And notice now that there isn't any strcuture like
hypotetic std::fixed_vector<T> in c++. There's no pre-designed way to
define such a template class in c++.

Using templates and placement new, I think its possible to build a
library that will be able to take advantage of the restrictions of
group 2, but probabely not to the full extent that could be taken by
the language. But is has to be very sensible in deciding how such
types should be defined, and particularly how to mimic the implicit
instanciations of c++ on them (implicit copy contruction, move
construction, conversions) in definitions of variables / data
members / return values / parameters.

In light of what I said above, I think your API is lacking the
recursive feature. You only allow to allocate primitive array of type
T, where T is of group 1. The practical problem is that no one uses
arrays directly anymore. We would certainly want to be able to define
and use such fixed_vector<T> with full container properties (which
simple arrays don't have), and use them to with such allocations. Very
soon we would also want to aggregate a few such structures in bigger
structures and give them type names and used them automatically.

Appart from that, I have a few small suggestions:
* Many c++ implementations have special support for such stack
operations. In these cases you want to use the implementation's
mechanism to allocate that memory, and don't need your own.
* Keep in mind that you'll need such a stack for every thread in the
program.
* The allocators cannot keep the no-throw exception guaranty, but
destructors of local allocators must not throw exceptions, so you
don't affect the weaker exception safety guaraties.

itaj

itaj sherman · Feb 14, 2011

Have you actually profilled to see if it is worthwhile?

IME when the size of the struct/array gets too big for the stack then any
processing you're doing with it will completely dwarf any overhead you get
from using heap allocation.

The point is that the size is not known at compile time.
For example: a function may be called many times, in most the sizes
are small, but sometimes are large. You can't use regular stack
allocation. If you use regular dynamic allocation, then in most
invocations the processing (the size is small) would not dwarf the
allocation.

itaj

itaj sherman · Feb 14, 2011

On Feb 13, 11:41 pm, Bjarke Hammersholt Roune <[email protected]>
wrote:
[snip]

Using templates and placement new, I think its possible to build a
library that will be able to take advantage of the restrictions of
group 2, but probabely not to the full extent that could be taken by
the language. But is has to be very sensible in deciding how such
types should be defined, and particularly how to mimic the implicit
instanciations of c++ on them (implicit copy contruction, move
construction, conversions) in definitions of variables / data
members / return values / parameters.

Some details of how I think such a library can be structured:

Defining some class fixed_size_allocator. It should have a default
constructor, but doesn't have to be copyable. An intance of this class
can manage the memory allocation for 1 structure of group 2.

Define a concept for classes that belong to group 2. Lets call it
fixed_size concept.
All constructors should accept as the first argument an object which
is a fixed_size_allocator. This object will be used to allocate memory
for the instance. A constructor will pass the allocator to the
contructor of members and bases which are also of fixed_size concept.

Each class T which is fixed_size, will have a specialization of
fixed_size_traits<T>.
For each constructor of T

T::T( Par1 par1, Par2 par2 ... );

There will also be a static function member:

size_t fixed_size_traits<T>::calculate_size( Par1 par1, Par2
par2 ... );

which should return the size of the instance that is to be built with
these given arguments.
This function can invoke fixed_size_traits<X>::calculate_size for all
X memebers and bases classes of T.

also need template classes:
template< typename T > stack_fixed_size_object;
template< typename T > dynamic_fixed_size_object;

for use when declaring a local variable of type T, or dynamically
allocating one.
The constructors of these can call
fixed_size_traits<T>::calculate_size, and allocate the needed memory
in one chunck (on a stack or dynamic) - this is where the performance
advantage comes. It is not only that local variables of group 2 are
allocated on a stack memory manager (rather than less efficient heap),
is also that in both cases the memory is allocated in one chunck for
all the parts of the instance.

That means that unlike
std::vector< std::vector< T > >{ { 1, 2, 3 }, { 4, 5, 6 }, { 7, 8,
9 } }
which needs 3 separate heap allocations at least.

dynamic_fixed_size_object< fixed_vector< fixed_vector< T > > >{ { 1,
2, 3 }, { 4, 5, 6 }, { 7, 8, 9 } }
is 1 heap allocation.

stack_fixed_size_object< fixed_vector< fixed_vector< T > > >{ { 1, 2,
3 }, { 4, 5, 6 }, { 7, 8, 9 } }
is 1 stack allocation (your special stack, not the regular, but still
very fast ).

Now need to think real hard what to do for implicit copy and move
sematics. I don't even know where to begin.

itaj

Bjarke Hammersholt Roune · Feb 14, 2011

Have you actually profilled to see if it is worthwhile?

IME when the size of the struct/array gets too big for the stack then any
processing you're doing with it will completely dwarf any overhead you get
from using heap allocation.

Yep, all my sub-algorithm-choice optimization is 100% profile driven.
The thing is that the size of what I'm allocating on the stack is
dynamic, so for most things it wouldn't be a problem but nothing
prevents a user from giving my program input that would make it a
problem. Also my problems tend to be NP-hard so I end up using little
memory but the processing time is high and I get lots of allocations/
deallocations for intermediate computations. The amount of memory
needed for these computations is unpredictable even if it is usually
small. I use various techniques to avoid frequent allocations now, but
they all have drawbacks that I'd like to avoid and it seems to me that
this is a way to cleanly and completely fix this mess. I believe that
this solution would make my coding easier, improve my design and
interfaces, open up more possibilities for algorithms and make my code
a little faster at the same time. So to me it's well worth it to make
this happen.

These situations also turn up in the standard library. For example
std::stable_sort is usually implemented to be in-place using heapsort
instead of merge sort to avoid allocating temporary buffers. Merge
sort would be considerably faster. If the standard library had a
facility like what I'm proposing then mergesort would become a more
attractive choice than it is now.

Cheers
Bjarke

Bjarke Hammersholt Roune · Feb 14, 2011

I think what stands behind that, is that different structure types
have different memory allocation requirements:
1) all instances always have the same size.
2) each instance has a constant size for its lifetime.
3) an instance may change size through its lifetime.

That's a nice way of thinking about the issue. You are precisely right
that my design is based on assuming that objects have dynamic size but
that their size will never increase after allocation. Though I have
toyed with the idea of allowing the very top-most element on the stack
to change its size. I'm not sure if its a good idea or not.

In light of what I said above, I think your API is lacking the
recursive feature. You only allow to allocate primitive array of type
T, where T is of group 1. The practical problem is that no one uses
arrays directly anymore. We would certainly want to be able to define
and use such fixed_vector<T> with full container properties (which
simple arrays don't have), and use them to with such allocations. Very
soon we would also want to aggregate a few such structures in bigger
structures and give them type names and used them automatically.

You are completely right. For now I just want to replace stack
allocation, and my design can do everything you can do with stack
allocation. So far so good. Yet as you point out it would be even
better to be able to expand stack allocation to classes that allocate
their own memory internally. This requires some way to tell an object
how it is being allocated. The preliminary idea I have on that is
template specialization. There would be some function template<T>
construct(...), and if some T wants to do something special when
allocated via StackAlloc<T> then it would specialize construct to do
the right thing by allocating all its dynamic size (type 2) members
using the stack allocator. E.g. a mechanism like that is necessary for
convenient use of classes using the trick of placing a one-element
array at the end that can actually be more than 1 element by putting
the class T in a buffer that is larger than sizeof(T). It would be

awesome to have "StackAlloc said:
Appart from that, I have a few small suggestions:
* Many c++ implementations have special support for such stack
operations. In these cases you want to use the implementation's
mechanism to allocate that memory, and don't need your own.

I'd love an example of this. Don't all stack-based memory suffer from
the issue of stack overflow?

Thank you for your thoughtful response.

Cheers
Bjarke

Bjarke Hammersholt Roune · Feb 14, 2011

[...]
Now need to think real hard what to do for implicit copy and move
sematics. I don't even know where to begin.

There is always an outer owner of the whole memory block that would be
acting like a smart pointer. It could allow move semantics by handing
over its exclusive pointer. For copy the outer owner could allocate a
memory area of the same size as that being copied and then call the
owned element's copy constructor with placement new. Then it's up to
the owned element to either have a reasonable implementation of
copying or to make its copy constructor inaccessible. What happens on
copy could also be a template function that individual owned elements
could specialize if they don't want the copy to necessarily have the
same size as that which is being copied.

Calculating the required size is complicated and error prone. There
would really have to be a convenient way to write the code to
calculate the size of a class. Even better if the same piece of code
could be instantiated to calculate the size and then instantiated
another way to construct the object or part of it. I'm not sure if
that is possible.

Some classes might want to support both fixed size and variable size
allocation. There should be some standard way to support that, perhaps
a smart pointer that only deletes the memory when it was allocated by
new. The copy operation would also have to be aware of this.

Cheers
Bjarke

itaj sherman · Feb 14, 2011

It was something I came across at work once, that made me think about
what I described. I can't remember what it was. But it wasn't crutial
from performance POV back then, and I never got into more details
about this idea.
Actually I never talked about it with anyone, so that's quite nice I
saw your question here and got reminded about that. It's good to know
that it makes sense to someone else but me.
Maybe I'll think more about it. I'll tell you if I get something
interesting.

You are completely right. For now I just want to replace stack
allocation, and my design can do everything you can do with stack
allocation. So far so good. Yet as you point out it would be even
better to be able to expand stack allocation to classes that allocate
their own memory internally. This requires some way to tell an object
how it is being allocated. The preliminary idea I have on that is
template specialization. There would be some function template<T>
construct(...), and if some T wants to do something special when
allocated via StackAlloc<T> then it would specialize construct to do
the right thing by allocating all its dynamic size (type 2) members
using the stack allocator. E.g. a mechanism like that is necessary for
convenient use of classes using the trick of placing a one-element
array at the end that can actually be more than 1 element by putting
the class T in a buffer that is larger than sizeof(T). It would be
awesome to have "StackAlloc<T> myObj(size);" just work for such T's.

I think it can be any argument there, not just size. The type of the
instance that is being initialized should know how to use the
parameter to calculate the needed size.
These are all just general ideas, I never really thought about these
details carefully.

For example:
{
int size = ...;
StackAlloc< fixed_vector< X > > b( size );
}
The initialization of

{
std::vector< std::vector< X > > a;
//put values into a...
StackAlloc< fixed_vector< fixed_vector< X> > > b( a ); //this should
be done with 1 special-stack allocation, no dynamic.
}

fixed_vector initialization should be implemented to check the size of
the argument container to calculate how much memory to ask from the
allocator. If going by what I said in the other reply, it should

probabely be inside fixed_size_traits said:
::calculate_size. In this case T = X when initializing each element

of type fixed_vector<X>, and T = fixed_vector<X> for the main object.
More specifically, it would be in a certain overload:

temlate said:
::calculate_size( std::vector< U > const& r )

{
return ... + ( r.size() * sizeof(T) ) + ...;
}

I'd love an example of this. Don't all stack-based memory suffer from
the issue of stack overflow?

No, I was wrong. I was just thinking about the alloca( size_t )
function that Windows and some linux have.
It allocates on the real stack. But I didn't think about it carefuly.
It's actually not very useful in this case, because deallocation is
automatic at the end of the calling function. Especially not helpful
if you're going for an unlimited stack size (to avoid overflow).
I was thinking more about how to even allow stack allocations with run-
time size determination at all, rather than further thinking about
allowing unlimited amounts of data.
I suppose you will have to implement the stack structure, or find some
third party data structure that can provide these requirements.

itaj

Bjarke Hammersholt Roune · Feb 15, 2011

This ignores alignment. ScopeAlloc would have to adjust X to preserve
alignment by rounding up to the proper alignment. To align to
alignment Y, that is to round up to the nearest multiple of Y, I would
calculate (notice no branches)

X = Y * ((X - 1) / Y + 1)

Rounding up with just 2 instructions:

X = (X + (Y - 1)) & (~(Y - 1))

This works because Y is a power of 2 and compile-time constant.

Cheers
Bjarke

Bjarke Hammersholt Roune · Feb 16, 2011

Exceptions make this more complicated that I had thought. If I'm
allocating an array I have to call constructors, and then I have to
handle the case of one of the constructors throwing an exception. I
think I know how that should work. However, what do I do when I'm
deallocating the array, and one of the destructors throws an
exception. Do I still call the remaining destructors? What if I get
two exceptions that way? I recall reading about what should happen in
those cases, but now I can't find it. Anyone knows? I know its bad
style for destructors to throw exceptions and mine don't but I can't
know that that will be true for everyone who might use my code at some
point. My arrays should work the same as usual arrays including with
respect to exceptions.

Btw. you'd think I could just call placement new[] and placement
delete[]. Unfortunately as far as I can tell from discussions on the
internet those functions are completely useless because
implementations are allowed to store information about an array in the
memory for that array and placement new[] and placement delete[] are
allowed to contain the code to actually do this. So to make that work
I'd have to know how much extra space the compiler I'm on wants me to
insert so I can allocate enough space and take the right offset after
having called placement new[] to find the actual beginning of the
array where the first element is stored. There is no way to obtain
that information, indeed I read that the right offset is allowed to
change from one call to placement new[] to another. That's the
stupidest thing I've heard in quite a while and I'd love for someone
to tell me that it's wrong.

gwowen · Feb 16, 2011

However, what do I do when I'm
deallocating the array, and one of the destructors throws an
exception.

In general, destructors should not throw exceptions. Nearly every
piece of good-practice advice suggests that destructors *must* not
throw exceptions. The most compelling reason -- although not the only
one -- is that exception propagation up through the call-stack
destroys any automatic objects as part of the unwinding, and if those
destructors throw, this will terminate() your program. (Also, you
cannot store such objects in standard containers).

You *can* avoid these problems through careful use of objects whose
destructors can throw (essentially managing their lifetimes yourself),
but extreme care is needed.

itaj sherman · Feb 16, 2011

Exceptions make this more complicated that I had thought. If I'm
allocating an array I have to call constructors, and then I have to
handle the case of one of the constructors throwing an exception. I
think I know how that should work. However, what do I do when I'm
deallocating the array, and one of the destructors throws an
exception. Do I still call the remaining destructors? What if I get
two exceptions that way? I recall reading about what should happen in
those cases, but now I can't find it. Anyone knows? I know its bad
style for destructors to throw exceptions and mine don't but I can't
know that that will be true for everyone who might use my code at some
point. My arrays should work the same as usual arrays including with
respect to exceptions.

I don't think you need to support elements with throwing destructors.
All standard containers for example don't, for the same reason.
You should require the elements to have a no-throw destructor. I mean,
in POV of the module that allocates, its contract only allows to be
used with no-throw destructors.

It is generally accepted that all classes should have no-throw
destructor (unless it is some very special case, but then has to give
up using such class with almost any core libraries such std
containers).

If one constructor throws you have to destruct all the other elements
that already constructed in reverse order, and let that exception
continue throw. Like primitive arrays, and call std containers do.
Because you know elements have no-throw destructors, another exception
should not be thrown in that process.

itaj

Bjarke Hammersholt Roune · Mar 6, 2011

As an update, it turns out that the GNU libc comes with obstacks which
does something very close to what I'm suggesting here

http://www.gnu.org/s/libc/manual/html_node/Obstacks.html

However, the implementation is not so good for my purpose. The code is
obfuscated by being part of a library (so weird variable names) and by
being written almost entirely as macroes. However, as far as I can
tell, an allocate/deallocate pair takes 9 ops (similar to add) and 3
branches in the best case plus a significant number of heap reads and
writes. This is when you specify the size in bytes. The scheme
described here takes 7 ops and 2 branches in the best case to do the
same thing and only accesses the heap to update the remaining capacity
member.

It gets a lot worse, however, because obstack allocates in fixed-size
chunks and it frees a chunk immediately when it is empty. As the
chunks are fixed-size the number of chunks that have to be allocated
is linear in the size of the total amount of space that is allocated.
Even worse, if the current chunk is full and you allocate 1 byte, a
new chunk will be allocated. When you deallocate that 1 byte, that new
chunk will be deallocated. So potentially every single allocation and
deallocation will cause a call to the backing memory allocator, at
which point performance will be worse than if the backing memory
allocator had been used directly.

From comments in the code obstacks seems to have been written with a
very specific use in mind: storage of symbols while parsing source
code. This focus makes obstacks unsuitable as a general purpose
replacement for stack allocation. So I'll still need to make my own.

Cheers
Bjarke

function implementation with stack vs heap allocation	4	Apr 30, 2009
block allocation	11	Sep 8, 2011
Stack is slow than heap?	23	Nov 7, 2011
I have an idea for a universal programming language	90	Jun 15, 2022
Solo game dev idea	0	Sep 26, 2021
data member allocation (local vs heap)	4	Jun 22, 2010
vector, but without the overhead of dynamic memory allocation	5	Mar 9, 2011
Memory Allocation Question	3	Nov 22, 2011

An idea for heap allocation at near stack allocation speed

Bjarke Hammersholt Roune

Bjarke Hammersholt Roune

Finbar Slingshot

itaj sherman

itaj sherman

itaj sherman

Bjarke Hammersholt Roune

Bjarke Hammersholt Roune

Bjarke Hammersholt Roune

itaj sherman

Bjarke Hammersholt Roune

Bjarke Hammersholt Roune

gwowen

itaj sherman

Bjarke Hammersholt Roune

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads