An idea for heap allocation at near stack allocation speed

Discussion in 'C++' started by Bjarke Hammersholt Roune, Feb 13, 2011.

  1. I want to allocate dynamically on the stack since it is faster, but
    that requires platform dependent code and the stack can overflow. So
    instead I want to write a memory allocator that works as stack
    allocation does including in speed yet is backed by memory from the
    heap. I want to be able to replace fast (but terrible!) code like

    void foo(const size_t size) {
    if (size > HugeConstant)
    throw no_can_do_exception;
    int hugeArray[HugeConstant];
    HugeStruct hugeStruct;
    // do computations on hugeArray and hugeStruct
    }

    or fast (but terrible!) interfaces like

    void foo(const size_t size, int* scratchBuffer) {
    // do computations using scratchBuffer as temporary memory
    }

    with something like:

    void foo(const size_t size) {
    ScopeArrayAlloc<int> hugeArray(size);
    ScopeAlloc<HugeStruct> hugeStruct;
    // do computations on hugeArray and hugeStruct
    }

    ScopeAlloc should have these properties:

    1) Be nearly as fast as stack allocation.
    2) Be type-safe and unable to cause memory leaks or double deletes
    3) Live on the heap and take up only around sizeof(void*) bytes on
    the stack
    4) Allocate memory dynamically and allocate no more memory then
    needed
    6) Require no platform dependent code
    7) Work with any class that could be allocated on the stack
    8) Be able to allocate any amount of memory up to what is available
    on the machine

    Using new/delete inside ScopeAlloc could support all of these except
    1. I'll now go into how I want to implement ScopeAlloc, but what I'd
    really like is for there to be a Free software library already
    available that does all of this. Do you know anything like that?





    ** Implementing ScopeAlloc

    Here's how I'll implement ScopeAlloc. I have some questions about how
    I might improve this plan.

    All ScopeAlloc's are supported by a static arena allocator - it is the
    same allocator for all types T. An arena allocator has a memory area
    it allocates out of that is structured like this:

    [ -- memory in use -- p -- memory not in use -- ]

    Here p is a char* that points into the memory area, and an allocation
    of X bytes simply returns p and increments p by X. It may be necessary
    to allocate another memory block, so that has to be checked. Here's a
    way to do that that doesn't work:

    if (p + X < endOfMemoryBlock) ...

    It doesn't work because p + X could be more than 1 element past the
    end of the memory block which makes it undefined behavior to even
    calculate p + X (?). Worse, even if it was defined behavior, there
    could be an overflow which would make < do the wrong thing. Here's
    something else that doesn't work:

    if (bytesAllocated + X < memoryBlockCapacity) ...

    Here bytesAllocated + X can overflow. What I'm thinking of now is to
    have ScopeAlloc internally use a function like this on a global
    allocator to get its memory:

    inline char* alloc(const size_t X) {
    if (X < remainingCapacity)
    allocateMoreMemory(X)
    remainingCapacity -= X;
    char* const buffer = p;
    p += X;
    return p;
    }

    Here allocateMoreMemory would allocate a new block of
    2*max(lastBlockSize, X) bytes which would have the effect of making
    the condition X < remainingCapacity almost always false. So the cost
    here is one very predictable branch and two additions (counting a
    subtraction as an addition).

    This ignores alignment. ScopeAlloc would have to adjust X to preserve
    alignment by rounding up to the proper alignment. To align to
    alignment Y, that is to round up to the nearest multiple of Y, I would
    calculate (notice no branches)

    X = Y * ((X - 1) / Y + 1)

    For single objects this would all be compile-time constant, so the
    compiler would probably optimize it all away, so no problem. If X is a
    dynamic size, then at least Y would be a compile-time constant that is
    a power of two, so I would expect the compiler to replace the
    multiplication and division by shifts. So the cost would be 4
    additions (counting shifts as additions). Maybe a compile-time
    evaluatable branch could be used to skip the alignment adjustment at
    compile-time if the size of the object that we are making an array of
    is already a multiple of Y.

    ** Question: Is there a better way to do the alignment calculation?



    Now for deallocation of a buffer of size X. We can't just do "p -= X;"
    since maybe the object being deallocated was allocated on a previous
    block of memory to the one that p is pointing into. That has to be
    detected, so there must be a branch (grumble grumble). One way to deal
    with that would be to check this:

    if (p == startOfMemoryblock) {
    // Adjust member variables to now allocate
    // out of the previous block
    }

    The condition works since allocations and deallocations are required
    to be in reverse order so the only way that we could have gotten to
    deallocating into the previous block is if p is at the start of the
    current block. The big problem with this approach is that if the
    memory use of the program hovers around the end of a block then we are
    going to be adjusting lots of fields for lots of allocations and
    deallocations to move allocation from one block to the next and then
    back. Even worse, the branches that detect when to move to another
    block will become poorly predictable which greatly increases the cost
    of those branches. Also, memory use will become split across two or
    more blocks decreasing locality of reference.

    Another approach is to never move allocation into a previous block.
    Instead we can let p stay where it is in that case. So the
    deallocation function becomes just:

    inline free(const size_t X) {
    if (p != startOfMemoryBlock) {
    p -= X;
    remainingCapacity += X;
    }
    }

    If we don't want to waste the memory of the previous blocks, we could
    do something like this:

    inline free(const size_t X) {
    if (p != startOfMemoryBlock) {
    p -= X;
    remainingCapacity += X;
    } else {
    // Look into previous block and delete that
    // block if it has become completely freed.
    }
    }

    A nice thing here is that the else branch doesn't have to be that
    efficient since it can't be taken that many times.

    The last point is destruction. ScopeAlloc<T> knows the type of T, so
    it can just call the destructor of T directly in the destructor of
    ScopeAlloc. A ScopeArrayAlloc<T> could store the size of the array and
    then run through a loop to destruct each element. However lots of
    classes have do-nothing destructors and when using stack allocation or
    even delete[] I'm sure that the compiler is aware of that so it emits
    no code to call the destructors. I would like my code to have zero
    overhead in that case as well.

    Question: If I have a loop that calls do-nothing destructors, can I
    generally trust compilers to optimize the loop away?



    So the total cost for an allocate/deallocate pair is 2 well-predicted
    branches and 4 additions. Add 4 additions for arrays of classes whose
    size is not a multiple of the alignment. I'm guessing that dynamic
    stack allocation has an overhead of 2 additions per allocate/
    deallocate pair, has better cache behavior and dirties fewer
    registers. Still, this is much better than new/delete in terms of
    performance and it should have better cache behavior than object
    pools. Also, unlike arenas the interface for ScopeAlloc has no
    pitfalls that is not shared by normal stack allocation.

    Any suggestions for improvements or links to similar ideas?

    Cheers
    Bjarke Hammersholt Roune
    www.broune.com
     
    Bjarke Hammersholt Roune, Feb 13, 2011
    #1
    1. Advertising

  2. On Feb 13, 4:41 pm, Bjarke Hammersholt Roune <>
    wrote:
    > [...]
    > This ignores alignment. ScopeAlloc would have to adjust X to preserve
    > alignment by rounding up to the proper alignment. To align to
    > alignment Y, that is to round up to the nearest multiple of Y, I would
    > calculate (notice no branches)
    >
    >   X = Y * ((X - 1) / Y + 1)
    >

    I just realized that the situation is much worse than this. To show
    the problem, let T be some type, and suppose we want to allocate an
    array of T's. Define

    N = number of elements in the array
    S = sizeof(T)
    A = required alignment
    C = remaining capacity

    Define align(x) as x rounded up to the nearest multiple of A. Then all
    I considered above is that mathematically

    align(x) = A * ((x - 1) / A + 1)

    Unfortunately the actual condition that we need to check is

    align(N*S) <= C

    In the discussion above I did not account for the possibility of
    overflow of N*S or of the align operation, but they can very
    definitely overflow, which needs to be detected. In general overflow
    detection involving multiplications is a lot of work, but fortunately
    this is unsigned multiplication and we know that align(x) and x differ
    by at most A - 1. Because of this it is enough to check that

    N <= (((size_t)-1) - (A - 1)) / S

    to ensure that align(N*S) does not overflow. Here the right hand side
    is all compile-time constant so this is just a comparison against a
    constant. So the cost for detecting overflow is a single well
    predicted branch. Is there a better way?

    Cheers
    Bjarke Hammersholt Roune
     
    Bjarke Hammersholt Roune, Feb 14, 2011
    #2
    1. Advertising

  3. "Bjarke Hammersholt Roune" <> wrote in message
    news:...
    >I want to allocate dynamically on the stack since it is faster,


    Have you actually profilled to see if it is worthwhile?

    IME when the size of the struct/array gets too big for the stack then any
    processing you're doing with it will completely dwarf any overhead you get
    from using heap allocation.
     
    Finbar Slingshot, Feb 14, 2011
    #3
  4. Bjarke Hammersholt Roune

    itaj sherman Guest

    On Feb 13, 11:41 pm, Bjarke Hammersholt Roune <>
    wrote:
    > I want to allocate dynamically on the stack since it is faster, but
    > that requires platform dependent code and the stack can overflow. So
    > instead I want to write a memory allocator that works as stack
    > allocation does including in speed yet is backed by memory from the
    > heap. I want to be able to replace fast (but terrible!) code like
    >
    > Cheers
    > Bjarke Hammersholt Rounewww.broune.com



    I also have thought about this issue before.
    I think what stands behind that, is that different structure types
    have different memory allocation requirements:
    1) all instances always have the same size.
    2) each instance has a constant size for its lifetime.
    3) an instance may change size through its lifetime.

    These requirements are orders from most strict to least, and the
    restrictions in each level could be applied to improve performance.
    What makes this separation nice, is that each of these groups is
    recursively closed with respect to aggregation:
    1) a structure that is built of a set of structures (fixed at compile
    time) from this group, also belongs to this group
    2) a structure that is built of a set of structures (fixed at
    instanciation) from this group, also belongs to this group
    3) a structure that is build of a set of structures (changeable) from
    this group, also belongs to this group

    In the c++ language structures with allocation scheme 1, can be
    defined with class. The way classes are defined allows implementations
    to allocate all instances of a class to always have the same size.
    Implementations do take advantage of this in the way they implement
    the memory allocation for classes - They know in compile time what
    would be the total size of an instance. So if it is dynamically
    storage, they can do that with 1 chunck, and if stack storage even
    better, they can usually sum all local objects in a function and
    allocate them all with one machine operation.
    A good example for a structure from group 1 is std::array<T,N> where T
    is also from group 1.

    Allocation scheme 3 is done in c++ using pointers and dynamic storage.
    An instance of structure of this type, can never be allocated in a
    signle chunk, because it may change size through its lifetime.
    An example would be std::vector<T> (is doesn't matter which group T
    is).
    * c++ types themselves are all from group 1. A structure from group 3
    (like std::vector<T>) is composed of instances of many structures from
    group 1 that hold pointers to connect them.

    What c++ doesn't have a general clear way to take advantage of
    structures with allocation scheme 2. When you have such a structure,
    then you can calulate the size of an intance when its created, and
    know that size will stay for its lifetime. But there is no pre-
    designed way in the language to take advantage of that restriction and
    make performance out of it.
    The only structures in this group that c++ is aware of are dynamic
    arrays of type T, where T is of group 1. But that doesn't take
    advantage of the recursive aggregation property of group 2.

    For example: a structure that is an aggregation of a few dynamic
    arrays is of group 2, but there is no way to express such a structure
    in c++. We are forced to define such a structure as an aggregation of
    pointers to those arrays, and it belongs to group 3.

    Another example: I could imagine a template class
    std::fixed_vector<T>, which would be much like std::vector, except
    that the number of elements would be given to the constructor and
    stays constant for that instance's lifetime. If T is a type of group
    2, such std::fixed_vector<T> would also be of group 2 (std::vector<T>
    wouldn't). And notice now that there isn't any strcuture like
    hypotetic std::fixed_vector<T> in c++. There's no pre-designed way to
    define such a template class in c++.

    Using templates and placement new, I think its possible to build a
    library that will be able to take advantage of the restrictions of
    group 2, but probabely not to the full extent that could be taken by
    the language. But is has to be very sensible in deciding how such
    types should be defined, and particularly how to mimic the implicit
    instanciations of c++ on them (implicit copy contruction, move
    construction, conversions) in definitions of variables / data
    members / return values / parameters.

    In light of what I said above, I think your API is lacking the
    recursive feature. You only allow to allocate primitive array of type
    T, where T is of group 1. The practical problem is that no one uses
    arrays directly anymore. We would certainly want to be able to define
    and use such fixed_vector<T> with full container properties (which
    simple arrays don't have), and use them to with such allocations. Very
    soon we would also want to aggregate a few such structures in bigger
    structures and give them type names and used them automatically.

    Appart from that, I have a few small suggestions:
    * Many c++ implementations have special support for such stack
    operations. In these cases you want to use the implementation's
    mechanism to allocate that memory, and don't need your own.
    * Keep in mind that you'll need such a stack for every thread in the
    program.
    * The allocators cannot keep the no-throw exception guaranty, but
    destructors of local allocators must not throw exceptions, so you
    don't affect the weaker exception safety guaraties.

    itaj
     
    itaj sherman, Feb 14, 2011
    #4
  5. Bjarke Hammersholt Roune

    itaj sherman Guest

    On Feb 14, 6:36 pm, "Finbar Slingshot" <> wrote:
    > "Bjarke Hammersholt Roune" <> wrote in messagenews:...
    >
    > >I want to allocate dynamically on the stack since it is faster,

    >
    > Have you actually profilled to see if it is worthwhile?
    >
    > IME when the size of the struct/array gets too big for the stack then any
    > processing you're doing with it will completely dwarf any overhead you get
    > from using heap allocation.


    The point is that the size is not known at compile time.
    For example: a function may be called many times, in most the sizes
    are small, but sometimes are large. You can't use regular stack
    allocation. If you use regular dynamic allocation, then in most
    invocations the processing (the size is small) would not dwarf the
    allocation.

    itaj
     
    itaj sherman, Feb 14, 2011
    #5
  6. Bjarke Hammersholt Roune

    itaj sherman Guest

    On Feb 14, 7:20 pm, itaj sherman <> wrote:
    > On Feb 13, 11:41 pm, Bjarke Hammersholt Roune <>
    > wrote:
    >

    [snip]

    > Using templates and placement new, I think its possible to build a
    > library that will be able to take advantage of the restrictions of
    > group 2, but probabely not to the full extent that could be taken by
    > the language. But is has to be very sensible in deciding how such
    > types should be defined, and particularly how to mimic the implicit
    > instanciations of c++ on them (implicit copy contruction, move
    > construction, conversions) in definitions of variables / data
    > members / return values / parameters.
    >


    Some details of how I think such a library can be structured:

    Defining some class fixed_size_allocator. It should have a default
    constructor, but doesn't have to be copyable. An intance of this class
    can manage the memory allocation for 1 structure of group 2.

    Define a concept for classes that belong to group 2. Lets call it
    fixed_size concept.
    All constructors should accept as the first argument an object which
    is a fixed_size_allocator. This object will be used to allocate memory
    for the instance. A constructor will pass the allocator to the
    contructor of members and bases which are also of fixed_size concept.

    Each class T which is fixed_size, will have a specialization of
    fixed_size_traits<T>.
    For each constructor of T

    T::T( Par1 par1, Par2 par2 ... );

    There will also be a static function member:

    size_t fixed_size_traits<T>::calculate_size( Par1 par1, Par2
    par2 ... );

    which should return the size of the instance that is to be built with
    these given arguments.
    This function can invoke fixed_size_traits<X>::calculate_size for all
    X memebers and bases classes of T.

    also need template classes:
    template< typename T > stack_fixed_size_object;
    template< typename T > dynamic_fixed_size_object;

    for use when declaring a local variable of type T, or dynamically
    allocating one.
    The constructors of these can call
    fixed_size_traits<T>::calculate_size, and allocate the needed memory
    in one chunck (on a stack or dynamic) - this is where the performance
    advantage comes. It is not only that local variables of group 2 are
    allocated on a stack memory manager (rather than less efficient heap),
    is also that in both cases the memory is allocated in one chunck for
    all the parts of the instance.

    That means that unlike
    std::vector< std::vector< T > >{ { 1, 2, 3 }, { 4, 5, 6 }, { 7, 8,
    9 } }
    which needs 3 separate heap allocations at least.

    dynamic_fixed_size_object< fixed_vector< fixed_vector< T > > >{ { 1,
    2, 3 }, { 4, 5, 6 }, { 7, 8, 9 } }
    is 1 heap allocation.

    stack_fixed_size_object< fixed_vector< fixed_vector< T > > >{ { 1, 2,
    3 }, { 4, 5, 6 }, { 7, 8, 9 } }
    is 1 stack allocation (your special stack, not the regular, but still
    very fast ).

    Now need to think real hard what to do for implicit copy and move
    sematics. I don't even know where to begin.

    itaj
     
    itaj sherman, Feb 14, 2011
    #6
  7. On Feb 14, 11:36 am, "Finbar Slingshot" <> wrote:
    > "Bjarke Hammersholt Roune" <> wrote in messagenews:...
    >
    > >I want to allocate dynamically on the stack since it is faster,

    >
    > Have you actually profilled to see if it is worthwhile?
    >
    > IME when the size of the struct/array gets too big for the stack then any
    > processing you're doing with it will completely dwarf any overhead you get
    > from using heap allocation.
    >

    Yep, all my sub-algorithm-choice optimization is 100% profile driven.
    The thing is that the size of what I'm allocating on the stack is
    dynamic, so for most things it wouldn't be a problem but nothing
    prevents a user from giving my program input that would make it a
    problem. Also my problems tend to be NP-hard so I end up using little
    memory but the processing time is high and I get lots of allocations/
    deallocations for intermediate computations. The amount of memory
    needed for these computations is unpredictable even if it is usually
    small. I use various techniques to avoid frequent allocations now, but
    they all have drawbacks that I'd like to avoid and it seems to me that
    this is a way to cleanly and completely fix this mess. I believe that
    this solution would make my coding easier, improve my design and
    interfaces, open up more possibilities for algorithms and make my code
    a little faster at the same time. So to me it's well worth it to make
    this happen.

    These situations also turn up in the standard library. For example
    std::stable_sort is usually implemented to be in-place using heapsort
    instead of merge sort to avoid allocating temporary buffers. Merge
    sort would be considerably faster. If the standard library had a
    facility like what I'm proposing then mergesort would become a more
    attractive choice than it is now.

    Cheers
    Bjarke
     
    Bjarke Hammersholt Roune, Feb 14, 2011
    #7
  8. On Feb 14, 12:20 pm, itaj sherman <> wrote:
    >
    > I think what stands behind that, is that different structure types
    > have different memory allocation requirements:
    > 1) all instances always have the same size.
    > 2) each instance has a constant size for its lifetime.
    > 3) an instance may change size through its lifetime.
    >

    That's a nice way of thinking about the issue. You are precisely right
    that my design is based on assuming that objects have dynamic size but
    that their size will never increase after allocation. Though I have
    toyed with the idea of allowing the very top-most element on the stack
    to change its size. I'm not sure if its a good idea or not.

    > In light of what I said above, I think your API is lacking the
    > recursive feature. You only allow to allocate primitive array of type
    > T, where T is of group 1. The practical problem is that no one uses
    > arrays directly anymore. We would certainly want to be able to define
    > and use such fixed_vector<T> with full container properties (which
    > simple arrays don't have), and use them to with such allocations. Very
    > soon we would also want to aggregate a few such structures in bigger
    > structures and give them type names and used them automatically.
    >

    You are completely right. For now I just want to replace stack
    allocation, and my design can do everything you can do with stack
    allocation. So far so good. Yet as you point out it would be even
    better to be able to expand stack allocation to classes that allocate
    their own memory internally. This requires some way to tell an object
    how it is being allocated. The preliminary idea I have on that is
    template specialization. There would be some function template<T>
    construct(...), and if some T wants to do something special when
    allocated via StackAlloc<T> then it would specialize construct to do
    the right thing by allocating all its dynamic size (type 2) members
    using the stack allocator. E.g. a mechanism like that is necessary for
    convenient use of classes using the trick of placing a one-element
    array at the end that can actually be more than 1 element by putting
    the class T in a buffer that is larger than sizeof(T). It would be
    awesome to have "StackAlloc<T> myObj(size);" just work for such T's.

    > Appart from that, I have a few small suggestions:
    > * Many c++ implementations have special support for such stack
    > operations. In these cases you want to use the implementation's
    > mechanism to allocate that memory, and don't need your own.
    >

    I'd love an example of this. Don't all stack-based memory suffer from
    the issue of stack overflow?

    Thank you for your thoughtful response.

    Cheers
    Bjarke
     
    Bjarke Hammersholt Roune, Feb 14, 2011
    #8
  9. On Feb 14, 1:02 pm, itaj sherman <> wrote:
    > [...]
    > Now need to think real hard what to do for implicit copy and move
    > sematics. I don't even know where to begin.
    >

    There is always an outer owner of the whole memory block that would be
    acting like a smart pointer. It could allow move semantics by handing
    over its exclusive pointer. For copy the outer owner could allocate a
    memory area of the same size as that being copied and then call the
    owned element's copy constructor with placement new. Then it's up to
    the owned element to either have a reasonable implementation of
    copying or to make its copy constructor inaccessible. What happens on
    copy could also be a template function that individual owned elements
    could specialize if they don't want the copy to necessarily have the
    same size as that which is being copied.

    Calculating the required size is complicated and error prone. There
    would really have to be a convenient way to write the code to
    calculate the size of a class. Even better if the same piece of code
    could be instantiated to calculate the size and then instantiated
    another way to construct the object or part of it. I'm not sure if
    that is possible.

    Some classes might want to support both fixed size and variable size
    allocation. There should be some standard way to support that, perhaps
    a smart pointer that only deletes the memory when it was allocated by
    new. The copy operation would also have to be aware of this.

    Cheers
    Bjarke
     
    Bjarke Hammersholt Roune, Feb 14, 2011
    #9
  10. Bjarke Hammersholt Roune

    itaj sherman Guest

    On Feb 14, 8:57 pm, Bjarke Hammersholt Roune <>
    wrote:
    > On Feb 14, 12:20 pm, itaj sherman <> wrote:
    >


    It was something I came across at work once, that made me think about
    what I described. I can't remember what it was. But it wasn't crutial
    from performance POV back then, and I never got into more details
    about this idea.
    Actually I never talked about it with anyone, so that's quite nice I
    saw your question here and got reminded about that. It's good to know
    that it makes sense to someone else but me.
    Maybe I'll think more about it. I'll tell you if I get something
    interesting.

    >
    > You are completely right. For now I just want to replace stack
    > allocation, and my design can do everything you can do with stack
    > allocation. So far so good. Yet as you point out it would be even
    > better to be able to expand stack allocation to classes that allocate
    > their own memory internally. This requires some way to tell an object
    > how it is being allocated. The preliminary idea I have on that is
    > template specialization. There would be some function template<T>
    > construct(...), and if some T wants to do something special when
    > allocated via StackAlloc<T> then it would specialize construct to do
    > the right thing by allocating all its dynamic size (type 2) members
    > using the stack allocator. E.g. a mechanism like that is necessary for
    > convenient use of classes using the trick of placing a one-element
    > array at the end that can actually be more than 1 element by putting
    > the class T in a buffer that is larger than sizeof(T). It would be
    > awesome to have "StackAlloc<T> myObj(size);" just work for such T's.


    I think it can be any argument there, not just size. The type of the
    instance that is being initialized should know how to use the
    parameter to calculate the needed size.
    These are all just general ideas, I never really thought about these
    details carefully.

    For example:
    {
    int size = ...;
    StackAlloc< fixed_vector< X > > b( size );
    }
    The initialization of

    {
    std::vector< std::vector< X > > a;
    //put values into a...
    StackAlloc< fixed_vector< fixed_vector< X> > > b( a ); //this should
    be done with 1 special-stack allocation, no dynamic.
    }

    fixed_vector initialization should be implemented to check the size of
    the argument container to calculate how much memory to ask from the
    allocator. If going by what I said in the other reply, it should
    probabely be inside fixed_size_traits< fixed_vector< T >
    >::calculate_size. In this case T = X when initializing each element

    of type fixed_vector<X>, and T = fixed_vector<X> for the main object.
    More specifically, it would be in a certain overload:

    temlate< typename T, typename U >
    size_t fixed_size_traits< fixed_vector< T >
    >::calculate_size( std::vector< U > const& r )

    {
    return ... + ( r.size() * sizeof(T) ) + ...;
    }

    >
    > > Appart from that, I have a few small suggestions:
    > > * Many c++ implementations have special support for such stack
    > > operations. In these cases you want to use the implementation's
    > > mechanism to allocate that memory, and don't need your own.

    >
    > I'd love an example of this. Don't all stack-based memory suffer from
    > the issue of stack overflow?
    >


    No, I was wrong. I was just thinking about the alloca( size_t )
    function that Windows and some linux have.
    It allocates on the real stack. But I didn't think about it carefuly.
    It's actually not very useful in this case, because deallocation is
    automatic at the end of the calling function. Especially not helpful
    if you're going for an unlimited stack size (to avoid overflow).
    I was thinking more about how to even allow stack allocations with run-
    time size determination at all, rather than further thinking about
    allowing unlimited amounts of data.
    I suppose you will have to implement the stack structure, or find some
    third party data structure that can provide these requirements.

    itaj
     
    itaj sherman, Feb 14, 2011
    #10
  11. On Feb 13, 4:41 pm, Bjarke Hammersholt Roune <>
    wrote:
    > This ignores alignment. ScopeAlloc would have to adjust X to preserve
    > alignment by rounding up to the proper alignment. To align to
    > alignment Y, that is to round up to the nearest multiple of Y, I would
    > calculate (notice no branches)
    >
    >   X = Y * ((X - 1) / Y + 1)
    >

    Rounding up with just 2 instructions:

    X = (X + (Y - 1)) & (~(Y - 1))

    This works because Y is a power of 2 and compile-time constant.

    Cheers
    Bjarke
     
    Bjarke Hammersholt Roune, Feb 15, 2011
    #11
  12. Exceptions make this more complicated that I had thought. If I'm
    allocating an array I have to call constructors, and then I have to
    handle the case of one of the constructors throwing an exception. I
    think I know how that should work. However, what do I do when I'm
    deallocating the array, and one of the destructors throws an
    exception. Do I still call the remaining destructors? What if I get
    two exceptions that way? I recall reading about what should happen in
    those cases, but now I can't find it. Anyone knows? I know its bad
    style for destructors to throw exceptions and mine don't but I can't
    know that that will be true for everyone who might use my code at some
    point. My arrays should work the same as usual arrays including with
    respect to exceptions.

    Btw. you'd think I could just call placement new[] and placement
    delete[]. Unfortunately as far as I can tell from discussions on the
    internet those functions are completely useless because
    implementations are allowed to store information about an array in the
    memory for that array and placement new[] and placement delete[] are
    allowed to contain the code to actually do this. So to make that work
    I'd have to know how much extra space the compiler I'm on wants me to
    insert so I can allocate enough space and take the right offset after
    having called placement new[] to find the actual beginning of the
    array where the first element is stored. There is no way to obtain
    that information, indeed I read that the right offset is allowed to
    change from one call to placement new[] to another. That's the
    stupidest thing I've heard in quite a while and I'd love for someone
    to tell me that it's wrong.
     
    Bjarke Hammersholt Roune, Feb 16, 2011
    #12
  13. Bjarke Hammersholt Roune

    gwowen Guest

    On Feb 16, 5:55 am, Bjarke Hammersholt Roune <>
    wrote:
    > However, what do I do when I'm
    > deallocating the array, and one of the destructors throws an
    > exception.


    In general, destructors should not throw exceptions. Nearly every
    piece of good-practice advice suggests that destructors *must* not
    throw exceptions. The most compelling reason -- although not the only
    one -- is that exception propagation up through the call-stack
    destroys any automatic objects as part of the unwinding, and if those
    destructors throw, this will terminate() your program. (Also, you
    cannot store such objects in standard containers).

    You *can* avoid these problems through careful use of objects whose
    destructors can throw (essentially managing their lifetimes yourself),
    but extreme care is needed.
     
    gwowen, Feb 16, 2011
    #13
  14. Bjarke Hammersholt Roune

    itaj sherman Guest

    On Feb 16, 7:55 am, Bjarke Hammersholt Roune <>
    wrote:
    > Exceptions make this more complicated that I had thought. If I'm
    > allocating an array I have to call constructors, and then I have to
    > handle the case of one of the constructors throwing an exception. I
    > think I know how that should work. However, what do I do when I'm
    > deallocating the array, and one of the destructors throws an
    > exception. Do I still call the remaining destructors? What if I get
    > two exceptions that way? I recall reading about what should happen in
    > those cases, but now I can't find it. Anyone knows? I know its bad
    > style for destructors to throw exceptions and mine don't but I can't
    > know that that will be true for everyone who might use my code at some
    > point. My arrays should work the same as usual arrays including with
    > respect to exceptions.
    >


    I don't think you need to support elements with throwing destructors.
    All standard containers for example don't, for the same reason.
    You should require the elements to have a no-throw destructor. I mean,
    in POV of the module that allocates, its contract only allows to be
    used with no-throw destructors.

    It is generally accepted that all classes should have no-throw
    destructor (unless it is some very special case, but then has to give
    up using such class with almost any core libraries such std
    containers).

    If one constructor throws you have to destruct all the other elements
    that already constructed in reverse order, and let that exception
    continue throw. Like primitive arrays, and call std containers do.
    Because you know elements have no-throw destructors, another exception
    should not be thrown in that process.

    itaj
     
    itaj sherman, Feb 16, 2011
    #14
  15. As an update, it turns out that the GNU libc comes with obstacks which
    does something very close to what I'm suggesting here

    http://www.gnu.org/s/libc/manual/html_node/Obstacks.html

    However, the implementation is not so good for my purpose. The code is
    obfuscated by being part of a library (so weird variable names) and by
    being written almost entirely as macroes. However, as far as I can
    tell, an allocate/deallocate pair takes 9 ops (similar to add) and 3
    branches in the best case plus a significant number of heap reads and
    writes. This is when you specify the size in bytes. The scheme
    described here takes 7 ops and 2 branches in the best case to do the
    same thing and only accesses the heap to update the remaining capacity
    member.

    It gets a lot worse, however, because obstack allocates in fixed-size
    chunks and it frees a chunk immediately when it is empty. As the
    chunks are fixed-size the number of chunks that have to be allocated
    is linear in the size of the total amount of space that is allocated.
    Even worse, if the current chunk is full and you allocate 1 byte, a
    new chunk will be allocated. When you deallocate that 1 byte, that new
    chunk will be deallocated. So potentially every single allocation and
    deallocation will cause a call to the backing memory allocator, at
    which point performance will be worse than if the backing memory
    allocator had been used directly.

    From comments in the code obstacks seems to have been written with a
    very specific use in mind: storage of symbols while parsing source
    code. This focus makes obstacks unsuitable as a general purpose
    replacement for stack allocation. So I'll still need to make my own.

    Cheers
    Bjarke
     
    Bjarke Hammersholt Roune, Mar 6, 2011
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. chris
    Replies:
    6
    Views:
    1,018
    chris
    Oct 28, 2005
  2. Michal Slocinski

    Heap dump file size vs heap size

    Michal Slocinski, Mar 25, 2008, in forum: Java
    Replies:
    1
    Views:
    766
    GArlington
    Mar 25, 2008
  3. viki
    Replies:
    6
    Views:
    611
    Erik Wikström
    Jun 28, 2008
  4. Hicham Mouline
    Replies:
    4
    Views:
    1,121
    Hicham Mouline
    May 8, 2009
  5. Raymond Schanks
    Replies:
    0
    Views:
    574
    Raymond Schanks
    Apr 11, 2010
Loading...

Share This Page