Pattern for allocating array objects with embedded header?

Discussion in 'C++' started by Marcel Müller, Dec 26, 2012.

  1. From time to time I could use objects that consist of a header and an
    array of items of immutable size. Doing this with two allocations is
    straight forward, but has the drawback of another indirection on each
    array member access and, of course, the additional allocation.

    So the basic idea is to allocate the storage at once. Nothing special to
    a C programmer. But wrapping this in a C++ class seems a little bit
    tricky to me.

    I use a simple string as an example. The string length is the header in
    this case, the characters are the items. But I have other similar use
    cases with more complex headers and items, like the nodes o a B-tree.

    #include<stdlib.h>
    #include<string.h>
    #include<stdio.h>

    class String
    {private:
    size_t Len; // Logically const, but since we initialize it
    // before constructor it has to be non-const.
    private: // pure reference type => non-copyable
    String(const String&);
    void operator=(const String&);
    private: // Avoid array creation and ordinary new operator!
    static void* operator new(size_t);
    static void* operator new[](size_t);
    static void operator delete[](void*);
    private: // allocators
    static void* operator new(size_t s, size_t l)
    { String* that = (String*)new char[s+l+1];
    that->Len = l; // Dirty early construction
    return that;
    }
    public:
    static void operator delete(void* p)
    { delete[] (char*)p; }
    private:
    String(const char* str)
    { memcpy(this+1, str, Len);
    ((char*)(this+1))[Len] = 0;
    }
    public:
    static String* create(const char* str, size_t len)
    { return new (len) String(str); }

    /// Return the current strings length.
    size_t length() const
    { return Len; }
    /// Return the current strings content.
    const char* str() const
    { return (char*)(this+1); }
    };

    int main(int argc, char** argv)
    {
    String* mystr = String::create(argv[0], strlen(argv[0]));
    printf("Len = %u, String = %s\n", mystr->length(), mystr->str());
    delete mystr;
    return 0;
    }

    The basic requirement is that operator new and constructor is now one
    logical action. Are the common patterns to implement this?
    From my point of view the access to members from operator new is really
    dirty. And I am unsure whether the result of operator new is necessarily
    equivalent to this.

    Any better ideas?


    Marcel
    Marcel Müller, Dec 26, 2012
    #1
    1. Advertising

  2. On 26.12.2012 14:16, Paavo Helde wrote:
    > Marcel Müller <> wrote in news:50dacf41$0
    > $6637$-online.net:
    >
    >> From time to time I could use objects that consist of a header and an
    >> array of items of immutable size. Doing this with two allocations is
    >> straight forward, but has the drawback of another indirection on each
    >> array member access and, of course, the additional allocation.

    >
    > Let's see:

    [...]
    > b) separate header and data (a la std::string), allocate header on stack
    > (presumably header is small) and the data portion dynamically on the
    > heap:
    >
    > std::string x("...");
    >
    > Accessing the data:
    >
    > x.c_str();

    [...]
    > So it seems to me that b) is generally not worse than a) and with small
    > string optimization b) can do much better than a) in some circumstances.
    > So I do not see any actual reason to prefer a) over b) here.


    In fact you say: use value types, they perform better.
    Maybe I should not have used a simple string as an example.
    Think of delivery documents with a reasonable large header. You do not
    want to copy this over and over.

    Furthermore think of the case of optional references.
    In my case I have in the order of 10,000 objects. Each of them has about
    30 references to other objects. All of them are optional. They are used
    sparse (~20%) and they often refer to the same object.
    If each reference take let's say 10 machine size words instead of one
    for the reference only, we are talking about ~50 MB additional memory.
    The total memory of the application is restricted by the platform to
    about 250 MB. So this is not a good idea.
    For this reason I do not like the small string optimization very much.
    At least not for immutable strings. It also prevents deduplication.

    In another case I have a B-tree with nodes. Each node consist of a
    header and two dozen entries. Each entry is either a reference to some
    external object or a reference to a sub node. Putting the header of the
    sub nodes into the parent (together with the reference to it's entries)
    would increase each entry and with them the size of the B tree
    structures by about a factor 5.

    Compared to these two examples the overhead of an additional allocation
    for the header is low. But even better is one allocation (especially
    with respect to cache efficiency).


    Marcel
    Marcel Müller, Dec 27, 2012
    #2
    1. Advertising

  3. Marcel Müller

    Öö Tiib Guest

    On Thursday, 27 December 2012 14:27:41 UTC+2, Marcel Müller wrote:
    > On 26.12.2012 14:16, Paavo Helde wrote:
    > > Marcel Müller <> wrote in news:50dacf41$0
    > > $6637$-online.net:
    > >
    > >> From time to time I could use objects that consist of a header and an
    > >> array of items of immutable size. Doing this with two allocations is
    > >> straight forward, but has the drawback of another indirection on each
    > >> array member access and, of course, the additional allocation.

    > >
    > > Let's see:

    > [...]
    > > b) separate header and data (a la std::string), allocate header on stack
    > > (presumably header is small) and the data portion dynamically on the
    > > heap:
    > >
    > > std::string x("...");
    > >
    > > Accessing the data:
    > >
    > > x.c_str();

    > [...]
    > > So it seems to me that b) is generally not worse than a) and with small
    > > string optimization b) can do much better than a) in some circumstances..
    > > So I do not see any actual reason to prefer a) over b) here.

    >
    > In fact you say: use value types, they perform better.


    std::string is not value type. It is a dynamic container of characters.

    > Maybe I should not have used a simple string as an example.


    Certainly. Look at our faces. Do you see anyone who is not already bored
    of all those QStrings CStrings wxStrings BSTRs and the like? Reinventing
    strings is sure way how to attune enthusiasm down.

    Notice the more likely usage example of your String class:

    String* mystr = NULL;
    try
    {
    mystr = String::create("bla bla", strlen("bla bla"));
    // ...
    // whatever C++ usage of mystr
    }
    catch(...)
    {
    delete mystr;
    throw;
    }
    delete mystr;

    That is to achieve *minimal* exception safety. std::string has
    thankfully removed the need for that crap at least.

    > Think of delivery documents with a reasonable large header. You do not
    > want to copy this over and over.


    There are basically 3 ways to pass data ahead: to copy, to pass a
    pointer/reference or to move.

    > Furthermore think of the case of optional references.
    > In my case I have in the order of 10,000 objects. Each of them has about
    > 30 references to other objects. All of them are optional. They are used
    > sparse (~20%) and they often refer to the same object.


    What are the "optional references"? A reference in C++ can not
    be optional. You mean some textual reference like in scientific
    publications ... or some sort of hyperlink ... or?

    > If each reference take let's say 10 machine size words instead of one
    > for the reference only, we are talking about ~50 MB additional memory.


    Uhh ... 50 MB / (10 * 30 * 10 000) ... ~ 16 bytes. You got a platform
    where "machine size word" is 16 bytes? sizeof(std::string) with short
    string optimization is anyway nowhere 160. It is typically 32 or less.

    > The total memory of the application is restricted by the platform to
    > about 250 MB. So this is not a good idea.


    So ... 128 bit platform ... with 250 MB memory? Very interesting.

    > For this reason I do not like the small string optimization very much.
    > At least not for immutable strings. It also prevents deduplication.


    Do not use strings then for your "references". Use pointers. Or
    smart pointers. Or ... if the references are very sparse then decouple
    them from your objects into separate container and form a graph.
    So "documents" of yours are vertexes and the "references" are edges.

    > In another case I have a B-tree with nodes. Each node consist of a
    > header and two dozen entries. Each entry is either a reference to some
    > external object or a reference to a sub node. Putting the header of the
    > sub nodes into the parent (together with the reference to it's entries)
    > would increase each entry and with them the size of the B tree
    > structures by about a factor 5.


    I calculate what I calculate but can no way get factor of 5, sorry.
    Maybe you use the term "reference" in too several different meanings
    here or I misunderstand something obvious.

    > Compared to these two examples the overhead of an additional allocation
    > for the header is low. But even better is one allocation (especially
    > with respect to cache efficiency).


    10 000 allocations or 20 000? On typical platforms we are talking
    about difference around 5 milliseconds. Not sure about your particular
    cases but in general the allocations start to hurt when we have millions
    of those.
    Öö Tiib, Dec 27, 2012
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Lambacher
    Replies:
    0
    Views:
    386
    Chris Lambacher
    Jun 8, 2005
  2. Chris Lambacher
    Replies:
    0
    Views:
    670
    Chris Lambacher
    Jun 8, 2005
  3. Rakesh Kumar
    Replies:
    5
    Views:
    681
    James Kanze
    Dec 21, 2007
  4. yatko
    Replies:
    3
    Views:
    2,515
    yatko
    Dec 28, 2008
  5. mlt
    Replies:
    2
    Views:
    833
    Jean-Marc Bourguet
    Jan 31, 2009
Loading...

Share This Page