Pattern for allocating array objects with embedded header?

M

Marcel Müller

From time to time I could use objects that consist of a header and an
array of items of immutable size. Doing this with two allocations is
straight forward, but has the drawback of another indirection on each
array member access and, of course, the additional allocation.

So the basic idea is to allocate the storage at once. Nothing special to
a C programmer. But wrapping this in a C++ class seems a little bit
tricky to me.

I use a simple string as an example. The string length is the header in
this case, the characters are the items. But I have other similar use
cases with more complex headers and items, like the nodes o a B-tree.

#include<stdlib.h>
#include<string.h>
#include<stdio.h>

class String
{private:
size_t Len; // Logically const, but since we initialize it
// before constructor it has to be non-const.
private: // pure reference type => non-copyable
String(const String&);
void operator=(const String&);
private: // Avoid array creation and ordinary new operator!
static void* operator new(size_t);
static void* operator new[](size_t);
static void operator delete[](void*);
private: // allocators
static void* operator new(size_t s, size_t l)
{ String* that = (String*)new char[s+l+1];
that->Len = l; // Dirty early construction
return that;
}
public:
static void operator delete(void* p)
{ delete[] (char*)p; }
private:
String(const char* str)
{ memcpy(this+1, str, Len);
((char*)(this+1))[Len] = 0;
}
public:
static String* create(const char* str, size_t len)
{ return new (len) String(str); }

/// Return the current strings length.
size_t length() const
{ return Len; }
/// Return the current strings content.
const char* str() const
{ return (char*)(this+1); }
};

int main(int argc, char** argv)
{
String* mystr = String::create(argv[0], strlen(argv[0]));
printf("Len = %u, String = %s\n", mystr->length(), mystr->str());
delete mystr;
return 0;
}

The basic requirement is that operator new and constructor is now one
logical action. Are the common patterns to implement this?
From my point of view the access to members from operator new is really
dirty. And I am unsure whether the result of operator new is necessarily
equivalent to this.

Any better ideas?


Marcel
 
M

Marcel Müller

From time to time I could use objects that consist of a header and an
array of items of immutable size. Doing this with two allocations is
straight forward, but has the drawback of another indirection on each
array member access and, of course, the additional allocation.

Let's see: [...]
b) separate header and data (a la std::string), allocate header on stack
(presumably header is small) and the data portion dynamically on the
heap:

std::string x("...");

Accessing the data:

x.c_str(); [...]
So it seems to me that b) is generally not worse than a) and with small
string optimization b) can do much better than a) in some circumstances.
So I do not see any actual reason to prefer a) over b) here.

In fact you say: use value types, they perform better.
Maybe I should not have used a simple string as an example.
Think of delivery documents with a reasonable large header. You do not
want to copy this over and over.

Furthermore think of the case of optional references.
In my case I have in the order of 10,000 objects. Each of them has about
30 references to other objects. All of them are optional. They are used
sparse (~20%) and they often refer to the same object.
If each reference take let's say 10 machine size words instead of one
for the reference only, we are talking about ~50 MB additional memory.
The total memory of the application is restricted by the platform to
about 250 MB. So this is not a good idea.
For this reason I do not like the small string optimization very much.
At least not for immutable strings. It also prevents deduplication.

In another case I have a B-tree with nodes. Each node consist of a
header and two dozen entries. Each entry is either a reference to some
external object or a reference to a sub node. Putting the header of the
sub nodes into the parent (together with the reference to it's entries)
would increase each entry and with them the size of the B tree
structures by about a factor 5.

Compared to these two examples the overhead of an additional allocation
for the header is low. But even better is one allocation (especially
with respect to cache efficiency).


Marcel
 
Ö

Öö Tiib

From time to time I could use objects that consist of a header and an
array of items of immutable size. Doing this with two allocations is
straight forward, but has the drawback of another indirection on each
array member access and, of course, the additional allocation.

Let's see: [...]
b) separate header and data (a la std::string), allocate header on stack
(presumably header is small) and the data portion dynamically on the
heap:

std::string x("...");

Accessing the data:

x.c_str(); [...]
So it seems to me that b) is generally not worse than a) and with small
string optimization b) can do much better than a) in some circumstances..
So I do not see any actual reason to prefer a) over b) here.

In fact you say: use value types, they perform better.

std::string is not value type. It is a dynamic container of characters.
Maybe I should not have used a simple string as an example.

Certainly. Look at our faces. Do you see anyone who is not already bored
of all those QStrings CStrings wxStrings BSTRs and the like? Reinventing
strings is sure way how to attune enthusiasm down.

Notice the more likely usage example of your String class:

String* mystr = NULL;
try
{
mystr = String::create("bla bla", strlen("bla bla"));
// ...
// whatever C++ usage of mystr
}
catch(...)
{
delete mystr;
throw;
}
delete mystr;

That is to achieve *minimal* exception safety. std::string has
thankfully removed the need for that crap at least.
Think of delivery documents with a reasonable large header. You do not
want to copy this over and over.

There are basically 3 ways to pass data ahead: to copy, to pass a
pointer/reference or to move.
Furthermore think of the case of optional references.
In my case I have in the order of 10,000 objects. Each of them has about
30 references to other objects. All of them are optional. They are used
sparse (~20%) and they often refer to the same object.

What are the "optional references"? A reference in C++ can not
be optional. You mean some textual reference like in scientific
publications ... or some sort of hyperlink ... or?
If each reference take let's say 10 machine size words instead of one
for the reference only, we are talking about ~50 MB additional memory.

Uhh ... 50 MB / (10 * 30 * 10 000) ... ~ 16 bytes. You got a platform
where "machine size word" is 16 bytes? sizeof(std::string) with short
string optimization is anyway nowhere 160. It is typically 32 or less.
The total memory of the application is restricted by the platform to
about 250 MB. So this is not a good idea.

So ... 128 bit platform ... with 250 MB memory? Very interesting.
For this reason I do not like the small string optimization very much.
At least not for immutable strings. It also prevents deduplication.

Do not use strings then for your "references". Use pointers. Or
smart pointers. Or ... if the references are very sparse then decouple
them from your objects into separate container and form a graph.
So "documents" of yours are vertexes and the "references" are edges.
In another case I have a B-tree with nodes. Each node consist of a
header and two dozen entries. Each entry is either a reference to some
external object or a reference to a sub node. Putting the header of the
sub nodes into the parent (together with the reference to it's entries)
would increase each entry and with them the size of the B tree
structures by about a factor 5.

I calculate what I calculate but can no way get factor of 5, sorry.
Maybe you use the term "reference" in too several different meanings
here or I misunderstand something obvious.
Compared to these two examples the overhead of an additional allocation
for the header is low. But even better is one allocation (especially
with respect to cache efficiency).

10 000 allocations or 20 000? On typical platforms we are talking
about difference around 5 milliseconds. Not sure about your particular
cases but in general the allocations start to hurt when we have millions
of those.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top