Improving memory consumption in the container library

J

jacob navia

The first version of the container library was only that, a first version,
where the main objective was to show how it could be implemented.

One thing that immediately ringed bells in my mind was the fact that
the list module (even if in my machine was as fast as the C++ module)
in other (probably slower) machines like solaris. Besides, C++ used only
about 700K, but my container used 1200K. Definitely too much.

First thing I discovered is that under C99

struct list_element {
struct list_element *Next;
char Data[];
};

sizeof(struct list_element) --> sizeof(void *)

Since I wanted to run under C89, I changed this to

struct list_element {
struct list_element *Next;
char Data[1];
};

without realizing that now the sizeof(struct list_element)
was 2*sizeof(void *). This produced an overhead of already 33%
more memory consumption when the size of the data was just an integer
(4 in a 32 bit system).

Second thing I realized is that I was allocating each piece of
the list, each list_element, using malloc to minimize memory
usage. Problem is, the overhead of malloc was adding enormously
to the total memory consumption. Obviously I forgot the first
lesson of malloc usage:

DO NOT USE MALLOC VERY OFTEN!

(And this includes also GC_malloc. When I turn on the GC,
performance is significantly slower than with malloc, probably
because the GC malloc is more expensive than the standard
malloc)

I decided then to build a simple heap management for container allocation.
I allocated 1000 pointers to blocks of data that could contain up to
1000 list elements.

This DECREASED memory consumption from 785 000 using malloc, to
408 000 bytes. Almost a 50% savings!

The same program in C++ (using optimized or not optimized settings)
takes 1 177 172 bytes (using Visual C++ 2008). I suppose that other
C++ implementations take less but not significantly less.

Obviously this simple schema will get more complicated when you
consider what happens when you free list elements. C++ takes a
significant time to close down, since probably the destructors
are getting called for each list element. In C, I have to figure
out a good way to free things, maintain a free list, etc. Even
if the overhead will never attain the C++ heights, it will surely
slow down the software a bit.

This primitive heap manager can be expanded also for a mark/release
style of programming, where you allocate things from a list and
when you destroy the list all the memory used by the list is
freed in a few calls to free.
 
K

Keith Thompson

jacob navia said:
The first version of the container library was only that, a first version,
where the main objective was to show how it could be implemented.

One thing that immediately ringed bells in my mind was the fact that
the list module (even if in my machine was as fast as the C++ module)
in other (probably slower) machines like solaris. Besides, C++ used only
about 700K, but my container used 1200K. Definitely too much.

First thing I discovered is that under C99

struct list_element {
struct list_element *Next;
char Data[];
};

sizeof(struct list_element) --> sizeof(void *)

A quibble: I think you mean sizeof(struct list_element*),
not sizeof(void*).
Since I wanted to run under C89, I changed this to

struct list_element {
struct list_element *Next;
char Data[1];
};

without realizing that now the sizeof(struct list_element)
was 2*sizeof(void *). This produced an overhead of already 33%
more memory consumption when the size of the data was just an integer
(4 in a 32 bit system).

Presumably you're using the struct hack, as described in question 2.6
of the C FAQ.

The size of struct list_element will be at least
sizeof(struct list_element*) + 1
not necessarily 2*sizeof(void*).

You appear to be assuming an implementation where struct pointers are
the same size as void*, and pointers are aligned to their own size.

But for typical use of the struct hack, I wouldn't expect that you'd
ever actually want to allocate sizeof(struct list_element) bytes.
The correct number of bytes to allocate should normally be

offsetof(struct list_element, Data) + count;

where count is the number of additional bytes for the data.

When using the struct hack, it's usual to store the number of
bytes in the struct itself. Presumably you keep track of the size
of the data somewhere else; either that, or you're presenting an
incomplete example.

[snip]
 
B

bartc

DO NOT USE MALLOC VERY OFTEN!

I don't think this is going to be news to anybody here. Those of use who
aren't implementers have no idea how efficient or not a particular malloc
is. Best then to avoid it as much as possible if it might be an issue.

(I use an allocator that uses it's own heap for blocks up to 1KB, above
which it will call malloc(). Each allocation call is about a dozen machine
instructions plus the call overhead.

However, over-allocation is used, which can be tested in application code in
some half dozen instructions (inline), so that when a block needs to grow,
most of the time this is the only overhead, even when the block is above
1KB. I expect realloc() overheads to be more than that.)
(And this includes also GC_malloc. When I turn on the GC,
performance is significantly slower than with malloc, probably
because the GC malloc is more expensive than the standard
malloc)

I decided then to build a simple heap management for container allocation.
I allocated 1000 pointers to blocks of data that could contain up to
1000 list elements.

This DECREASED memory consumption from 785 000 using malloc, to
408 000 bytes. Almost a 50% savings!

What was the saving due to? The allocator I mentioned does not store block
sizes, which I would imagine means significant savings compared to using
malloc on large numbers of small blocks (but it means the application code
must supply a size when freeing; this works well though).
This primitive heap manager can be expanded also for a mark/release
style of programming, where you allocate things from a list and
when you destroy the list all the memory used by the list is
freed in a few calls to free.

Again, this is the sort of thing that people must be used to doing for
themselves, given only malloc/realloc/free to work with. Any complexity of
data structure can be freed without full traversal, providing some way of
storing/linking the pointers in a simpler sequence has been used (or, if
using a private heap initially obtained from malloc(), just free that one
block!).
 
J

jacob navia

bartc a écrit :
What was the saving due to?

Less overhead for malloc. For 1000 list elements I use 1 malloc,
so the malloc overhead is 1/1000, not very much!
The allocator I mentioned does not store
block sizes, which I would imagine means significant savings compared to
using malloc on large numbers of small blocks (but it means the
application code must supply a size when freeing; this works well though).

Since the lists my container library uses are made of elements of the
same type (hence of the same size), the length is implicit.
Again, this is the sort of thing that people must be used to doing for
themselves, given only malloc/realloc/free to work with. Any complexity
of data structure can be freed without full traversal, providing some
way of storing/linking the pointers in a simpler sequence has been used
(or, if using a private heap initially obtained from malloc(), just free
that one block!).

To free everything you just free each of the 1000 blocks containing
1000 elements. (If the list has 1 million elements, what is highly unlikely).

For smaller lists, 1 or 2 calls to free will be necessary.
 
B

bartc

jacob navia said:
bartc a écrit :

Less overhead for malloc. For 1000 list elements I use 1 malloc,
so the malloc overhead is 1/1000, not very much!


Since the lists my container library uses are made of elements of the
same type (hence of the same size), the length is implicit.

This is exactly why, in my scheme, having to effectively call free(p,n)
instead of free(p) has never been much of a problem.

And why I try to stay clear of malloc() which, I understand, might use 16 or
32 bytes of extra overhead per allocation.

For large numbers of allocations, all of the same size, I have used a
private heap which can have a very simple and fast allocator/deallocator
(ie. one or more largish malloced blocks plus a couple of alloc/free
functions).
To free everything you just free each of the 1000 blocks containing
1000 elements. (If the list has 1 million elements, what is highly
unlikely).

For smaller lists, 1 or 2 calls to free will be necessary.

OK, so this is the same sort of idea: creating faster/more efficient
allocators on top of malloc(), when the application scenario is well
understood. My point was that I expected many to be already doing this.
 
J

jacob navia

bartc a écrit :
OK, so this is the same sort of idea: creating faster/more efficient
allocators on top of malloc(), when the application scenario is well
understood. My point was that I expected many to be already doing this.

Me too, I am saying nothing extraordinarily new. But it is interesting (in my opinion)
to emphasize this kind of "well known" facts yet another time in yet another
context.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top