code using more memory than required

S

Sourin

Hi all,
I am trying to write code for my experiments. My work involves huge
datasets, and as such my code needs to be memory efficient. I did some
hand calculations regarding the amount of dynamic memory required for
my datastructures ( used sizeof() to get the size of the structures ).
But mallinfo() function show that approximately double that amount of
memory is being allocated. I am working with gcc 3.2.2 on a redhat 9
machine. I come from electrical engineering background and dont know
much about memory allocations or overheads. Can anyone please
enlighten me.
Thanks in advance.
 
D

Derk Gwen

(e-mail address removed) (Sourin) wrote:
# Hi all,
# I am trying to write code for my experiments. My work involves huge
# datasets, and as such my code needs to be memory efficient. I did some
# hand calculations regarding the amount of dynamic memory required for
# my datastructures ( used sizeof() to get the size of the structures ).
# But mallinfo() function show that approximately double that amount of
# memory is being allocated. I am working with gcc 3.2.2 on a redhat 9
# machine. I come from electrical engineering background and dont know
# much about memory allocations or overheads. Can anyone please
# enlighten me.

There's no one storage allocator for unix. You get one free with libc, but different
libc implementations can have different allocation strategies. Some implementations
will round up the allocated block size to conveniently alignable size, say 8 bytes.
Others allocate a page at a time where all blocks on the page are the same size;
a different page (perhaps 4096 bytes at a time) will be allocated for every different
block size. The buddy system rounds up the block size to the next power of two.
Some allocators need a link list overhead hidden from the caller; some provide
overflow and underflow zones around each block.

With virtual memory, you can often afford less efficient use of the address space
to get faster code. If you're only going to use unix, you can bypass all other
allocation libraries and write your own with the kernel functions break()
and sbrk().
 
N

Nils Petter Vaskinn

[snip: program using more memory than expected]
Thanks in advance.

Ok this can happen with any compiler not just gcc:

The compiler may decide to optimize. Say you have a struct of size 9. Then
you create an array of that struct. Now the comiler may know that the
processor works faster if it accesses data at every 4th byte. So instead
of packing your structs end to end you get 3 bytes unused between them.

Basically it's a speed / memory tradeoff.

Read your compilers documentation for ways to force it to pack the data
together without padding with unuesd bytes.

(for gcc look for the __packed__ attribute)

Be aware that this may make the memory access slower making your program
run slower. Maybe you can rethink your design to avoid the problem: do you
really need to have all the data in memory or can you proces it piece by
piece?

hth
NPV
 
E

Eric Sosman

Sourin said:
Hi all,
I am trying to write code for my experiments. My work involves huge
datasets, and as such my code needs to be memory efficient. I did some
hand calculations regarding the amount of dynamic memory required for
my datastructures ( used sizeof() to get the size of the structures ).
But mallinfo() function show that approximately double that amount of
memory is being allocated. I am working with gcc 3.2.2 on a redhat 9
machine. I come from electrical engineering background and dont know
much about memory allocations or overheads. Can anyone please
enlighten me.

Others have already mentioned the possibility that some
of your struct types contain padding, but nobody so far has
suggested what you might do about it. One way to get rid of
the padding is to use non-Standard language extensions to tell
the compiler not to pad certain struct types; many compilers
support such extensions, although the details differ from one
compiler to the next. Unfortunately, this usually results in
slower and/or larger code; the padding existed for a reason,
and eliminating it exacts a penalty.

Another approach, if you've got large arrays of these
padded structs, is break the structs apart and "factor" the
data into parallel arrays. For example, instead of

struct {
short shrift;
/* padding likely here */
double trouble;
} array[10000];

.... you could write

short shrift[10000];
double trouble[10000];

.... and the ten thousand padding slots simply vanish. Of
course, you've sacrificed some notational and mnemonic
convenience -- it's obvious that array[42].shrift and
array[42].trouble are somehow related, but in the revised
program the relationship between shrift[42] and trouble[42]
is less apparent. Functions like qsort() won't work with
this scheme, for example, because they don't know about
the relationship between these parallel arrays.

On other possible contributor to greater-than-expected
memory consumption is the overhead of keeping track of each
allocated area. malloc() implementations differ, but most
will add somewhere between four and sixteen bytes to each
allocation, in addition to rounding the total up to a nice
multiple of (usually) four, eight, or sixteen bytes. In a
malloc(1) call, the overhead due to bookkeeping and rounding
will probably be much greater than the one-byte "payload,"
and if you make a hundred thousand such calls, most of your
program's memory will be tied up in this useless (to you)
overhead. malloc(1) is an extreme case, but the effect can
be significant even for moderate-sized structs -- an eight-
byte struct may incur eight bytes of overhead, for only
fifty percent "efficiency."

If you are dealing with many such smallish structs, a
way to reduce the bookkeeping and rounding overhead is to
allocate a whole block of them at once and dole them out
individually, thus incurring only one chunk of overhead
for the whole batch instead of one for each struct:

struct thing *new_thing(void) {
static struct thing *avail;
static int howmany = 0;
if (howmany == 0) {
howmany = 1000;
avail = malloc(howmany * sizeof *avail);
if (avail == NULL)
return NULL; /* out of memory */
}
return &array[--howmany];
}

The disadvantage of this approach is that you can no longer
free() or realloc() the individual structs; you can only do
so on an entire batch at once. Depending on the application,
this may or may not be a problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top