help with memory management strategy?


A

Ark Khasin

My pet project is a command-line utility (preprocessor), so it runs and
terminates. It uses lots of memory allocations; most of them are quite
small. What to do with the allocated objects when they are no longer
used? I consider the following "pure" strategies:
- Meticulously free anything ever malloc'ed. This involves
inefficiencies of calling free, but worse yet, in certain cases object
duplication appears necessary to ensure uniqueness of custodial pointers.
- Forget about no-longer-used objects and let them rot in the heap. This
is the fastest and most compact but may lead to out-of-memory failures
on a really complex processing job where method 1 would succeed. It
would sound like negligence then, even though I've never observed this
happen in practice.
- Use a different allocator endowed with garbage collector. I am not
concerned about non-deterministic performance since for my utility what
counts is the total execution time. However, I am a little scared by
disclaimers that a GC may be tricked into reclaiming memory which is
still in use. [Does anyone know of a bullet-proof GC?]

I'd like to ask for advice on which strategy (or a mix of them) is
considered the best for this sort of tasks?
 
Ad

Advertisements

D

dj3vande

My pet project is a command-line utility (preprocessor), so it runs and
terminates. It uses lots of memory allocations; most of them are quite
small. What to do with the allocated objects when they are no longer
used? I consider the following "pure" strategies:
- Meticulously free anything ever malloc'ed. This involves
inefficiencies of calling free, but worse yet, in certain cases object
duplication appears necessary to ensure uniqueness of custodial pointers.

Don't worry about the inefficiencies of calling free; if they're
noticeable at all, the inefficiencies of leaking a bunch of memory will
be even worse.
- Forget about no-longer-used objects and let them rot in the heap. This
is the fastest and most compact but may lead to out-of-memory failures
on a really complex processing job where method 1 would succeed. It
would sound like negligence then, even though I've never observed this
happen in practice.

If you do this, make sure you document that the code doesn't clean up
its allocated memory and is unsuitable for any re-use as part of a
larger project.
Better yet, just don't do it.
- Use a different allocator endowed with garbage collector. I am not
concerned about non-deterministic performance since for my utility what
counts is the total execution time. However, I am a little scared by
disclaimers that a GC may be tricked into reclaiming memory which is
still in use. [Does anyone know of a bullet-proof GC?]

(4) As (1), but use reference-counting to make your life easier.
For objects that you'd want to have pointers to from widely different
places, also allocate a reference count. When you're done with it in
one place, decrement the reference count and if it's gone to zero
deallocate it.
This will give you a lot of the benefits of GC with very little of the
hassle, but watch out for circular data structures.

(5) Use pool allocators
When you're doing a chunk of work that allocates memory that won't be
needed after that chunk of work is done, store a pointer to the
allocated memory somewhere that's easy to get at. When you're done
with the memory, just forget about it; when you're done with the chunk
of work, go through your list of allocated pointers and free all of
them.

(6) Re-think your design
If you're having trouble keeping track of what's going where, that may
indicate that your design is flawed. There may be a better design that
makes it easier to keep track of things and make sure they all get
deallocated when you're done with them.

(7) Use different tools
If you're suffering from the lack of a particular high-level
abstraction, it may be easier to find a language that has it than to
try to build it in C.
(If you really do have a problem that lends itself well to GC, it might
also be better to use a language with GC built in than to try to bolt
it onto C.)

I'd like to ask for advice on which strategy (or a mix of them) is
considered the best for this sort of tasks?

I'd try (1), (6), (4), (5), and (7), in that order, and if I fell off
the end of that list I'd give up and get a job mowing lawns instead.


dave
 
T

Tor Rustad

Ark said:
My pet project is a command-line utility (preprocessor), so it runs and
terminates. It uses lots of memory allocations; most of them are quite
small. What to do with the allocated objects when they are no longer
used? I consider the following "pure" strategies:
- Meticulously free anything ever malloc'ed. This involves
inefficiencies of calling free, but worse yet, in certain cases object
duplication appears necessary to ensure uniqueness of custodial pointers.

Well, I would rather expect malloc() to be the heavy-weight call here,
not free(). This method, is the traditional one, which works in all cases.
- Forget about no-longer-used objects and let them rot in the heap. This
is the fastest and most compact but may lead to out-of-memory failures
on a really complex processing job where method 1 would succeed. It

Right, if being lazy, bad things can happen. If doing a lot of malloc's,
I wouldn't recommend being lazy. :)
would sound like negligence then, even though I've never observed this
happen in practice.
- Use a different allocator endowed with garbage collector. I am not
concerned about non-deterministic performance since for my utility what
counts is the total execution time. However, I am a little scared by
disclaimers that a GC may be tricked into reclaiming memory which is
still in use. [Does anyone know of a bullet-proof GC?]

There are issues with e.g. Boehm GC, but such cases appears to be rather
constructed. For a non-safety related command line utility, I don't see
the problem with using Boehm GC.
 
A

Ark Khasin

Tor said:
Ark said:
- Use a different allocator endowed with garbage collector. I am not
concerned about non-deterministic performance since for my utility
what counts is the total execution time. However, I am a little scared
by disclaimers that a GC may be tricked into reclaiming memory which
is still in use. [Does anyone know of a bullet-proof GC?]

There are issues with e.g. Boehm GC, but such cases appears to be rather
constructed. For a non-safety related command line utility, I don't see
the problem with using Boehm GC.
The trouble is, the current version of the utility (Unimal) is used in
the build process of a safety-critical product, so IMO it would not be
wise to take chances. Currently, it uses a mix of thoughtful cleanup and
deliberate leaks but I just thought there is a better way...
 
C

CBFalconer

Ark said:
My pet project is a command-line utility (preprocessor), so it
runs and terminates. It uses lots of memory allocations; most of
them are quite small. What to do with the allocated objects when
they are no longer used? I consider the following "pure"
strategies:
- Meticulously free anything ever malloc'ed. This involves
inefficiencies of calling free, but worse yet, in certain cases
object duplication appears necessary to ensure uniqueness of
custodial pointers.
- Forget about no-longer-used objects and let them rot in the heap.
This is the fastest and most compact but may lead to out-of-
memory failures on a really complex processing job where method
1 would succeed. It would sound like negligence then, even
though I've never observed this happen in practice.
- Use a different allocator endowed with garbage collector. I am
not concerned about non-deterministic performance since for my
utility what counts is the total execution time. However, I am
a little scared by disclaimers that a GC may be tricked into
reclaiming memory which is still in use. [Does anyone know of
a bullet-proof GC?]

I'd like to ask for advice on which strategy (or a mix of them)
is considered the best for this sort of tasks?

You omitted 'Use a malloc package that avoid the long sequences on
free". I developed this because of the problem, and all operations
are O(1). It is semi-portable, requiring being able to treat the
memory as a big char array (or arrays), and needs gcc, because of
gcc style variadic macros. It also depends on the presence of the
sbrk() system call. It is available at:

<http://cbfalconer.home.att.net/download/nmalloc.zip>
 
C

CBFalconer

Don't worry about the inefficiencies of calling free; if they're
noticeable at all, the inefficiencies of leaking a bunch of memory
will be even worse.

To the contrary, the usual malloc package has O(n) performance for
free, where n is the number of allocated blocks. The makes the
operation of freeing a large segment of those blocks O(n * n).
Elsethread I have given references to my malloc package, which is
O(1) and makes the large segment of blocks free operation O(n).

When you have n in the 10,000 to 20,000 range you can easily see
the difference between O(n) and O(n * n) free() operation. My
hashlib testing package has special provisions for this.
 
Ad

Advertisements

T

Tor Rustad

Ark said:
Tor said:
Ark said:
- Use a different allocator endowed with garbage collector. I am not
concerned about non-deterministic performance since for my utility
what counts is the total execution time. However, I am a little
scared by disclaimers that a GC may be tricked into reclaiming memory
which is still in use. [Does anyone know of a bullet-proof GC?]

There are issues with e.g. Boehm GC, but such cases appears to be
rather constructed. For a non-safety related command line utility, I
don't see the problem with using Boehm GC.
The trouble is, the current version of the utility (Unimal) is used in
the build process of a safety-critical product, so IMO it would not be
wise to take chances. Currently, it uses a mix of thoughtful cleanup and
deliberate leaks but I just thought there is a better way...

If building safety-critical program, I want tools which do their job
well. During the build process, correctness of the output is critical,
not really avoiding a failure to build. Extra memory can be added, so
can swap space.

The Boehm GC does indeed do some very low-level stuff, and using it will
increase the intrinsic complexity of your tool, which may introduce some
nasty bugs. An alternative, is to first use the GC as a *leak detector*,
and also pull in a test-case where you compare the output, when GC is
used and when it isn't. Those outputs should always be identical.

If being in your shoes, I guess my choice would be going for explicit
free() calls, but you are in a better position to judge this. Using GC
as a leak-detector in debug builds for a while, will give you even
better know how.
 
Ad

Advertisements

A

Ark Khasin

Tor said:
Ark said:
Tor said:
Ark Khasin wrote:
- Use a different allocator endowed with garbage collector. I am not
concerned about non-deterministic performance since for my utility
what counts is the total execution time. However, I am a little
scared by disclaimers that a GC may be tricked into reclaiming
memory which is still in use. [Does anyone know of a bullet-proof GC?]

There are issues with e.g. Boehm GC, but such cases appears to be
rather constructed. For a non-safety related command line utility, I
don't see the problem with using Boehm GC.
The trouble is, the current version of the utility (Unimal) is used in
the build process of a safety-critical product, so IMO it would not be
wise to take chances. Currently, it uses a mix of thoughtful cleanup
and deliberate leaks but I just thought there is a better way...

If building safety-critical program, I want tools which do their job
well. During the build process, correctness of the output is critical,
not really avoiding a failure to build.
Thank you indeed for this observation. It put me in the justifiably
conservative mindset.
The Boehm GC does indeed do some very low-level stuff, and using it will
increase the intrinsic complexity of your tool, which may introduce some
nasty bugs. An alternative, is to first use the GC as a *leak detector*,
and also pull in a test-case where you compare the output, when GC is
used and when it isn't. Those outputs should always be identical.
This of course can uncover an error but cannot prove its absence. As a
leak detector... I'll have to study this further because e.g. some leaks
may remain intentional.
If being in your shoes, I guess my choice would be going for explicit
free() calls, but you are in a better position to judge this. Using GC
as a leak-detector in debug builds for a while, will give you even
better know how.
Thank you
 

Top