C
CBFalconer
Well more importantly, you can choose the ordering in which a sequence
of allocations is freed, and therefore can always create whatever kind
of fragmentation you like. So this clearly is not one of the criteria
a C-style allocator can have.
If we could move the storage (see next para) this can be handled.
In ISO Pascal we know there are no extant pointers other that those
assigned by new, so indirection and repacking is possible.
[...] Any such
would require indirect memory access for everything, which would be
a monstrous performance hit when enforced, together with
specialized compilers (or interpreters). However it is definitely
possible to have O(1) performance for all operations and to always
recover memory.
Ok, but what about also always being able to successfully allocate?
[...] It doesn't even require buddy systems (which have
their own penalties). I have published such a system, which only
requires care in writing.
See <http://cbfalconer.home.att.net/download/nmalloc.zip>
I noticed you don't supply any seperate documentation for it. Can I
assume the comments in the code are accurate? From what I can peruse:
As far as I know they are accurate. Not the memalign portion,
which is nonsense as published (and so labelled).
1) You seem to be neglecting at least one merge scenario. When you
split off the excess from an allocation, you are not merging it with
any possible adjacent free entries. In a sense this causes unnecessary
fragmentation of the remaining free entries. Remember with your
strategy, you want your free entries to be as large as possible. Try
writing a benchmark with a lot of reallocs in it, and you will see the
difference this makes.
Not so. The split off portion is always merged if possible. If
too small to be of use it is included in the assigned memory. One
of us is wrong Just such a test exists in the testing code.
2) Your function size2bucket() is a linear search (though, of course,
its count is bounded above by sizeof(ulong)*CHAR_BITS). Think about it
as bits but seperated from the mathematical properties for a second.
You can re-implement it using a binary search, which means on real
system you would have an effective loop count of 5 or 6.
For a search of roughly 16 or 20 items (at most), this is not
useful. Notice that the search begins at the first guaranteed
suitable bucket, and there is no need to search the list of bucket
items. This makes it approximately a first fit algorithm.
3) You are also giving up as soon as your algorithm is unable to find
memory using its "fast method". If you run out of memory, you are in
typically in "screwedzville" anyways. You can simply go back to your
free list and check the bucket one lower, and start scanning linearly
for some fixed amount before truly bailing. This is important for
certain scenarios where you are allocating near the system limit in
terms of amount of memory, but nearly all your allocations are the
exact same size (this scenario is very typical). With your strategy as
coded, you can "run out of memory" even though you actually have plenty
of memory available -- you just aren't willing to scan for it. Its
very difficult to implement something which works robustly and all the
time here, but you can capture typical cases without too much effort.
It won't be 'plenty'. It may exist in the lower bucket, but that
would involve searching a list of unknown length, requiring unknown
time. The system hasn't aborted, it just returns NULL, so the user
has an opportunity to take corrective action. If the system has
reached that point total inability is near anyhow, but the
capability of making smaller assignments MAY still be present, to
assist in recovery.
Notice that when that mechanism fails the opportunity to grab more
memory from the OS still exists, via sbrk.
In any event, it is as I said. You can implement something with O(1),
but it ends up not always being able to allocate all available memory
successfully.
And, as I said, doing such requires moving previously assigned
memory and repacking. The fundamentals of C make this
impracticable.
The whole thing should port easily to any system where alignment
depends on the low order bits of the addresses and arithmetic on
such addresses is valid. The other dependency is on the sbrk
routine. Thus it should work for most *IX systems. The variadic
development debuggery macros require gcc to avoid bombing even when
they are switched off.