Malloc Query

Eric Sosman · May 19, 2011

BartC said:
BartC said:

[...]
I've just done a test where allocating and freeing large numbers of
4-byte blocks took twenty times longer using malloc/free (and probably
took 2-4 times as much space) than using a special allocator.

How significant this is depends on the application, but I don't think
you can just dismiss such an allocator.

Click to expand...

Bartc: Are you sure of this? What I really wanted to ask: Is it
reasonable to plan for saving 50% of memory allocation time by rewriting
special functions for same ? If speedup can be 20X this is amazing.

It gets back to what I mentioned before: If you know something
important about your allocations, and you can think of a way to
exploit that knowledge, then you may be able to write code that is
better specialized for your allocation patterns than general-purpose
malloc().

But in your original post you wrote "I am talking about general
purpose applications," which can be read as "I have no exploitable
knowledge about the memory usage patterns." In that case, the issue
collapses to "Can I, myself, write a better general-purpose malloc()
package than the vendors' experts?" As I also pointed out, this is
sometimes possible when the vendor has given the job to tyros and not
to experts. Nowadays I suspect such lame vendors are not exactly
thick upon the ground; in fact, the one I know of has disappeared.

Uncle Steve · May 19, 2011

Yes, similar but even simpler.

I am guessing you don't have a background in the sciences!

Not really. Science fiction doesn't really count. Real science is
time-consuming, and I just don't have enough free leisure hours in the
day.

I am glad that's in 'scare quotes'. I don't really want to get into a
discussion of whether a special-case algorithm is better than a general
one. It depends on your metrics, no doubt. What I'd really like is
some data, and even better than that would be some code I could try to
see how general the benefit is (across systems) for some specific
special-purpose allocator.

Hey, the code I wrote there is almost trivial to test. I don't know
why you're so keen to get numbers on it, part of my reason for not
having those numbers is that it's more meaningful to measure the
performance of real code that's using the allocator. Besides, what
I've done here has probably been studied to death some time way back
in the sixties or seventies. I don't have convenient access to a
university library account, so an abstract search is problematic for
me. You'll probably get buried in citations if you try this route.

Anyways, as I'm not an actual working scientist, measurement of every little
thing in my programs isn't much of a priority. If it is for you,
actually implementing the arena_open() and arena_close() will take
less time than setting up your test bench.

(a) You have to write it and debug it. The code you posted had bugs,
but that's probably because it was pulled out of context to illustrate
the idea. (BTW, I've been around the block long enough that you could
just have said that it's a free list of fixed size objects).

Probably, but people with less experience might not have appreciated the
simplicity of the algorithm if it was full of object padding
calculations, and all the little bits that go along with it. Of
course I edited for brevity. In this newsgroup, when someone posts a
few screens full of code it takes time to read it. I don't think it
would have been better if I had posted the entire source file in this
instance.

(b) You might also have to port it (see the problem that BartC's code
had on my machine for example).

(c) Other people might have to learn about the interface (and maybe
persuade themselves of the implementation) if they come to maintain code
that uses it. Every C programmer know malloc(...). You code, for
example, does not allocate and free using pointers -- the functions take
and return an int. This is perfectly logical, but it will take some
getting used for the maintainer.

Oh, absolutely. Using integers as a sort of offset from a base is
the kind of thing more common to assembly language programmers, but
even in C it's a common idiom to use an array index, this is instead a
base + offset arrangement. Stated that way, it is easy to understand
what is going on in the code. Not much of an intellectual burden.

These all boil down to "more code". The question remains, though, about
the benefits. It would be very helpful to have the speed benefit
quantified.

I guess. If your question bugs me enough I'll think about setting
aside a couple of hours on the weekend.

Regards,

Uncle Steve

Eric Sosman · May 19, 2011

What's so content-free about that? At least the part you quote here
is clearly stated, and the answer is of course "no, not unless the
standard malloc implementation is really bad, which is unlikely".

From the original post, I quote

That's content-free, because both "considerable" and "overhead"
are left completely open to individual interpretation. One
respondent might say "I've measured the overhead, and it's only
twenty-six milliquivers per kilofrag," while another might say
"YES! I've measured the HORRIBLE overhead, and it's a full TWENTY-
SIX MILLIQUIVERS PER KILOFRAG, OMG!!!!" And both could be right.

A question for which both Yes and No are correct answers is,
in my book, content-free.

James Kuyper · May 19, 2011

Not really. Science fiction doesn't really count. Real science is
time-consuming, and I just don't have enough free leisure hours in the
day.

If you don't have sufficient spare time to test whether or not your
custom allocator actually significantly improves the speed of your
programs, you don't have enough spare time to justify wasting it
producing untested custom allocators.

....

Hey, the code I wrote there is almost trivial to test. I don't know
why you're so keen to get numbers on it,

He's not - the key point is that you SHOULD be. If it's indeed trivial
to test, you should have done so already. Until you have those numbers,
you can't be sure whether or not your custom allocator was a complete
waste of time.

....

course I edited for brevity. In this newsgroup, when someone posts a
few screens full of code it takes time to read it. I don't think it
would have been better if I had posted the entire source file in this
instance.

Example code that has, at the very least, been compiled, and preferably
tested, wastes a lot less of a reviewer's time than code with a variety
of miscellaneous uncaught bugs. The reviewer has to waste time trying to
figure out whether those are actually bugs, or features.

....

Oh, absolutely. Using integers as a sort of offset from a base is
the kind of thing more common to assembly language programmers, but
even in C it's a common idiom to use an array index,

It's not as common as you seem to think.

Ben Bacarisse · May 19, 2011

BartC said:
These numbers will vary if the malloc() implementation changes,
or a different program is interpreted. But my point is that these custom
allocators should be taken seriously and not just dismissed.

I am trying to do just that (not dismiss them). Can you post something
I can test and time?

Ben Bacarisse · May 19, 2011

Uncle Steve said:
Not really. Science fiction doesn't really count. Real science is
time-consuming, and I just don't have enough free leisure hours in the
day.

Hey, the code I wrote there is almost trivial to test. I don't know
why you're so keen to get numbers on it, part of my reason for not
having those numbers is that it's more meaningful to measure the
performance of real code that's using the allocator. Besides, what
I've done here has probably been studied to death some time way back
in the sixties or seventies. I don't have convenient access to a
university library account, so an abstract search is problematic for
me. You'll probably get buried in citations if you try this route.

I think this sort of thing has to be done every generation. I doubt
that conclusions drawn on a PDP-11 or an IBM 360 relate to my multi-core
multi-level cache machine. I've seen the situation change. At one
point I took out a freelist allocator like yours from a function
language implementation of mine because it added very little to the
performance. Maybe I would be changing that ifdef macro now. Only
measurements can tell, and I can't test that code any more -- I seem to
have lost it into the big bit bucket in the sky.

Even if the general rules are the same (specific-case allocation being
cheaper than general-case) the thing that prompted my reply was a
number: 20 times. I'd like to check that out on my system.

Probably, but people with less experience might not have appreciated the
simplicity of the algorithm if it was full of object padding
calculations, and all the little bits that go along with it. Of
course I edited for brevity.

You edited in some bugs too:

| #define arena_obj_addr(x, n) ((void *) *(&p->arena[n * x->obj_size]))

'p' can't be right here. It must be 'x', yes? The result of the
expression is a char converted to void *. That can't be right either.
presumably the '*' in front of the '(&' should not be there. (And I'd
add parentheses round all the macro arguments in the body.)

Oh, absolutely. Using integers as a sort of offset from a base is
the kind of thing more common to assembly language programmers, but
even in C it's a common idiom to use an array index, this is instead a
base + offset arrangement. Stated that way, it is easy to understand
what is going on in the code. Not much of an intellectual burden.

I don't thinks it's a common idiom at all and I've see a lot of C over
the years. The last time I saw it was in Gregory Chaitin's tiny Lisp
interpreter that he's used in his books and courses. It's not exactly
idiomatic C, but then he has bigger fish to fry.

<snip>

Tim Rentsch · May 19, 2011

Eric Sosman said:
Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same.

Click to expand...

Ah, yes, that well-known and reputable authority "many users."
Almost as reliable as "I've heard" and "some say," nearly as
infallible as "it seems." Well, I've heard that some of these
many users say it seems the Moon is made of green cheese, but even
so I'm not planning to head skyward with my Roquefort dressing
recipe. [snip]

IMO this response is inappropriate. The statement
starting "Many users claim..." wasn't offered for
its truth, but just to establish a context for asking
the question. And there's nothing wrong with that.

Ian Collins · May 19, 2011

Eric Sosman said:
Eric Sosman said:

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same.

Click to expand...

Ah, yes, that well-known and reputable authority "many users."
Almost as reliable as "I've heard" and "some say," nearly as
infallible as "it seems." Well, I've heard that some of these
many users say it seems the Moon is made of green cheese, but even
so I'm not planning to head skyward with my Roquefort dressing
recipe. [snip]

Click to expand...

IMO this response is inappropriate.

Yes, it is. Roquefort is a blue cheese.

BartC · May 19, 2011

Ben Bacarisse said:
I am trying to do just that (not dismiss them). Can you post something
I can test and time?

I tried a slightly more substantial test, allocating 16-byte blocks in a
random fashion. The dedicated code was 3-6 times as fast as using malloc().

I'm not going to post code again because these things are very difficult to
evaluate. Sometimes it is not just this scale factor that is significant, it
might be that all the small blocks live in the same region that is an
advantage (for example, the entire lot can be deallocated in one call to
free*()); or that an ultra-simple allocator lends itself to being
incorporated directly in a piece of inline assembly.

It can just be part of optimisation.

Ben Bacarisse · May 19, 2011

BartC said:
I tried a slightly more substantial test, allocating 16-byte blocks in
a random fashion. The dedicated code was 3-6 times as fast as using
malloc().

I'm not going to post code again because these things are very
difficult to evaluate.

!!! You are protecting me from something you think I'd have trouble
evaluating? Thanks, but I'll take the risk.

Sometimes it is not just this scale factor that
is significant, it might be that all the small blocks live in the same
region that is an advantage (for example, the entire lot can be
deallocated in one call to free*()); or that an ultra-simple allocator
lends itself to being incorporated directly in a piece of inline
assembly.

Yes, and sometimes there's no need to free at all; and if your
allocations are single bytes and catastrophic failure is permitted, the
allocator can be just 'return p++'. In that case you'd be comparing
malloc(1) with an inline addition. It's clear that at one end of the
spectrum the advantage over malloc can be made very large indeed. At
the other end, you'll loose because it turn out malloc is better at it's
job than you imagine. If we actually had some timing code, we'd be able
to see what the range of ratios is over different systems and for
different special-case allocation patterns. Your code, run on several
systems, could provide some of this data.

<snip>

BartC · May 19, 2011

Ben Bacarisse said:
Yes, and sometimes there's no need to free at all; and if your
allocations are single bytes and catastrophic failure is permitted, the
allocator can be just 'return p++'. In that case you'd be comparing
malloc(1) with an inline addition. It's clear that at one end of the
spectrum the advantage over malloc can be made very large indeed.

Yes, that's why it might be worth exploiting. And also why a simplistic test
may not be meaningful other than showing the advantage *can* be large.

At
the other end, you'll loose because it turn out malloc is better at it's
job than you imagine.

That's agreed. And when the blocks are large enough the allocation overheads
get less significant.

David Resnick · May 19, 2011

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done. Any inputs/comments
on the same are highly appreciated.

Lots of suggestions in the thread that suggest that for general use
you are unlikely to procure a better allocator than the standard one,
which I'd agree with. Getting malloc and free exactly right (avoiding
fragmentation, making it work and be reasonable in a multi-threaded
environment if you care about that aspect, etc) is best left to the
library absent any really compelling reason to the contrary.

I've done some work on a system where profilers showed malloc/free
were a very serious issue (receiving and processing 50000 packets per
second, any dynamic allocations tended to show up rather
conspicuously). However, rather than using a custom allocator,
turned out object (memory) reuse was a better choice, a common way to
attack this problem. As in, if a profiler says that malloc is
actually causing you woe, doing less of it may well be a better
solution than doing it faster.

-David

Joe Pfeiffer · May 19, 2011

David Resnick said:
Lots of suggestions in the thread that suggest that for general use
you are unlikely to procure a better allocator than the standard one,
which I'd agree with. Getting malloc and free exactly right (avoiding
fragmentation, making it work and be reasonable in a multi-threaded
environment if you care about that aspect, etc) is best left to the
library absent any really compelling reason to the contrary.

I've done some work on a system where profilers showed malloc/free
were a very serious issue (receiving and processing 50000 packets per
second, any dynamic allocations tended to show up rather
conspicuously). However, rather than using a custom allocator,
turned out object (memory) reuse was a better choice, a common way to
attack this problem. As in, if a profiler says that malloc is
actually causing you woe, doing less of it may well be a better
solution than doing it faster.

My earlier anecdote is a case in point -- and one might say that in
general it's very likely that the best way to do a "custom allocator" is
really to do object reuse on top of malloc(). And when I say "very
likely" I mean "there might be some pathological case where malloc() is
available but this isn't the case , but I can't think of one off-hand".

Uncle Steve · May 19, 2011

If you don't have sufficient spare time to test whether or not your
custom allocator actually significantly improves the speed of your
programs, you don't have enough spare time to justify wasting it
producing untested custom allocators.

That's a little bit presumptuous. Are you suddenly in charge of my
development schedule? I must have missed the memo. Anyways, I think
it's valid to say that some things are understandable by 'inspection',
as I think they say in some math texts.

...

He's not - the key point is that you SHOULD be. If it's indeed trivial
to test, you should have done so already. Until you have those numbers,
you can't be sure whether or not your custom allocator was a complete
waste of time.

The one previous to that may have been, but I didn't test it either.
Instead, I wrote the version I'm using now. BTW, I do all my code
development in a very limited amount of free time. You seem to be
saying that as I am talking about the algorithm in a public newsgroup,
that I should have also proved it in the lab and in the field, not to
mention publishing all the testing data and methodologies.

As I mentioned previously I'm not a working scientist, even if I know
some of what they do and how they go about it.

...

Example code that has, at the very least, been compiled, and preferably
tested, wastes a lot less of a reviewer's time than code with a variety
of miscellaneous uncaught bugs. The reviewer has to waste time trying to
figure out whether those are actually bugs, or features.

Are we 'reviewing' code in this newsgroup? I thought we were
discussing it.

...

It's not as common as you seem to think.

I don't really know how common it is, so you're already ahead of me on
that score.

Regards,

Uncle Steve

Uncle Steve · May 19, 2011

Uncle Steve said:
Uncle Steve said:

Not really. Science fiction doesn't really count. Real science is
time-consuming, and I just don't have enough free leisure hours in the
day.

Hey, the code I wrote there is almost trivial to test. I don't know
why you're so keen to get numbers on it, part of my reason for not
having those numbers is that it's more meaningful to measure the
performance of real code that's using the allocator. Besides, what
I've done here has probably been studied to death some time way back
in the sixties or seventies. I don't have convenient access to a
university library account, so an abstract search is problematic for
me. You'll probably get buried in citations if you try this route.

Click to expand...

I think this sort of thing has to be done every generation. I doubt
that conclusions drawn on a PDP-11 or an IBM 360 relate to my multi-core
multi-level cache machine. I've seen the situation change. At one
point I took out a freelist allocator like yours from a function
language implementation of mine because it added very little to the
performance. Maybe I would be changing that ifdef macro now. Only
measurements can tell, and I can't test that code any more -- I seem to
have lost it into the big bit bucket in the sky.

Even if the general rules are the same (specific-case allocation being
cheaper than general-case) the thing that prompted my reply was a
number: 20 times. I'd like to check that out on my system.

Probably, but people with less experience might not have appreciated the
simplicity of the algorithm if it was full of object padding
calculations, and all the little bits that go along with it. Of
course I edited for brevity.

Click to expand...

You edited in some bugs too:

| #define arena_obj_addr(x, n) ((void *) *(&p->arena[n * x->obj_size]))

'p' can't be right here. It must be 'x', yes? The result of the
expression is a char converted to void *. That can't be right either.
presumably the '*' in front of the '(&' should not be there. (And I'd
add parentheses round all the macro arguments in the body.)

You're correct. The original function definitions have extra
convenience macros, and the macro variables are protected from
side-effects, etc. Of course it should be:

| #define arena_obj_addr(x, n) ((void *) *(&x->arena[n * x->obj_size]))

I don't thinks it's a common idiom at all and I've see a lot of C over
the years. The last time I saw it was in Gregory Chaitin's tiny Lisp
interpreter that he's used in his books and courses. It's not exactly
idiomatic C, but then he has bigger fish to fry.

I take it you're refering to hiding the integer in the body of the
arena object for the free list via a slightly clever cast? One of the
strengths of C, although it is a weakness as well since the conversion
of pointer types introduces complexities that can easily trip you up.

Regards,

Uncle Steve

Uncle Steve · May 19, 2011

On Thu, May 19, 2011 at 03:22:18AM +0100, Ben Bacarisse wrote:
[snip]

Ok, here's a quick and dirty hack that measures a highly contrived
scenario. This code first allocates two arenas of different size, and
then ping-pongs the allocations between the two arenas for a defined
number of iterations. Then, I do more or less the same thing with
malloc.y

The preliminary results show that my special-purpose allocator is 2-3
times faster than glibc malloc. Not quite and order-of-magnitude
difference in performance (unless you think in base-2), but very
acceptable. In some programs there may be a larger difference in
performance becuase of reduceed memory fragmentation or reduced
overall memory use. I do not plan to isolate those factors at this
time to measure their effect.

Code is compiled with "gcc -O0 -o arena_test arena_test.c -lrt"

The test platform here is an Intel Atom netbook running at 1.333GHz,
1G RAM, OpenSuSE 11.4, libc-2.11.3, gcc 4.5.1 20101208.

Typical result:

[31/22:32] nb

ts/10 stevet ~/stuff/src/libs/tools/test: ./arena_test
Arena: Iterations: 100000; elapsed CPU time (msec): 39181
Malloc: Iterations: 100000; elapsed CPU time (msec): 107265

Code follows:

#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>

#define TEST_SIZE 1024 * 1024
#define ITERATIONS 100000

#define DIFFERENCE 5

struct arena_head_s {
int obj_size;
int arena_size;
unsigned char *
arena;
int free;
int free_list;
};

typedef struct arena_head_s arena_head;

#define arena_obj_addr(x, n) ((void *) &x->arena[n * x->obj_size])

arena_qa(arena_head *x)
{
int n;

n = x->free_list;

if(n != -1) {
x->free_list = *((int *) arena_obj_addr(x, n));
x->free--;
}

return(n);
}

void arena_free(arena_head *p, int n)
{
*((int *) arena_obj_addr(p, n)) = p->free_list;
p->free_list = n;
p->free++;

return;
}

arena_head * arena_creat(size_t s, int n)
{
arena_head * h;
int i;

h = malloc(sizeof(arena_head));
if(h == NULL) {
perror("malloc()");
exit(1);
}

h->obj_size = s;
h->arena_size = n;
h->free = n;

h->arena = malloc(s * n);
if(h->arena == NULL) {
perror("malloc()");
exit(1);
}

h->free_list = 0;

for(i = 0; i < n; i++)
*((int *) arena_obj_addr(h, i)) = i + 1;

*((int *) arena_obj_addr(h, i)) = -1;

return(h);
}

typedef unsigned long long int T64;

#define processcputime(p) \
{ \
T64 * t = p; \
struct timespec T; \
\
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &T) == -1) { \
perror("clock_gettime()"); \
exit(1); \
} \
\
*t = ((T64)T.tv_sec) * 1000000LL + ((T64) T.tv_nsec / 1000); \
}

struct test_load_s {
int ref;
int data;
};

typedef struct test_load_s test_load;

void work(test_load *p)
{
if(!p) {
perror("null ptr");
exit(1);
}

p->ref++;
p->data = (int) p % 4 * rand();

return;
}

int main(int argc, char ** argv)
{
arena_head *h1, *h2;
T64 start, end;
int i, n;
test_load *p;

processcputime(&start);

h1 = arena_creat(sizeof(test_load), TEST_SIZE);
h2 = arena_creat(sizeof(test_load) + DIFFERENCE, TEST_SIZE);

if(!h1 || !h2) {
perror("arenas");
exit(1);
}

processcputime(&start);

for(i = 0; i < ITERATIONS; i++) {
n = arena_qa(h1);
if(n == -1) {
perror("Ooops.");
exit(1);
}
work(arena_obj_addr(h1, n));

n = arena_qa(h2);
if(n == -1) {
perror("Ooops 2.");
exit(1);
}
work(arena_obj_addr(h2, n));
}

processcputime(&end);

fprintf(stdout, "Arena: Iterations: %ld; elapsed CPU time (msec): %ld\n", ITERATIONS, (end - start));

processcputime(&start);

for(i = 0; i < ITERATIONS; i++) {
p = malloc(sizeof(test_load));
if(!p) {
perror("malloc()");
exit(1);
}
work(p);

p = malloc(sizeof(test_load) + DIFFERENCE);
if(!p) {
perror("malloc2()");
exit(1);
}
work(p);
}

processcputime(&end);

fprintf(stdout, "Malloc: Iterations: %ld; elapsed CPU time (msec): %ld\n", ITERATIONS, (end - start));

exit(0);
}

Regards,

Uncle Steve

Ben Bacarisse · May 19, 2011

Uncle Steve said:
You edited in some bugs too:

| #define arena_obj_addr(x, n) ((void *) *(&p->arena[n * x->obj_size]))

'p' can't be right here. It must be 'x', yes? The result of the
expression is a char converted to void *. That can't be right either.
presumably the '*' in front of the '(&' should not be there. (And I'd
add parentheses round all the macro arguments in the body.)

Click to expand...

You're correct. The original function definitions have extra
convenience macros, and the macro variables are protected from
side-effects, etc. Of course it should be:

| #define arena_obj_addr(x, n) ((void *) *(&x->arena[n * x->obj_size]))

x->arena is a char *. x->arena[...] is a char. &x->arena[...] is a
char * again. *(&x->arena[...]) is back to type char. You can't want
to convert that to a void * and return it. I think the middle * should
not be there.

I take it you're refering to hiding the integer in the body of the
arena object for the free list via a slightly clever cast? One of the
strengths of C, although it is a weakness as well since the conversion
of pointer types introduces complexities that can easily trip you up.

I don't think that's what I meant. I was referring to the more dramatic
difference: your free and allocate functions take and return ints not
pointers.

Uncle Steve · May 20, 2011

Uncle Steve said:
Uncle Steve said:

You edited in some bugs too:

| #define arena_obj_addr(x, n) ((void *) *(&p->arena[n * x->obj_size]))

'p' can't be right here. It must be 'x', yes? The result of the
expression is a char converted to void *. That can't be right either.
presumably the '*' in front of the '(&' should not be there. (And I'd
add parentheses round all the macro arguments in the body.)

Click to expand...

You're correct. The original function definitions have extra
convenience macros, and the macro variables are protected from
side-effects, etc. Of course it should be:

| #define arena_obj_addr(x, n) ((void *) *(&x->arena[n * x->obj_size]))

Click to expand...

x->arena is a char *. x->arena[...] is a char. &x->arena[...] is a
char * again. *(&x->arena[...]) is back to type char. You can't want
to convert that to a void * and return it. I think the middle * should
not be there.

Yeah, I must have mistyped something when I originally wrote it. In
your favor is a request for compilable code, which I have finally
supplied in another message. Maybe that would be a better basis on
which to make comments.

I don't think that's what I meant. I was referring to the more dramatic
difference: your free and allocate functions take and return ints not
pointers.

So? In context they are pointers.

Regards,

Uncle Steve

Ben Bacarisse · May 20, 2011

Uncle Steve said:
On Thu, May 19, 2011 at 03:22:18AM +0100, Ben Bacarisse wrote:
[snip]

Ok, here's a quick and dirty hack that measures a highly contrived
scenario. This code first allocates two arenas of different size, and
then ping-pongs the allocations between the two arenas for a defined
number of iterations. Then, I do more or less the same thing with
malloc.y

The preliminary results show that my special-purpose allocator is 2-3
times faster than glibc malloc. Not quite and order-of-magnitude
difference in performance (unless you think in base-2), but very
acceptable. In some programs there may be a larger difference in
performance becuase of reduceed memory fragmentation or reduced
overall memory use. I do not plan to isolate those factors at this
time to measure their effect.

You don't include the set-up time for the areans. That's fair if the
pattern you expect to see is a very large number of allocations and a
lot of re-use. If, however, new areans are needed in a long running
program (because not all the memory is re-used) you'd have to include
this cost.

It makes a huge difference with the numbers you've picked: it halves the
benefit on my system (3x vs 1.5x speed). I can reduce the effect of
course just by increasing the number of timed iterations. For very
small objects, with lots of optimisation and excluding the start-up
costs I can get just over your order of magnitude speedup. That's a
rather contrived scenario though. In particular, it is likely that the
allocation code will not be able to be inlined (as I think it can be
here).

Thank you for providing something that can give real data, albeit for
rather specific situations. FWIW I am on using an Intel(R) Core(TM)2
Duo CPU P8700 @ 2.53GHz.

If I have time (looks unlikely) I'll plug it into a small binary tree
program to see what happens when there is a little work being done too.

<snip code>

James Kuyper · May 20, 2011

That's a little bit presumptuous. Are you suddenly in charge of my
development schedule? I must have missed the memo.

You're right - I should have expressed that differently: if you have
enough spare time to write your allocator, you're going to receive a
serious lack of sympathy when you complain about not having enough time
to bother testing it. You are, of course, free to ignore the ridicule
you deserve for prioritizing your time that way.

....

development in a very limited amount of free time. You seem to be
saying that as I am talking about the algorithm in a public newsgroup,
that I should have also proved it in the lab and in the field, not to
mention publishing all the testing data and methodologies.

What I said has nothing to do with the fact that this is a public
newsgroup. Even if you're just doing it for yourself, it's a serious
waste of your time (which you are, of course, completely free to decide
to waste) to go to the trouble of building a custom allocator without
bothering to test whether you gained any benefit from doing so.

....

Are we 'reviewing' code in this newsgroup? I thought we were
discussing it.

I don't see a distinction. I don't see how you could discuss code in any
meaningful sense without first reviewing it - otherwise, how would you
know what it was that you were discussing? Perhaps you thought I was
using the word "review" strictly in some more formal sense?

I don't really know how common it is, so you're already ahead of me on
that score.

C is notorious for the ubiquity of pointers in contexts where other
languages would use integer indices.

Malloc Query	8	Oct 15, 2008
a fast malloc/free implementation & benchmarks	0	Mar 20, 2011
Questions regarding specialized malloc()/free() replacements	3	Jan 4, 2009
memory managers and malloc/free	3	Aug 12, 2006
Improving memory consumption in the container library	6	Oct 9, 2009
Dealing with naive malloc() implementations	14	May 9, 2007
malloc and realloc problem	1	May 19, 2008
[Slightly OT] Memory management and custom allocator	64	Dec 31, 2011

Malloc Query

Eric Sosman

Uncle Steve

Eric Sosman

James Kuyper

Ben Bacarisse

Ben Bacarisse

Tim Rentsch

Ian Collins

BartC

Ben Bacarisse

BartC

David Resnick

Joe Pfeiffer

Uncle Steve

Uncle Steve

Uncle Steve

Ben Bacarisse

Uncle Steve

Ben Bacarisse

James Kuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads