Simple Memory Allocator Challenge

Michael B Allen · Oct 22, 2003

Hi,

I've tried to write the *simplest* memory allocator possible. I think it
would be useful in many cases such as allocating memory on stack as a
poor man's garbage collection perhaps.

I was hoping the clc crowd had some ideas for making it even simpler!

In-lined below the full program (132 lines). It's a circular singular
linked list of "cells" inspired by the one in Plauger's text
"The Standard C Library".

I look forward to hearing your ideas,

Mike

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

#define C2P(c) ((char *)(c) + 4)
#define P2C(p) (struct cell *)((char *)(p) - 4)
#define ISADJ(c1,c2) ((struct cell *)(C2P(c1) + (c1)->size) == (struct cell *)(c2))

struct cell {
size_t size;
struct cell *next;
};
struct simple {
struct cell *tail;
};

void
simple_init(struct simple *s, void *mem, size_t size)
{
s->tail = mem;
s->tail->size = size - 4;
s->tail->next = s->tail;
}
void *
simple_alloc(struct simple *s, size_t size, int zero)
{
struct cell *c1, *c2, *c3;
zero = 0;

if (size < 4) {
size = 4; /* must accomodate next pointer */
}

c1 = s->tail;
while (c1->next->size < size) {
if (c1->next == s->tail) {
errno = ENOMEM;
return NULL;
}
c1 = c1->next;
}
c2 = c1->next;
if (c2->size > (4 + size)) { /* split new cell */
c3 = (struct cell *)(C2P(c2) + size);
c3->size = c2->size - (size + 4);
c3->next = c2->next;
c2->size = size;
c1->next = c3;
} else { /* use the entire cell */
c1->next = c2->next;
if (c2 == s->tail) {
s->tail = c1;
}
}

return C2P(c2);
}
int
simple_free(struct simple *s, void *ptr)
{
struct cell *c1, *c2, *c3;
int j1, j2;

/* splice the cell back into the list */
c1 = s->tail;
c2 = P2C(ptr);

if (c2 > c1) { /* append to end of list */
if (ISADJ(c1,c2)) { /* join with last cell */
c1->size += 4 + c2->size;
return 0;
}
c2->next = c1->next;
c1->next = c2;
s->tail = c2;
return 0;
}

while (c1->next < c2) { /* find insertion point */
c1 = c1->next;
}
c3 = c1->next;

j1 = ISADJ(c1,c2); /* c1 and c2 need to be joined */
j2 = ISADJ(c2,c3); /* c2 and c3 need to be joined */

if (j1) {
if (j2) { /* splice all three cells together */
c1->next = c3->next;
c1->size += 4 + c3->size;
}
c1->size += 4 + c2->size;
} else {
c1->next = c2;
if (j2) {
c2->next = c3->next;
c2->size += 4 + c3->size;
} else {
c2->next = c3;
}
}

return 0;
}

int
main(void)
{
char mem[0xFFFFF];
struct simple s;
void *ptr;
int i;
size_t size;

errno = 0;

simple_init(&s, mem, 0xFFFFF);

for (i = 0; i < 10000; i++) {
if (i % 2) {
simple_free(&s, ptr);
}
size = (3 << (i & 3)) + (i & 0xA8); /* generate quantized sizes */
printf("size=%d\n", size);
if ((ptr = simple_alloc(&s, size, 0)) == NULL) {
perror("simple_alloc");
break;
}
}

return EXIT_SUCCESS;
}

Eric Sosman · Oct 22, 2003

Michael said:
Hi,

I've tried to write the *simplest* memory allocator possible. I think it
would be useful in many cases such as allocating memory on stack as a
poor man's garbage collection perhaps.

I was hoping the clc crowd had some ideas for making it even simpler!

In-lined below the full program (132 lines). It's a circular singular
linked list of "cells" inspired by the one in Plauger's text
"The Standard C Library".

I look forward to hearing your ideas,
[code snipped]

Idea #1: Get rid of the magic number `4' lest you come
to grief on a system whose pointers are not four bytes wide.

Idea #2: Think about data alignment, lest you come to
grief on systems that need it.

Idea #3: Reconsider that `ENOMEM' error code, lest you
come to grief on systems that don't define it.

Idea #4: Re-examine the use of the argument `zero' in
the simple_alloc() function, lest people speculate about
what you've been smoking and demand a toke or two.

Idea #5: Speculate on possible uses for the value
returned by simple_free().

I might come up with more Ideas, but I'm already weary
of this code -- and I haven't even *begun* to examine it
for correctness!

Michael B Allen · Oct 23, 2003

Idea #1: Get rid of the magic number `4' lest you come
to grief on a system whose pointers are not four bytes wide.

Idea #2: Think about data alignment, lest you come to
grief on systems that need it.

1 and 2 done. I meant to fix this after I completed the algorithm. I just
forgot.

Idea #3: Reconsider that `ENOMEM' error code, lest you
come to grief on systems that don't define it.

Not sure what to do about this.

Idea #4: Re-examine the use of the argument `zero' in
the simple_alloc() function, lest people speculate about what you've
been smoking and demand a toke or two.

Idea #5: Speculate on possible uses for the value
returned by simple_free().

4 and 5 done (removed). I started out working on a generic "allocator"
interface and got side-tracked. However, if by chance you realized the
zero parameter was meant to indicate that the memory should be set to
zero (like calloc) and still believe such an interface would be offesive,
then please explain why.

Update in-lined below. Interesting it is indeed simpler as it is now only
131 lines! Acutually it's 103 if you don't count the driver program.

Mike

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

#define ALIGNMASK 1U
#define ALIGN(s) (((s) + ALIGNMASK) & ~ALIGNMASK)
#define POFF ALIGN(sizeof(size_t))
#define MINSIZ ALIGN(sizeof(struct cell *))
#define C2P(c) ((char *)(c) + POFF)
#define P2C(p) ((struct cell *)((char *)(p) - POFF))
#define ISADJ(c1,c2) ((struct cell *)(C2P(c1) + (c1)->size) == (struct cell *)(c2))

struct cell {
size_t size;
struct cell *next;
};
struct simple {
struct cell *tail;
};

void
simple_init(struct simple *s, void *mem, size_t size)
{
s->tail = mem;
s->tail->size = size - POFF;
s->tail->next = s->tail;
}
void *
simple_alloc(struct simple *s, size_t size)
{
struct cell *c1, *c2, *c3;

size = size < MINSIZ ? MINSIZ : ALIGN(size);

c1 = s->tail;
while (c1->next->size < size) {
if (c1->next == s->tail) {
errno = ENOMEM;
return NULL;
}
c1 = c1->next;
}
c2 = c1->next;
if (c2->size > (POFF + size)) { /* split new cell */
c3 = (struct cell *)(C2P(c2) + size);
c3->size = c2->size - (size + POFF);
c3->next = c2->next;
c2->size = size;
c1->next = c3;
} else { /* use the entire cell */
c1->next = c2->next;
if (c2 == s->tail) {
s->tail = c1;
}
}

return C2P(c2);
}
void
simple_free(struct simple *s, void *ptr)
{
struct cell *c1, *c2, *c3;
int j1, j2;

/* splice the cell back into the list */
c1 = s->tail;
c2 = P2C(ptr);

if (c2 > c1) { /* append to end of list */
if (ISADJ(c1,c2)) { /* join with last cell */
c1->size += POFF + c2->size;
return;
}
c2->next = c1->next;
c1->next = c2;
s->tail = c2;
return;
}

while (c1->next < c2) { /* find insertion point */
c1 = c1->next;
}
c3 = c1->next;

j1 = ISADJ(c1,c2); /* c1 and c2 need to be joined */
j2 = ISADJ(c2,c3); /* c2 and c3 need to be joined */

if (j1) {
if (j2) { /* splice all three cells together */
c1->next = c3->next;
c1->size += POFF + c3->size;
}
c1->size += POFF + c2->size;
} else {
c1->next = c2;
if (j2) {
c2->next = c3->next;
c2->size += POFF + c3->size;
} else {
c2->next = c3;
}
}
}

int
main(void)
{
char mem[0xFFFFF];
struct simple s;
void *ptr;
int i;
size_t size;

errno = 0;

simple_init(&s, mem, 0xFFFFF);

for (i = 0; i < 10000; i++) {
if (i % 2) {
simple_free(&s, ptr);
}
size = (3 << (i & 3)) + (i & 0xA8); /* generate quantized sizes */
printf("size=%d\n", size);
if ((ptr = simple_alloc(&s, size)) == NULL) {
perror("simple_alloc");
break;
}
}

return EXIT_SUCCESS;
}

Eric Sosman · Oct 23, 2003

Michael said:
]

Idea #3: Reconsider that `ENOMEM' error code, lest you
come to grief on systems that don't define it.

Click to expand...

Not sure what to do about this.

One possibility would be to test whether the C implementation
provides an `ENOMEM' error code:

#ifdef ENOMEM
errno = ENOMEM;
#endif

Of course, even there you're guessing about what the code
actually signifies. strerror(ENOMEM) might conceivably
return "Error: meditational syllable surrounded by spaces"
or some such gibberish. Still, it seems pretty likely that
an error code named `ENOMEM' has something to do with a
shortage of memory.

Another possibility would be to forego setting `errno'
altogether. The ordinary malloc() and friends are not
required to do so, and sometimes don't: On at least one
implementation I use, strerror(errno) after a failed malloc()
returns "no error". Is it necessary for "the *simplest*
memory allocator possible" to do more?

4 and 5 done (removed). I started out working on a generic "allocator"
interface and got side-tracked. However, if by chance you realized the
zero parameter was meant to indicate that the memory should be set to
zero (like calloc) and still believe such an interface would be offesive,
then please explain why.

That was my guess as to the intent of the `zero' argument,
but in fact the code made no effective use of it at all.

A zeroing allocator doesn't seem offensive to me, but neither
does it seem particularly useful. Yes, I do occasionally find a
reason to use calloc() -- but only occasionally, and I would
shed no tears if it suddenly vanished. A simple memset() after
an ordinary malloc() would do just as well, even if it might be
very slightly less efficient.

Michael B Allen · Oct 24, 2003

That was my guess as to the intent of the `zero' argument,
but in fact the code made no effective use of it at all.

A zeroing allocator doesn't seem offensive to me, but neither
does it seem particularly useful. Yes, I do occasionally find a reason
to use calloc() -- but only occasionally, and I would shed no tears if
it suddenly vanished. A simple memset() after an ordinary malloc()
would do just as well, even if it might be very slightly less efficient.

Actually I tested the difference between calloc and malloc w/ memset on
Linux w/ glibc 2.2 and found that an increase in size had little or no
effect on it's performance whereas malloc w/ memset took consideribly
longer. With a large enough allocation (I beleive it was 100MB) calloc
was virtually instantanious when malloc w/ memset locked up my machine for 10
minutes before hitting the power button. Obviously there's some VM tricks
being used there.

Mike

Christian Bau · Oct 24, 2003

Michael B Allen said:
Actually I tested the difference between calloc and malloc w/ memset on
Linux w/ glibc 2.2 and found that an increase in size had little or no
effect on it's performance whereas malloc w/ memset took consideribly
longer. With a large enough allocation (I beleive it was 100MB) calloc
was virtually instantanious when malloc w/ memset locked up my machine for 10
minutes before hitting the power button. Obviously there's some VM tricks
being used there.

linux seems to have this horrible property that it lets you allocate
memory that turns out to be unusable. Seems to have happened here,
although if 100 MB is a problem then this is most likely a relatively
old machine.

If you call free (calloc (...)) then this probably happens quite
quickly, but is pointless anyway. Try calloc'ing 100 MB, then _use_ that
memory and see how long it takes.

Michael B Allen · Oct 24, 2003

linux seems to have this horrible property that it lets you allocate
memory that turns out to be unusable. Seems to have happened here,
although if 100 MB is a problem then this is most likely a relatively
old machine.

Actually I think I was mistaken. When I locked up my machine it was
because I specified a rediculously huge value by accedent.

Although Linux does stuggle a little when stressed for memory. 2.2 would
thrash. 2.4 has an elevator problem. It will be interesting to see how
2.6 performs *when it reaches the stability of 2.2/2.4*.

If you call free (calloc (...)) then this probably happens quite
quickly, but is pointless anyway. Try calloc'ing 100 MB, then _use_ that
memory and see how long it takes.

Ok, let's do this a little better. On my 1.8 GHz Pentium IV mobile
w/ 256MB RAM the below program generates 250000 for malloc w/ memset
and 230000 for calloc. So I guess calloc isn't that useful.

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <time.h>

/* 100MB */
#define N 10000

int
main(void)
{
char *buf;
int i;
clock_t t0 = clock();

for (i = 0; i < N; i++) {

buf = malloc(N);
memset(buf, 0, N);

if ((i % (N / 100)) == 0) {
buf = 'x';
}
}

printf("t=%ld\n", clock() - t0);

return EXIT_SUCCESS;
}
int
main0(void)
{
char *buf;
int i;
clock_t t0 = clock();

for (i = 0; i < N; i++) {

buf = calloc(1, N);

if ((i % (N / 100)) == 0) {
buf = 'x';
}
}

printf("t=%ld\n", clock() - t0);

return EXIT_SUCCESS;
}

Daniel Haude · Oct 24, 2003

On Fri, 24 Oct 2003 08:27:38 +0100,

in Msg. said:
linux seems to have this horrible property that it lets you allocate
memory that turns out to be unusable.

It has been pointed out in this group before that this is a property most
implementations share, and for good reasons too. I don't want to embark on
a memory management discussion here because it's off-topic and I know
virtually nothing about the mechanics of MM, I just wanted to mention that
this is not a Linux-specific thing. I also forgot what the good reasons
were.

--Daniel

xarax · Oct 24, 2003

/snip/

calloc() has an advantage over malloc()+memset(). On better-designed
implementations, the operating system doesn't actually assign
backing-store to virtual memory until the first reference. The
granularity is usually the native page size. For example, assume a
page size of 4096 bytes. If you allocate 100MB using calloc(), that's
25600 pages. calloc() and malloc() will return the pointer. The
performance penalty happens when memset() is used to touch all 25600
pages to set them to zero. In the calloc() case, explicit zeroing
is not needed, because the operating system will set each page to
zeroes on first reference. You can save time by not calling memset(),
which would touch each page, causing a page fault and subsequent
OS activity to set-up paging control blocks.

Some implementations may specify these semantics for malloc(), but
that makes the program non-portable. Relying on malloc() to yield
"first reference" page-aligned memory is a bad idea. Only calloc()
can guarantee that every byte will be zero by the time your program
needs to reference the byte.

If you run a benchmark with calloc()+memset() versus malloc()+memset(),
I would expect the timings to be very nearly identical with most of the
time being spent in memset(), and calloc() versus malloc() identical.
This, of course, is for huge page-aligned allocations of new virtual
memory (not memory recycled through free()).

Some implementations of free() may also take advantage of native
operating system API for resetting virtual memory to its "first
reference" state, so that calloc() knows that the recycled memory
that it is allocating will already be zeroes.

Bottom line: Use calloc() for zeroed memory, rather than malloc()+memset().
Let calloc() decide how best to zero the memory. It will likely do a
better job than you can, and you have more important problems to solve.

--
----------------------------------------------
Jeffrey D. Smith
Farsight Systems Corporation
24 BURLINGTON DRIVE
LONGMONT, CO 80501-6906
303-774-9381
http://www.farsight-systems.com
z/Debug debugs your Systems/C programs running on IBM z/OS!

Eric Sosman · Oct 24, 2003

xarax said:
calloc() has an advantage over malloc()+memset(). On better-designed
implementations, the operating system doesn't actually assign
backing-store to virtual memory until the first reference. [...]
[Remainder of the usual argument snipped.]

First, it is debatable whether such an implementation
is conforming (c.f. Question 7.14 in the FAQ). There's no
point in re-opening the debate for the umpteenth time, but
let's just note that reasonable people differ about calling
such implementations "better-designed."

Second, the "automatically zeroed by the O/S" advantage
obtains only on the first time the memory is allocated. If
it is then freed and re-calloc()ed the advantage vanishes.

Third, the example [snipped for brevity; sorry] of a
100MB calloc()ation is valid. But who needs a hundred meg
of zeroes? Remember, those zeroes are not necessarily valid
as any data type at all other than `unsigned char'. I'd
suggest that if your program needs 100MB of pre-zeroed
`unsigned char', the design is probably improvable.

Finally, the strategy of postponing the actual allocation
until the first reference doesn't so much eliminate cost as
shift it. If you eventually reference the memory you eventually
incur the cost of getting a page from somewhere, assigning or
reserving backing store, and so on. If you're not going to use
the memory, calloc() may well be faster than malloc() plus
memset(). But if you're not going to use the memory, the code
snippet

;

is faster than either.

James Antill · Oct 24, 2003

linux seems to have this horrible property that it lets you allocate
memory that turns out to be unusable. Seems to have happened here,

s/horrible property/horrible default property, IMO,/

If you want to turn memory overcommit off, that's fine. Just do...

echo 3 > /proc/sys/vm/overcommit_memory

....or use sysctl.conf etc.

If you call free (calloc (...)) then this probably happens quite
quickly, but is pointless anyway. Try calloc'ing 100 MB, then _use_ that
memory and see how long it takes.

Although even with overcommit checks, things are not paged into the
application until needed (it's just guaranteed that those pages can be
paged in).

xarax · Oct 24, 2003

Eric Sosman said:
xarax said:

calloc() has an advantage over malloc()+memset(). On better-designed
implementations, the operating system doesn't actually assign
backing-store to virtual memory until the first reference. [...]
[Remainder of the usual argument snipped.]

Click to expand...

First, it is debatable whether such an implementation
is conforming (c.f. Question 7.14 in the FAQ). There's no
point in re-opening the debate for the umpteenth time, but
let's just note that reasonable people differ about calling
such implementations "better-designed."

I was describing an implementation detail, rather than
anything that could be considered conforming. My later
remarks were leaning towards "do it the right way rather
than what you're implementation prefers" for portability.
If you need zeroed memory, then calloc() is always better
(or at least it's never worse) than malloc()+memset().

Second, the "automatically zeroed by the O/S" advantage
obtains only on the first time the memory is allocated. If
it is then freed and re-calloc()ed the advantage vanishes.

I also included remarks about free() that can recognize
when it can ask the OS to mark the memory "first reference"
again, thus discarding its contents from backing storage and
page I/O slots. A subsequent calloc() or malloc() that reuses
such storage will again see "first reference" behavior.

Third, the example [snipped for brevity; sorry] of a
100MB calloc()ation is valid. But who needs a hundred meg
of zeroes? Remember, those zeroes are not necessarily valid
as any data type at all other than `unsigned char'. I'd
suggest that if your program needs 100MB of pre-zeroed
`unsigned char', the design is probably improvable.

I only mention 100MB, because it was mentioned elsewhere. Such
allocations can and do happen in large scale applications (like
data base stuff). The mainframe world can see gigabyte allocations
quite often, so understanding the gotcha's of memory management
are crucial.

Finally, the strategy of postponing the actual allocation
until the first reference doesn't so much eliminate cost as
shift it. If you eventually reference the memory you eventually
incur the cost of getting a page from somewhere, assigning or
reserving backing store, and so on.

In rare circumstances, like when you know for sure that all of
the huge allocation will be touched very soon, there's probably
not much difference. However, smart operating systems can
recognize reference patterns and reasonably predict future
references and "pre-fault" the anticipated pages. For general
rule of thumb, it's usually a good idea to not peek/poke memory
until it's necessary.

If you're not going to use
the memory, calloc() may well be faster than malloc() plus
memset(). But if you're not going to use the memory, the code
snippet

;

is faster than either.

LOL.

It's more commonly a case of not knowing how much of it you
need, so you swag a gob of memory to reduce fragmentation
and off-load the burden to the OS. I am not saying this is
good design, because each case is unique. Sometimes, simple
incremental allocations with linking are better. Other times,
a swag for a huge contiguous chunk may be better. There are
times when you cannot predict exactly how much memory you
need, but can calculate a reasonable guess.

My bottom line still holds. If you happen to have a good
implementation of calloc()+free(), then you are just that
much better off compared to having a naive implementation.

--
----------------------------------------------
Jeffrey D. Smith
Farsight Systems Corporation
24 BURLINGTON DRIVE
LONGMONT, CO 80501-6906
http://www.farsight-systems.com
z/Debug debugs your Systems/C programs running on IBM z/OS!

Eric Sosman · Oct 24, 2003

xarax said:
Eric Sosman said:

Third, the example [snipped for brevity; sorry] of a
100MB calloc()ation is valid. But who needs a hundred meg
of zeroes? Remember, those zeroes are not necessarily valid
as any data type at all other than `unsigned char'. I'd
suggest that if your program needs 100MB of pre-zeroed
`unsigned char', the design is probably improvable.

Click to expand...

I only mention 100MB, because it was mentioned elsewhere. Such
allocations can and do happen in large scale applications (like
data base stuff). The mainframe world can see gigabyte allocations
quite often, so understanding the gotcha's of memory management
are crucial.

I do not deny the need for large allocations, but the
need for large zero-filled allocations. A hash table is
the only example that comes to (my) mind where zero-filling
is potentially useful, but even there the benefit is dubious:

- As soon as you start populating the table you'll be
touching most of the slots, and in an essentially
"random" pattern (assuming your hash function is a
good one). Most of the page-faulting saved by a
zero-on-reference implementation is going to occur
almost immediately afterwards anyhow.

- It's still a non-portable technique, because the hash
table elements are probably something other than plain
`unsigned char'. (Of course, some platform-dependency
may be acceptable; after all, 100MB allocations are
already ntoo big to be portable!)

In any event, the OP's "*simplest* memory allocator possible"
is unable to take advantage of any such trickery. In fact, in
his intended use the "arena" from which the memory is taken will
be an `auto' array whose prior content is unreliable! So whatever
our opinions on the merits of calloc() -- and I guess we'll have
to agree to disagree -- there seems to be not much reason other
than "work-alike-ness" for him to offer a zeroing allocator.

Michael B Allen · Oct 25, 2003

On Thu, 23 Oct 2003 04:23:17 -0400, Michael B Allen wrote:
<snip all code>

Natrually there were a few bugs in this code. Feel free to contact me if
you acutally care to see the update. Or google libmba which is where it
will ultimately reside.

Mike

Problem with static data members in templated class	11	Oct 4, 2010
Endian Independence	25	Jul 27, 2008
simple region allocator...	2	Nov 24, 2008
stream_cast fails with case insensitive string	0	May 6, 2011
help with small example of file operations	9	Jun 1, 2011
can't stream cast from a case insensitive string	0	Oct 9, 2011
Bubble Sort in C	48	Sep 14, 2009
Adding adressing of IPv6 to program	1	Feb 16, 2023

Simple Memory Allocator Challenge

Michael B Allen

Eric Sosman

Michael B Allen

Eric Sosman

Michael B Allen

Christian Bau

Michael B Allen

Daniel Haude

xarax

Eric Sosman

James Antill

xarax

Eric Sosman

Michael B Allen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads