A debugging implementation of malloc

Ian Collins · Jun 25, 2006

Keith said:
Really? long double has been standard since ANSI C89; any C compiler
that doesn't provide it is non-conforming. (It can have the same
characteristics as long, so it's just a matter of recognizing the
syntax.)

OK, scratch didn't have (I didn't check) and emphasise we didn't use
them, neither target had an FPU, so FP operations where strongly
discouraged.

Ian Collins · Jun 25, 2006

Malcolm said:
And so gibberish types multiply.
Why not just go back to malloc() taking an int?

Wouldn't be ideal in a 64 bit system, long would be a better bet. But
there still isn't a guarantee it would be the same size as size_t.

Keith Thompson · Jun 25, 2006

jacob navia said:
Thanks for the answers.
1) True, I do not consider alignment problems, and that could
be a problem in machines where integers are aligned at some
power of 2 boundary. A work-around would be to just
memcpy (or equivalent) and memcmp (or an equivalent loop)
instead of accessing the signature at the end of the
block as an integer.

If I remember and understand the code correctly, you allocate two
extra ints at the start of the block, and one at the end, and return
(malloc() + 2 * sizeof(int)). (That's not a valid expression, but you
get the idea.)

If the two ints cause the rest of the block to be misaligned, using
memcpy() to access the int you put at the end of the block will help
only for that int; the user's data will still be potentially
misaligned (depending on what's stored in it).

[...]

3) A problem arises when size_t is much bigger than
sizeof int. In 64 bit windows, for instance, this
code would truncate a size_t when the user attempts
to allocate more than 2GB of memory in a single block.

So don't truncate it. Store it as a size_t.

A problem arises then here. What should this routine do with
an allocation request of that size? It could be a legal
request, but it would also be the result of a negative
number being passed to malloc...
Microsoft proposed in the TR 24731 to coin a new type
rsize_t that would be the maximum value of a legal request
for an object, and limited to a signed int, in 32 bit systems
2GB. I think the solution would be along those lines here.

Perhaps, but since rsize_t isn't in the language, there's no portable
way to define it or determine its upper bound.

My advice:

Store the size as a size_t. Don't even think about truncating it. If
the code is intended to be portable, you can't assume that
malloc((size_t)INT_MAX + 1)
or
malloc((size_t)INT_MAX * 2)
is invalid.

Define a constant for the maximum alignment in bytes, and use it
when you decide how much to allocate. For example:
#define MAX_ALIGN (sizeof(long long))

The offset of the user's data in the malloc()ed block would then be
max(MAX_ALIGN, 2 * sizeof(size_t)).

If you want to check for accidental negative values, define a constant
and do an explicit check in your code; treat any attempt to allocate
more than that as an error. For example:
#define MAX_ALLOC (SIZE_MAX / 2)

For systems where some type requires stricter alignment than long
long, or where the maximum sensible allocation is something other than
SIZE_MAX / 2, the constants are easily configured by editing a line or
two in the source (or using whatever configuration system you might
use).

Skarmander · Jun 25, 2006

Barry Schwarz wrote:

It is legal for the user to malloc 0 bytes. I could not find a
description of what memset does if the third argument is 0. Let's
hope it checks first.

What do you mean? Why would it have to check anything? It's just as legal
for the user to memset() 0 bytes.

My copy of the standard says "void *memset(void *s, int c, size_t n); [..]
The memset function copies the value of c (converted to an unsigned char)
into each of the first n characters of the object pointed to by s." This
statement is well-defined if n == 0; memset() must do nothing.

Sometimes edge cases are special; it's a good idea if they're not.

S.

Eric Sosman · Jun 25, 2006

Ben said:
This is a good idea. If you know a little about the target
platform, you can even choose particularly good garbage. In
"debug" allocators I have designed, I have chosen 0xcc, because
on x86 that is a "debug trap" instruction when executed and it's
unlikely to be a valid pointer value for user programs. (I think
I originally got this idea from a book.)

Fill patterns that resemble small odd negative values make
good garbage:

- negative, because negative values will be "obviously
wrong" in many contexts

- small, because if interpreted as unsigned values on a
two's complement machine the values will be "huge," and
stand a good chance of being detected as "ridiculous"

- odd, because if interpreted as a pointer there's a
decent chance of provoking a data-alignment trap

0x9F was used in one commercial product I worked on, and
0xBADD was seen a lot in the 16-bit days. 0xDEADBEEF is a
popular favorite. I'm personally rather fond of 0xDECEA5ED,
but that's mostly for its amusement value.

Keith Thompson · Jun 25, 2006

Barry Schwarz said:
This doesn't solve the problem that your are returning an address
*only* two int beyond the allocated address.

Two possibilities come to mind:

On the very first call to your function, allocate a small
block (sizeof(long long)+sizeof(long double)). Convert the address to
unsigned long. Determine N, the number of low order zeros in the
result. Instead of using sizeof(int) as your delta, use pow(2,N).
Free the memory and get on with what the user requested.

I've worked on systems where the number of low-order zeros in the
representation of an address did *not* correspond directly to the
alignment in bytes of the address. (Cray vector systems, where
hardware addresses denote 64-bit words, and byte pointers
(CHAR_BIT==8) are implemented in software by storing an offset in the
high-order 3 bits of the pointer.)

On the very first call, compute max(sizeof(short),
sizeof(int), sizeof(size_t), sizeof(ptrdiff_t), sizeof(long long),
sizeof(long double), ...), betting on the assumption that the longest
object imposes the strictest alignment.

You can also create a union of all those types, then wrap it in a
struct:

struct foo {
char c;
union everthing u;
};

If you've included enough stuff in "union everything", then
offsetof(struct foo, u) will probably be the strictest alignment,
suitable for use by malloc(). If long long is 8 bytes and requires
only 4-byte alignment, this method will give you 4 rather than 8,
which could save some space.

Keith Thompson · Jun 25, 2006

Malcolm said:
And so gibberish types multiply.
Why not just go back to malloc() taking an int?

Because it won't always work. There are good reasons for int to be 32
bits even on 64-bit systems. (with 8-bit char, making int 64 bits
would leave a hole in the type system; short would be either 16 or 32
bits). But there's no reason to forbid malloc()ing more than INT_MAX
bytes.

websnarf · Jun 26, 2006

Eric said:
jacob said:

In the C tutorial for lcc-win32, I have a small chapter
about a debugging implementation of malloc.
[...]

Click to expand...

Others have commented on the alignment issues, and on
the wisdom of initializing newly-allocated memory to a
"useful" value like all zeroes. I've got a few further
suggestions:

When an allocation is released, consider filling it
with garbage.

MSVC's debug-mode heap fills free entries with a predictable pattern
that is checked when it is used for a subsequent allocation. If you
are willing to conceed this amount of a performance hit, then I would
endorse MSVC's strategy, as it allows you to detect double-frees, and
other heap corruptions in general.

[...] Consider keeping the "metadata" -- addresses, sizes,
and so on -- separate from the allocated data. The idea is to
make the metadata less vulnerable to common off-by-one errors.

Right, but then you are also less likely to detect some kinds of
errors. For example:

ptr1 = malloc(100);
ptr2 = malloc(100);
memcpy (ptr-16, ptr-16, 132);

The memcpy is erroneous, but if you are using a constant signature,
then there will appear to be no detectable error. Mixing the meta-data
in with the real data payloads makes it more "fragile"; but it is this
fragility that is precisely what you want to focus on to detect errors
more exhaustively.

(I've used a skip-list with the block address as a key for
this purpose; that's not portable because C doesn't define
comparison between pointers to "unrelated" objects, but it
works for mainstream C implementations.) Maintaining the
metadata separately involves more overhead, but this is, after
all, a debugging package.

In implementing my own debugging heaps, I've come to the conclusion
that you can do very good corruption detection with very little
performance hit. This is an important point, since it makes the option
of shipping the release version with the debug heap enabled
realistically available to you.

Consider adding a sanity-checker that visits all the
allocated areas and checks their signatures for signs of
damage. Making the checker callable by the user can help in
"wolf fence" debugging. If sufficiently desperate, sanity-
check at every transaction.

More generally, you should just make a heap-walker, like WATCOM C/C++'s
Clib includes. This would allow you to walk all allocated heap
entries, and sanity check them one by one. The programmer could then
instrument their own code with such checks as necessary.

My own debug heap includes a full heap implementation as well, so I
also track and walk the free entries as well.

I found it helpful to keep a sequence number in the
metadata for each block, and to let the user program query
the current value. This is useful when tracking down leaks:

open files, initialize libraries, ...
epoch = getCurrentSeqno();
do something that should be memory-neutral ...
listSurvivingAllocationsSince(epoch);

... where the final call just traverses all the allocations
that haven't been freed, reporting any whose sequence numbers
are greater than `epoch'.

This sounds interesting. But I would also add in __FILE__, __LINE__ as
well. In fact, in my debug heaps, I typically have:

void * dbgMalloc_internal (size_t size, const char * setto__FILE__,
int setto_LINE__);
#define dgbMalloc(s) dbgMalloc_internal ((s), __FILE__, __LINE__);
/* ... etc. ... */

If you need function pointers, you can just make wrapper functions.

There are two other very useful ideas to put into a debug heap manager.
As you malloc memory, convert the pointer to a intptr_t, and keep
track of the minimum and maximum values that are generated by the
allocator. These values usually correspond to a fairly narrow range of
integer addresses. Then the debug free function would first check that
the (intptr_t) cast value of its parameter was within the range of all
possible pointers that were returned by the debug malloc, calloc and
realloc functions. This allows you to (with very high probability)
detect attempts to free memory that didn't come from the heap in the
first place.

The second is how I do the signatures. Using a random constant (like
0x1234567) on the head and foot works fine, but won't help you detect
an over-large memcpy such as I've shown in my first example above. So
for the signature, I usually pick the value:

((intptr_t) ptr) ^ MAGIC_CONST

Where ptr is exactly equal to the address of the position where each
signature is stored. So every signature is address specific (and thus
not movable) while still retaining the same bit entropy as your
MAGIC_CONST.

Richard Tobin · Jun 26, 2006

Ian Collins said:
I've used max(sizeof(void*), sizeof(long double)) as a 'fairly safe'
value for alignment in similar situations.

Some implementations do not require doubles (for example) to have
alignment "as big as" their size. If you wanted to take advantage
of this you could declare structs such as

struct {char a; double b;}

and use offsetof() to get the alignment.

-- Richard

Eric Sosman · Jun 26, 2006

Eric said:
Eric said:

jacob navia wrote:

[...] Consider keeping the "metadata" -- addresses, sizes,
and so on -- separate from the allocated data. The idea is to
make the metadata less vulnerable to common off-by-one errors.

Click to expand...

Right, but then you are also less likely to detect some kinds of
errors. For example:

ptr1 = malloc(100);
ptr2 = malloc(100);
memcpy (ptr-16, ptr-16, 132);

(I guess the two appearances of ptr are supposed to
be ptr1 and ptr2. Not too sure where the 132 came from;
are you assuming that the two allocations are contiguous?
Hmmm -- maybe not. Moot point, though, because the problem
is not with the separate metadata but with the constant
signature, and as it turns out neither of us uses a constant
signature.)

The memcpy is erroneous, but if you are using a constant signature,
then there will appear to be no detectable error. Mixing the meta-data
in with the real data payloads makes it more "fragile"; but it is this
fragility that is precisely what you want to focus on to detect errors
more exhaustively.

Yah. Instead of a constant signature, I've used a block-
specific signature formed by hashing the block address with
the block length. I plant a copy of the signature before and
after the "payload" of each block, and check it on free() and
realloc(). (Reading further along in your reply, I see you've
done something fairly similar.)

The separately-allocated metadata provides a further validity
check. Instead of just trusting the pointer given to free() or
realloc(), use that pointer as a key and search for it in the
metadata. Not found? Bad argument! Found? That's nice, now
check the block signatures for signs of damage.

No one scheme seems likely to detect every error, so there's
a sort of guessing game about what sorts of error are both likely
and resistant to detection by other means. Then you put together
a data structure that helps detect the likely errors, but is not
particularly likely to be damaged by them. Nothing's perfect, of
course; there's really no defense against

for (;

*(*argv + rand()) = 42;

Keith Thompson · Jun 26, 2006

Eric Sosman said:
jacob said:

In the C tutorial for lcc-win32, I have a small chapter
about a debugging implementation of malloc.
[...]

Click to expand...

Others have commented on the alignment issues, and on
the wisdom of initializing newly-allocated memory to a
"useful" value like all zeroes. I've got a few further
suggestions:

When an allocation is released, consider filling it
with garbage. This may help catch erroneous uses of an
area after it's been freed, as in the famously incorrect
code for freeing a linked list:

for (ptr = head; ptr != NULL; ptr = ptr->next)
free(ptr);

[...]

Better yet: when allocating memory, fill it with one kind of garbage
(to detect, though not reliably, the error of accessing uninitialized
memory), and when releasing it, fill it with another kind of garbage
(to detect, though not reliably, the distinct error of attempting to
access freed memory). It could be a substantial performance hit, but
it might be worth it if it detects errors.

Eric Sosman · Jun 26, 2006

Keith said:
Eric Sosman said:

jacob said:

In the C tutorial for lcc-win32, I have a small chapter
about a debugging implementation of malloc.
[...]

Click to expand...

Others have commented on the alignment issues, and on
the wisdom of initializing newly-allocated memory to a
"useful" value like all zeroes. I've got a few further
suggestions:

When an allocation is released, consider filling it
with garbage. This may help catch erroneous uses of an
area after it's been freed, as in the famously incorrect
code for freeing a linked list:

for (ptr = head; ptr != NULL; ptr = ptr->next)
free(ptr);

Click to expand...

[...]

Better yet: when allocating memory, fill it with one kind of garbage
(to detect, though not reliably, the error of accessing uninitialized
memory), and when releasing it, fill it with another kind of garbage
(to detect, though not reliably, the distinct error of attempting to
access freed memory). It could be a substantial performance hit, but
it might be worth it if it detects errors.

I fill newly-allocated blocks with 0xFEEDC0DE, and fill
released blocks with 0xDECEA5ED. (0xFEEDC0DE violates one of
the suggested guidelines I posted elsethread, but I've been
unable to change it due to an arm injury suffered while trying
to pat myself on the back over my own cleverness. ;-)

Skarmander · Jun 26, 2006

Keith said:
Because it won't always work. There are good reasons for int to be 32
bits even on 64-bit systems. (with 8-bit char, making int 64 bits
would leave a hole in the type system; short would be either 16 or 32
bits). But there's no reason to forbid malloc()ing more than INT_MAX
bytes.

Isn't "int" supposed to be the "natural" integer type for a platform? The
"hole" argument is compelling up to a point, but an implementation can
always provide int_least32_t and int_fast32_t for code that needs such types
(which, for ease of portability, will typically hide these behind typedefs
anyway).

Today's systems properly make the tradeoff, but I can imagine 64-bit
architectures where having a 32-bit int would be silly. ('short' would
indeed be likely to be 32 bits there, and there might be a separate 16-bit
type not covered by the regular types).

S.

pete · Jun 26, 2006

Skarmander said:
Isn't "int" supposed to be the "natural" integer type for a platform?

By tradition, but not by force of law.

Keith Thompson · Jun 26, 2006

Skarmander said:
Isn't "int" supposed to be the "natural" integer type for a platform?

Sure, but "natural" is a loose enough concept that the requirement is
unenforceable.

The "hole" argument is compelling up to a point, but an implementation
can always provide int_least32_t and int_fast32_t for code that needs
such types (which, for ease of portability, will typically hide these
behind typedefs anyway).

Sure, for C99. I don't think C99 has yet influenced the choice
of sizes of predefined types. A lot of C code is written
to be portable to C90 implementations; such code cannot use
<stdint.h>. (Unless it uses Doug Gwyn's q8 implementation,

Today's systems properly make the tradeoff, but I can imagine 64-bit
architectures where having a 32-bit int would be silly. ('short' would
indeed be likely to be 32 bits there, and there might be a separate
16-bit type not covered by the regular types).

I don't have to imagine them. I've used systems with:

char 8 bits
short 32 bits
int 64 bits

and others with

char 8 bits
short 64 bits
int 64 bits

The lack of a 32-bit integer type on the latter was inconvenient at
times. (The implementation was C90 only.)

(If you're curious, the systems were a Cray T3E (based on the Alpha
processor) and a Cray T90 (vector system), respectively.)

Frederick Gotham · Jun 26, 2006

Keith Thompson posted:

and others with

char 8 bits
short 64 bits
int 64 bits

The lack of a 32-bit integer type on the latter was inconvenient at
times. (The implementation was C90 only.)

I'm curious, why is the lack of a 32-Bit integer type inconvenient? It
would seem that the 64-Bit integer types would provide all the
"functionality" you need... unless you're depending on 0 rolling backwards
into 4294967295?

Skarmander · Jun 26, 2006

Keith said:
Sure, but "natural" is a loose enough concept that the requirement is
unenforceable.

Sure, for C99. I don't think C99 has yet influenced the choice
of sizes of predefined types. A lot of C code is written
to be portable to C90 implementations; such code cannot use
<stdint.h>. (Unless it uses Doug Gwyn's q8 implementation,
<http://www.lysator.liu.se/c/q8/index.html> (link currently down).)

Clarify: I meant to imply that implementations are free to provide
nonstandard integer types as an extension. (Like, say, __int32.) Portable
code that needs exactly-sized integer types will usually use its own typedef
sequestered in a header; customizing this is easy.

If implementations offer additional integer types through C99 standard types
that's even better; I was just using the C99 types to unambiguously describe
the kind of types I'm talking about.

I don't have to imagine them. I've used systems with:

char 8 bits
short 32 bits
int 64 bits

and others with

char 8 bits
short 64 bits
int 64 bits

The lack of a 32-bit integer type on the latter was inconvenient at
times. (The implementation was C90 only.)

(If you're curious, the systems were a Cray T3E (based on the Alpha
processor) and a Cray T90 (vector system), respectively.)

Thanks for the information. I've only been familiar with the recent x86-64
platforms, which for obvious reasons make very sure to keep 32-bit types around.

S.

Keith Thompson · Jun 26, 2006

Frederick Gotham said:
Keith Thompson posted:

I'm curious, why is the lack of a 32-Bit integer type inconvenient? It
would seem that the 64-Bit integer types would provide all the
"functionality" you need... unless you're depending on 0 rolling backwards
into 4294967295?

The particular case I'm thinking of involved an externally defined
data format. The code, which was in a system header file, used a
32-bit bit field where most other implementations used an ordinary
struct member. This caused problems for code that took that member's
address.

A workaround was possible, but it was, as I said, inconvenient.

ena8t8si · Jul 17, 2006

Eric said:
jacob said:

In the C tutorial for lcc-win32, I have a small chapter
about a debugging implementation of malloc.
[...]

Click to expand...

Others have commented on the alignment issues, and on
the wisdom of initializing newly-allocated memory to a
"useful" value like all zeroes. I've got a few further
suggestions:

When an allocation is released, consider filling it
with garbage. This may help catch erroneous uses of an
area after it's been freed, as in the famously incorrect
code for freeing a linked list:

for (ptr = head; ptr != NULL; ptr = ptr->next)
free(ptr);

release() leaks the memory that's passed to it, so the
package isn't suitable for debugging programs that churn
through a lot of allocations and releases and expect the
released memory to be recycled. Consider passing released
areas to free(), perhaps not immediately (so the garbage
mentioned above has time to cause damage). You might make
a FIFO list of released areas, and let it grow to a few
thousand entries before handing the oldest to free(). You
might even postpone free() until malloc() returns NULL.

Consider keeping the "metadata" -- addresses, sizes,
and so on -- separate from the allocated data. The idea is to
make the metadata less vulnerable to common off-by-one errors.
(I've used a skip-list with the block address as a key for
this purpose; that's not portable because C doesn't define
comparison between pointers to "unrelated" objects, but it
works for mainstream C implementations.) Maintaining the
metadata separately involves more overhead, but this is, after
all, a debugging package.

Consider adding a sanity-checker that visits all the
allocated areas and checks their signatures for signs of
damage. Making the checker callable by the user can help in
"wolf fence" debugging. If sufficiently desperate, sanity-
check at every transaction.

I found it helpful to keep a sequence number in the
metadata for each block, and to let the user program query
the current value. This is useful when tracking down leaks:

open files, initialize libraries, ...
epoch = getCurrentSeqno();
do something that should be memory-neutral ...
listSurvivingAllocationsSince(epoch);

... where the final call just traverses all the allocations
that haven't been freed, reporting any whose sequence numbers
are greater than `epoch'.

An excellent collection of suggestions. *applause*

ena8t8si · Jul 17, 2006

Chris said:
Barry said:

At this point [where r is the result of a malloc() call] r contains
the address of a block of memory properly aligned for any possible
object.

Click to expand...

Click to expand...

Is there a portable way of finding out what this alignment is?

Click to expand...

No. I consider this a flaw in the C standards.

If there *were* a portable way to find this, you could write portable
"augmented malloc and free" routines along the lines of those Mr Navia
has provided (with more work to deal with alignment, of course). ...

I think you mean to add a qualifier that pointers
can be turned into integers and operated on sensibly.
Otherwise knowing the alignment doesn't help.

Alternative to Malloc in C	0	May 3, 2022
Naive Custom Malloc Implementation	8	Nov 11, 2009
Adding adressing of IPv6 to program	1	Feb 16, 2023
a fast malloc/free implementation & benchmarks	0	Mar 20, 2011
my little implementation of malloc	52	Jan 27, 2007
malloc	40	May 1, 2011
Array implementation of Stack	80	Jun 16, 2011
hash table - simple implementation	16	Oct 17, 2012

A debugging implementation of malloc

Ian Collins

Ian Collins

Keith Thompson

Skarmander

Eric Sosman

Keith Thompson

Keith Thompson

websnarf

Richard Tobin

Eric Sosman

Keith Thompson

Eric Sosman

Skarmander

pete

Keith Thompson

Frederick Gotham

Skarmander

Keith Thompson

ena8t8si

ena8t8si

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads