Debugging memory leaks

F

fmassei

Hello!
I made a short piece of code that I find very useful for debugging,
and I wanted to ask you if it is correct, somehow acceptable or if I
simply reinvented the wheel.
To deal with some bad bugs caused by memory leaks I ended up with this
simple solution: I made one header file that, when included, replaces
the malloc/calloc/realloc/free functions with some other functions
that do the actual job and insert (or remove) the pointers to the
allocated memory in a list. This way I can catch many errors as double
frees, allocations of zero size, or missing calls to free.
Obviously it works only when a given #define is set (NDEBUG, in my
case).
What do you think about it?
Thank you in advance!


(If you want to see the code I wrote a little page here:
http://technicalinsanity.org/out/simplegc/index.html)
 
B

Ben Pfaff

I made a short piece of code that I find very useful for debugging,
and I wanted to ask you if it is correct, somehow acceptable or if I
simply reinvented the wheel.
To deal with some bad bugs caused by memory leaks I ended up with this
simple solution: I made one header file that, when included, replaces
the malloc/calloc/realloc/free functions with some other functions
that do the actual job and insert (or remove) the pointers to the
allocated memory in a list. This way I can catch many errors as double
frees, allocations of zero size, or missing calls to free.
Obviously it works only when a given #define is set (NDEBUG, in my
case).
What do you think about it?

You have reinvented the wheel. Many programmers have done
similarly to debug memory leaks. (That doesn't make it any less
useful to do so.)
 
D

dj3vande

Hello!
I made a short piece of code that I find very useful for debugging,
and I wanted to ask you if it is correct, somehow acceptable or if I
simply reinvented the wheel.
To deal with some bad bugs caused by memory leaks I ended up with this
simple solution: I made one header file that, when included, replaces
the malloc/calloc/realloc/free functions with some other functions
that do the actual job and insert (or remove) the pointers to the
allocated memory in a list. This way I can catch many errors as double
frees, allocations of zero size, or missing calls to free.
Obviously it works only when a given #define is set (NDEBUG, in my
case).
What do you think about it?

NDEBUG is usually #define'd to indicate that you DON'T want debugging
features turned on in the code. So if I'm understanding you correctly
and you're using it to turn on your debugging malloc family
replacement, that's probably Not Such A Good Idea.

Other than that, it sounds like you've reinvented a particularly useful
wheel; if you don't already have a wheel of that size and load-bearing-
ness, your version will probably do fine.

<OT>
The standard library malloc on my Mac supports most of the checking
features you describe, and allows them to be turned on by setting
appropriate environment variables before running the program. I
strongly suspect that this was inherited from FreeBSD, and would be
unsurprised if other BSDs have similar debugging support.

If you're using Linux/x86, you should take a look at Valgrind, which
can do all of that and more.
</OT>

Note that by replacing malloc and friends, you're breaking the contract
that the definition of the C language specifies between you and the
implementor of your compiler; it's possible that you'll break something
by doing this.
(For debugging memory problems, this is probably acceptable. The worst
that can happen is that you break things in a way that interferes with
your debugging, and even in that case reverting to the non-malloc-
debugging build will leave you back where you started, and all you lose
is the getting a little bit farther ahead that you were hoping for.)

If you haven't already implemented them (I didn't look at your code),
you might also find a few wrapper macros useful:
--------
#undef malloc
#define malloc(sz) my_malloc(sz,__FILE__,__LINE__)
#undef free
#define free(ptr) my_free(ptr,__FILE__,__LINE__)
#undef realloc
#define realloc(ptr,sz) my_realloc(ptr,sz,__FILE__,__LINE__)
#undef calloc
#define calloc(num,sz) my_calloc(num,sz,__FILE__,__LINE__)
--------
This allows your debugging versions (which will need to take filename
and line arguments, of course) to track where they were called from
and report that when they detect problems.


dave
 
J

jameskuyper

Hello!
I made a short piece of code that I find very useful for debugging,
and I wanted to ask you if it is correct, somehow acceptable or if I
simply reinvented the wheel.
To deal with some bad bugs caused by memory leaks I ended up with this
simple solution: I made one header file that, when included, replaces
the malloc/calloc/realloc/free functions with some other functions
that do the actual job and insert (or remove) the pointers to the
allocated memory in a list. This way I can catch many errors as double
frees, allocations of zero size, or missing calls to free.
Obviously it works only when a given #define is set (NDEBUG, in my
case).
What do you think about it?

By convention, macros such as gc_need_real should be all upper-case.
Of course, this is only a convention, but a useful one. If it is
#defined, you should keep in mind that there may already be standard
library macros defined with the same names you use. You should #undef
any such definitions before replacing them with your own. In
principle, if gc_need_real is #defined, the behavior of any program
which calls the normal malloc() familiy is undefined. In practice,
replacement of standard library functions often works, and is a
popular method of performing memory leak checks.

Keep in mind that your approach is completely useless for
investigating memory leaks that are due to direct calls to the
malloc() family from libraries that were built without using your
header files.

You are quite right to worry about whether you are "reinventing the
wheel". You should check to see whether a debugging version of the
malloc() family is provided by your implementation. If one is
provided, it's probably safer to use than your own replacement. Also,
this same technique is used by a number of memory leak testing
packages, some of them much more sophisticated than anything you could
easily write. In both cases, the replacement library functions
actually replace the normal standard library functions, rather than
wrapping them. As a result, you can even detect memory leaks that
occur in libraries that were compiled without using your header file.
 
P

Peter Nilsson

If you haven't already implemented them (I didn't look
at your code), you might also find a few wrapper macros
useful:
--------
#undef malloc
#define malloc(sz) my_malloc(sz,__FILE__,__LINE__)
#undef free
#define free(ptr) my_free(ptr,__FILE__,__LINE__)
#undef realloc
#define realloc(ptr,sz) my_realloc(ptr,sz,__FILE__,__LINE__)
#undef calloc
#define calloc(num,sz) my_calloc(num,sz,__FILE__,__LINE__)
--------

These potentially cancel the idempotence of the <stdlib.h>
header. Better to supply (and use) genuine wrappers...

#ifndef NDEBUG
#define wrap_malloc(sz) my_malloc(sz, __FILE__, __LINE__)
#else
#define wrap_malloc(sz) malloc(sz)
#endif
 
P

Peter Nilsson

(If you want to see the code I wrote a little page
here:http://technicalinsanity.org/out/simplegc/index.html)

Some obvious things from a quick glance... some are style
issues, many are correctness issues...

Your include guards use identifiers reserved for the
implementation. Suggest you replace __SIMPLEGC_H__
with H_SIMPLEGC_H.

You might want to swap the comments...

#include <stdlib.h> /* needed: printf */
#include <stdio.h> /* needed: malloc/calloc/realloc/free */

Better still, delete them.

You incorrectly print size_t values with %d and pointers
with %x.

You should print debugging comments to stderr, not stdout.

You should use const char *, instead of char * when
declaring parameters that don't change the pointed to
string...

void *gc_malloc(size_t size, char *fname, size_t fline);

Also, I'd go with long, not size_t for fline. I've used
implementations where size_t's range is 0..65535, but
__LINE__ could exceed that. Of course, the same problem
exists if there are LONG_MAX lines, but I was much less
concerned about that. ;)

Note: NDEBUG means 'NO DEBUG'.

In any case, I suggest that you _not_ make your utility
functions subject to NDEBUG. Simply define the functions
you need. If they aren't used, they aren't used. Smart
linkers will remove them, but even if they don't, suppose
you're linking two modules where you want the debugging
on one, but not the other.

Two things regarding...

if ((p->fname = malloc(strlen(fname)+1))==NULL) {

Supressing the macro with (malloc)(...) is simpler than
the gv_need_real kludge. Also, rather than copying the
string, I'd just assign it and put a condition on your
function that they must constant static duration strings,
e.g. string literals. [I doubt this will worry anyone
who uses your function.]
 
N

Nate Eldredge

Peter Nilsson said:
Also, I'd go with long, not size_t for fline. I've used
implementations where size_t's range is 0..65535, but
__LINE__ could exceed that. Of course, the same problem
exists if there are LONG_MAX lines, but I was much less
concerned about that. ;)

Hmm. On such implementations, assuming `int' was also 16 bits, did an
occurence of __LINE__ on line 65537 expand to `65537L'? It seems like
it would be a bug if it didn't.

This is an interesting issue, because ordinarily the preprocessor
wouldn't know about things like the ranges for types. Also, people use
C preprocessors for a lot of non-C languages, and I bet they don't
expect that behavior.
 
K

Keith Thompson

Nate Eldredge said:
Hmm. On such implementations, assuming `int' was also 16 bits, did an
occurence of __LINE__ on line 65537 expand to `65537L'? It seems like
it would be a bug if it didn't.

The L suffix isn't necessary. If int is 16 bits, then the unadorned
constant 65537 is of type long.

[...]
 
N

Nate Eldredge

Keith Thompson said:
The L suffix isn't necessary. If int is 16 bits, then the unadorned
constant 65537 is of type long.

Aha. 6.4.4.1 (5). Thanks. I think I've been using the L suffix
unnecessarily all along.
 
F

fmassei

Thank you all, you gave me a lot of inputs and good suggestions! I'm
going to modify the code asap.
Regarding the use of implementation dependent tools of course you are
right, but I'd like to have an implementation independent tool (even
if very very simple) just to be free to debug the applications in all
the target environments. That is the main reason because I prefer not
to use very advanced softwares like valgrind, even if sometimes they
can actually save your day.
 
C

Chris Ahlstrom

After takin' a swig o' grog, (e-mail address removed) belched out
this bit o' wisdom:
Hello!
I made a short piece of code that I find very useful for debugging,
and I wanted to ask you if it is correct, somehow acceptable or if I
simply reinvented the wheel.
To deal with some bad bugs caused by memory leaks I ended up with this
simple solution: I made one header file that, when included, replaces
the malloc/calloc/realloc/free functions with some other functions
that do the actual job and insert (or remove) the pointers to the
allocated memory in a list. This way I can catch many errors as double
frees, allocations of zero size, or missing calls to free.
Obviously it works only when a given #define is set (NDEBUG, in my
case).

(If you want to see the code I wrote a little page here:
http://technicalinsanity.org/out/simplegc/index.html)

If you're on a platform that supports it, you can use valgrind
to check for leaks and other issues. It's pretty cool:

http://valgrind.org/

Also Electric Fence:

http://perens.com/works/software/
 
P

Paul Hsieh

Hello!
I made a short piece of code that I find very useful for debugging,
and I wanted to ask you if it is correct, somehow acceptable or if I
simply reinvented the wheel.

Yes, you are reinventing the wheel. However, the C standard
practically *demands* that you re-invent the wheel, because modern
systems require that you make numerous complicated decisions in
implementing this that the C standard does not address. (I.e., the
right answer is that the C standard should simply *provide* greater
dynamic memory functionality which makes this either easier to
implement or fall trivially out of the C standard. Force the compiler
vendors to make the platform more portable, not developers.)
To deal with some bad bugs caused by memory leaks I ended up with this
simple solution: I made one header file that, when included, replaces
the malloc/calloc/realloc/free functions with some other functions
that do the actual job and insert (or remove) the pointers to the
allocated memory in a list.

Well ... if all you want to do is see if you are leaking, you could
just keep a count of the size of each allocation in a hidden header,
and just add and subtract as necessary and then print the last total
in an atexit() routine or something like that.

This is the real problem with the C standard, is that you can do a
*LOT* more than that, that is really useful. You can use special
patterns guards to detect bounds corruptions and also keep minimum and
maximum address bounds (on architectures where that makes sense) to
quickly detect garbage pointers being freed or realloced. The C
standard could expose functions like

int isLiveAllocation (const void *ptr);
size_t totalAllocatedMem (void);
size_t memSize (const void * ptr);
int walkEachAllocation (int (* walker) (const void * ptr, size_t
sz, void * ctx), void * ctx);

which are all fairly easily implementable with any serious dynamic
memory architecture. The value of such functions is so obvious,
especially in light of what you are trying to do.

There are also special problems that you need to deal with on some
platforms. strdup() makes direct system calls to malloc. So if you
try to free it with an overloaded free() macro, then you may encounter
problems. Basically, you need to redefine and make your own strdup as
well. You don't need to do this with functions like fopen, of course,
since they provide their own clean up (i.e., fclose).
[...] This way I can catch many errors as double
frees, allocations of zero size, or missing calls to free.
Obviously it works only when a given #define is set (NDEBUG, in my
case).
What do you think about it?
Thank you in advance!

(If you want to see the code I wrote a little page here:http://technicalinsanity.org/out/simplegc/index.html)

Its fine as a first pass kind of thing.

The main problem, of course, is the free is horrendously slow for
large numbers of outstanding allocations. There is a common trick of
making the header appear at the address: ((char*)ptr) - sizeof
(header) which allows you to work around this performance problem.
The idea is that a pointer range check, alignment and header signature
check are very highly probabilistically good enough to detect a purely
bogus pointer. Its important to support programs that have a massive
number of memory allocations outstanding because that's where you will
get primary value from such a debugging mechanism.

And of course, this is pretty useless in multithreaded environments.
You need to make some sort of abstraction for a mutex or lock for your
memory (if you care about portability, otherwise you can just go ahead
and use the platform's specific mutexes). You can be clever and use a
hash on the value of ptr to create an array of striped locks to
increase parallelism (since malloc itself will be hit with a higher
degree of parallelism than you would otherwise encounter if you are
using a single lock), but that might be overkill.
 
U

user923005

C

CBFalconer

Paul said:
.... big snip ...

The main problem, of course, is the free is horrendously slow for
large numbers of outstanding allocations.

This is a common problem, because the free system has to search for
adjacent free chunks and combine them. The problem can be avoided
with some extra links in the malloc system. For a method that is
always O(1) for all malloc, realloc, and free calls, and is almost
pure standard C coding, see:

<http://cbfalconer.home.att.net/download/nmalloc.zip>

Most people never notice the problem, because they probably don't
allocate many blocks and the effective free of all at program exit
doesn't have to do the search.
 
K

Keith Thompson

Keith Thompson said:
Nate Eldredge said:
Also, I'd go with long, not size_t for fline. I've used
implementations where size_t's range is 0..65535, but
__LINE__ could exceed that. Of course, the same problem
exists if there are LONG_MAX lines, but I was much less
concerned about that. ;)

Hmm. On such implementations, assuming `int' was also 16 bits, did an
occurence of __LINE__ on line 65537 expand to `65537L'? It seems like
it would be a bug if it didn't.

The L suffix isn't necessary. If int is 16 bits, then the unadorned
constant 65537 is of type long.

[...]

If int is 16 bits, wouldn't the constant 32768 also be of type long int
(and 32767 of type int)? 3.1.3.2 in C89, 6.4.4.1 in C99.

Yes.
 
D

dj3vande

These potentially cancel the idempotence of the <stdlib.h>
header.

That's an unavoidable consequence of their intended use.
Better to supply (and use) genuine wrappers...

#ifndef NDEBUG
#define wrap_malloc(sz) my_malloc(sz, __FILE__, __LINE__)
#else
#define wrap_malloc(sz) malloc(sz)
#endif

I have a bunch of malloc'ing code that I suspect has a memory-
management bug in it somewhere. Wouldn't it be nice if I could just
insert a #include at the top that pulls in some debugging wrappers
without having to change anything else in the code?


dave
 
F

fmassei

Hello. Even if no one cares, I wanted to let the people who gave me
positive inputs know that I changed my code according to their good
suggestions.
I was also wondering why the few times I asked something in clc (about
reentrant/thread-safe functions) everybody told me that it was not a
*standard c* related question, while this time quite everybody told me
to use *implementation defined* solutions. Funny! Probably I should
swap my questions somehow, next time.
 
I

Ian Collins

I was also wondering why the few times I asked something in clc (about
reentrant/thread-safe functions) everybody told me that it was not a
*standard c* related question, while this time quite everybody told me
to use *implementation defined* solutions. Funny! Probably I should
swap my questions somehow, next time.

I don't think anyone (even Chuck) would spit the dummy if you asked
about reentrant functions.

Threading is another matter. There is an excellent group
(comp.programming.threads) where threading questions can be posted.
There is a wealth of experience covering most platforms to be found there.

The C standard does not cover threading and there isn't a common
threading standard. The most common are windows and Posix and they
differ in many ways.
 
A

Antoninus Twink

I was also wondering why the few times I asked something in clc (about
reentrant/thread-safe functions) everybody told me that it was not a
*standard c* related question, while this time quite everybody told me
to use *implementation defined* solutions. Funny!

Of course all these questions are perfectly on-topic in clc.

However, many people here are pursuing a political agenda - they want to
turn the group into a narrow discussion of ISO C, rather than all C. For
this reason, they will refuse to answer your questions, complain
vociferously, and quite possibly insult you into the bargain.

As they won't provide you with any help, the best thing to do is just to
ignore them.
 
J

jameskuyper

Hello. Even if no one cares, I wanted to let the people who gave me
positive inputs know that I changed my code according to their good
suggestions.
I was also wondering why the few times I asked something in clc (about
reentrant/thread-safe functions) everybody told me that it was not a
*standard c* related question, while this time quite everybody told me
to use *implementation defined* solutions. Funny! Probably I should
swap my questions somehow, next time.

The only useful C answer to the thread-safe issue was to re-direct you
to forums more appropriate forum for such discussions, since the C
standard doesn't say anthing about threads.

The standard says a lot of things that are relevant to malloc()/
free(), and the feasibility of either wrapping them or replacing them.
Wrapping them is perfectly feasible, but replacing them necessarily
involves undefined behavior. Even explaining the fact that the
behavior is undefined, is more on-topic here than anything to do with
threads, because the C standard doesn't even explicitly say that the
behavior of threaded programs is undefined - the issue isn't discussed
at all.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top