Malloc Query

P

pushpakulkar

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done. Any inputs/comments
on the same are highly appreciated.

Thanks and Regards,
Pushpa
 
N

Nate Eldredge

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done.

Certainly in principle you could write a better malloc() than the one
your system provides. However, the authors of your system's standard
library (hopefully) put a lot of thought and effort into writing a good
general-purpose allocator, so you might have a hard time doing better.

Now, the phrase "general-purpose" gives a caveat. There are a lot of
possible factors to consider in writing an allocator: speed, efficient
usage of memory, robustness in the face of user error, avoiding
fragmentation, returning memory to the OS, etc. These involve
tradeoffs. Your library's malloc() will have chosen some balance
between these, and it's possible that you want a different balance, in
which case your own implementation could do "better". Another factor is
that if you know something about how you are going to use your allocator
(e.g. you will be mainly allocating 73-byte chunks and rarely freeing
them), you can optimize appropriately, which the system's malloc() can't
do.

I'd suggest if you are considering this, to first look at some of the
many freely available memory allocator implementations out there. Some
of them are quite advanced. Then, once you have a few possible
candidates, benchmark them and see which is the best for your
application.
 
J

James Kuyper

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager.

You need to understand that the overhead of malloc() and free() is an
intrinsic consequence of the job that they're required to perform.
You'll find it difficult, though not necessarily impossible, to create a
faster version of malloc() than the one provided by the standard library.

The only way you can reasonably hope for a significantly faster memory
manager than malloc()/free() is to define it with more limited
capabilities than malloc()/free(). For example example: a memory manager
that always returns allocations of exactly the same size can be
significantly faster than malloc(). A memory manager that never actually
needs to release memory can also be significantly faster.
 
U

user923005

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same.

Some applications may be bottlenecked by rapid allocation and
releasing of resources. If you measure this as a problem in your
application via profile, then it is worth investigating alternatives.
Can we really
eliminate overhead by using
used defined functions/user defined memory manager.

No. We might reduce it.
Is there any
significant advantage in providing user defined malloc() and free().

It depends. The BSD memory allocator is famous for its excellence,
and so it is unlikely that you are going to write a better one. Even
at that, it may be worthwhile to create an alternative for special
situations.
I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done.

Imagine that you are a compiler vendor. If you want to compete with
the other compiler vendors, you will have to write a good memory
manager. If other teams of people write better memory managers, you
are probably going to look at them and use their good ideas (unless
they are patented -- and even then you might just pay the royalty if
the benefit is large enough). The compiler memory manager will have
to have intimate knowlege of the operating system memory manager if it
is to perform well. I have written memory managers (most often sub-
allocators that manage a large memory pool of rapid alloc/free
cycles). However, this is rarely the bottleneck in a large
application and it is a total waste of time to do this sort of thing
for applications that are not bound by this limit.

Did you hear about Dr. Finklebean? He invented a cure for a disease
that didn't exist. But then he caught the disease. Worst of all, the
cure did not work.
So don't be like Dr. Finklebean unless you have already caught the
disease and you need to find a solution for it.

If you want your code to be fast, write efficient algorithms.
Everything else pales in comparison.
IMO-YMMV.
 
D

Default User

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager.

Chances are, if you have to ask the question, then the answer is "no".





Brian
 
P

Paul Hsieh

Certainly in principle you could write a better malloc() than the one
your system provides.  However, the authors of your system's standard
library (hopefully) put a lot of thought and effort into writing a good
general-purpose allocator, so you might have a hard time doing better.

You would think so ... but I was able to write a malloc/free that
highly outperformed the System V allocator on my first try, without
using any really advanced tricks (just a header at -sizeof(header)
offset from the pointer that contains the size, a footer, a linked
list in a power of two array of size bounded free entries, and active
coalescence).
Now, the phrase "general-purpose" gives a caveat.  There are a lot of
possible factors to consider in writing an allocator: speed, efficient
usage of memory,

After a little while of thinking about this, you should realize that
these are all actually the same thing.
[...] robustness in the face of user error,

Ah. Now that's a different problem. However, it turns out that the
cost of this is relatively low in the grand scheme of things.
[...] avoiding fragmentation,

I've never understood why people obsess over this. If instead you
just try to make your allocations have the highest probability of
staying in your cache, then you have a far better heuristic that makes
fragmentation obsession a moot point.
[...] returning memory to the OS, etc.

Yeah, I dunno how important this is. The more you give the memory
back, the more you end up negotiating for your memory, which again
slows you down.
[...] These involve tradeoffs.

Many of them do not, and there are some obvious prioritization that
can be done.
[...]  Your library's malloc() will have chosen some balance
between these,

More likely they were tuned to some arbitrary benchmark. My own
investigation on this specific problem indicates this has not lead to
the right answer most of the time.
[...] and it's possible that you want a different balance, in
which case your own implementation could do "better".  Another factor is
that if you know something about how you are going to use your allocator
(e.g. you will be mainly allocating 73-byte chunks and rarely freeing
them), you can optimize appropriately, which the system's malloc() can't
do.

I'd suggest if you are considering this, to first look at some of the
many freely available memory allocator implementations out there.  Some
of them are quite advanced.  Then, once you have a few possible
candidates, benchmark them and see which is the best for your
application.

Actually the only thing interesting in any of these are how they deal
with multi-threading. Otherwise, its just really hard to imagine any
truly interesting strategizing that doesn't cause as many problems as
it tries to solve.
 
J

jameskuyper

Paul said:
After a little while of thinking about this, you should realize that
these are all actually the same thing.

It doesn't seem so to me. Efficient use of memory means being able to
deliver to the user as large a percentage of the memory available as
possible. That's a very different criterion from deliverying the
memory as fast as possible.Many of the things I'm aware of that speed
up memory allocation involve rendering some of that memory
unavailable. For example, consider the popular approach of rounding up
allocation requests to the nearest power of 2, and allocating
different powers of 2 from different blocks of memory.. The difference
between the rounded size of the requrest and the actual size requested
is memory that is unavailble for any useful task. It reduces effiicent
use of memory, but makes allocation faster.
[...] avoiding fragmentation,
....
I've never understood why people obsess over this. If instead you
just try to make your allocations have the highest probability of
staying in your cache, then you have a far better heuristic that makes
fragmentation obsession a moot point.

My understanding is that the key reason for avoiding fragmentation is
that if a request comes in for a single block of memory larger than
the largest fragment currently available, then it cannot be satisfied,
even though the total amount of memory available might be orders of
magnitude larger than the size requested. I don't see how the approach
you describe makes that possibility something you don't need to worry
about.
[...] returning memory to the OS, etc.

Yeah, I dunno how important this is. The more you give the memory
back, the more you end up negotiating for your memory, which again
slows you down.

Yes, but the memory you don't give back slows other programs down; in
many contexts, that's a serious issue.
 
L

Lew Pitcher

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same.
[snip]
Now, the phrase "general-purpose" gives a caveat.  There are a lot of
possible factors to consider in writing an allocator: speed, efficient
usage of memory,

After a little while of thinking about this, you should realize that
these are all actually the same thing.

I beg to differ. Speed is not the same as efficient use of memory, which is
not the same as "reduction in fragmentation".

In general, a "first fit" allocator will be faster than a "best fit"
allocator, and a "best fit" allocator will make better use of memory than
a "first fit" allocator. However, a "best fit" allocator "tends to increase
the number of very small blocks, and proliferation of small blocks is
usually undesirable." (Knuth TAOCP Vol 1, Chapter 2.5).


[snip]
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
B

Bartc

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.

I use a special allocator, for sizes up to 1KB, built /on top/ of malloc()
and free().

This outperforms the malloc() associated with my gcc for x86 by about 3 to 1
(with other libraries this figure can vary). But, I use a few extra
restrictions (for example freed small allocations are returned to a local
pool but never submitted to free()).

With more restrictions and for even more specialised applications, more
speedup can be achieved: you can get allocation and deallocation down to a
few machine instructions each.

You should profile your use of malloc/free to see if they are actual
bottlenecks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top