[Slightly OT] Memory management and custom allocator

Discussion in 'C Programming' started by FtM, Dec 31, 2011.

  1. FtM

    FtM Guest

    [xpost comp.lang.c, comp.unix.programmer, comp.os.ms-
    windows.programmer.win32]
    Hello NG,
    I have a very specific question, but first of all I describe you what
    I'm facing as I'm open to alternative general solutions..
    I'm refactoring some old C code that deals with very basilar data
    structures (like lists, priority queues, hashtables and so on) that
    pretends to be very portable. Regarding the memory management, I have
    an internal wrapper that, on *nix and windows systems, maps the malloc/
    calloc/realloc/free functions one-to-one to the default ones, but it's
    possibile to switch to another custom allocator that works pretty much
    as a standard allocator (a big chunk of memory splitted/joined, tought
    for embedded systems).
    I would like to implement my custom allocator for the *nix and windows
    systems too (of course loosing some portability for performance gain),
    and maybe implement something like the C++ allocators to improve the
    memory management of the different algorithms (like rare large
    allocations on vectors / frequent small allocations on lists), but
    I've some questions:
    - In these systems, to have decent performances, I was thinking to
    allocate page-sized (and page-aligned) chunks to have small
    allocations in the same page, but how is it possible to request some
    memory with this characteristics? Will it help the OS memory manager?
    - Does all of this make any sense? :)
    Thanks for any suggestion!
    Ciao!

    P.S. I don't really have the necessity of a "performance boost", but
    I'm dealing with a general, low-level and already used library, and as
    long as I'm refactoring the code I'd like to do it the better possible
    way ;-)
    FtM, Dec 31, 2011
    #1
    1. Advertising

  2. FtM

    Eric Sosman Guest

    On 12/31/2011 9:56 AM, FtM wrote:
    > [xpost comp.lang.c, comp.unix.programmer, comp.os.ms-
    > windows.programmer.win32]
    > Hello NG,
    > I have a very specific question, but first of all I describe you what
    > I'm facing as I'm open to alternative general solutions..
    > I'm refactoring some old C code that deals with very basilar data
    > structures (like lists, priority queues, hashtables and so on) that
    > pretends to be very portable. Regarding the memory management, I have
    > an internal wrapper that, on *nix and windows systems, maps the malloc/
    > calloc/realloc/free functions one-to-one to the default ones, but it's
    > possibile to switch to another custom allocator that works pretty much
    > as a standard allocator (a big chunk of memory splitted/joined, tought
    > for embedded systems).


    This is frequently done, usually either to use an alternate memory
    manager with extra debugging features or to substitute a manager whose
    performance characteristics better match the needs of the program at
    hand. However, the C language does not guarantee that it's possible to
    do such substitution; "the library" is regarded as monolithic, and not
    as something that can be replaced piecemeal. This allows the library
    implementor to use internal and unadvertised features, "back doors" if
    you like. (For example, exit() needs a "back door" to find and close
    all the still-open FILE* streams.) It is at least possible that some
    implementation's internals involve "back doors" to the memory manager,
    so if you substituted a memory manager that implemented only the public
    operations you'd break the library in strange ways.

    That's mostly a "theoretical" consideration, though, perhaps since
    replacing the memory manager is such an incredibly useful thing to want
    to do. Still, keep in mind that it's not guaranteed to work.

    > I would like to implement my custom allocator for the *nix and windows
    > systems too (of course loosing some portability for performance gain),
    > and maybe implement something like the C++ allocators to improve the
    > memory management of the different algorithms (like rare large
    > allocations on vectors / frequent small allocations on lists), but
    > I've some questions:
    > - In these systems, to have decent performances, I was thinking to
    > allocate page-sized (and page-aligned) chunks to have small
    > allocations in the same page, but how is it possible to request some
    > memory with this characteristics? Will it help the OS memory manager?
    > - Does all of this make any sense? :)


    Not much, unless you are fond of re-inventing well-rounded wheels
    for your own amusement. The things you mention are already common
    practice in many Unixoid allocators; I have no specific knowledge of
    Windows' allocators but there's no reason to think Redmond is unaware
    of the state(s) of the art(s).

    > P.S. I don't really have the necessity of a "performance boost", but
    > I'm dealing with a general, low-level and already used library, and as
    > long as I'm refactoring the code I'd like to do it the better possible
    > way ;-)


    For a "general low-level" library I think you'll be hard-pressed
    just to match the performance of the O/S' own allocators, much less
    improve on them. Things would be different for a library tuned to
    the specific needs of a particular program and making use of program-
    specific knowledge: Which allocations will be short- or long-lived,
    which will be used together and would benefit from sharing a single
    page, which data items are allocated and freed so frequently that they
    merit their own cache, ... Large gains can (sometimes) be had by
    exploiting knowledge of this kind, but a "general low-level" library
    really can't have that kind of knowledge.

    If you stay on the "general low-level" plane, you're just matching
    wits with people who are paid to devote a lot of thought to aspects of
    this particular problem -- which is why I say you're unlikely even to
    do as well as they've already done. More engineer-decades have gone
    into the existing allocators than you're likely to be able to devote
    to replacing them. There are probably better targets for your effort.

    --
    Eric Sosman
    d
    Eric Sosman, Dec 31, 2011
    #2
    1. Advertising

  3. FtM

    FtM Guest

    Re: Memory management and custom allocator

    On Dec 31, 5:08 pm, Eric Sosman <> wrote:
    > On 12/31/2011 9:56 AM, FtM wrote:
    >
    > > [xpost comp.lang.c, comp.unix.programmer, comp.os.ms-
    > > windows.programmer.win32]
    > > Hello NG,
    > > I have a very specific question, but first of all I describe you what
    > > I'm facing as I'm open to alternative general solutions..
    > > I'm refactoring some old C code that deals with very basilar data
    > > structures (like lists, priority queues, hashtables and so on) that
    > > pretends to be very portable. Regarding the memory management, I have
    > > an internal wrapper that, on *nix and windows systems, maps the malloc/
    > > calloc/realloc/free functions one-to-one to the default ones, but it's
    > > possibile to switch to another custom allocator that works pretty much
    > > as a standard allocator (a big chunk of memory splitted/joined, tought
    > > for embedded systems).

    >
    >      This is frequently done, usually either to use an alternate memory
    > manager with extra debugging features or to substitute a manager whose
    > performance characteristics better match the needs of the program at
    > hand.  However, the C language does not guarantee that it's possible to
    > do such substitution; "the library" is regarded as monolithic, and not
    > as something that can be replaced piecemeal.  This allows the library
    > implementor to use internal and unadvertised features, "back doors" if
    > you like.  (For example, exit() needs a "back door" to find and close
    > all the still-open FILE* streams.)  It is at least possible that some
    > implementation's internals involve "back doors" to the memory manager,
    > so if you substituted a memory manager that implemented only the public
    > operations you'd break the library in strange ways.
    >
    >      That's mostly a "theoretical" consideration, though, perhaps since
    > replacing the memory manager is such an incredibly useful thing to want
    > to do.  Still, keep in mind that it's not guaranteed to work.
    >


    You are perfectly right, I optimistically hope that if something like
    the memory allocation functions are broken I'll notice it in a short
    time and fix it (replacing my allocator with the wrapper) :) Anyway
    the "custom" allocator is written for those simple systems that don't
    have a dynamic memory allocator at all (and they are probably
    disappearing from the market).

    > > I would like to implement my custom allocator for the *nix and windows
    > > systems too (of course loosing some portability for performance gain),
    > > and maybe implement something like the C++ allocators to improve the
    > > memory management of the different algorithms (like rare large
    > > allocations on vectors / frequent small allocations on lists), but
    > > I've some questions:
    > > - In these systems, to have decent performances, I was thinking to
    > > allocate page-sized (and page-aligned) chunks to have small
    > > allocations in the same page, but how is it possible to request some
    > > memory with this characteristics? Will it help the OS memory manager?
    > > - Does all of this make any sense? :)

    >
    >      Not much, unless you are fond of re-inventing well-rounded wheels
    > for your own amusement.  The things you mention are already common
    > practice in many Unixoid allocators; I have no specific knowledge of
    > Windows' allocators but there's no reason to think Redmond is unaware
    > of the state(s) of the art(s).
    >
    > > P.S. I don't really have the necessity of a "performance boost", but
    > > I'm dealing with a general, low-level and already used library, and as
    > > long as I'm refactoring the code I'd like to do it the better possible
    > > way ;-)

    >
    >      For a "general low-level" library I think you'll be hard-pressed
    > just to match the performance of the O/S' own allocators, much less
    > improve on them.  Things would be different for a library tuned to
    > the specific needs of a particular program and making use of program-
    > specific knowledge: Which allocations will be short- or long-lived,
    > which will be used together and would benefit from sharing a single
    > page, which data items are allocated and freed so frequently that they
    > merit their own cache, ...  Large gains can (sometimes) be had by
    > exploiting knowledge of this kind, but a "general low-level" library
    > really can't have that kind of knowledge.
    >
    >      If you stay on the "general low-level" plane, you're just matching
    > wits with people who are paid to devote a lot of thought to aspects of
    > this particular problem -- which is why I say you're unlikely even to
    > do as well as they've already done.  More engineer-decades have gone
    > into the existing allocators than you're likely to be able to devote
    > to replacing them.  There are probably better targets for your effort.
    >


    You exactly identified the origin of my doubts: if I go the way I
    described, I'd like to give the library user the possibility to switch/
    reuse more than one allocator just like the way you can switch
    allocators in C++. I know that I can't beat the general standard one,
    but using an optimized (maybe third-party?) one only in some occasions
    and the system's one otherwise seems reasonable to me.. isn't it?
    I'm asking here because I know that it's a common problem that many of
    you have faced and solved ;-)
    Ciao!
    FtM, Dec 31, 2011
    #3
  4. FtM wrote:
    > [xpost comp.lang.c, comp.unix.programmer, comp.os.ms-
    > windows.programmer.win32]
    > Hello NG,
    > I have a very specific question, but first of all I describe you what
    > I'm facing as I'm open to alternative general solutions..
    > I'm refactoring some old C code that deals with very basilar data
    > structures (like lists, priority queues, hashtables and so on) that
    > pretends to be very portable. Regarding the memory management, I have
    > an internal wrapper that, on *nix and windows systems, maps the malloc/
    > calloc/realloc/free functions one-to-one to the default ones, but it's
    > possibile to switch to another custom allocator that works pretty much
    > as a standard allocator (a big chunk of memory splitted/joined, tought
    > for embedded systems).
    > I would like to implement my custom allocator for the *nix and windows
    > systems too (of course loosing some portability for performance gain),
    > and maybe implement something like the C++ allocators to improve the
    > memory management of the different algorithms (like rare large
    > allocations on vectors / frequent small allocations on lists), but
    > I've some questions:


    AFAIK the allocator of GNU/Linux libc already does this.
    man mallopt will give you some information about it's behavior.
    Or read the source code.
    No need to reinvent the wheel.

    > - In these systems, to have decent performances, I was thinking to
    > allocate page-sized (and page-aligned) chunks to have small
    > allocations in the same page, but how is it possible to request some
    > memory with this characteristics? Will it help the OS memory manager?


    linux libc uses sbrk and mmap... and possibly other syscalls

    regards,
    JK
    Johann Klammer, Dec 31, 2011
    #4
  5. FtM

    FtM Guest

    Re: Memory management and custom allocator

    On Dec 31, 5:39 pm, Johann Klammer <> wrote:
    > FtM wrote:
    > > I would like to implement my custom allocator for the *nix and windows
    > > systems too (of course loosing some portability for performance gain),
    > > and maybe implement something like the C++ allocators to improve the
    > > memory management of the different algorithms (like rare large
    > > allocations on vectors / frequent small allocations on lists), but
    > > I've some questions:

    >
    > AFAIK the allocator of GNU/Linux libc already does this.
    > man mallopt will give you some information about it's behavior.


    I've already looked into mallopt but it doesn't seem enough fine-
    grained for my needs... Do you know some code or a project that uses
    it so I can take a look?

    > Or read the source code.
    > No need to reinvent the wheel.
    >


    That's the reason of asking before coding ;)

    > > - In these systems, to have decent performances, I was thinking to
    > > allocate page-sized (and page-aligned) chunks to have small
    > > allocations in the same page, but how is it possible to request some
    > > memory with this characteristics? Will it help the OS memory manager?

    >
    > linux libc uses sbrk and mmap... and possibly other syscalls
    >


    sure, sbrk() is the way to go for a low-level allocator, but the
    question of how I can take a page-aligned segment remains: you can't
    be sure of the OS's VA masking mechanism (or maybe yes?). Anyway, I
    could proceed by reading the code of a specific kernel, but it would
    be totally useless if there isn't a general enough-shared "philosophy"
    underneath (read as: "a specific syscall").
    I already use mmap() for persistent b-trees, but does it make sense to
    use it without an underlying physical file?
    Ciao!
    FtM, Dec 31, 2011
    #5
  6. FtM

    Kaz Kylheku Guest

    On 2011-12-31, FtM <> wrote:
    > I would like to implement my custom allocator for the *nix and windows
    > systems too (of course loosing some portability for performance gain),
    > and maybe implement something like the C++ allocators to improve the
    > memory management of the different algorithms (like rare large
    > allocations on vectors / frequent small allocations on lists), but


    What makes you think that your malloc doesn't already handle
    such cases?

    Say, have you profiled the code to see what fraction of the time is actually
    spent executing the allocator code?

    Can you answer the question: how much faster will the program be if
    a magic allocator is substituted which requires zero cycles to execute?

    > I've some questions:
    > - In these systems, to have decent performances, I was thinking to
    > allocate page-sized (and page-aligned) chunks to have small
    > allocations in the same page, but how is it possible to request some
    > memory with this characteristics? Will it help the OS memory manager?


    Not knowing these basics is an indicator that you need to study more if you
    want to write memory allocators that beat malloc.
    Kaz Kylheku, Dec 31, 2011
    #6
  7. FtM

    FtM Guest

    Re: Memory management and custom allocator

    On Dec 31, 6:24 pm, Kaz Kylheku <> wrote:
    > On 2011-12-31, FtM <> wrote:
    >
    > > I would like to implement my custom allocator for the *nix and windows
    > > systems too (of course loosing some portability for performance gain),
    > > and maybe implement something like the C++ allocators to improve the
    > > memory management of the different algorithms (like rare large
    > > allocations on vectors / frequent small allocations on lists), but

    >
    > What makes you think that your malloc doesn't already handle
    > such cases?
    >


    Simply because it can't know if I'm going to do 200 little allocations
    or 1 big one. Even if it has the most neat heuristic ever, the libc
    (or the kernel) exposing a simple specific function/syscall would do
    its job way better. Part of the question was about this hypothetical
    function.

    > Say, have you profiled the code to see what fraction of the time is actually
    > spent executing the allocator code?
    >
    > Can you answer the question: how much faster will the program be if
    > a magic allocator is substituted which requires zero cycles to execute?
    >


    Let's see... So why do the C++ STLs (or boost, for another example) do
    exactly what I'm asking here? :)

    > > I've some questions:
    > > - In these systems, to have decent performances, I was thinking to
    > > allocate page-sized (and page-aligned) chunks to have small
    > > allocations in the same page, but how is it possible to request some
    > > memory with this characteristics? Will it help the OS memory manager?

    >
    > Not knowing these basics is an indicator that you need to study more if you
    > want to write memory allocators that beat malloc.


    I already said that I don't want to "beat malloc", just to find a
    (probably platform-specific) way to facilitate its job.
    Ciao!

    P.S. I'm sorry if I'm looking so.. childish(?).. to you but, believe
    me, I've written some physical/virtual memory allocators for many
    simpler system, I know how systems work behind the scenes, and I
    really don't want to start writing something like that: I'm just
    asking how to deal with the two specific environments above while
    masking the differences between the two and maintaining a standard
    interface. I know that it's hard and that someone already did it: I'm
    asking *how* he did that (and maybe where!) :)
    FtM, Dec 31, 2011
    #7
  8. Re: Memory management and custom allocator

    FtM wrote:
    ....
    > I already use mmap() for persistent b-trees, but does it make sense to
    > use it without an underlying physical file?

    ....
    RTFM: MAP_ANONYMOUS

    However, the number of mmaps per process is limited on linux.

    JK
    Johann Klammer, Dec 31, 2011
    #8
  9. FtM

    Kaz Kylheku Guest

    Re: Memory management and custom allocator

    On 2011-12-31, Johann Klammer <> wrote:
    > FtM wrote:
    > ...
    >> I already use mmap() for persistent b-trees, but does it make sense to
    >> use it without an underlying physical file?

    > ...
    > RTFM: MAP_ANONYMOUS
    >
    > However, the number of mmaps per process is limited on linux.


    That is true, but now for some qualifying cluesticks:

    - malloc uses mmap when sbrk runs into a collision, and also for
    thread arenas and large allocations. So you don't avoid mmap by using
    malloc (though you avoid making lots of little mmap requests).

    - brk/sbrk are based on mmmap! so there is no escaping from mmap that way.
    So if you make lots of little page-sized brk increments, can you hit
    the limit? It would seems to, but ...

    - ... luckily, mmap will coalesce adjacent anonymous mappings which have
    compatible flags so you will only run into the limit if you make a large
    number of maps which do not touch or that do touch but have incompatible
    properties.

    - The default limit is only 65536 but in the light of the above, it's
    maybe not so terrible.

    - If you still have a problem, you can easily tweak the maximum count. become
    root and then:

    sysctl -w vm.max_map_count=<your-desired-number>

    better yet, record it in /etc/sysctl.conf and run sysctl -p.
    Kaz Kylheku, Dec 31, 2011
    #9
  10. FtM

    Eric Sosman Guest

    Re: Memory management and custom allocator

    On 12/31/2011 12:52 PM, FtM wrote:
    > On Dec 31, 6:24 pm, Kaz Kylheku<> wrote:
    >> On 2011-12-31, FtM<> wrote:
    >>
    >>> I would like to implement my custom allocator for the *nix and windows
    >>> systems too (of course loosing some portability for performance gain),
    >>> and maybe implement something like the C++ allocators to improve the
    >>> memory management of the different algorithms (like rare large
    >>> allocations on vectors / frequent small allocations on lists), but

    >>
    >> What makes you think that your malloc doesn't already handle
    >> such cases?
    >>

    >
    > Simply because it can't know if I'm going to do 200 little allocations
    > or 1 big one. Even if it has the most neat heuristic ever, the libc
    > (or the kernel) exposing a simple specific function/syscall would do
    > its job way better. Part of the question was about this hypothetical
    > function.


    Why do you think the kernel predicts your program's future behavior
    better than malloc() does?

    > I already said that I don't want to "beat malloc", just to find a
    > (probably platform-specific) way to facilitate its job.


    Both the kernel and malloc() have a hard time predicting your
    program's behavior, but I'll risk a prediction for the New Year:

    You will not invent a "general low-level" replacement that
    outperforms malloc() et al.

    > [...] I'm just
    > asking how to deal with the two specific environments above while
    > masking the differences between the two and maintaining a standard
    > interface.


    Easy: Use malloc() et al., right out of the box. Platform
    specifics are already masked, system-specific dodges and tricks are
    already employed, code is already tested and debugged to a degree
    that you cannot possibly replicate.

    There! Wasn't that easy?

    --
    Eric Sosman
    d
    Eric Sosman, Dec 31, 2011
    #10
  11. Re: Memory management and custom allocator

    On 31.12.2011 18:52, FtM wrote:
    > On Dec 31, 6:24 pm, Kaz Kylheku <> wrote:
    >> On 2011-12-31, FtM <> wrote:
    >>
    >>> I would like to implement my custom allocator for the *nix and windows
    >>> systems too (of course loosing some portability for performance gain),
    >>> and maybe implement something like the C++ allocators to improve the
    >>> memory management of the different algorithms (like rare large
    >>> allocations on vectors / frequent small allocations on lists), but

    >>
    >> What makes you think that your malloc doesn't already handle
    >> such cases?
    >>

    >
    > Simply because it can't know if I'm going to do 200 little allocations
    > or 1 big one. Even if it has the most neat heuristic ever, the libc
    > (or the kernel) exposing a simple specific function/syscall would do
    > its job way better. Part of the question was about this hypothetical
    > function.
    >


    malloc() is already optimized to exactly those cases, as these are the
    by far most common ones: Either you want a storage area, the size of
    which is determined at run time, then you will probably have one big
    allocation, or you are using fancy data structures other than dynamic
    arrays (that includes, but is not limited to, linked lists, trees and
    hashes), then chances are you are allocating each node separately so you
    get your n+1 little allocations.

    Having seen the glibc, eglibc, dietlibc and uclibc allocators, I already
    saw that those are the preferred modes of operation. The dietlibc
    allocator is especially easy to describe:

    For one, it _never_ uses brk(). That syscall is just plain old and does
    nothing mmap() couldn't do better. Then it has linked lists prepared for
    objects of sizes of powers of two, starting at 16 and stopping at 2048.
    It rounds the sum of the object header size and the object size to the
    next power of two (using some nifty eye-watering bit-mangling) and puts
    it in such a linked list. If the list is too small, it is expanded using
    mmap() or mremap().

    If the object is bigger than 2048 bytes it gets it's own page(s)!

    >> Say, have you profiled the code to see what fraction of the time is actually
    >> spent executing the allocator code?
    >>
    >> Can you answer the question: how much faster will the program be if
    >> a magic allocator is substituted which requires zero cycles to execute?
    >>

    >
    > Let's see... So why do the C++ STLs (or boost, for another example) do
    > exactly what I'm asking here? :)
    >


    Hmmm... I just had a look at the libstdc++ deployed with g++ 4.5.0 and I
    saw that at least there, operator new calls malloc(). I also looked at
    STLport and saw the same.

    Also, you didn't answer the questions. Those are actually pretty good ones!

    >>> I've some questions:
    >>> - In these systems, to have decent performances, I was thinking to
    >>> allocate page-sized (and page-aligned) chunks to have small
    >>> allocations in the same page, but how is it possible to request some
    >>> memory with this characteristics? Will it help the OS memory manager?

    >>
    >> Not knowing these basics is an indicator that you need to study more if you
    >> want to write memory allocators that beat malloc.

    >
    > I already said that I don't want to "beat malloc", just to find a
    > (probably platform-specific) way to facilitate its job.
    > Ciao!
    >


    You are not making sense. You don't want to beat malloc but you still
    want to write a malloc replacement? You'd willingly choose the inferior
    implementation?

    > P.S. I'm sorry if I'm looking so.. childish(?).. to you but, believe
    > me, I've written some physical/virtual memory allocators for many
    > simpler system,


    That's an entirely different can of worms. Believe me, the memory
    management that just uses an OS is far easier than the one the OS has to
    maintain.

    > I know how systems work behind the scenes, and I
    > really don't want to start writing something like that: I'm just
    > asking how to deal with the two specific environments above while
    > masking the differences between the two and maintaining a standard
    > interface. I know that it's hard and that someone already did it: I'm
    > asking *how* he did that (and maybe where!) :)


    You could look at the libc source codes. glibc, eglibc, dietlibc and
    uclibc are a good place to start for Linux. For Windows I'd recommend
    cygwin.

    If you are interested in the syscalls that allocate memory: For *nix
    that's mmap() with MAP_ANONYMOUS or, failing that, /dev/zero. Failing
    _that_, you could try to map a newly created and truncated file. Try to
    stay away from brk(), it's marked obsolete.

    For Windows, have a look at HeapAlloc(), VirtualAlloc() and
    LocalAlloc(). BTW: The MSVC implementation somehow manages to make
    access to _one_ byte after the allocated buffer fail with a crash. Good
    job, as it makes finding off-by-one errors easier.

    But I still don't see your point. malloc() is meant to be portable! The
    only thing you might want to tweak against is malloc(0), as that's
    unspecified. Several allocators return NULL on that, but several others
    do not. They just present you an object you can't even access the first
    byte of.

    HTH,
    Markus
    Markus Wichmann, Jan 1, 2012
    #11
  12. FtM

    BartC Guest

    Re: Memory management and custom allocator

    "Markus Wichmann" <> wrote in message
    news:...
    > On 31.12.2011 18:52, FtM wrote:


    >> Simply because it can't know if I'm going to do 200 little allocations
    >> or 1 big one.


    > For one, it _never_ uses brk(). That syscall is just plain old and does
    > nothing mmap() couldn't do better. Then it has linked lists prepared for
    > objects of sizes of powers of two, starting at 16 and stopping at 2048.


    That exactly what I do in my own allocator! Except I stop at 1024. Although
    I probably manage them very differently, for example once a small block size
    is allocated, say 64 bytes, it always stays that size.

    And one thing about malloc() I don't like, are the overheads in having to
    remember the block sizes; if I allocate 16 bytes, malloc() seems to reserve
    either 20 or, more often, 24 bytes on the few compilers I've tried. That's a
    50% overhead!

    Also if you're being sensible and trying to keep to power-of-two
    allocations, this 4- or 8-byte overhead is going to screw that up.

    My allocators have never stored a block size, and it has never really been a
    problem. Either you will know the size (a struct for example), or you need
    to keep track of it yourself, in which case it is very handy when you have
    an allocation that can grow in never having to keep running back to
    realloc().

    As an illustration, take this simple example of a string growing from one to
    ten million characters:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    int main(void)
    {
    char *p;
    int i,len;

    p=malloc(1);
    *p=0;
    len=0;

    for (i=1; i<=10000000; ++i) {
    ++len;
    p=realloc(p,len+1);
    p[len-1]='*';
    p[len]=0;
    // puts(p);
    }
    printf("Length= %d\n",strlen(p));
    }

    On three C compilers, this took about 12 seconds (on DMC, any length of
    string resulted in a runtime of 0.4 seconds; a bit odd).

    Using my strategy, of avoiding any calls to malloc or realloc unless
    absolutely necessary, and using that inside an interpreter, this bit of
    script took about 0.1 seconds:

    string a

    a:=""
    to 10 million do
    a+:="*"
    od
    println "Length=",a.len

    Of course, real C code wouldn't use such a naive strategy; it would also
    seek to minimise malloc()/realloc() calls, but that's also an admission that
    these calls have some overheads which it would be desirable to avoid. Hence
    I can understand the OP's attempt to experiment with his own versions.

    --
    Bartc
    BartC, Jan 1, 2012
    #12
  13. FtM

    Eric Sosman Guest

    Re: Memory management and custom allocator

    On 1/1/2012 7:54 AM, BartC wrote:
    > "Markus Wichmann" <> wrote in message
    > news:...
    >> On 31.12.2011 18:52, FtM wrote:

    >
    >>> Simply because it can't know if I'm going to do 200 little allocations
    >>> or 1 big one.

    >
    >> For one, it _never_ uses brk(). That syscall is just plain old and does
    >> nothing mmap() couldn't do better. Then it has linked lists prepared for
    >> objects of sizes of powers of two, starting at 16 and stopping at 2048.

    >
    > That exactly what I do in my own allocator! Except I stop at 1024.
    > Although I probably manage them very differently, for example once a
    > small block size is allocated, say 64 bytes, it always stays that size.


    Sounds like a recipe for internal fragmentation, but YMMV.

    > And one thing about malloc() I don't like, are the overheads in having
    > to remember the block sizes; if I allocate 16 bytes, malloc() seems to
    > reserve either 20 or, more often, 24 bytes on the few compilers I've
    > tried. That's a 50% overhead!


    It's only significant if you have many such small allocations.
    Even if so, a malloc() implementation can find other ways to remember
    the size than by storing it explicitly. For example, malloc() might
    set aside regions of memory dedicated to "small" allocations: If an
    pointer aims at an address anyhere in [A1,A1+N1), [A2,A2+N2), ... it
    could simply be "known" to point at a 32-byte block. (The test can't
    be done efficiently in portable C, but a malloc() implementation need
    not be portable, nor written in C.)

    > Also if you're being sensible and trying to keep to power-of-two
    > allocations, this 4- or 8-byte overhead is going to screw that up.


    What's "sensible" about allocating in powers of two? Why should
    you allocate 128 bytes to hold a line of input, rather than 100 or
    80 or 250? Programmers are numerologically superstitious.

    > My allocators have never stored a block size, and it has never really
    > been a problem. Either you will know the size (a struct for example), or
    > you need to keep track of it yourself, in which case it is very handy
    > when you have an allocation that can grow in never having to keep
    > running back to realloc().


    If your allocators never store block sizes, it follows that free()
    cannot recycle a released block for re-use by a subsequent malloc(),
    because it does not know whether the freed block is big enough to
    satisfy the malloc() request. (Re-computing the size by examining the
    addresses of nearby busy and free blocks counts as "storing the size"
    in my book, just encoding it deviously.)

    An allocator with a different API -- one whose free()-substitute
    takes the block size as a parameter -- could recycle blocks. But now
    you're no longer talking about the O.P.'s drop-in replacement for the
    API as it stands.

    > As an illustration, take this simple example of a string growing from
    > one to ten million characters:
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    > #include <string.h>
    >
    > int main(void)
    > {
    > char *p;
    > int i,len;
    >
    > p=malloc(1);
    > *p=0;
    > len=0;
    >
    > for (i=1; i<=10000000; ++i) {
    > ++len;
    > p=realloc(p,len+1);


    Side note: Never call realloc() this way in real code, because if
    it returns NULL you'll have lost your only pointer to the original
    block; you won't even be able to free() it any more.

    > p[len-1]='*';
    > p[len]=0;
    > // puts(p);
    > }


    With an allocator that does not store the block size, this will
    necessarily take a long time, for several reasons:

    - Since realloc() does not know how long the block at `p' is, it
    cannot tell whether there's any slop at the end that the block
    could grow into without moving. Therefore, every realloc() call
    must allocate a new block and copy the old contents into it.

    - Since realloc() does not know how long the old block is, it
    must copy `len+1' bytes from the old block to the new, even
    though that's more than the old block holds. (We'll assume a
    non-portable implementation can get away with copying the non-
    existent bytes.) Total amount copied: 2+...+10000001 bytes;
    that's fifty terabytes.

    - Since realloc() does not know how long the old block was, it
    cannot re-use that memory to satisfy subsequent requests. Thus,
    every call allocates a brand-new block and makes the old block
    unavailable. Total memory allocated: 1+2+...+10000001 bytes.
    50TB to store a 10MB string is an overhead of fifty million
    percent -- and you groused about a mere fifty???!

    --
    Eric Sosman
    d
    Eric Sosman, Jan 1, 2012
    #13
  14. FtM

    Eric Sosman Guest

    Re: Memory management and custom allocator

    On 1/1/2012 7:54 AM, BartC wrote:
    > "Markus Wichmann" <> wrote in message
    > news:...
    >> On 31.12.2011 18:52, FtM wrote:

    >
    >>> Simply because it can't know if I'm going to do 200 little allocations
    >>> or 1 big one.

    >
    >> For one, it _never_ uses brk(). That syscall is just plain old and does
    >> nothing mmap() couldn't do better. Then it has linked lists prepared for
    >> objects of sizes of powers of two, starting at 16 and stopping at 2048.

    >
    > That exactly what I do in my own allocator! Except I stop at 1024.
    > Although I probably manage them very differently, for example once a
    > small block size is allocated, say 64 bytes, it always stays that size.


    Sounds like a recipe for internal fragmentation, but YMMV.

    > And one thing about malloc() I don't like, are the overheads in having
    > to remember the block sizes; if I allocate 16 bytes, malloc() seems to
    > reserve either 20 or, more often, 24 bytes on the few compilers I've
    > tried. That's a 50% overhead!


    It's only significant if you have many such small allocations.
    Even if so, a malloc() implementation can find other ways to remember
    the size than by storing it explicitly. For example, malloc() might
    set aside regions of memory dedicated to "small" allocations: If an
    pointer aims at an address anyhere in [A1,A1+N1), [A2,A2+N2), ... it
    could simply be "known" to point at a 32-byte block. (The test can't
    be done efficiently in portable C, but a malloc() implementation need
    not be portable, nor written in C.)

    > Also if you're being sensible and trying to keep to power-of-two
    > allocations, this 4- or 8-byte overhead is going to screw that up.


    What's "sensible" about allocating in powers of two? Why should
    you allocate 128 bytes to hold a line of input, rather than 100 or
    80 or 250? Programmers are numerologically superstitious.

    > My allocators have never stored a block size, and it has never really
    > been a problem. Either you will know the size (a struct for example), or
    > you need to keep track of it yourself, in which case it is very handy
    > when you have an allocation that can grow in never having to keep
    > running back to realloc().


    If your allocators never store block sizes, it follows that free()
    cannot recycle a released block for re-use by a subsequent malloc(),
    because it does not know whether the freed block is big enough to
    satisfy the malloc() request. (Re-computing the size by examining the
    addresses of nearby busy and free blocks counts as "storing the size"
    in my book, just encoding it deviously.)

    An allocator with a different API -- one whose free()-substitute
    takes the block size as a parameter -- could recycle blocks. But now
    you're no longer talking about the O.P.'s drop-in replacement for the
    API as it stands.

    > As an illustration, take this simple example of a string growing from
    > one to ten million characters:
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    > #include <string.h>
    >
    > int main(void)
    > {
    > char *p;
    > int i,len;
    >
    > p=malloc(1);
    > *p=0;
    > len=0;
    >
    > for (i=1; i<=10000000; ++i) {
    > ++len;
    > p=realloc(p,len+1);


    Side note: Never call realloc() this way in real code, because if
    it returns NULL you'll have lost your only pointer to the original
    block; you won't even be able to free() it any more.

    > p[len-1]='*';
    > p[len]=0;
    > // puts(p);
    > }


    With an allocator that does not store the block size, this will
    necessarily take a long time, for several reasons:

    - Since realloc() does not know how long the block at `p' is, it
    cannot tell whether there's any slop at the end that the block
    could grow into without moving. Therefore, every realloc() call
    must allocate a new block and copy the old contents into it.

    - Since realloc() does not know how long the old block is, it
    must copy `len+1' bytes from the old block to the new, even
    though that's more than the old block holds. (We'll assume a
    non-portable implementation can get away with copying the non-
    existent bytes.) Total amount copied: 2+...+10000001 bytes;
    that's fifty terabytes.

    - Since realloc() does not know how long the old block was, it
    cannot re-use that memory to satisfy subsequent requests. Thus,
    every call allocates a brand-new block and makes the old block
    unavailable. Total memory allocated: 1+2+...+10000001 bytes.
    50TB to store a 10MB string is an overhead of fifty million
    percent -- and you groused about a mere fifty???!

    --
    Eric Sosman
    d
    Eric Sosman, Jan 1, 2012
    #14
  15. FtM

    BartC Guest

    Re: Memory management and custom allocator

    "Eric Sosman" <> wrote in message
    news:jdpp3r$8h3$...
    > On 1/1/2012 7:54 AM, BartC wrote:


    >> That exactly what I do in my own allocator! Except I stop at 1024.
    >> Although I probably manage them very differently, for example once a
    >> small block size is allocated, say 64 bytes, it always stays that size.

    >
    > Sounds like a recipe for internal fragmentation, but YMMV.


    That's what I once thought. I've used a similar scheme for years (blocks
    could get smaller but not bigger), and originally prepared a defragmentation
    algorithm, but I never needed it. (This was for managing many small
    allocations in a script language.) It was self-sustaining.

    >> Also if you're being sensible and trying to keep to power-of-two
    >> allocations, this 4- or 8-byte overhead is going to screw that up.

    >
    > What's "sensible" about allocating in powers of two? Why should
    > you allocate 128 bytes to hold a line of input, rather than 100 or
    > 80 or 250? Programmers are numerologically superstitious.


    More like habit. But, if you allocate 4096 bytes thinking it would fit
    neatly into one page of virtual memory, you will really need 4100 or 4104;
    not quite as tidy. (This is an issue with my allocator, which does request
    power-of-two sizes, that is implemented on top of malloc()).

    >> My allocators have never stored a block size, and it has never really
    >> been a problem.


    > An allocator with a different API -- one whose free()-substitute
    > takes the block size as a parameter -- could recycle blocks. But now
    > you're no longer talking about the O.P.'s drop-in replacement for the
    > API as it stands.


    Yes, that's what I used; free() takes an extra parameter.

    >> p=realloc(p,len+1);


    > With an allocator that does not store the block size, this will
    > necessarily take a long time, for several reasons:


    > - Since realloc() does not know how long the block at `p' is, it

    <snip>

    You're now talking about a hypothetical realloc() that really has no idea
    what the original block size was?

    That wouldn't work anyway, as it would have no idea whether the block is
    expanding or contracting and there would be other issues such as those you
    mention.

    Such a function *has* to be able to determine the original size, but storing
    it as malloc/realloc presumably does, deducing it, or simply being told.
    (Actually my scheme doesn't even have a realloc()-type function; but with
    block-sizes known, it would be trivial to write)

    --
    Bartc
    BartC, Jan 1, 2012
    #15
  16. FtM

    FtM Guest

    Re: Memory management and custom allocator

    On Jan 2, 1:54 pm, "BartC" <> wrote:
    > "Markus Wichmann" <> wrote in message
    >
    > news:...
    >
    > > On 31.12.2011 18:52, FtM wrote:
    > >> Simply because it can't know if I'm going to do 200 little allocations
    > >> or 1 big one.

    > > For one, it _never_ uses brk(). That syscall is just plain old and does
    > > nothing mmap() couldn't do better. Then it has linked lists prepared for
    > > objects of sizes of powers of two, starting at 16 and stopping at 2048.

    >
    > That exactly what I do in my own allocator! Except I stop at 1024. Although
    > I probably manage them very differently, for example once a small block size
    > is allocated, say 64 bytes, it always stays that size.
    >


    Ok, thank you all for the suggestions (that basically are "use malloc
    and don't bother"). I know that the allocation functions are great as
    they are: I'm using them right now and I have no problems at all! I
    was only trying to expand the code from the embedded part that
    contains some code that I already have, that's it. More on, I already
    have some handlers that help me debugging the algorithms..

    Anyway, regarding the allocator descriptions that Markus and BartC
    gave, that's what I was thinking and partially already doing, let's
    see if this makes more sense to you:
    - first of all, I'm dealing with the memory allocation only *inside*
    my library, so I don't need to build a general purpose allocator.
    - inside, I'll have three different "objects": the first one is for
    little allocations, the second one for the others, the third one for
    the medium/big allocations that possibly will need to realloc(). The
    third allocator follows the "normal" strategy (and could simply use
    the system calls for that matter), while the first and the second one
    could simply have some preallocated blocks and a vector to mark if
    they are used or not, skipping the natural overhead needed by the
    general purpose allocator.

    What do you think about this approach (despite the fact that in the
    end it probably gives no noticeable advantages)?
    Ciao!
    FtM, Jan 1, 2012
    #16
  17. FtM

    Joe keane Guest

    In article <>,
    FtM <> wrote:
    >- Does all of this make any sense? :)


    IMHE this come up when

    a) you are not pleased with the default functions' time or space
    overhead
    b) you like to control things
    c) you want the 'memory' in memory-mapped files
    d) you want to use shared memory
    Joe keane, Jan 1, 2012
    #17
  18. FtM

    Joe keane Guest

    Re: Memory management and custom allocator

    In article <jdpp3r$8h3$>,
    Eric Sosman <> wrote:
    >What's "sensible" about allocating in powers of two?


    A) VM pages are powers of two, so if you have non-two objects you waste
    space or take more VM pages for each object than you would like

    B) cache lines are powers of two, so if you have non-two objects you
    waste space or take more cache lines for each object than you would like

    >Programmers are numerologically superstitious.


    i am numerologically superstitious, but you can also file this into the
    'maybe someone has figured out something that we don't understand'
    category
    Joe keane, Jan 1, 2012
    #18
  19. FtM

    BartC Guest

    Re: Memory management and custom allocator

    "BartC" <> wrote in message
    news:LPYLq.355$2...

    [growing a string from empty, to 10 million chars]
    > char *p;
    > int i,len;
    >
    > p=malloc(1);
    > *p=0;
    > len=0;
    >
    > for (i=1; i<=10000000; ++i) {
    > ++len;
    > p=realloc(p,len+1);
    > p[len-1]='*';
    > p[len]=0;
    > }


    > On three C compilers, this took about 12 seconds (on DMC, any length of
    > string resulted in a runtime of 0.4 seconds; a bit odd).


    Measuring the number of times the value of p changes, with the slow
    compilers, this was respectively 2335, 2336, and 2337. With DMC, it was just
    46.

    Clearly DMC overallocates more than the others (growing allocations by
    sqrt(2) apparently).

    (My own code, which just uses doubling, will call a memory allocator only
    about 20 times, so doesn't have the overheads of calling a function like
    realloc() ten million times, hence it was quite quick even though it was
    interpreted (but heavily optimised too..))

    So with an application with certain memory usage patterns, one
    implementation can dramatically outperform another. Another reason, even if
    not writing your own allocators, at least to not just use what's provided
    without question.

    --
    Bartc
    BartC, Jan 1, 2012
    #19
  20. FtM

    BartC Guest

    Re: Memory management and custom allocator

    "FtM" <> wrote in message
    news:...

    > - inside, I'll have three different "objects": the first one is for
    > little allocations, the second one for the others, the third one for
    > the medium/big allocations that possibly will need to realloc(). The
    > third allocator follows the "normal" strategy (and could simply use
    > the system calls for that matter), while the first and the second one
    > could simply have some preallocated blocks and a vector to mark if
    > they are used or not, skipping the natural overhead needed by the
    > general purpose allocator.
    >
    > What do you think about this approach (despite the fact that in the
    > end it probably gives no noticeable advantages)?


    I don't think there's enough information here; what sort of sizes are the
    three classes of allocator handling?

    Will any of them always be a fixed size? Will they need deallocating? (As
    sometimes this isn't necessary, or the pool of memory from which they're
    drawn will be freed as one block.) What strategy will be used to grow the
    heap from which the the blocks will come; in fact does this heap need to
    grow? Where does the memory come from in the first place? Etc..

    --
    Bartc
    BartC, Jan 1, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian Genisio
    Replies:
    12
    Views:
    8,041
    tom_usenet
    Jan 15, 2004
  2. Alex Vinokur
    Replies:
    16
    Views:
    12,298
    Alex Vinokur
    Aug 16, 2004
  3. Romeo Colacitti

    Custom memory allocator - alignment

    Romeo Colacitti, Mar 4, 2005, in forum: C Programming
    Replies:
    4
    Views:
    751
    Chris Torek
    Mar 7, 2005
  4. Robert Frunzke

    custom allocator using templates

    Robert Frunzke, Jan 29, 2006, in forum: C++
    Replies:
    4
    Views:
    452
    Robert Frunzke
    Jan 29, 2006
  5. xushiwei
    Replies:
    1
    Views:
    352
    Matthias Buelow
    Apr 22, 2008
Loading...

Share This Page