How to choose buffer size?

Discussion in 'C Programming' started by kj, Jun 4, 2009.

  1. kj

    kj Guest

    In this program I'm writing I want to pre-allocate space for a but
    an indeterminate (but potentially large) number of instances of a
    given structure. And if this amount of memory proves insufficient,
    I'll add another large chunk with realloc. Etc.

    So somewhere I my code I'll have an initial pre-allocation that
    looks like (very schematically):

    size_t size = INITIAL_SIZE;
    foo *pile_o_foo;
    int next_slot = 0;

    pile_o_foo = (foo *)malloc(size);
    /* error check */

    /* ... */

    while (size <= next_slot * sizeof foo) {
    size *= 2;
    pile_o_foo = (foo *)realloc(pile_o_foo, size);
    /* error check */
    }

    /* ... */

    Is there a way to choose a good value for INITIAL_SIZE? Should I
    just pick a number out of a hat? 101? 1001? 5091? Or should I
    pick a "nice power of 2" out of a hat, like 1024 or 4096? Or is
    there a more clueful way to go about this?

    TIA!

    kynnjo
    kj, Jun 4, 2009
    #1
    1. Advertising

  2. Eric Sosman <> writes:
    [...]
    > Counts that produce power-of-two sizes should probably be
    > avoided (unless you have information that the "indeterminate"
    > number is likely to be such a value). The reasoning goes
    >
    > - Many malloc() implementations inflate each allocation by
    > a few bytes, and store some of their own metadata in the
    > extra space, and
    >
    > - Some malloc() implementations use fundamental "block sizes"
    > that are themselves powers of two.
    >
    > If your malloc() has both of these characteristics, a power-of-two
    > request is very nearly the worst thing you can ask for.

    [...]

    Yes, but do any real malloc implementations behave that way?

    I just tried an experiment, calling malloc four times with a size of
    1048576 bytes (1 megabyte, or 1 mebibyte if you want to be picky). On
    three different systems, the resulting addresses differed consistently
    by 2**20 plus a relatively small overhead; on three different systems,
    the overheads were 4096 bytes, 65536 bytes, and 8 bytes. (The systems
    were Ubuntu, Cygwin under Windows XP, and Solaris.)

    And yes, the code that determined the differences was non-portable.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jun 4, 2009
    #2
    1. Advertising

  3. On 4 Jun 2009 at 18:11, Eric Sosman wrote:
    > Thing *ptr;
    > size_t count = (1024 * 1024 - 16) / sizeof *ptr;
    > ptr = malloc(count * sizeof *ptr);


    16 is a magic number, Eric. How did you choose it?
    Antoninus Twink, Jun 4, 2009
    #3
  4. kj

    cr88192 Guest

    "Keith Thompson" <> wrote in message
    news:...
    > Eric Sosman <> writes:
    > [...]
    >> Counts that produce power-of-two sizes should probably be
    >> avoided (unless you have information that the "indeterminate"
    >> number is likely to be such a value). The reasoning goes
    >>
    >> - Many malloc() implementations inflate each allocation by
    >> a few bytes, and store some of their own metadata in the
    >> extra space, and
    >>
    >> - Some malloc() implementations use fundamental "block sizes"
    >> that are themselves powers of two.
    >>
    >> If your malloc() has both of these characteristics, a power-of-two
    >> request is very nearly the worst thing you can ask for.

    > [...]
    >
    > Yes, but do any real malloc implementations behave that way?
    >
    > I just tried an experiment, calling malloc four times with a size of
    > 1048576 bytes (1 megabyte, or 1 mebibyte if you want to be picky). On
    > three different systems, the resulting addresses differed consistently
    > by 2**20 plus a relatively small overhead; on three different systems,
    > the overheads were 4096 bytes, 65536 bytes, and 8 bytes. (The systems
    > were Ubuntu, Cygwin under Windows XP, and Solaris.)
    >
    > And yes, the code that determined the differences was non-portable.
    >


    but, alas, this is c.l.c where it does not matter so much whether something
    is real so much as it matters people imagining that it is real...

    so, nevermind that most practical implementations would realize that a
    buddy-allocator would suck as a malloc implementation, people can still
    imagine the possibility of a buddy allocator being used as the main
    malloc...


    it is like non-8-bit bytes and amazon-woman style cultures...


    > --
    > Keith Thompson (The_Other_Keith)
    > <http://www.ghoti.net/~kst>
    > Nokia
    > "We must do something. This is something. Therefore, we must do this."
    > -- Antony Jay and Jonathan Lynn, "Yes Minister"
    cr88192, Jun 4, 2009
    #4
  5. kj

    cr88192 Guest

    "Antoninus Twink" <> wrote in message
    news:...
    > On 4 Jun 2009 at 18:11, Eric Sosman wrote:
    >> Thing *ptr;
    >> size_t count = (1024 * 1024 - 16) / sizeof *ptr;
    >> ptr = malloc(count * sizeof *ptr);

    >
    > 16 is a magic number, Eric. How did you choose it?
    >


    well, it doesn't matter, because it is wrong...

    infact, the correct calculation goes more like this:
    42, this is the number that is the root of everything;
    10, this number means completeness.

    42*10=420

    so, yes, 420 is the answer, yep, seems about right...
    cr88192, Jun 4, 2009
    #5
  6. kj

    kj Guest

    In <> Keith Thompson <> writes:

    >I just tried an experiment, calling malloc four times with a size of
    >1048576 bytes (1 megabyte, or 1 mebibyte if you want to be picky). On
    >three different systems, the resulting addresses differed consistently
    >by 2**20 plus a relatively small overhead; on three different systems,
    >the overheads were 4096 bytes, 65536 bytes, and 8 bytes. (The systems
    >were Ubuntu, Cygwin under Windows XP, and Solaris.)


    >And yes, the code that determined the differences was non-portable.


    I'd very much appreciate it if you'd share this code.

    kynn
    --
    kj, Jun 5, 2009
    #6
  7. kj

    kj Guest

    In <1244139050.745227@news1nwk> Eric Sosman <> writes:

    >kj wrote:
    >> In this program I'm writing I want to pre-allocate space for a but
    >> an indeterminate (but potentially large) number of instances of a
    >> given structure. And if this amount of memory proves insufficient,
    >> I'll add another large chunk with realloc. Etc.
    >>
    >> [... malloc(), double, realloc(), lather, rinse, repeat ...]
    >>
    >> Is there a way to choose a good value for INITIAL_SIZE?


    > Not without information about the "indeterminate" number.
    >If you have some idea about the likely distribution of that
    >number, you could pick an initial size that's somewhere on
    >the high side of the median, the idea being that "most of the
    >time" you'll avoid the expense of reallocating and copying.
    >But if the number is really, truly unknown ...


    >> Should I
    >> just pick a number out of a hat? 101? 1001? 5091? Or should I
    >> pick a "nice power of 2" out of a hat, like 1024 or 4096? Or is
    >> there a more clueful way to go about this?


    > Counts that produce power-of-two sizes should probably be
    >avoided (unless you have information that the "indeterminate"
    >number is likely to be such a value). The reasoning goes


    > - Many malloc() implementations inflate each allocation by
    > a few bytes, and store some of their own metadata in the
    > extra space, and


    > - Some malloc() implementations use fundamental "block sizes"
    > that are themselves powers of two.


    >If your malloc() has both of these characteristics, a power-of-two
    >request is very nearly the worst thing you can ask for. Consider
    >what happens if you ask for exactly one megabyte. malloc() starts
    >by inflating the request to one megabyte plus a smidgen, and then
    >looks for a block that will hold the total. How big is that block,
    >if only power-of-two blocks are in use? Two megabytes.


    I think that's true only if the "fundamental block size" (call it
    FBS) that this malloc uses happens to be exactly 1 megabyte. But
    if instead FBS were, for example, 4096 bytes and that 1 smidgen <=
    FBS. Then in the example you cite, the block that malloc would
    use would be only 1MB + 4096 bytes, which is significantly less
    than 2MB.

    kynn

    --
    kj, Jun 5, 2009
    #7
  8. kj

    Eric Sosman Guest

    kj wrote:
    > In <1244139050.745227@news1nwk> Eric Sosman <> writes:
    >> [...]
    >> - Some malloc() implementations use fundamental "block sizes"
    >> that are themselves powers of two.

    >
    >> If your malloc() has both of these characteristics, a power-of-two
    >> request is very nearly the worst thing you can ask for. Consider
    >> what happens if you ask for exactly one megabyte. malloc() starts
    >> by inflating the request to one megabyte plus a smidgen, and then
    >> looks for a block that will hold the total. How big is that block,
    >> if only power-of-two blocks are in use? Two megabytes.

    >
    > I think that's true only if the "fundamental block size" (call it
    > FBS) that this malloc uses happens to be exactly 1 megabyte. But
    > if instead FBS were, for example, 4096 bytes and that 1 smidgen <=
    > FBS. Then in the example you cite, the block that malloc would
    > use would be only 1MB + 4096 bytes, which is significantly less
    > than 2MB.


    .... and is also not a power of two. A "pure" buddy system
    allocator can only manage power-of-two-sized blocks, and can
    subdivide them only by halving into smaller powers of two.
    Such an allocator cannot cope with a 1MB+4KB block, nor with
    its 1MB-4KB "buddy," since the whole point of the buddy system
    is that (block address) XOR (block size) = (buddy address).

    How many pure buddy system allocators are there? Beats
    me. For short-ish blocks, the buddy system has the advantage
    that the neighbor of a freed block can be found without a
    search. But for large-ish blocks, the vulnerability to
    internal fragmentation (2MB for 1MB+4KB) is troublesome. I
    suspect (without empirical evidence) that buddy system
    allocators in practical use probably use a mixed strategy,
    using pure buddy system for block sizes below some threshold
    and a different method above that threshold.

    Still, one never knows what's actually going on in the
    malloc() of a system one hasn't seen. I think it prudent to
    avoid what might -- *might* -- be a pathological case, just
    in, well, just in pathological case.

    (Research topic: Does anyone know of any "Fibonacci buddy"
    allocators in actual use? See TAOCP exercise 2.5-31.)

    --
    Eric Sosman
    lid
    Eric Sosman, Jun 5, 2009
    #8
  9. kj

    kj Guest

    In <h09rcv$aov$-september.org> Eric Sosman <> writes:

    >kj wrote:
    >> In <1244139050.745227@news1nwk> Eric Sosman <> writes:
    >>> [...]
    >>> - Some malloc() implementations use fundamental "block sizes"
    >>> that are themselves powers of two.

    >>
    >>> If your malloc() has both of these characteristics, a power-of-two
    >>> request is very nearly the worst thing you can ask for. Consider
    >>> what happens if you ask for exactly one megabyte. malloc() starts
    >>> by inflating the request to one megabyte plus a smidgen, and then
    >>> looks for a block that will hold the total. How big is that block,
    >>> if only power-of-two blocks are in use? Two megabytes.

    >>
    >> I think that's true only if the "fundamental block size" (call it
    >> FBS) that this malloc uses happens to be exactly 1 megabyte. But
    >> if instead FBS were, for example, 4096 bytes and that 1 smidgen <=
    >> FBS. Then in the example you cite, the block that malloc would
    >> use would be only 1MB + 4096 bytes, which is significantly less
    >> than 2MB.


    >... and is also not a power of two.


    Aah, I misunderstood what you meant by "fundamental block size".
    I see what you mean.

    kynn
    --
    kj, Jun 5, 2009
    #9
  10. kj <> writes:
    > In <> Keith Thompson <> writes:
    >>I just tried an experiment, calling malloc four times with a size of
    >>1048576 bytes (1 megabyte, or 1 mebibyte if you want to be picky). On
    >>three different systems, the resulting addresses differed consistently
    >>by 2**20 plus a relatively small overhead; on three different systems,
    >>the overheads were 4096 bytes, 65536 bytes, and 8 bytes. (The systems
    >>were Ubuntu, Cygwin under Windows XP, and Solaris.)

    >
    >>And yes, the code that determined the differences was non-portable.

    >
    > I'd very much appreciate it if you'd share this code.


    Actually, the C program itself didn't compute the differences; it just
    printed the addresses using "%p", and I computed the differences
    manually using a separate calculator program. (I'd had an earlier
    version that did the computation, but decided it was easier to do it
    myself.)

    #include <stdio.h>
    #include <stddef.h>
    #include <stdlib.h>
    #define MEGABYTE (1U<<20)
    int main(void)
    {
    char *p0 = malloc(MEGABYTE);
    char *p1 = malloc(MEGABYTE);
    char *p2 = malloc(MEGABYTE);
    char *p3 = malloc(MEGABYTE);
    printf("%p\n%p\n%p\n%p\n", p0, p1, p2, p3);
    return 0;
    }

    There are several sloppy things that I would normally have fixed
    before posting the code. In particular, it doesn't use <stddef.h>,
    (1U<<20) can invoke undefined behavior if unsigned int is narrower
    than 21 bits, and I didn't bother checking whether malloc succeeded.
    The latter doesn't really matter, since "%p" will still work. And
    "%p" expects void* rather than char*, but the standard (almost)
    guarantees that it will still work.

    Here's a more elaborate version that does the computations itself and
    avoids UB except in one place where it's unavoidable:

    #include <stdio.h>
    #include <stddef.h>
    #include <stdlib.h>
    #define MEGABYTE ((size_t)1 << 20)
    #define COUNT 4
    int main(void)
    {
    char *p[COUNT];
    int i;

    for (i = 0; i < COUNT; i ++) {
    p = malloc(MEGABYTE);
    if (p == NULL) {
    fprintf(stderr, "Allocation failure\n");
    exit(EXIT_FAILURE);
    }
    }

    for (i = 1; i < COUNT; i ++) {
    ptrdiff_t delta = p - p[i-1]; /* UNDEFINED BEHAVIOR! */
    if (delta < 0) delta = -delta;
    printf("delta = %ld, overhead = %ld\n",
    (unsigned long)delta,
    (unsigned long)(delta - MEGABYTE));
    }

    return 0;
    }

    Note that the program indirectly assumes that the chunks of memory
    will be allocated sequentially. It won't fail if the assumption is
    violated, but the output won't make much sense.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jun 5, 2009
    #10
  11. kj

    Guest

    On 4 June, 18:46, kj <> wrote:

    > In this program I'm writing I want to pre-allocate space for a [lot?] but
    > an indeterminate (but potentially large) number of instances of a
    > given structure.  And if this amount of memory proves insufficient,
    > I'll add another large chunk with realloc.  Etc.
    >
    > So somewhere I my code I'll have an initial pre-allocation that
    > looks like (very schematically):


    <snip>

    >   pile_o_foo = (foo *)malloc(size);


    the cast is unnecessary and may hide an error (forgetting to include
    stdlib)

    <snip>

    >   while (size <= next_slot * sizeof foo) {
    >     size *= 2;
    >     pile_o_foo = (foo *)realloc(pile_o_foo, size);
    >     /* error check */


    again, no need for the cast. If realloc() fails then pile_o_foo
    has lost the old block of memory. Use a temporary

    foo *tmp;
    if ((tmp = realloc (pile_o_foo, size)) == NULL)
    /* handle error */
    else
    pile_o_foo = tmp;

    > Is there a way to choose a good value for INITIAL_SIZE?  Should I
    > just pick a number out of a hat?  101?  1001?  5091?  Or should I
    > pick a "nice power of 2" out of a hat, like 1024 or 4096?  Or is
    > there a more clueful way to go about this?


    We'd need to know more about foo and what your application
    does with it. It might be installation dependent (number
    of users might vary from a literal handful to thousands).
    One possibility is read the system size from some external
    source (ini file, data base, environmental variable etc)

    --
    Nick Keighley

    Server rooms should be unfriendly! They should look dangerous,
    they should be uncomfortably cold, they should be inexplicably noisy.
    If you can arrange for the smell of burning transistors, that's good
    too.
    If anything, I'd rather see rackmounted servers designed in dark
    foreboding
    colors with lots of metal spikes sticking out of them.
    Mike Sphar (on news//alt.sysadmin.recovery)
    , Jun 5, 2009
    #11
  12. kj

    James Kuyper Guest

    Dominik Zaczkowski wrote:
    > Keith Thompson <> writes:
    >
    > <snip>
    >> for (i = 1; i < COUNT; i ++) {
    >> ptrdiff_t delta = p - p[i-1]; /* UNDEFINED BEHAVIOR! */
    >> if (delta < 0) delta = -delta;
    >> printf("delta = %ld, overhead = %ld\n",
    >> (unsigned long)delta,
    >> (unsigned long)(delta - MEGABYTE));
    >> }
    >>
    >> return 0;
    >> }

    > <snip>
    >
    > Hi,
    > Could you explain me why is that UB? Isnt ptrdiff_t made to keep
    > "distances" between pointers? Or its valid only when difference is taken
    > from the same arrary/chunk of memory?


    "When two pointers are subtracted, both shall point to elements of the
    same array object, or one past the last element of the array object ..."
    (6.5.6p9).
    James Kuyper, Jun 5, 2009
    #12
  13. kj

    kj Guest

    In <> Keith Thompson <> writes:

    >#include <stdio.h>
    >#include <stddef.h>
    >#include <stdlib.h>
    >#define MEGABYTE ((size_t)1 << 20)
    >#define COUNT 4
    >int main(void)
    >{
    > char *p[COUNT];
    > int i;


    > for (i = 0; i < COUNT; i ++) {
    > p = malloc(MEGABYTE);
    > if (p == NULL) {
    > fprintf(stderr, "Allocation failure\n");
    > exit(EXIT_FAILURE);
    > }
    > }


    > for (i = 1; i < COUNT; i ++) {
    > ptrdiff_t delta = p - p[i-1]; /* UNDEFINED BEHAVIOR! */
    > if (delta < 0) delta = -delta;
    > printf("delta = %ld, overhead = %ld\n",
    > (unsigned long)delta,
    > (unsigned long)(delta - MEGABYTE));
    > }


    > return 0;
    >}


    >Note that the program indirectly assumes that the chunks of memory
    >will be allocated sequentially. It won't fail if the assumption is
    >violated, but the output won't make much sense.


    Thanks for posting this. I found it instructive.

    kynn
    kj, Jun 5, 2009
    #13
  14. kj

    Richard Bos Guest

    "Malcolm McLean" <> wrote:

    > "Eric Sosman" <> wrote in message
    > > Counts that produce power-of-two sizes should probably be
    > > avoided (unless you have information that the "indeterminate"
    > > number is likely to be such a value). The reasoning goes
    > >

    > A power of two suggests to your reader that "this is a decent buffer size".
    > POwers of ten do the same thing, but not so much in C, since it is a low
    > level language.
    >
    > Magic numbers like 1008 (1024-16) leave the reader wondering why the buffer
    > needs to be 1008 places long.


    That's what manifest constants are for. _Commented_ manifest constants.

    Richard
    Richard Bos, Jun 8, 2009
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Skyhorse
    Replies:
    4
    Views:
    4,222
    EventHelix.com
    Apr 16, 2004
  2. Raja
    Replies:
    12
    Views:
    24,343
    John Harrison
    Jun 21, 2004
  3. Replies:
    2
    Views:
    587
    sergejusz
    Mar 26, 2007
  4. sandeep

    How to choose buffer size?

    sandeep, Jun 5, 2010, in forum: C Programming
    Replies:
    4
    Views:
    571
    Seebs
    Jun 6, 2010
  5. Raj Pashwar

    How to choose buffer size?

    Raj Pashwar, Apr 20, 2012, in forum: C Programming
    Replies:
    7
    Views:
    524
    Malcolm McLean
    Apr 21, 2012
Loading...

Share This Page