How to choose buffer size?

Discussion in 'C Programming' started by Raj Pashwar, Apr 20, 2012.

  1. Raj Pashwar

    Raj Pashwar Guest

    Hello friends:

    In this program I'm writing I want to pre-allocate space for a but
    an indeterminate (but potentially large) number of instances of a
    given structure. And if this amount of memory proves insufficient,
    I'll add another large chunk with realloc. Etc.

    So somewhere I my code I'll have an initial pre-allocation that
    looks like:

    size_t size = INITIAL_SIZE;
    foo *pile_o_foo;
    int next_slot = 0;

    pile_o_foo = (foo *)malloc(size);
    /* error check */

    /* ... */

    while (size <= next_slot * sizeof foo) {
    size *= 2;
    pile_o_foo = (foo *)realloc(pile_o_foo, size);
    /* error check */
    }

    /* ... */

    Is there a way to choose a good value for INITIAL_SIZE? Should I
    just pick a random number? 101? 1001? 5091? Or should I
    pick a nice power of 2, like 1024 or 4096? Or is
    there a more intelligent way to go about this?

    Best regards
    Raj Pashwar, Apr 20, 2012
    #1
    1. Advertising

  2. Raj Pashwar

    James Kuyper Guest

    On 04/20/2012 06:05 PM, Raj Pashwar wrote:
    > Hello friends:
    >
    > In this program I'm writing I want to pre-allocate space for a but
    > an indeterminate (but potentially large) number of instances of a
    > given structure. And if this amount of memory proves insufficient,
    > I'll add another large chunk with realloc. Etc.
    >
    > So somewhere I my code I'll have an initial pre-allocation that
    > looks like:
    >
    > size_t size = INITIAL_SIZE;
    > foo *pile_o_foo;
    > int next_slot = 0;
    >
    > pile_o_foo = (foo *)malloc(size);


    The cast is unnecessary: it specifies explicitly precisely the same
    conversion that would have occurred implicitly if there had been no
    cast. That is reason enough to remove it. Necessary casts are almost
    always dangerous, while conversions that can happen implicitly are
    generally much safer. Therefore, experienced coders look at all casts as
    potential problems; removing unnecessary casts creates fewer
    distractions, allowing you to concentrate on the truly dangerous ones.

    In addition, if you use C90 and forget to #include <stdlib.h>, this cast
    will usually prevent you from getting an error message due to the fact
    that there is no prototype for malloc() in scope. The behavior of such
    code is undefined - it might work with any given implementation of C,
    but when you port it to a different implementation there's a good chance
    it will fail catastrophically.

    > /* error check */
    >
    > /* ... */
    >
    > while (size <= next_slot * sizeof foo) {
    > size *= 2;
    > pile_o_foo = (foo *)realloc(pile_o_foo, size);
    > /* error check */
    > }
    >
    > /* ... */
    >
    > Is there a way to choose a good value for INITIAL_SIZE? Should I
    > just pick a random number? 101? 1001? 5091? Or should I
    > pick a nice power of 2, like 1024 or 4096? Or is
    > there a more intelligent way to go about this?


    First of all, INITIAL_SIZE should be an exact multiple of sizeof(foo).
    The easiest way to do this is to keep track of the number of instances
    of foo, rather than the number of bytes. Then you can write:

    pile_o_foo = malloc(num_instances * sizeof *pile_o_foo);

    Secondly, while it's true that there are many contexts where a size that
    is a power of 2 has efficiency advantages, that is, paradoxically, a
    reason why the size that you use here should NOT be a power of 2. Why?
    because many implementations of malloc() add a small amount to your
    allocation, to store information to be used by the malloc() family of
    functions about the allocation. They then round up the allocation to the
    next multiple of a fixed block size. That size is usually a power of 2,
    or a small integer multiple of one. Therefore, by making your allocation
    exactly a power of 2, or even a small multiple of a power of two,
    there's a good chance that malloc will allocate slightly more than a
    power of 2 bytes, thereby wasting an entire memory block to hold just a
    few more bytes of information.

    If you knew the size that was added, and the block size, you could
    choose your sizes to be an exact multiple of a block size AFTER the
    addition; at the cost of tying your code to a particular implementation.
    You're better off not worrying about such things.

    For the same reason, multiplying by 2 every time you need more memory
    will sooner or later cause exactly the same problem. I've seen analyses
    that ignore this issue, and end up concluding that sqrt(2) it the ideal
    growth factor, but if you use that value exactly, you'll have the same
    problem on every other growth step (You can get an average growth factor
    of exactly sqrt(2) by alternating between 3/2 and 4/3). If you round
    sqrt(2) up to 1.5, then you'll avoid the problem entirely. To avoid
    unnecessary conversions to floating point types, use

    num_instances = (num_instances*3)/2;

    As China Blue said: the exact value of INITIAL_SIZE is not very
    important. It would only be important if it's MUCH smaller (by several
    orders of magnitude) than the typical size that will eventually be needed.
    James Kuyper, Apr 21, 2012
    #2
    1. Advertising

  3. Raj Pashwar

    Eric Sosman Guest

    On 4/20/2012 6:05 PM, Raj Pashwar wrote:
    > Hello friends:
    >
    > In this program I'm writing I want to pre-allocate space for a but
    > an indeterminate (but potentially large) number of instances of a
    > given structure. And if this amount of memory proves insufficient,
    > I'll add another large chunk with realloc. Etc.
    >
    > So somewhere I my code I'll have an initial pre-allocation that
    > looks like:
    >
    > size_t size = INITIAL_SIZE;
    > foo *pile_o_foo;
    > int next_slot = 0;
    >
    > pile_o_foo = (foo *)malloc(size);
    > /* error check */


    As an aside: It would probably be less error-prone to keep
    track of the size of the allocation in terms of the `foo' things
    it contains, rather than in terms of the number of bytes they
    occupy. You'd then have something like

    size_t count = INITIAL_COUNT;
    ...
    pile_o_foo = malloc(count * sizeof *pile_o_foo);

    Two notes about that final line: First, there's no need to cast
    the result of malloc(), although the cast is harmless. Second, it's
    safer to write `sizeof *pile_o_foo' than `sizeof(foo)', since this
    way you cannot possibly write `sizeof(foobar)' by mistake.

    >
    > /* ... */
    >
    > while (size<= next_slot * sizeof foo) {


    Aside: You're missing some parentheses.

    > size *= 2;
    > pile_o_foo = (foo *)realloc(pile_o_foo, size);
    > /* error check */


    Yet another aside: Don't do `p = realloc(p,...)', because if
    realloc() returns NULL you'll have overwritten your pointer to the
    original area. If "error check" is just going to terminate the
    program abruptly that may be tolerable, but if not it may have
    leaked the old memory: You can no longer even free() it since the
    pointer to it is gone.

    > }
    >
    > /* ... */
    >
    > Is there a way to choose a good value for INITIAL_SIZE? Should I
    > just pick a random number? 101? 1001? 5091? Or should I
    > pick a nice power of 2, like 1024 or 4096? Or is
    > there a more intelligent way to go about this?


    INITIAL_SIZE (or INITIAL_COUNT) depends on "domain-specific
    knowledge," and there's no single prescription that works for all
    domains. If each `foo' holds the quantum numbers for a single
    electron and you're reading one atom's worth, you have prior knowledge
    that ten dozen will suffice for all elements yet discovered. If the
    atoms are all involved in organic chemistry, electron counts will be
    almost always less than two dozen. But if each `foo' represents one
    IP packet on an undersea fibre, it's hard to put a useful ceiling on
    how large the array might grow.

    So: First, apply domain-specific knowledge to the extent that you
    have it. If `foo' is a maid of honor at your former roommate's best
    friend's second cousin's wedding, INITIAL_COUNT might well be four.
    Or even eight: The penalty for overestimating is modest.

    Second, if the only domain-specific knowledge you can summon is
    "It's probably not HUGE" (for some unstated value of HUGE), you might
    as well start INITIAL_COUNT at forty-two and chance the consequences.
    You're balancing space wastage against copying time, and for non-HUGE
    populations neither will be HUGE.

    Third, if all you know is "It might be almost anything, including
    REALLY REALLY HUGE," consider other data structures. A linked list of
    "buckets" of a thousand `foo' instances each, for example: Yes, it's
    harder to traverse (but not *that* much harder), but you avoid needing
    to copy and re-copy and re-re-copy all those larger and larger arrays
    as you're accumulating the data.

    One other tiny practicality: When allocating a big blob of memory,
    it is probably best to ask for slightly less than a power of two. This
    is because nearly all malloc() implementations inflate your request by
    a few bytes to add their own metadata, and some implementations round
    the sum up to the next power of two. So if you ask for 1024 bytes and
    malloc() adds eight, you'll use as much memory as if you'd asked for
    2040 bytes. (But note the word "tiny" above: Concerns of this kind
    only come into play if you're allocating VERY MANY such blocks, or
    VERY LARGE ones. Also, the internals of malloc() implementations vary
    a great deal from platform to platform, and too much pandering to the
    peculiarities of a few is vanity and vexation of spirit.)

    --
    Eric Sosman
    d
    Eric Sosman, Apr 21, 2012
    #3
  4. Raj Pashwar

    Ike Naar Guest

    On 2012-04-20, James Kuyper <> wrote:
    > [...] To avoid
    > unnecessary conversions to floating point types, use
    >
    > num_instances = (num_instances*3)/2;


    Caveat: this will not increase num_instances if it equals 1.
    Ike Naar, Apr 21, 2012
    #4
  5. Raj Pashwar

    James Kuyper Guest

    On 04/21/2012 04:27 AM, Ike Naar wrote:
    > On 2012-04-20, James Kuyper <> wrote:
    >> [...] To avoid
    >> unnecessary conversions to floating point types, use
    >>
    >> num_instances = (num_instances*3)/2;

    >
    > Caveat: this will not increase num_instances if it equals 1.


    If it makes sense to have an initial value of 1 for num_instances, you
    can either have special case handling for num_instances == 1, or use
    (num_instances*3+1)/2. If it makes sense to have an initial value of 0
    for num_instances, you could use special handling or ((num_instances+1)*3)/2
    --
    James Kuyper
    James Kuyper, Apr 21, 2012
    #5
  6. Raj Pashwar

    Paul N Guest

    On Apr 20, 11:05 pm, Raj Pashwar <> wrote:
    >   while (size <= next_slot * sizeof foo) {
    >     size *= 2;
    >     pile_o_foo = (foo *)realloc(pile_o_foo, size);
    >     /* error check */
    >   }


    This looks wrong to me - you know how much you need but (if next_slot
    * sizeof foo is more than twice size) you are realloc'ing a smaller
    amount first. There seems no point to this. Try something like:

    if (size <= next_slot * sizeof foo) {
    while (size <= next_slot * sizeof foo) size *= 2;
    new_pile_o_foo = (foo *)realloc(pile_o_foo, size);
    /* error check */
    }
    Paul N, Apr 21, 2012
    #6
  7. Raj Pashwar

    Eric Sosman Guest

    On 4/21/2012 4:27 AM, Ike Naar wrote:
    > On 2012-04-20, James Kuyper<> wrote:
    >> [...] To avoid
    >> unnecessary conversions to floating point types, use
    >>
    >> num_instances = (num_instances*3)/2;

    >
    > Caveat: this will not increase num_instances if it equals 1.


    Because of that, and to make the calculation very slightly less
    vulnerable to overflow, I use

    num_instances += num_instances / 2 + 10;

    (for suitable values of 2 and 10). Works even if num_instances is
    initially zero.

    --
    Eric Sosman
    d
    Eric Sosman, Apr 21, 2012
    #7
  8. בת×ריך ×™×•× ×©×™×©×™, 20 ב×פריל 2012 23:05:23 UTC+1, מ×ת Raj Pashwar:
    > Is there a way to choose a good value for INITIAL_SIZE? Should I
    > just pick a random number? 101? 1001? 5091? Or should I
    > pick a nice power of 2, like 1024 or 4096? Or is
    > there a more intelligent way to go about this?
    >

    When allocating a buffer, choose an initial size which is a) a trivial amount of memory and b) likely to fulfil trivial requests b) isn't always possible to define, you might not know the distribution of likely requests. But what's technically known as the log-normal distribution is very common in real situations. You have some companies with one or two employees, and large number with ten to twenty, then a long right tail with a small number of individual companies with very high employee counts. So the initial buffer should be about 20.

    Then to grow, I normally use add a percentage plus a small constant. This means that you avoid excessive reallocations at the beginning, but don't need to ask for too much when the buffer gets large. 10% is a good choice if memory is precious, 50% for the more common case where it's quite cheap but not of negligible value.

    --
    Basic Algorithms - a valuable C programmign resource
    http://www.malcolmmclean.site11.com/www
    Malcolm McLean, Apr 21, 2012
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Skyhorse
    Replies:
    4
    Views:
    4,233
    EventHelix.com
    Apr 16, 2004
  2. Raja
    Replies:
    12
    Views:
    24,394
    John Harrison
    Jun 21, 2004
  3. Replies:
    2
    Views:
    603
    sergejusz
    Mar 26, 2007
  4. kj

    How to choose buffer size?

    kj, Jun 4, 2009, in forum: C Programming
    Replies:
    13
    Views:
    670
    Richard Bos
    Jun 8, 2009
  5. sandeep

    How to choose buffer size?

    sandeep, Jun 5, 2010, in forum: C Programming
    Replies:
    4
    Views:
    583
    Seebs
    Jun 6, 2010
Loading...

Share This Page