How to choose buffer size?

Discussion in 'C Programming' started by sandeep, Jun 5, 2010.

  1. sandeep

    sandeep Guest

    In this program I'm writing I want to pre-allocate space for a
    indeterminate (but potentially large) number of instances of a
    given structure. And if this amount of memory proves insufficient,
    I'll add another large chunk with realloc. Etc.

    So somewhere in my code I'll have an initial pre-allocation that
    looks like

    size_t _siz = INITIAL_SIZE;
    foo* foos;
    int next_slot = 0;

    foos = (foo *)malloc(_siz);
    /* error check */

    /* ... */

    while (size <= next_slot * sizeof foo) {
    size *= 2;
    foos = (foo *)realloc(foos, size);
    /* error check */
    }

    /* ... */

    Is there a way to choose a good value for INITIAL_SIZE? Should I
    just pick a number out of a hat? 101? 1001? 5091? Or should I
    pick a "nice power of 2" out of a hat, like 1024 or 4096? Or is
    there a better way?
     
    sandeep, Jun 5, 2010
    #1
    1. Advertising

  2. sandeep

    Seebs Guest

    On 2010-06-05, Richard Heathfield <> wrote:
    > It's best to avoid identifiers that use leading underscores. There is a
    > rule against it. Although it's not an all-embracing rule, in practice it
    > turns out to be easiest to pretend that it /is/ all-embracing.


    Strong agreement here. There is simply no POINT in using such identifiers,
    and if you never use one, you can be TOTALLY confident that you will not
    mistakenly trip the rule.

    Now if only the str*, is*, to*, and E* rules were as easy to remember.

    > Depends on your data. One way is to pick a value that will be big enough
    > to cover, say, 25% of your typical average requirement. That way, most
    > of the time you'll probably only have to realloc once or twice, and
    > you're unlikely ever to be wasting too much unrequired memory.


    There is a similar tradeoff in how much to grow at a time. I tend to favor
    *1.5 over *2, myself, but I'm not sure whether this has any justification
    past "it seemed like a good idea at the time".

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Jun 5, 2010
    #2
    1. Advertising

  3. On Sat, 5 Jun 2010 17:24:57 +0000 (UTC), sandeep <>
    wrote:

    >In this program I'm writing I want to pre-allocate space for a
    >indeterminate (but potentially large) number of instances of a
    >given structure. And if this amount of memory proves insufficient,
    >I'll add another large chunk with realloc. Etc.
    >
    >So somewhere in my code I'll have an initial pre-allocation that
    >looks like
    >
    > size_t _siz = INITIAL_SIZE;


    Why are you using an identifier whose name violates 7.1.3?

    > foo* foos;
    > int next_slot = 0;
    >
    > foos = (foo *)malloc(_siz);


    Why are you using a meaningless cast?

    While not incorrect, the more common idiom is to keep track of the
    number of objects, not the total size they consume.

    > /* error check */
    >
    > /* ... */
    >
    > while (size <= next_slot * sizeof foo) {


    foo is a type. When the operand of sizeof is a type, the parentheses
    are required.

    If the left operand equals the right operand, you will overflow your
    allocated memory. You probably want <, not <=.

    > size *= 2;


    Did you mean _siz instead of size?

    What will happen if you execute this block multiple times (think of
    the wheat and chessboard fable)? You might want to consider
    size += INITIAL_SIZE;
    which will have the same effect on the first execution but result in a
    significantly smaller number on subsequent executions.

    > foos = (foo *)realloc(foos, size);


    If realloc fails, you have created a memory leak and cannot access any
    of the previously created objects.

    > /* error check */
    > }
    >
    > /* ... */
    >
    >Is there a way to choose a good value for INITIAL_SIZE? Should I
    >just pick a number out of a hat? 101? 1001? 5091? Or should I
    >pick a "nice power of 2" out of a hat, like 1024 or 4096? Or is
    >there a better way?


    Analysis is almost always superior to guessing.

    Even if the exact number is unknown, do you know anything about the
    number of objects you are likely to need? If you run the program
    repeatedly, do you know anything about the distribution of this value?
    Does you program spend most of its time processing the data (so that
    minimizing calls to malloc might be useful) or is it I/O bound (in
    which case it wouldn't matter if your call to realloc increased the
    space for only one more object)?

    Since INITIAL_SIZE is not the quantity of objects but the amount of
    space they consume, it needs to be a multiple of sizeof(foo). If you
    change the initialization of _siz to
    INITIAL_SIZE * sizeof(foo)
    (or change the argument to malloc) then your numbers above might be
    reasonable.

    Most systems today use a virtual memory model. Virtual memory is
    allocated in terms of pages. Each page has a fixed size determined by
    the operating system and hardware. There may be some benefit to
    allocating memory in quantities that match page boundaries.

    How sophisticated would you like your growth algorithm to be?
    Increasing by a constant factor (such as your doubling) or by a
    constant quantity (such as my suggestion) are dirt simple approaches.
    There are others, such as increasing by 100%, 75%, 50%, and 25% for
    the first four changes and then if more increases are needed start
    raising the percentage again. If you are obtaining the data from a
    file, you could use a system dependent method of getting the file size
    and computing an initial estimate from that.

    --
    Remove del for email
     
    Barry Schwarz, Jun 5, 2010
    #3
  4. sandeep

    Gene Guest

    On Jun 5, 1:24 pm, sandeep <> wrote:
    > In this program I'm writing I want to pre-allocate space for a
    > indeterminate (but potentially large) number of instances of a
    > given structure.  And if this amount of memory proves insufficient,
    > I'll add another large chunk with realloc.  Etc.
    >
    > So somewhere in my code I'll have an initial pre-allocation that
    > looks like
    >
    >   size_t _siz = INITIAL_SIZE;
    >   foo* foos;
    >   int next_slot = 0;
    >
    >   foos = (foo *)malloc(_siz);
    >   /* error check */
    >
    >   /* ... */
    >
    >   while (size <= next_slot * sizeof foo) {
    >     size *= 2;
    >     foos = (foo *)realloc(foos, size);
    >     /* error check */
    >   }
    >
    >   /* ... */
    >
    > Is there a way to choose a good value for INITIAL_SIZE?  Should I
    > just pick a number out of a hat?  101?  1001?  5091?  Or should I
    > pick a "nice power of 2" out of a hat, like 1024 or 4096?  Or is
    > there a better way?


    I suggest implementing an internal memory allocation interface that
    the rest of your system uses rather than directly calling malloc/
    calloc/realloc/strdup/free. Start by implementing the interface with
    a malloc() call once per object. Profile. Then only if memory
    allocation is a bottleneck or you have fragmentation problems, etc.
    implement your block allocation scheme behind the interface. If your
    program has a memory utilization profile consisting of many
    allocations (the ones you'd use the blocks for) followed by
    deallocation, modern malloc/frees are going to be a few 10s of
    instructions, so multiple allocation won't buy much.
     
    Gene, Jun 6, 2010
    #4
  5. sandeep

    Seebs Guest

    On 2010-06-05, Richard Heathfield <> wrote:
    > Seebs wrote:
    >> There is a similar tradeoff in how much to grow at a time. I tend to favor
    >> *1.5 over *2, myself, but I'm not sure whether this has any justification
    >> past "it seemed like a good idea at the time".


    > That's an interesting empirical reach toward the theoretical ideal of
    > 1.618033988... (i.e. (1.0 + sqrt(5.0)) / 2).


    Yes, it is. :)

    > I have an interesting proof of this, but... well, Usenet's so *small*,
    > isn't it?


    It's sufficient to observe that it's "probably close enough", with perhaps
    the advantage that "*1.5" is SOOO much faster to calculate that, in a run
    where you expand from an initial pool of a few thousand objects to a pool
    of several million, clearly you will save NANOSECONDS by using something that
    can be clearly and expressively written as:
    i = (i < 2) ? (i + (i | 1)) : (i + (i >> 1));

    .... Sorry, long day. :)

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Jun 6, 2010
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Skyhorse
    Replies:
    4
    Views:
    4,264
    EventHelix.com
    Apr 16, 2004
  2. Raja
    Replies:
    12
    Views:
    24,724
    John Harrison
    Jun 21, 2004
  3. Replies:
    2
    Views:
    639
    sergejusz
    Mar 26, 2007
  4. kj

    How to choose buffer size?

    kj, Jun 4, 2009, in forum: C Programming
    Replies:
    13
    Views:
    691
    Richard Bos
    Jun 8, 2009
  5. Raj Pashwar

    How to choose buffer size?

    Raj Pashwar, Apr 20, 2012, in forum: C Programming
    Replies:
    7
    Views:
    553
    Malcolm McLean
    Apr 21, 2012
Loading...

Share This Page