Hello friends:
In this program I'm writing I want to pre-allocate space for a but
an indeterminate (but potentially large) number of instances of a
given structure. And if this amount of memory proves insufficient,
I'll add another large chunk with realloc. Etc.
So somewhere I my code I'll have an initial pre-allocation that
looks like:
size_t size = INITIAL_SIZE;
foo *pile_o_foo;
int next_slot = 0;
pile_o_foo = (foo *)malloc(size);
/* error check */
As an aside: It would probably be less error-prone to keep
track of the size of the allocation in terms of the `foo' things
it contains, rather than in terms of the number of bytes they
occupy. You'd then have something like
size_t count = INITIAL_COUNT;
...
pile_o_foo = malloc(count * sizeof *pile_o_foo);
Two notes about that final line: First, there's no need to cast
the result of malloc(), although the cast is harmless. Second, it's
safer to write `sizeof *pile_o_foo' than `sizeof(foo)', since this
way you cannot possibly write `sizeof(foobar)' by mistake.
/* ... */
while (size<= next_slot * sizeof foo) {
Aside: You're missing some parentheses.
size *= 2;
pile_o_foo = (foo *)realloc(pile_o_foo, size);
/* error check */
Yet another aside: Don't do `p = realloc(p,...)', because if
realloc() returns NULL you'll have overwritten your pointer to the
original area. If "error check" is just going to terminate the
program abruptly that may be tolerable, but if not it may have
leaked the old memory: You can no longer even free() it since the
pointer to it is gone.
}
/* ... */
Is there a way to choose a good value for INITIAL_SIZE? Should I
just pick a random number? 101? 1001? 5091? Or should I
pick a nice power of 2, like 1024 or 4096? Or is
there a more intelligent way to go about this?
INITIAL_SIZE (or INITIAL_COUNT) depends on "domain-specific
knowledge," and there's no single prescription that works for all
domains. If each `foo' holds the quantum numbers for a single
electron and you're reading one atom's worth, you have prior knowledge
that ten dozen will suffice for all elements yet discovered. If the
atoms are all involved in organic chemistry, electron counts will be
almost always less than two dozen. But if each `foo' represents one
IP packet on an undersea fibre, it's hard to put a useful ceiling on
how large the array might grow.
So: First, apply domain-specific knowledge to the extent that you
have it. If `foo' is a maid of honor at your former roommate's best
friend's second cousin's wedding, INITIAL_COUNT might well be four.
Or even eight: The penalty for overestimating is modest.
Second, if the only domain-specific knowledge you can summon is
"It's probably not HUGE" (for some unstated value of HUGE), you might
as well start INITIAL_COUNT at forty-two and chance the consequences.
You're balancing space wastage against copying time, and for non-HUGE
populations neither will be HUGE.
Third, if all you know is "It might be almost anything, including
REALLY REALLY HUGE," consider other data structures. A linked list of
"buckets" of a thousand `foo' instances each, for example: Yes, it's
harder to traverse (but not *that* much harder), but you avoid needing
to copy and re-copy and re-re-copy all those larger and larger arrays
as you're accumulating the data.
One other tiny practicality: When allocating a big blob of memory,
it is probably best to ask for slightly less than a power of two. This
is because nearly all malloc() implementations inflate your request by
a few bytes to add their own metadata, and some implementations round
the sum up to the next power of two. So if you ask for 1024 bytes and
malloc() adds eight, you'll use as much memory as if you'd asked for
2040 bytes. (But note the word "tiny" above: Concerns of this kind
only come into play if you're allocating VERY MANY such blocks, or
VERY LARGE ones. Also, the internals of malloc() implementations vary
a great deal from platform to platform, and too much pandering to the
peculiarities of a few is vanity and vexation of spirit.)