Matt said:
But I was wondering if realloc had the same overhead as malloc(). Like,
if realloc() returns a block that's bigger than what you requested and
you call it again, it could just update the node or whatever without
having to make a system call. I don't know, I guess it's implementation
dependent, but it seems like realloc() would almost have to work like
that out of necessity.
Most implementations of malloc() and friends will round
the requested size up to a multiple of some convenient "atom"
size, for alignment if for no better reason. Eight bytes is
a typical "atom" size, but of course implementations vary.
(Some -- "buddy system" allocators, for example -- use several
different "atom" sizes. But for what follows, let's just
pretend that the Universe contains nothing but hydrogen.)
For the sake of argument, let's take eight bytes as the
typical "atom" and assume a four-byte `int'. Then if you
call realloc() for every new `int', half of the requests
can be satisfied by simply recognizing the previously unused
four bytes at the end of the existing area. Of course, the
realloc() function needs to do a certain amount of work to
figure out the actual as opposed to nominal region size, but
that's usually not too hard: let's be optimistic and guess
that it's only as hard as storing five `int' values.
But what about the other half of the realloc() calls, those
that find the area already full? The function must now work
harder ...
Perhaps it can discover that the full region is just below
an unallocated area, making it possible to take one "atom" from
the upper area and attach it to the existing one. Discovering
this fact usually involves a search of some kind, plus some
fiddling with bookkeeping information: two area sizes and a
pointer, at least, would need to be changed. Let's again be
optimistic and suppose that the cost of this lengthier operation
is only fifteen times that of storing the `int'.
Of course, realloc() might also find that the growing region
abuts an allocated area, so stealing an "atom" isn't possible.
In this case, realloc() needs to find and allocate an entirely
new memory region, copy the existing data from the old region
to the new one, and deallocate the old region. This requires
more searching, about twice as much bookkeeping as in the above
scenario, and the cost of copying all the data. But let's not
complicate matters: chances are that if you're just sucking in
data from a single source and stashing it in a growing array,
this "full Monty" realloc() will happen fairly rarely.
So, assuming that half the realloc() calls are of the first
type and the rest are of the second, the cost to swallow N `int'
values comes to
(N/2) * 5 + (N/2) * 15 + N = 11 * N
91% of the cost is in the realloc() function; the other 9% is
"payload."
But what would happen if you grew the array by sixteen
`int's worth each time, instead of just by one? The downside
is that all the realloc() calls are now of the costlier second
variety, but the upside is that there are only one-sixteenth
as many calls to begin with. The total cost becomes
(N/16) * 15 + N ~= 1.94 * N
See what just happened? This section of the program is about
5.7 times faster than it used to be. Grow the array by 100
elements at a whack and the cost drops to
(N/100) * 15 + N = 1.15 * N
.... and the program runs at more than nine times the speed of
the original.
Of course, all this is based on guesstimations; the Standard
is silent on the cost of realloc(). The actual cost may be quite
different from the illustrative computations given here. But since
the realloc() operations themselves are "overhead" as opposed to
"payload" as far as the overall progress of the program is
concerned, it makes sense to avoid excessive realloc() calls if
N is large. (If N is small, of course, it makes no sense at
all to bother optimizing this insignificant piece of program.)