Buffer growing strategy?

Jef Driesen · Sep 17, 2009

Hi,

I'm implementing a buffer that grows automatically when appending or
prepending data. To improve performance, I'm looking into a buffer
growing strategy to double the memory (or more generally multiply with a
FACTOR > 1). It seems there are a number of variations in use:

1. Try to increase the current size once, and use the requested size 'n'
if that is not enough.

newsize = oldsize * FACTOR;
if (n > newsize)
newsize = n;

2. Increase the current size until the new size is large enough. (I'm
aware that I have to take extra care in case oldsize is zero, or with
multiplications with a non-integer FACTOR.)

newsize = oldsize;
while (n > newsize)
newsize *= FACTOR;

3. Similar to #2, but always using a fixed blocksize instead of the
current size. (Typically BLOCKSIZE=1 and FACTOR=2, such that the
allocated buffer is always a power of two.)

newsize = BLOCKSIZE;
while (n > newsize)
newsize *= FACTOR;

Which of these methods is preferred, and for what reason? Or is that
just a matter of personal preference?

Thanks,

Jef

jacob navia · Sep 17, 2009

Richard Harter a écrit :

[snip discussion about reallocation strategies]

The real trouble is that the allocator has no way of
knowing that you have a resizable array.

And why not?

Why do we have a malloc that can't receive a set of flags that
would indicate it what use will be done for the allocated
memory?

If we could tell the malloc/realloc/free system

#define ALLOC_FLAG_RESIZABLE 1 // Can be resized
#define ALLOC_FLAG_STATIC 2 // Never resized
#define ALLOC_FLAG_NOALIGN 4 // No alignment requirement
#define ALLOC_FLAG_FILLZERO 8
#define ALLOC_FLAG_PERMANENT 16 // Never freed

and many others. That would allow the allocator to function
much more efficiently than now, where it has to be good for
all purposes!

This is what I mean with an obsolete standard library.

Seebs · Sep 17, 2009

Why do we have a malloc that can't receive a set of flags that
would indicate it what use will be done for the allocated
memory?

Don't see much benefit.

If we could tell the malloc/realloc/free system

#define ALLOC_FLAG_RESIZABLE 1 // Can be resized
#define ALLOC_FLAG_STATIC 2 // Never resized

This is silly. Pick one, make it a flag. The other is the
absence of the flag.

Anyway, what do you mean "resized"? Resized how much? Are we
going to end up with

ALLOC_FLAG_RESIZABLE_NO_MORE_THAN_ONE_MEGABYTE
ALLOC_FLAG_RESIZABLE_I_DUNNO_MAYBE_TWENTY_K_OR_SO
ALLOC_FLAG_RESIZABLE_FACTOR_OF_THREE_PROBABLY

?

#define ALLOC_FLAG_NOALIGN 4 // No alignment requirement

I don't see any measurable benefit.

#define ALLOC_FLAG_FILLZERO 8

This one I think has actual potential.

#define ALLOC_FLAG_PERMANENT 16 // Never freed
Maybe.

and many others. That would allow the allocator to function
much more efficiently than now, where it has to be good for
all purposes!

This is what I mean with an obsolete standard library.

I'm not at all convinced. I've seen a couple of allocators with flags,
and in general, it didn't work out particularly well. (Sometimes merely
due to poor choices about documentation.) I don't necessarily buy
the argument that there would be a significant performance increase,
but there'd certainly be a ton of potential for mistakes (resizing
a no-resize region, etc.).

-s

Keith Thompson · Sep 17, 2009

Seebs said:
Maybe.

[...]

So what happens if you try to free() a pointer that was allocated with
ALLOC_FLAG_PERMANENT?

unsigned char *perm = new_malloc(1000, ALLOC_FLAG_PERMANENT);
free(perm);

I can think of several possible answers, considering that free() has
no mechanism for reporting errors.

1. It's undefined behavior.

2. It silently does nothing.

3. There's a new free-like function to go along with the new
malloc-like function (I think it's sufficiently obvious that we
can't add a flags argument to malloc). free(perm) either is UB or
does nothing; new_free(perm) either does nothing or reports an
error.

jacob navia · Sep 17, 2009

Seebs a écrit :

Don't see much benefit.

This is silly. Pick one, make it a flag. The other is the
absence of the flag.

Surely not, since there are allocations where you just
do not know anything about the future use yet!

Anyway, what do you mean "resized"? Resized how much? Are we
going to end up with

ALLOC_FLAG_RESIZABLE_NO_MORE_THAN_ONE_MEGABYTE
ALLOC_FLAG_RESIZABLE_I_DUNNO_MAYBE_TWENTY_K_OR_SO
ALLOC_FLAG_RESIZABLE_FACTOR_OF_THREE_PROBABLY

?

No, the size should go in the extra arguments

I don't see any measurable benefit.

Less wasted space between the allocated data

This one I think has actual potential.

I'm not at all convinced. I've seen a couple of allocators with flags,
and in general, it didn't work out particularly well. (Sometimes merely
due to poor choices about documentation.) I don't necessarily buy
the argument that there would be a significant performance increase,
but there'd certainly be a ton of potential for mistakes (resizing
a no-resize region, etc.).

Flags with the allocator exist in windows since windows 3.1 (1992 if I
remember correctly) and they were and are useful.

malloc/free needs functionality like

size_t GetBlockSize(void *);

that allows you to know how big this malloced block is. If you get
(size_t)-1 it means that is not a block allocated with malloc.

Jef Driesen · Sep 17, 2009

Richard said:
Let me add number 4:

if (n > size) {
size = n * FACTOR;
realloc(buffer,size);
}

It mostly is a matter of personal preference since the costs are
nominal in all cases. I see no advantage to 3 since 2 will do
the same thing more cheaply. There are potential disadvantages
in using power of two sizes. However if you want powers of two
sizes use #2.

Otherwise use 1 or 4. In my opinion, for whatever that may be
worth.

Can you explain why #2 is cheaper than #3? And what is the potential
disadvantage of powers of two? I assumed powers of two might be an
advantage because the underlying allocator probably uses such block
sizes internally, but I'm not really sure about that.

My typical use case is that the data blocks that needs to be
appended/prepended to the buffer usually have a fixed size. I believe a
FACTOR=2 is the best choice in that case, because such a block will
always fit in the free space (if there is free space of course).

A very simple solution is to just call realloc () for the new size
whenever the size changes, and hope that the realloc implementation is
efficient. Which it usually is. I just checked that calling realloc
for the same pointer with all sizes from 1 byte to 1 billion bytes
took just 55 nanoseconds per realloc on a three year old MacBook. In
this case, anything more complex than calling realloc seems a waste of
programmers time.

Click to expand...

That's simple enough, but it is more expensive than you appear to
think. The problem is not with the cost of an individual realloc
(though that will increase with buffer size); rather it is with
the number of calls.

[... nice explanation about allocators ...]

In my particular case, realloc is not always optimal because my buffer
not only needs to support appending data at the end, but also prepending
data at the front. And realloc is not ideal when the data needs to be
moved anyway. (See my other thread "Increase memory buffer with realloc
or malloc?")

I have implemented my buffer in such a way that the free space is either
located at the end (for efficient appending) or at the front (for
efficient prepending). Whenever data is appended and the free space is
at the wrong end it is moved (and if necessary the size increased as
well). That way, future append operations are as efficient as possible.
The same for prepending data. Only interleaving appends and prepends is
inefficient, but that does not happen in my code, so that's fine.

William Hughes · Sep 17, 2009

(I think it's sufficiently obvious that we
can't add a flags argument to malloc).

Certainly Jacob Navia would not agree.

Seems to me the best approach is to add
optional arguments and default values to C.
This would not break backward compatibility if
we require that a library function called without
optional arguments must have identical behaviour
as before.

We could then add an optional flag argument to
malloc and an optional error return argument
to free with default value null, eg,

free(ptr,&free_erno)
free(ptr) equivalent to free(ptr,null)

If ptr is an ALLOC_FLAG_PERMANENT ptr then
free(ptr) could silently do nothing while
free(ptr,&free_erno) would return an appropriate
error.

I am not sure if the gain in worth it in this
specific case. On the
other hand I think optional arguments are
a great idea.

- William Hughes

Keith Thompson · Sep 17, 2009

jacob navia said:
Seebs a Ã©crit :

Surely not, since there are allocations where you just
do not know anything about the future use yet!

I don't see the difference between "Let me resize this later" and
"I don't know whether I'll need to resize this later". Well, maybe
the former could do something to make resizing a trifle faster,
but I'm skeptical that that's a distinction worth making.

What happens if I specify ALLOC_FLAG_RESIZABLE|ALLOC_FLAG_STATIC?

No, the size should go in the extra arguments

Um, what extra arguments? I don't think you've shown us the
function(s) that take these flags.

Less wasted space between the allocated data

Hmm. Depending on the implementation, there might be little or no
benefit to this; for example, there might be housekeeping information
stored in a freed block that needs to be aligned anyway.

But if you're going to provide alignment control, IMHO it makes more
sense to allow the actual alignment to be specified. Document, or
even provide a way to query, what alignemnts are supported, including
a constant for the maximum required alignment (what malloc() uses).

With the usual caveat that all-bits-zero isn't necessarily 0 for
pointers or FP values. (That's not to say it wouldn't be useful
in spite of that; after all, we already have calloc().)

[...]

malloc/free needs functionality like

size_t GetBlockSize(void *);

that allows you to know how big this malloced block is. If you get
(size_t)-1 it means that is not a block allocated with malloc.

If I do this:
void *ptr = malloc(999);
does GetBlockSize(ptr) give me 999 or the actual number of bytes
allocated?

I can see this being useful. I can also see it imposing a significant
burden on some implementations.

Seebs · Sep 17, 2009

Seebs a écrit :

Surely not, since there are allocations where you just
do not know anything about the future use yet!

So what?

If you don't know at the time of allocation whether or not it is possible
to resize, then it's resizable.

"Never resized" means "I can prove that it can never be resized".

No, the size should go in the extra arguments

But what size? The largest possible future size? Merely the information
that there may be a future size but we don't know what it is?

Less wasted space between the allocated data

Wasted space between allocated data is probably very close to zero
functionally, because you still need alignment for the information used
to describe the region...

Flags with the allocator exist in windows since windows 3.1 (1992 if I
remember correctly) and they were and are useful.

Maybe. I'm not convinced that they've paid off.

Which systems have had more problems with memory allocation systems failing
catastrophically: Windows, or systems which just provided malloc/free?

Answer: Windows.

malloc/free needs functionality like

size_t GetBlockSize(void *);

that allows you to know how big this malloced block is. If you get
(size_t)-1 it means that is not a block allocated with malloc.

Why does it need this functionality? I have never needed it. If I want to
track how much space I've requested, I do so -- and the rest of the time I
don't pay the (substantial) overhead.

-s

Seebs · Sep 17, 2009

Seems to me the best approach is to add
optional arguments and default values to C.

I think this would be pretty horrible.

This would not break backward compatibility if
we require that a library function called without
optional arguments must have identical behaviour
as before.

How would we know it had been called without the optional arguments?

In the existing world, "optional arguments" have always been marked by
a previous argument -- you have to pass in an argument which tells you
that there are more arguments coming.

You can't do that to free, though.

I am not sure if the gain in worth it in this
specific case. On the
other hand I think optional arguments are
a great idea.

They might be, except a large number of ABIs make no provision for a way
to tell whether or not they've been passed. The best I can think of is
that you'd have to always pass the "default" value implicitly, but that's
a pretty big change and imposes substantial costs too. And honestly,
I'm not sure I think it's necessarily an improvement. I've found that
"default arguments" seem to more often hide errors than be useful to me.

-s

Seebs · Sep 17, 2009

Sure, I remember that malloc wa&s completely screwed back then but

seriously...

do you REALLY think it was because of the extra arguments?

Well, let's see. We have a simple design and a complex design. The
complex design fails more often.

Hmmmmmmm.

To avoid buffer overflows it is nice to know the size of your buffer
isn't it?

If you don't know the size of your buffer already, checking the allocated
space is the wrong way to find out.

-s

jacob navia · Sep 17, 2009

William Hughes a écrit :

Certainly Jacob Navia would not agree.

Seems to me the best approach is to add
optional arguments and default values to C.
This would not break backward compatibility if
we require that a library function called without
optional arguments must have identical behaviour
as before.

We could then add an optional flag argument to
malloc and an optional error return argument
to free with default value null, eg,

free(ptr,&free_erno)
free(ptr) equivalent to free(ptr,null)

If ptr is an ALLOC_FLAG_PERMANENT ptr then
free(ptr) could silently do nothing while
free(ptr,&free_erno) would return an appropriate
error.

I am not sure if the gain in worth it in this
specific case. On the
other hand I think optional arguments are
a great idea.

- William Hughes

I know of a C compiler that implements optional arguments.

Aren't they GREAT?

But I think a new set of entry points could be designed, and
it would be OK too.

Microsoft always appended Ex when it needed a new interface,
like CreateWindowEx that added some args to CreateWindow.

We could add some suffix (malloc_extended, for instance).
Yes, there will be no other solution as to use namespace
since optional arguments would provoke a tempest of protests

jacob navia · Sep 17, 2009

Seebs a écrit :

Maybe. I'm not convinced that they've paid off.

Which systems have had more problems with memory allocation systems failing
catastrophically: Windows, or systems which just provided malloc/free?

Answer: Windows.

Sure, I remember that malloc wa&s completely screwed back then but

seriously...

do you REALLY think it was because of the extra arguments?

Now, come on...

Why does it need this functionality? I have never needed it. If I want to
track how much space I've requested, I do so -- and the rest of the time I
don't pay the (substantial) overhead.

To avoid buffer overflows it is nice to know the size of your buffer
isn't it?

Keith Thompson · Sep 17, 2009

Seebs said:
I think this would be pretty horrible.

How would we know it had been called without the optional arguments?

In the existing world, "optional arguments" have always been marked by
a previous argument -- you have to pass in an argument which tells you
that there are more arguments coming.

You can't do that to free, though.

[...]

I don't think it would be unreasonable to provide arguments with
default values. If you don't pass the argument, the parameter gets
the default value.

For example, this could potentially replace putc and putchar:

int new_putc(int c, FILE *stream = stdout);

so that new_putc('x') is exactly equivalent to new_putc('x', stdout).

They might be, except a large number of ABIs make no provision for a way
to tell whether or not they've been passed. The best I can think of is
that you'd have to always pass the "default" value implicitly, but that's
a pretty big change and imposes substantial costs too. And honestly,
I'm not sure I think it's necessarily an improvement. I've found that
"default arguments" seem to more often hide errors than be useful to me.

I don't see what "substantial costs" it would impose, other than that
the code always passes the extra arguments.

Seebs · Sep 18, 2009

I don't see what "substantial costs" it would impose, other than that
the code always passes the extra arguments.

Hmm.

Well, let's see. I guess the compiler would just translate
new_putc(x) into new_putc(x, stdout)... That seems like it's asking
for trouble, but I agree that it's not immediately obvious.

Although...

What are the allowable forms of the "default"? Constants? Well, stdout isn't
a constant. Expressions?

int why_do_you_torment_me_so(int *a,
int b = a[0],
int c = (int) sqrt((double) a[1]),
int d = (a && a[0]) ? array_sum(a + 1, a[0]) : -23);

-s

websnarf · Sep 18, 2009

Indeed, this retains the guarantee that the number of reallocs is O(log
(FACTOR, MAXLENGTH)) and its a finite cost to calculate the size of
the new size itself, so long as you never *shrink* the memory size of
the buffer. This strategy may save your maximum memory usage if your
largest allocation comes before any allocations at least half its
size.

In practice this does just as well as above, its just that a few cases
where you do a few too many multiplies. It also doesn't deal with the
start up case. But its all window dressing. This strategy will tend
to have fewer overall calls to realloc if your memory allocations tend
to all be roughly the same size.

Oh -- this performs shrinks, because it doesn't retain the oldsize as
a lower bound. The problem is that depending on your implementation
of realloc() this can either cause your system to thrash (see exercise
below) or it will behave as well as the two good solutions you pose
above. The number of real data moves depends on whether or not the
size oscillates between values that are in different FACTOR power
ranges. The solutions above actually guarantee that the number of
data moves is finite and uses no more than twice (assuming FACTOR = 2)
your memory bandwidth for the largest allocation.

Between the first two is a matter of personal preferences. You may be
able to see a slight advantage of one over the other depending on the
characteristics of your application, but both are suitable no matter
what. The third strategy is an example of what not to do.

A very simple solution is to just call realloc () for the new size
whenever the size changes, and hope that the realloc implementation is
efficient. Which it usually is. I just checked that calling realloc
for the same pointer with all sizes from 1 byte to 1 billion bytes
took just 55 nanoseconds per realloc on a three year old MacBook. In
this case, anything more complex than calling realloc seems a waste of
programmers time.

Exercise to the reader: a) Show that this solution is can be as bad as
O(MAXLENGTH**2) in terms of memory copy performance. b) Why would a
system vendor implement a tight and "chop-and-reliquish" strategy for
realloc()? Is it possible for a compiler vendor to implement a realloc
() that guarantees a O(log(MAXLENGTH)) performance for this sort of
problem while making efficient use of memory?

Keith Thompson · Sep 18, 2009

Seebs said:
Hmm.

Well, let's see. I guess the compiler would just translate
new_putc(x) into new_putc(x, stdout)... That seems like it's asking
for trouble, but I agree that it's not immediately obvious.

There's certainly plenty of precedent for it in other languages.

Although...

What are the allowable forms of the "default"? Constants? Well,
stdout isn't a constant. Expressions?

int why_do_you_torment_me_so(int *a,
int b = a[0],
int c = (int) sqrt((double) a[1]),
int d = (a && a[0]) ? array_sum(a + 1, a[0]) : -23);

In C++, the default argument "is type checked at the time of
the function declaration and evaluated at the time of the call"
(quoting Stroustrup, The C++ Programming Language, 3rd edition).
That seems to me like a reasonable rule (and there's *some* virtue
in being compatible with C++). Default argument expressions may
not refer to other parameters.

William Hughes · Sep 18, 2009

I think this would be pretty horrible.

How would we know it had been called without the optional arguments?

By count.

Possible rules.
Optional arguments have to come last and must have default
values assigned. Optional arguments and "..." cannot be used
in the same function. A function for which you cannot tell
from the prototype how many non optional arguments
there are cannot have optional arguments.
Non-optional arguments must be passed.
Optional arguments are detected by count and any optional
argument that is not passed gets the default value.

In the existing world, "optional arguments" have always been marked by
a previous argument -- you have to pass in an argument which tells you
that there are more arguments coming.

False. See e.g. Python.

You can't do that to free, though.

They might be, except a large number of ABIs make no provision for a way
to tell whether or not they've been passed. The best I can think of is
that you'd have to always pass the "default" value implicitly, but that's
a pretty big change and imposes substantial costs too.

I don't see this. No deep change is made, a function with
optional arguments acts just like a function with a fixed
number (non-optional plus optional) of arguments.
If values for the optional arguments are not provided
in the code then the default values are passed.
(On the other hand, *any* change imposes substantial costs).

And honestly,
I'm not sure I think it's necessarily an improvement. I've found that
"default arguments" seem to more often hide errors than be useful to me.

My experience is exactly opposite. I find optional arguments
very useful in modifying and extending existing code and I do not
find them any more error prone than non-optional arguments.

- William Hughes

Alan Curry · Sep 18, 2009

Possible rules.
Optional arguments have to come last and must have default
values assigned. Optional arguments and "..." cannot be used
in the same function. A function for which you cannot tell
from the prototype how many non optional arguments
there are cannot have optional arguments.
Non-optional arguments must be passed.
Optional arguments are detected by count and any optional
argument that is not passed gets the default value.

void foo(int x, int y, int z=0) { ... }
void bar(int x, int y, int z=1) { ... }

typedef void (*func_taking_3_ints)(int, int, int);

func_taking_3_ints p1 = rand()%1 ? foo : bar;
(*p1)(0, 0);

What happens? Does the caller have to figure out which function is being
called, and provide the default value for the missing optional argument? Or
do you just make optional arguments mandatory when calling a function through
a pointer? And how would you do that, since even a plain function name
evaluates as a pointer? There's no such thing as a non-pointer function call,
so you'll have to invent one.

Or does the default value figure into the function's type, making it illegal
to assign foo or bar to p1? How could I declare a pointer to a function with
default argument? Could be something like this I suppose:

void (*p1)(int, int, int=0) = foo; /* OK */
p1 = bar; /* Mismatch! */

Seebs · Sep 18, 2009

False. See e.g. Python.

I was talking specifically about C. There are several examples of
functions in C which take variable numbers of arguments, but in every
case, you can tell what arguments are coming by looking at existing
arguments.

I don't see this. No deep change is made, a function with
optional arguments acts just like a function with a fixed
number (non-optional plus optional) of arguments.
If values for the optional arguments are not provided
in the code then the default values are passed.

Hmm.

It's an interesting thought, I guess it just seems like a pretty big
change -- in particular, a ton of simple functions might suddenly start
having additional overhead, having to change calling sequences, and so
on.

At the very least, you would be pretty much forced to recompile everything.
You couldn't be compatible with existing binaries which called free() without
serious hassle.

My experience is exactly opposite. I find optional arguments
very useful in modifying and extending existing code and I do not
find them any more error prone than non-optional arguments.

I guess I just like prototypes which catch common errors, such as forgetting
to specify something...

-s

reading the source of calling a singly-linked list	4	Feb 12, 2010
type-punning?	15	May 6, 2008
What is "the instance structure must be manually initialized"?	2	Jan 15, 2014
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Increase memory buffer with realloc or malloc?	7	Sep 7, 2009
Lexical Analysis on C++	1	Oct 31, 2023
unit test strategy	9	Sep 15, 2012
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023

Buffer growing strategy?

Jef Driesen

jacob navia

Seebs

Keith Thompson

jacob navia

Jef Driesen

William Hughes

Keith Thompson

Seebs

Seebs

Seebs

jacob navia

jacob navia

Keith Thompson

Seebs

websnarf

Keith Thompson

William Hughes

Alan Curry

Seebs

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads