Dynamically resizing a buffer

P

Philip Potter

Hello clc,

I have a buffer in a program which I write to. The buffer has
write-only, unsigned-char-at-a-time access, and the amount of space
required isn't known a priori. Therefore I want the buffer to
dynamically grow using realloc().

A comment by Richard Heathfield in a thread here suggested that a good
algorithm for this is to use realloc() to double the size of the buffer,
but if realloc() fails request smaller size increments until realloc()
succeeds or until realloc() has failed to increase the buffer by even
one byte.

The basic idea is below. The key function is MyBuffer_writebyte(), which
expects the incoming MyBuffer object to be in a consistent state.

Are there any improvements I could make to this code? To me it feels
clumsy, especially with the break in the 3-line while loop.

struct mybuffer_t {
unsigned char *data;
size_t size; /* size of buffer allocated */
size_t index; /* index of first unwritten member of data */
};

typedef struct mybuffer_t MyBuffer;

void MyBuffer_writebyte(MyBuffer *buf, unsigned char byte) {
if(buf->size == buf->index) {
/* need to allocate more space */
size_t inc = buf->size;
unsigned char *tmp;
while(inc>0) {
tmp = realloc(buf->data, buf->size + inc);
if (tmp!=NULL) break; /* break to preserve the size of inc*/
inc/=2;
}
if(tmp==NULL) {
/* couldn't allocate any more space, print error and exit */
exit(EXIT_FAILURE);
}
buf->size += inc;
}
buf->data[buf->index++] = byte;
}
 
R

Richard

Philip Potter said:
Hello clc,

I have a buffer in a program which I write to. The buffer has
write-only, unsigned-char-at-a-time access, and the amount of space
required isn't known a priori. Therefore I want the buffer to
dynamically grow using realloc().

A comment by Richard Heathfield in a thread here suggested that a good
algorithm for this is to use realloc() to double the size of the
buffer, but if realloc() fails request smaller size increments until
realloc() succeeds or until realloc() has failed to increase the
buffer by even one byte.

Double the size of the buffer? Never IMO. Not if you "designed" the buffer
to be "about right" in the first place. Suppose it doesn't fail but does
exhaust memory? Well, bang goes your next malloc.

Keep it simple. Have a delta increase and use that. I would suggest
something like 10-20% of the initial size.

Clearly if you KNOW that the buffer is going to double/quadruple etc
then malloc it to start with.
The basic idea is below. The key function is MyBuffer_writebyte(),
which expects the incoming MyBuffer object to be in a consistent
state.

Are there any improvements I could make to this code? To me it feels
clumsy, especially with the break in the 3-line while loop.

struct mybuffer_t {
unsigned char *data;
size_t size; /* size of buffer allocated */
size_t index; /* index of first unwritten member of data */
};

typedef struct mybuffer_t MyBuffer;

void MyBuffer_writebyte(MyBuffer *buf, unsigned char byte) {
if(buf->size == buf->index) {
/* need to allocate more space */
size_t inc = buf->size;
unsigned char *tmp;
while(inc>0) {
tmp = realloc(buf->data, buf->size + inc);
if (tmp!=NULL) break; /* break to preserve the size of inc*/
inc/=2;
}
if(tmp==NULL) {
/* couldn't allocate any more space, print error and exit */
exit(EXIT_FAILURE);
}
buf->size += inc;
}
buf->data[buf->index++] = byte;
}

--
 
M

Mark Bluemel

Philip said:
struct mybuffer_t {
unsigned char *data;
size_t size; /* size of buffer allocated */
size_t index; /* index of first unwritten member of data */
};

typedef struct mybuffer_t MyBuffer;

void MyBuffer_writebyte(MyBuffer *buf, unsigned char byte) {
if(buf->size == buf->index) {
/* need to allocate more space */
size_t inc = buf->size;
unsigned char *tmp;
while(inc>0) {
tmp = realloc(buf->data, buf->size + inc);
if (tmp!=NULL) break; /* break to preserve the size of inc*/
inc/=2;
}
if(tmp==NULL) {
/* couldn't allocate any more space, print error and exit */
exit(EXIT_FAILURE);
}

You never replace buf->data... It will continue to point to the original
space, which may have been freed ...
buf->size += inc;
}
buf->data[buf->index++] = byte;
}

I'd split out the resizing to a separate function, as it's a
generalizable technique.

You could combine the two forms of loop control (size of increment,
success/failure of allocation) in the while.

This (untested) is my rough hack :-

void MyBuffer_writebyte(MyBuffer *buf, unsigned char byte) {
if(buf->size == buf->index) {
if(!resizeBuffer(buf)){
exit(EXIT_FAILURE); /* or some better error handling */
}
}
buf->data[buf->index++] = byte;
}

/* returns new size */
int resizeBuffer(MyBuffer *buf) {
size_t inc = buf->size;
unsigned char *tmp = NULL;
while(inc>0 &&
(tmp = realloc(buf->data, buf->size + inc)) == NULL) {
inc/=2;
}
if(tmp!=NULL) {
buf->data = tmp;
buf->size += inc;
return buf->size;
} else {
return 0;
}

}
 
P

Philip Potter

Richard said:
Double the size of the buffer? Never IMO. Not if you "designed" the buffer
to be "about right" in the first place. Suppose it doesn't fail but does
exhaust memory? Well, bang goes your next malloc.
>
Keep it simple. Have a delta increase and use that. I would suggest
something like 10-20% of the initial size.
>
Clearly if you KNOW that the buffer is going to double/quadruple etc
then malloc it to start with.

I've already stated that the final amount of space required isn't known
a priori. It could be thousands of bytes, it could be millions. A
linearly-growing buffer is not appropriate for this situation, which is
why I chose an exponentially-growing buffer.

Phil
 
P

Philip Potter

Mark said:
Philip Potter wrote:
>
You never replace buf->data... It will continue to point to the original
space, which may have been freed ...

Whoops! That bug was also in my project code...
buf->size += inc;
}
buf->data[buf->index++] = byte;
}

I'd split out the resizing to a separate function, as it's a
generalizable technique.
>
You could combine the two forms of loop control (size of increment,
success/failure of allocation) in the while.

I think this is clumsy too, but less so :)
This (untested) is my rough hack :-

void MyBuffer_writebyte(MyBuffer *buf, unsigned char byte) {
if(buf->size == buf->index) {
if(!resizeBuffer(buf)){
exit(EXIT_FAILURE); /* or some better error handling */
}
}
buf->data[buf->index++] = byte;
}

/* returns new size */
int resizeBuffer(MyBuffer *buf) {
size_t inc = buf->size;
unsigned char *tmp = NULL;
while(inc>0 &&
(tmp = realloc(buf->data, buf->size + inc)) == NULL) {
inc/=2;
}
if(tmp!=NULL) {
buf->data = tmp;
buf->size += inc;
return buf->size;
} else {
return 0;
}

}

Surely resizeBuffer() should return size_t?
 
C

cr88192

Philip Potter said:
Hello clc,

I have a buffer in a program which I write to. The buffer has write-only,
unsigned-char-at-a-time access, and the amount of space required isn't
known a priori. Therefore I want the buffer to dynamically grow using
realloc().

A comment by Richard Heathfield in a thread here suggested that a good
algorithm for this is to use realloc() to double the size of the buffer,
but if realloc() fails request smaller size increments until realloc()
succeeds or until realloc() has failed to increase the buffer by even one
byte.

doubling is probably too steep IMO.

I usually use 50% each time ('size2=size+(size>>1);').
25% may also be sane ('size2=size+(size>>2);').
33% may also be good.

33% is ugly though:
size2=size+(size>>2)+(size>>4)+(size>>6); //bulky
size2=size+size*33/100; //critical buffer size limit

but is a nice 4/3 ratio...

<snip>
 
M

Mark Bluemel

Philip said:
A comment by Richard Heathfield in a thread here suggested that a good
algorithm for this is to use realloc() to double the size of the buffer,
but if realloc() fails request smaller size increments until realloc()
succeeds or until realloc() has failed to increase the buffer by even
one byte.

The basic idea is below. The key function is MyBuffer_writebyte(), which
expects the incoming MyBuffer object to be in a consistent state.

Are there any improvements I could make to this code? To me it feels
clumsy, especially with the break in the 3-line while loop.

struct mybuffer_t {
unsigned char *data;
size_t size; /* size of buffer allocated */
size_t index; /* index of first unwritten member of data */
};

typedef struct mybuffer_t MyBuffer;

void MyBuffer_writebyte(MyBuffer *buf, unsigned char byte) {
if(buf->size == buf->index) {
/* need to allocate more space */
size_t inc = buf->size;
unsigned char *tmp;
while(inc>0) {
tmp = realloc(buf->data, buf->size + inc);
if (tmp!=NULL) break; /* break to preserve the size of inc*/
inc/=2;
}
if(tmp==NULL) {
/* couldn't allocate any more space, print error and exit */
exit(EXIT_FAILURE);
}
buf->size += inc;
}
buf->data[buf->index++] = byte;
}

Here's a simple alternative :-

void MyBuffer_writebyte(MyBuffer *buf, unsigned char byte) {
if(buf->size == buf->index) {
/* need to allocate more space */
size_t inc = buf->size;
unsigned char *tmp;
while(inc>0) {
tmp = realloc(buf->data, buf->size + inc);
if (tmp!=NULL) {
buf->data = tmp;
buf->size += inc;
inc = 0; /* force exit */
} else {
inc /= 2;
}
}
if(tmp==NULL) {
/* couldn't allocate any more space, print error and exit */
exit(EXIT_FAILURE);
}
}
buf->data[buf->index++] = byte;
}
 
B

Ben Bacarisse

Richard said:
Double the size of the buffer? Never IMO. Not if you "designed" the buffer
to be "about right" in the first place. Suppose it doesn't fail but does
exhaust memory? Well, bang goes your next malloc.

Keep it simple. Have a delta increase and use that. I would suggest
something like 10-20% of the initial size.

"of the initial size" means a linear growth pattern. An exponential
growth pattern gives better performance over a range of input
patterns. If memory is tight, then one can use a factor of the
current size that is < 2 but still > 1 to get O(log n) allocations
rather than O(n).

Initially, this is what I thought you meant (20% of the current size
is factor of 1.2) but I see now you wrote initial. Of course, there
are cases where one knows this to be a better strategy, but I think
they are rare.

To the OP:
Another strategy is to switch to linear growth (maybe even only adding
1!) when the doubling fails. The rationale is that we are now in a low
memory situation so it may be better (in order to proceed at all) to
allocate the minimum rather than that maximum. We also know that we
enter this pattern only once in the growth of any buffer, so the cost is
never that large (in big O terms).
 
E

Eric Sosman

Philip said:
[...]

Are there any improvements I could make to this code? To me it feels
clumsy, especially with the break in the 3-line while loop.
[...]
while(inc>0) {
tmp = realloc(buf->data, buf->size + inc);
if (tmp!=NULL) break; /* break to preserve the size of inc*/
inc/=2;
}
if(tmp==NULL) {
/* couldn't allocate any more space, print error and exit */
exit(EXIT_FAILURE);
}
buf->size += inc;

The loop has two termination conditions -- realloc()
succeeds, or increment goes to zero -- so I don't see how
you can avoid two tests. But as written there's an after-
the-loop test to (re-)discover why the loop ended. That,
at least, is easy to get rid of:

while ((tmp = realloc(buf->data, buf->size + inc)) == NULL) {
if ((inc /= 2) == 0)
exit(EXIT_FAILURE);
}
buf->data = tmp;
buf->size += inc;
 
R

Richard Tobin

Double the size of the buffer? Never IMO. Not if you "designed" the buffer
to be "about right" in the first place. Suppose it doesn't fail but does
exhaust memory? Well, bang goes your next malloc.

Keep it simple. Have a delta increase and use that. I would suggest
something like 10-20% of the initial size.

If you don't know the likely size (e.g. when reading some textual
data) it makes more sense to increase by a proportion of the current
size: if you started at 10 bytes are have now reached 100 it makes no
sense to go on incrementing by one or two.

Of course that's what the original proposal did: increase by 100% of
the current size. If you start from one, this means you only allocate
power-of-two sized blocks.

A problem with any approach is that it might interact badly with the
malloc() implementation. Imagine an implementation that always
allocates powers of two but uses sizeof(void *) bytes of it to record
the size - if you always allocated powers of two you would end up
allocating nearly four times as much as you need.

I recommend separating out the increment algorithm so that it can
easily be changed if it proves to be bad on some platform.

-- Richard
 
R

Richard Heathfield

Philip Potter said:
I've already stated that the final amount of space required isn't
known a priori. It could be thousands of bytes, it could be millions.
A linearly-growing buffer is not appropriate for this situation, which
is why I chose an exponentially-growing buffer.

Right. A linear increase is most unwise.

One of the problems with killfiling trolls is that one doesn't get to
see (and thus have the opportunity to correct) the ludicrous "advice"
they dish out, unless it happens to be quoted in a reply.

Incidentally, if memory is tight (relative to the length of the line
you're reading), memory exhaustion /is/ a risk, but one that is easily
dealt with. Once you know that the line has been read completely, you
can "shrink" the allocation to be an exact fit, thus taking up no more
memory than you actually need.
 
R

Richard Heathfield

Eric Sosman said:

while ((tmp = realloc(buf->data, buf->size + inc)) == NULL) {
if ((inc /= 2) == 0)
exit(EXIT_FAILURE);

Do you really think that's a good idea? We've had this whole discussion
recently, I know, but it bears reiterating nonetheless that it is not a
library function's job to decide whether to terminate a program.
 
R

Richard

Richard Heathfield said:
Philip Potter said:

I missed that point. I also caveated my reply in the second
sentence. Apologies if you thought my reply was misleading.

"Not if you "designed" the buffer to be "about right" in the first place."
Right. A linear increase is most unwise.

One of the problems with killfiling trolls is that one doesn't get to
see (and thus have the opportunity to correct) the ludicrous "advice"
they dish out, unless it happens to be quoted in a reply.

Hilarious advice from someone who frequently posts incorrect advice.
Incidentally, if memory is tight (relative to the length of the line
you're reading), memory exhaustion /is/ a risk, but one that is easily
dealt with. Once you know that the line has been read completely, you
can "shrink" the allocation to be an exact fit, thus taking up no more
memory than you actually need.

And in the next episode, RH explains why "using less memory is better
for the system performance".
 
M

Mark Bluemel

Richard said:
Eric Sosman said:



Do you really think that's a good idea? We've had this whole discussion
recently, I know, but it bears reiterating nonetheless that it is not a
library function's job to decide whether to terminate a program.

In Eric's (and my:) defence, I think he was just making the minimum
change to the OP's code to address the specific issue the OP had raised
(about the use of "break").

As I remarked in my earlier, less than elegant reply, I think the
resizing should be split out to a separate function which returns some
value which indicates success or failure.

Equally the "MyBuffer_writebyte" routine might also be better returning
a success/failure result rather than terminating the program on failure.
 
P

Philip Potter

Richard said:
I missed that point. I also caveated my reply in the second
sentence. Apologies if you thought my reply was misleading.

"Not if you "designed" the buffer to be "about right" in the first place."

And if you knew the right size, why would you need a dynamic buffer in
the first place? The only situation in which a linear expansion is
useful is when you know an upper bound on the size requirements; in
which case why don't you just allocate that upper bound?
Hilarious advice from someone who frequently posts incorrect advice.

On the contrary, I find RH's advice to be some of the best here -- along
with a few others. He can be quite blunt with it sometimes, but we can't
have everything :)
And in the next episode, RH explains why "using less memory is better
for the system performance".

....and his advice continues to be correct.

Phil
 
R

Richard Tobin

Once you know that the line has been read completely, you
can "shrink" the allocation to be an exact fit, thus taking up no more
memory than you actually need.

Possibly. realloc() may well be a no-op for size decreases.

-- Richard
 
R

Richard Heathfield

Richard Tobin said:
Possibly. realloc() may well be a no-op for size decreases.

True enough, but then malloc() may well allocate in chunks of 16MB. You
can't fight a lousy implementation, except by ditching it in favour of
a better one.
 
E

Eric Sosman

Richard Heathfield wrote On 08/22/07 10:05,:
Eric Sosman said:




Do you really think that's a good idea? We've had this whole discussion
recently, I know, but it bears reiterating nonetheless that it is not a
library function's job to decide whether to terminate a program.

If you'll review the discussion you mention, you'll
find that I was firmly on the "report, don't crash" side.

The purpose of my rewrite of the O.P.'s code was to
implement his choices less clumsily, not to confuse the
issue by inflicting my own choices on him. That's why
(for example) I didn't mention the alternative design
possibility of a linked list of smaller buffers instead
of one ever-growing array: it would only have distracted
attention from the main issue.

At one point I even had a /* your choice */ comment on
the exit() call, but decided it was too distracting and
removed it for fear it would provoke responses on side-
issues ... Damned if I do, damned if I don't, I guess.
 
R

Richard Heathfield

Eric Sosman said:
Richard Heathfield wrote On 08/22/07 10:05,:

If you'll review the discussion you mention, you'll
find that I was firmly on the "report, don't crash" side.

The purpose of my rewrite of the O.P.'s code was to
implement his choices less clumsily, not to confuse the
issue by inflicting my own choices on him.

Fair enough. I do the same myself sometimes (and get the same kind of
stick that I've just given you!).

Damned if I do, damned if I don't, I guess.

I know. To make it up to you, we'll double your fee on this occasion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top