Buffer or Realloc?



Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

I don't think the code below is optimal since the size of the "buffer"
becomes extremely small when I realloc it. I also spend time
it. And taking this to the next step, if I wrote to a linked list or a
red black
tree, I would have to allocate memory. In that case, how do I
the best size?

Note: The code below contains some posix items. Please ignore those
items. I have also left out some error coding below as well.
below was written to just learn the regex API.

main(void) {

regex_t preg;
char *string = strdup("hello1hello4hello2hello3 are you hello");
char *match;
regmatch_t pmatch[1];
size_t nmatch = 1;
size_t len = 0;
size_t offset = 0;
int test;

match = calloc(BUFSIZ, sizeof(char));

if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
len = pmatch[0].rm_eo-pmatch[0].rm_so;
/* is this optimal or should I use a fixed sized buffer? */
match = realloc(match, len+1);
strlcpy(match, &string[offset],
(void)printf("matched string: %s\n", match);
offset += pmatch[0].rm_eo;

(void)printf("original string: %s\n", string);

Michael Mair

Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

If it is conceivable that your "large enough" buffer can be
too small and if it is detrimental for your programme because
you caught this case but cannot handle whatever ought to be
in the buffer because of its lack of size, then consider
allocating and reallocating memory.

There are systems where you only have freestanding implementations,
i.e. lack <stdlib.h>. There, you obviously have to make do with
fixed size buffers.

I don't think the code below is optimal since the size of the "buffer"
becomes extremely small when I realloc it. I also spend time
it. And taking this to the next step, if I wrote to a linked list or a
red black
tree, I would have to allocate memory. In that case, how do I
the best size?

Use a "reasonable" start size and choose a "reasonable"
minimum size.
"Reasonable" depends on your problem domain and the resources
you have at your beck and call.
Ideally, the allocated buffer is never resized at all but does
not claim a substantial amount of your memory.
Do not reallocate all the time; if you run into malloc()
failure somewhere else in your programme, then you could try
as fallback to resize allocated buffers to minimum and malloc()

One thing to keep in mind: Try out your realloc() fallback for
very small initial buffer sizes and your error handling strategies
for very large initial buffer sizes (e.g. try to
malloc((size_t)-1)). Without test, the code is not worth anything
but as a show of goodwill...
Note: The code below contains some posix items. Please ignore those
items. I have also left out some error coding below as well.
below was written to just learn the regex API.

You could have left them out; providing compiling C code is your
best bet for getting good answers.

You forgot to
#include <stdlib.h>
#include said:
main(void) {

regex_t preg;
char *string = strdup("hello1hello4hello2hello3 are you hello");
char *match;
regmatch_t pmatch[1];
size_t nmatch = 1;
size_t len = 0;
size_t offset = 0;
int test;

match = calloc(BUFSIZ, sizeof(char));

As you do not know the value of BUFSIZ, you might allocate
either 1 or one million bytes.
Go with a fixed initial value.
sizeof (char) always equals 1. If you want to base the allocation
on the type of match, then use
match = calloc(YOUR_INIT_BUFFER_SIZE, sizeof *match);

You also forgot to check whether calloc() succeeded. This is
outright stupid. No, I do not want to imply that you are stupid.
But it does not cost you anything to "prove" that you may go
on from here and failure to prove that may lead to a segmentation
fault at best and ungracious programme failure if an important
customer is standing right beside you together with your boss
on the other end of the scale.

if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
len = pmatch[0].rm_eo-pmatch[0].rm_so;
/* is this optimal or should I use a fixed sized buffer? */
match = realloc(match, len+1);

Once again, you discount the possibility of allocation failure.
The right way is
char *tmp;
tmp = realloc(match, len+1);
if (tmp == 0) {
/* Your error handling here; match is still available */
match = tmp;

You believe that there is no bug in the regex functions.
This is very trustful and certainly honours the writers of that
library -- however, if len is assigned a negative value
(which becomes a large positive value), then you may run
out of memory or inadvertently free(match) if len+1 == 0.

This is the part I meant above: Write a fallback and test it.
This is no paranoia but
- good practice for you; recovering gracefully from a malloc()
failure is hard
- may help you catch hard-to-catch bugs
- makes other people trust _your_ code -- if I evaluate code
and find it to be of the "malloc() always succeeds variety",
I am less than impressed.
strlcpy(match, &string[offset],
(void)printf("matched string: %s\n", match);
offset += pmatch[0].rm_eo;

(void)printf("original string: %s\n", string);


Bill Pursell

Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

I generally try to avoid fixed buffers larger than BUFSIZ, my
thinking being that stack limitations will generally be more
restrictive than anything else. (That's not a language issue,
however.) For me, it's more of an aesthetic thing: if 100s
of variable declarations in the same scope are bad, then
declaring big arrays is also bad.

One thing that is odd in your code is to downsize with
realloc. Generally, programs are greedy: once they've
acquired the memory, it's easiest to hold onto it. Malloc
a buffer that you think should handle the general case
and only realloc if it turns out to be too small. Realloc
by doubling the size and then hold onto that memory
forever. If that's feasible, of course. If you just consumed
98% of the machines memory to do an operation that
is a 1 in a 10^276 case in a long-running program, it
makes sense to release the memory. It depends on
the situation.


Thanks for all the great suggestions. I still need to think through
how I want to
handle the buffering. I can do bound checking and use an initial
buffer of 256 or 512. If the len of the string is larger, than I can
just double up or higher. I can do the check before the reallocation.
If the string is not larger than, I can skip the reallocation.

My final re-write will just be moving this into my http server as a
function or two. I want to parse and search http requests and send
responses. I am buffering the sockets with a fixed buffer of 8196. I
want to dump the finds into a structure that will either be part of a
linked list or a red black tree. I have started to use threads, so
that might change as well. The biggest challenge will be buffering and
writing regex against URIs.

Here's the code cleaned up with some error checking (buffering will be
redone over the next day or two):

#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>

main(void) {

regex_t preg;
char *string = strdup("hello1hello4hello2hello3 are you hello");
char *match;
char *new_match;
regmatch_t pmatch;
size_t nmatch = 1;
size_t len = 0;
size_t offset = 0;
int test;

match = calloc(BUFSIZ, sizeof(char));
if (match == NULL)
errx(-1, "calloc() could not allocate space!\n");

if (regcomp(&preg, "hello[41]", REG_BASIC) != 0)
puts("regcomp() failed to compile\n");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, &pmatch,0);
if (test !=0) {
len = pmatch.rm_eo-pmatch.rm_so;
if ( (new_match = realloc(match, len+1)) == NULL) {
match = new_match;
strlcpy(match, &string[offset],
(void)printf("matched string: %s\n", match);
offset += pmatch.rm_eo;
(void)printf("original string: %s\n", string);
match = NULL;

Eric Sosman

Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

There's no hard-and-fast rule, just as there's no
hard-and-fast rule for choosing between arrays and trees.
Each has its virtues and its drawbacks.

Some considerations for allocating "buffers," where
the somewhat vague term is understood to mean "places to
store varying amounts of stuff:"

- If you know an upper bound on the size of the stuff
(or if you're willing to report failure if the stuff
is larger), a fixed-size buffer is attractive because
of its simplicity.

- If the upper bound is large, it may be a good idea to
avoid declaring the buffer as an `auto' variable.
Many systems have less `auto' memory than dynamic or
static memory.

- A statically-allocated buffer may well be the ultimate
in simplicity, but it has some disadvantages. First,
it exists and occupies memory even when it's not being
used; for a large buffer this may be too wasteful.
Second (like any static object), it creates difficulties
if you want to write functions that are re-entrant or
can be called recursively.

- How many instances of "stuff" must the program keep track
of simultaneously? If there will be many of them, it is
often a good idea to put each in a dynamic allocation of
the minimum size, possibly by using realloc() on a buffer
obtained from malloc(), or possibly by copying from the
initial buffer to a new buffer specially allocated for
each instance.

- If an instance of "stuff" can be very large, consider
breaking it up into smaller pieces and storing those
pieces in another data structure. It may be easier to
manage a linked list of a thousand 1-megabyte chunks
than to find storage for a gigabyte array.

- If you decide to malloc() a buffer and then expand it as
needed with realloc(), it is usually better to expand by
a "magnification factor" than by a fixed amount. That is,
it is usually better to increase the buffer size by 50%
or 100% than to add 100 or 1000 bytes at every expansion.

- A popular technique used by many malloc() implementations
(but not by all) is to add about 8 bytes of bookkeeping
data to each allocation, and to round the total up to a
multiple of 8. (Or perhaps the magic number is 4 bytes,
or maybe 16 -- it depends.) Consequence: If you make a
large number of small allocations, a sizeable fraction of
the total memory will be expended in "overhead" rather
than in "payload." Corollary: If you've got a 200-byte
buffer that holds only 199 bytes of "stuff," shrinking
it with realloc() probably won't do much.

- Some malloc() implementations add their 8 bytes (or so)
of overhead but then round up to the next power of two.
If you request 1024 bytes, you'll wind up reserving 2048
for an "efficiency" of only 50%. To guard against this
effect, avoid power-of-two requests; if it makes sense
in terms of the rest of the program, aim for allocations
that are (nominally) a little less than a power of two
in size.


Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

Fixed size arrays are good for things that whose contents *CANNOT*
exceed the size of that array. A string usually is never that kind of

So usually the decision is quite simple: if you are dealing with
mutable strings *NEVER* use a fixed size array (this is a definitive
mark of an amateur and is usually accompanied by either buffer overflow
errors or ruthless truncation), but instead use the reallocing buffer
strategy. There is a common tendency in C programmers to conflate char
arrays and strings -- the two are different things conceptually and
this has implications for what you are actually doing.

If you just need a block buffer or something like that, or you are
transforming your algorithm to run through chunks of a stream at a
time, then obviously you can just use a fixed sized buffer.
I don't think the code below is optimal since the size of the "buffer"
becomes extremely small when I realloc it. I also spend time
it. And taking this to the next step, if I wrote to a linked list or a
red black
tree, I would have to allocate memory. In that case, how do I
the best size?

Note: The code below contains some posix items. Please ignore those
items. I have also left out some error coding below as well.
below was written to just learn the regex API.

main(void) {

regex_t preg;
char *string = strdup("hello1hello4hello2hello3 are you hello");
char *match;
regmatch_t pmatch[1];
size_t nmatch = 1;
size_t len = 0;
size_t offset = 0;
int test;

match = calloc(BUFSIZ, sizeof(char));

if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
len = pmatch[0].rm_eo-pmatch[0].rm_so;
/* is this optimal or should I use a fixed sized buffer? */
match = realloc(match, len+1);

This is wrong is what it is. realloc may return with NULL. That means
that the line above will leak the original contents of match, any time
realloc happens to fail. This call is also unnecessary if it decreases
the size of the allocation in match.
strlcpy(match, &string[offset],

Obviously this will fail, any time match is NULL. Also you should use
memcpy(), since its generally faster.
(void)printf("matched string: %s\n", match);
offset += pmatch[0].rm_eo;

(void)printf("original string: %s\n", string);

malloc, calloc and realloc are fairly slow function calls. So one
thing you should generally do is try to minimize the number of times
you call such functions. When reusing a buffer like you are doing
above, you only want to realloc more space if you need it.

There is a simple numeric size strategy for keeping the number of
reallocations down to a minimum while wasting a constant factor in
space (less than 100% extra) that you can use that is usually an
optimal trade off between speed and space: always realloc to the least
power of 2 greater than the size that you need,and never decrease the
size of a buffer with realloc. On a 32 bit machine this will means
that you never call realloc on the same buffer more than 32 times, and
you will never use more than an extra 100% over largest size that you
would have ever needed.

So I would recode your snippet something like as follows:

size_t mlen = 0;

match = NULL;

if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
len = pmatch[0].rm_eo-pmatch[0].rm_so;
if (len > mlen) {
char *tmpmatch;
while (mlen <= len) mlen = (mlen>0) ? 2*mlen : 1;
tmpmatch = realloc (match, mlen);
if (NULL == tmpmatch) break; /* Out of memory? */
match = tmpmatch;
memcpy (match, &string[offset], len);
match[len] = '\0';
[etc ...]

Once you get used to this idea, you will find that it gets kind of
repetitve and tiresome to implement this same idea over and over again.
You will probably want to throw the concept into a library somewhere
and wonder why someone hasn't done this before you.

Well someone has: http://bstring.sf.net/

Flash Gordon

Fixed size arrays are good for things that whose contents *CANNOT*
exceed the size of that array. A string usually is never that kind of

So usually the decision is quite simple: if you are dealing with
mutable strings *NEVER* use a fixed size array (this is a definitive
mark of an amateur and is usually accompanied by either buffer overflow
errors or ruthless truncation),

OK, so for the string I've got to prepare as part of a message to the UK
Government gateway where the specification says the string has a maximum
length of 10 characters I should not use a fixed size buffer but a
reallocating buffer? Presumable so that I don't immediately allocate the
maximum size I should start at 1 byte and increase the size 1 byte at a
time so avoid wasting memory?

You should not say to *NEVER* do something as invariably someone can
come up with a *real* example where that rule does not apply, such as
the one I just mentioned. I've got plenty of other interfaces where I am
either preparing a string (and yes, I do want a C string for simplicity
of manipulation such as using sprintf to create it) which has either a
*fixed* final length or a small but definite upper bound.
> but instead use the reallocing buffer
strategy. There is a common tendency in C programmers to conflate char
arrays and strings -- the two are different things conceptually and
this has implications for what you are actually doing.

True. Life would be much easier if C had a true string type that
automatically resized, but it doesn't, and when you know a reasonable
and definite upper bound on size a fixed length array is perfectly
adequate to the job.
If you just need a block buffer or something like that, or you are
transforming your algorithm to run through chunks of a stream at a
time, then obviously you can just use a fixed sized buffer.


malloc, calloc and realloc are fairly slow function calls. So one
thing you should generally do is try to minimize the number of times
you call such functions. When reusing a buffer like you are doing
above, you only want to realloc more space if you need it.

In addition you can hit memory fragmentation issues.

Michael Mair

Flash said:
OK, so for the string I've got to prepare as part of a message to the UK
Government gateway where the specification says the string has a maximum
length of 10 characters I should not use a fixed size buffer but a
reallocating buffer? Presumable so that I don't immediately allocate the
maximum size I should start at 1 byte and increase the size 1 byte at a
time so avoid wasting memory?

You should not say to *NEVER* do something as invariably someone can
come up with a *real* example where that rule does not apply, such as
the one I just mentioned. I've got plenty of other interfaces where I am
either preparing a string (and yes, I do want a C string for simplicity
of manipulation such as using sprintf to create it) which has either a
*fixed* final length or a small but definite upper bound.

In such cases, there often is some sort of inbetween:
Think of a piece of code that generates C identifiers which can be
restricted to a certain length, say 6 or 31 characters.
I'd generate them full-length, with the realloc() approach -- obviously
starting with a sufficiently large initial buffer -- and then shorten
them by a well-defined algorithm to the required length while retaining
uniqueness within a set of "visible" identifiers.
It is even possible that the final, _immutable_ version is copied to
a fixed-length array.
BTW: The "1 byte" remark IMO is insulting the intelligence of the
involved participants and was rather unnecessary -- even though
"*NEVER*" and "*ALWAYS*" make my teeth ache...

True. Life would be much easier if C had a true string type that
automatically resized, but it doesn't, and when you know a reasonable
and definite upper bound on size a fixed length array is perfectly
adequate to the job.

Which is different how from what Paul wrote?
If there is a definite upper bound, you typically do not do
"anything wild" with the arrays/strings.

In addition you can hit memory fragmentation issues.

Indeed. A "reasonably" large starting buffer with the option for
reallocation often is one of the better ways to go. Determining
"reasonable" of course may be a hard problem subject to changes
throughout the programme's lifetime :)



Michael said:
Indeed. A "reasonably" large starting buffer with the option for
reallocation often is one of the better ways to go. Determining
"reasonable" of course may be a hard problem subject to changes
throughout the programme's lifetime :)

That is where I am struggling a bit. I know my system uses 512 blocks,
so I am considering allocating 512 bytes from the onset and increasing
it as mulitple of 2 if it needs to be larger. However, BUFSIZ is
on my system to be 1024. But starting at 1024 might be overkill.

So how do you manage buffer size and speed with portability? If I were
move to another system with a different block size, any code written to
improve performance wouldn't have an impact. Should I start with the
BUFSIZ since it's in the standard library and check for boundaries on
that number and mulitple it up by a factor of 2?

The problem is that I doubt anything in this example program would
80 characters. The problem is that I am on a 512 block system, so
it cost more to write allocated memory that isn't divisble with the
block size?

<code correction>
Before someone tangents the discussion, the regcomp part of the code
should exit on failure.

if (regcomp(&preg, "hello[41]", REG_BASIC) != 0) {
puts("regcomp() failed to compile\n");

</code correction>

Flash Gordon

Michael said:
In such cases, there often is some sort of inbetween:
Think of a piece of code that generates C identifiers which can be
restricted to a certain length, say 6 or 31 characters.
I'd generate them full-length, with the realloc() approach -- obviously
starting with a sufficiently large initial buffer -- and then shorten
them by a well-defined algorithm to the required length while retaining
uniqueness within a set of "visible" identifiers.
It is even possible that the final, _immutable_ version is copied to
a fixed-length array.

Since malloced blocks have a certain amount of overhead in terms of
<OT>heap management structures</OT> and overhead in terms of calling
malloc/free plus the extra work (however small) of ensuring you don't
leak memory and handle malloc failures I find it far better to use a
fixed length buffer when there is something putting a definite and not
unreasonable limit on the size. Also your suggestion can still lead to
memory fragmentation.
BTW: The "1 byte" remark IMO is insulting the intelligence of the
involved participants and was rather unnecessary -- even though
"*NEVER*" and "*ALWAYS*" make my teeth ache...

Possibly I should have started at two bytes so you could fit more than
an empty string ;-)
Which is different how from what Paul wrote?
If there is a definite upper bound, you typically do not do
"anything wild" with the arrays/strings.

In part I was agreeing with Paul (it can happen), but also pointing out
that he was ignoring the fact that in many situations there are definite
and reasonably small upper bounds, in which case a fixed sized buffer is
the easiest thing to get right.
Indeed. A "reasonably" large starting buffer with the option for
reallocation often is one of the better ways to go. Determining
"reasonable" of course may be a hard problem subject to changes
throughout the programme's lifetime :)

This depends very much on the likely patter of memory usage and the
lifetime of the program but is definitely the right way to go in some


The below compiles and passes lint, but that doesn't mean my approach
is good or that I did not over look something. The style could use
some work.

Please dig in and tear it apart.

Here's my first attempt at a resize buffering function:

/* len is the length of the string */
void *
adj_buffer(char **buffer, size_t len) {

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

if (len > BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
else {
num = len / BUFSIZ;
num = (size_t)pow(2, num - 1);
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
buffer = NULL;
return (NULL);
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));
return f_buffer;

Ian Collins

The below compiles and passes lint, but that doesn't mean my approach
is good or that I did not over look something. The style could use
some work.

Please dig in and tear it apart.

Here's my first attempt at a resize buffering function:

/* len is the length of the string */
void *
adj_buffer(char **buffer, size_t len) {
Any reason to return void* when this works with char*?
char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

if (len > BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;

You can't get here if len > BUFSIZ.
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
else {
num = len / BUFSIZ;
num = (size_t)pow(2, num - 1);

What is this bit doing?

Assuming you are allocating chunks of BUFSIZ bytes, why not just num+1?
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
Why not test against let at the top?
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
buffer = NULL;
return (NULL);
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));

Do you want to do this? I'd have though you would want to preserve the
contents of the original buffer.


/* len is the length of the string */
char *
adj_buffer(char **buffer, size_t len) {

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

/* test for a multiple of the buffer
* This is where I am not sure on the best approach. I know that
* 1024, but the system's block size is 512. Should I just multiply
* 512? But what if I port to a system with a different block

/* if len is less than BUFSIZ just return all ready allocated buffer */

if (len > BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
else {
num = len / BUFSIZ;

/* corrected for when num truncates to 1; any advantage to using a
multiple of 2? */

num = (size_t)pow(2, (num = 1) ? 2 : (num - 1));
} else
return f_buffer;

/* test to avoid overflow after determining num */

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
buffer = NULL;
return (NULL);
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));
return f_buffer;


Made a typo in the above. Here's a more correct version:

/* len is the length of the string */
char *
adj_buffer(char **buffer, size_t len) {

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

if (len > BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
else {
num = len / BUFSIZ;
num = (size_t)pow(2, (num == 1) ? 2 : (num - 1));
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
buffer = NULL;
return (NULL);
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));
return f_buffer;

SM Ryan

# Is it generally better to set-up a buffer (fixed sized array) and read
# and write to that

It's generally better to not overrun your buffer. Otherwise if
you've written your code to expect all code in one string all at
once, then your alloc/realloc the buffer to be big enough. If
you've written your code to stream through the file, then you
use a fixed size buffer.

Michael Mair

That is where I am struggling a bit. I know my system uses 512 blocks,
so I am considering allocating 512 bytes from the onset and increasing
it as mulitple of 2 if it needs to be larger. However, BUFSIZ is
defined on my system to be 1024. But starting at 1024 might be

Some of the things you should think about
- overall memory of the system
- average memory for your programme
- standard use cases: The stuff users do 90% of the time
- corner cases: Do you wish to support a constellation which
means heavy abuse of the original intent of your programme?
If no: Fixed size buffer and dying gracefully may be an option
if your standard use cases permit that.
Example: You write a programme for text files and someone
gives you a stream of 1e10 random bytes.
If yes: Split the stuff into non-standard but reasonable use
cases and conceivable yet very improbable extreme cases.
If everything else permits, go for a reasonable upper limit
for the reasonable use cases.

As for starting at a power of 2, reread Eric Sosman's
reply: said:
So how do you manage buffer size and speed with portability? If I were
move to another system with a different block size, any code written to
improve performance wouldn't have an impact. Should I start with the
BUFSIZ since it's in the standard library and check for boundaries on
that number and mulitple it up by a factor of 2?

I would not start with BUFSIZ -- it is a value entirely outside
of your control.
If you think that you will deal only with 80 character lines then
think of the users -- if one of them uses text files for, say,
LaTeX input, then they may write whole paragraphs in one line.
If the first two limits above permit it, then it is perfectly
permissible to start at 500 or 1000 elements in your buffer.
If your system gives you only 32K of memory to play with, then
finding out the malloc() overhead and allocating 128-OVERHEAD
and (2*last+OVERHEAD) when increasing may be much more reasonable.
The problem is that I doubt anything in this example program would
80 characters. The problem is that I am on a 512 block system, so
it cost more to write allocated memory that isn't divisble with the
block size?

If this slows down your programme and you can prove that by measuring
and you really need your programme to run faster, then start at block
size - OVERHEAD -- usually, you do not need the additional bit of
speed and measuring wastes more time then you and your users can
ever gain by optimized buffer sizes.
This does not apply if your application is something used millions
of times by the system and there are many copies of the system out



The below compiles and passes lint,

That's because lint is bad at math.
[...] but that doesn't mean my approach
is good or that I did not over look something. The style could use
some work.

Style is far less relevant to its functionality, and in this case its
especially true.
Please dig in and tear it apart.

Here's my first attempt at a resize buffering function:

/* len is the length of the string */
void *
adj_buffer(char **buffer, size_t len) {

Ok, so you get the old buffer and the new length you want, but you
don't have any indication of what the old length used to be. So you
are not going to be able to determine the case of when the buffer is
already big enough (and you should do nothing) from inside this
routine. This pushes partial memory management issues upward into your
calling code rather than having it all done at the lower levels. I
would prefer a design where all memory management issues were solved at
once in a single function.
char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

if (len > BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;

This condition can never happen. You may as well omit it.
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;

This is not necessarily a power of 2. This looks like its taking an
exact division whenever you are aligned or something. You should drop
this case.
else {
num = len / BUFSIZ;
num = (size_t)pow(2, num - 1);

Ok, this is raising the number to a power of 2. Its not finding a
power of 2 greater than
the number, which is different.

The formula you want is:

num = (size_t) pow (2.0, ceil (log (len / (double) BUZSIZ) /

However, this is horrendously expensive to calculate. Instead do this:

int i;
for (num=BUFSIZ; num < len; num += num) {}
num /= BUFSIZ;

Which should be caparatively faster.
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
buffer = NULL;

Lint didn't call you on this? This is an assignment with no
discernable effect. And given what you are doing you probably mean
*buffer = NULL;
return (NULL);
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));

bzero is not ANSI.


(e-mail address removed) wrote:
Ok, so you get the old buffer and the new length you want, but you
don't have any indication of what the old length used to be. So you
are not going to be able to determine the case of when the buffer is
already big enough (and you should do nothing) from inside this
routine. This pushes partial memory management issues upward into your
calling code rather than having it all done at the lower levels. I
would prefer a design where all memory management issues were solved at
once in a single function.

What is the best way to manage the current size of the buffer? The
first pass through here, I know, what the initial buffer size was, but
the second time through here, I do not, so I agree with your assessment
above. What is the best way to calculate the size
of a memory allocated buffer? I assume I should do it as part of this
function, so it's self-contained.
Ok, this is raising the number to a power of 2. Its not finding a
power of 2 greater than
the number, which is different.

The formula you want is:

num = (size_t) pow (2.0, ceil (log (len / (double) BUZSIZ) /

However, this is horrendously expensive to calculate. Instead do this:

int i;
for (num=BUFSIZ; num < len; num += num) {}
num /= BUFSIZ;

Which should be caparatively faster.

I really like that approach. I think I went too complicated at the
Lint didn't call you on this? This is an assignment with no
discernable effect. And given what you are doing you probably mean
*buffer = NULL;

I meant (same difference):

f_buffer = NULL;

Maybe I am using lint incorrectly. What flags would have caught that?
bzero is not ANSI.

Would memset (which is ANSI) be a better alternative or should I skip
this step?

memset(f_buffer, 0, sizeof(f_buffer));




Flash Gordon

Made a typo in the above. Here's a more correct version:

/* len is the length of the string */
char *
adj_buffer(char **buffer, size_t len) {

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

Why not initialise new_buffer and f_buffer on definition?
if (len > BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;

It can still never reach this line. Just as Ian said before.
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
else {
num = len / BUFSIZ;
num = (size_t)pow(2, (num == 1) ? 2 : (num - 1));

Why would anyone use a floating point maths function to calculate how
much to increase buffer size by? That is just asking for things to slow
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
buffer = NULL;
return (NULL);
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));

Why use bzero which is not standard C when there is a perfectly standard
memset function? In any case, as Ian suggested, I doubt that this is
what you want to do. Just as Ian doubted it.

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Latest member

Latest Threads
