Expanding buffer in C

J

James Harris

Do you remember a previous discussion on this newsgroup as at

http://groups.google.com/group/comp.lang.c/browse_frm/thread/cb502ce8414cea06/bb738c4f91113b4e

It concerned code to manage a buffer, expanding it as needed semi
automatically. I've incorporated suggestions people made and published
the code at

http://codewiki.wikispaces.com/xbuf.c

(For the impatient check the last two links in the table of contents
for the code. The rest is documentation.)

The code is intended to be fast. It's uses include reading an
arbitrary-length line from an input stream - a safe gets() replacement
if you like - but it is more flexible than line reading functions and
is intended to be useful in more cases than just line reading.

--> NB. Code and documentation are on a wiki. If you see things that
should be changed feel free to change them.

Apart from the code itself what do you think of the concept of
publishing it on a wiki...?

James
 
G

Gene

Do you remember a previous discussion on this newsgroup as at

 http://groups.google.com/group/comp.lang.c/browse_frm/thread/cb502ce8....

It concerned code to manage a buffer, expanding it as needed semi
automatically. I've incorporated suggestions people made and published
the code at

 http://codewiki.wikispaces.com/xbuf.c

(For the impatient check the last two links in the table of contents
for the code. The rest is documentation.)

The code is intended to be fast. It's uses include reading an
arbitrary-length line from an input stream - a safe gets() replacement
if you like - but it is more flexible than line reading functions and
is intended to be useful in more cases than just line reading.

--> NB. Code and documentation are on a wiki. If you see things that
should be changed feel free to change them.

Apart from the code itself what do you think of the concept of
publishing it on a wiki...?

James

Nice presentation. You'll probably get less heap fragmentation for
programs that use lots of buffers if all of them are at set standard
sizes. So your new_size computation would become something like

new_size = current_size;
while (new_size < requested_size)
new_size = new_size * FACTOR;

NB. Your choice of default 3 / 2 for FACTOR is interesting. I've
been using this for many years because the standard 2 seemed
extravagant.
 
J

James Harris

On Nov 15, 8:12 pm, James Harris <[email protected]>
wrote:
....


....

Nice presentation.

That's good to hear. One aim of the wiki is to show code clearly in
order that it can be read online without downloading. Thanks for the
feedback.
You'll probably get less heap fragmentation for
programs that use lots of buffers if all of them are at set standard
sizes. So your new_size computation would become something like

new_size = current_size;
while (new_size < requested_size)
new_size = new_size * FACTOR;

I had something like this originally but I changed it for speed. With
code like this if the user set the factor to something near 1 this
loop could take a long time. If set to exactly 1 it would not
terminate at all. Also the user could start the buffer off with size
zero so the size could never increase. Having said all that I do see
your point and it is a good one: where multiple buffers are used
standard sizes would be better. I'm not sure what's best here. Will
give it some thought.
NB. Your choice of default 3 / 2 for FACTOR is interesting. I've
been using this for many years because the standard 2 seemed
extravagant.

I too (sic) thought a factor of 2 would be too much - and I wanted to
demonstrate that the code would allow a factor between 1 and 2 while
still using integer arithmetic for performance.

I'm not happy with this yet, though. The highest allowable original
size or offset for (3 / 2) is 1/3rd of the address space. At that
level and above a factor of (3 / 2) will fail. Maybe it's better to
define FACTOR as

/ 2 * 3

and include it in the function as

(offset FACTOR)

rather than

(offset * FACTOR)

as it is at the moment. I've not seen anyone else's C code that does
this though - i.e. rely on the textual substitution of the
preprocessor. How does it look to other C programmers?

Is there a better way to increase the buffer size? Especially as the
address space fills up should the increase factor be reduced, and how?

Anyone been down this road?

James
 
G

Gene

That's good to hear. One aim of the wiki is to show code clearly in
order that it can be read online without downloading. Thanks for the
feedback.



I had something like this originally but I changed it for speed. With
code like this if the user set the factor to something near 1 this
loop could take a long time. If set to exactly 1 it would not
terminate at all. Also the user could start the buffer off with size
zero so the size could never increase. Having said all that I do see
your point and it is a good one: where multiple buffers are used
standard sizes would be better. I'm not sure what's best here. Will
give it some thought.

Speed is almost certainly not relevant. If you are allocating a
buffer, the time needed to process each character in the buffer is
probably bigger than each of these loop iterations. And there are
only log N loop iterations for N characters. Optimizing before
profiling is almost always a waste.

For the reason you cite about the trouble with small initial sizes, I
actually use this:

while (new_size < requested_size)
new_size = 1 + new_size * FACTOR;

If you are really worried about it, you can maintain a global array of
standard sizes and store the index of the current size in a field of
the buffer struct. Then advancing to the next size is just an
increment and an array reference.
I too (sic) thought a factor of 2 would be too much - and I wanted to
demonstrate that the code would allow a factor between 1 and 2 while
still using integer arithmetic for performance.

I'm not happy with this yet, though. The highest allowable original
size or offset for (3 / 2) is 1/3rd of the address space. At that
level and above a factor of (3 / 2) will fail. Maybe it's better to
define FACTOR as

  / 2 * 3

and include it in the function as

  (offset FACTOR)

rather than

  (offset * FACTOR)

as it is at the moment. I've not seen anyone else's C code that does
this though - i.e. rely on the textual substitution of the
preprocessor. How does it look to other C programmers?

It's ugly. You should realy define NUM and DEN and then say (offset *
NUM) / DEN. When DEN happens to be one, the division will be
optimized away.
Is there a better way to increase the buffer size? Especially as the
address space fills up should the increase factor be reduced, and how?

Anyone been down this road?

Garbage collectors deal with this when they need to increase the pool
size when not much VM is left. There isn't a single best policy
because behavior depends heavily on the OS.
 
J

James Harris

....
Garbage collectors deal with this when they need to increase the pool
size when not much VM is left. There isn't a single best policy
because behavior depends heavily on the OS.

Agreed. The more I think about this the less I think there is any kind
of one-size-fits-all solution. One advantage of distributing as source
code is that people can vary the code to suit their situation. I've
kept the original code as it was but have documented some alternatives
as comments within the code.

http://codewiki.wikispaces.com/xbuf.c

The commented alternatives are purely illustrative and are untested. I
hope I've got the C syntax etc correct. Either way they should be
enough to illustrate the options to anyone using the code. If you,
dear reader, see an error feel free to fix it or let me know what I've
got wrong and I'll fix it.

James
 
C

CBFalconer

James said:
.... snip ...


Agreed. The more I think about this the less I think there is any
kind of one-size-fits-all solution. One advantage of distributing
as source code is that people can vary the code to suit their
situation. I've kept the original code as it was but have
documented some alternatives as comments within the code.

In general you should adjust the mechanism to fit the expected
use. As an example, my ggets routine expands the buffer in
constant increments (of 128), because its primary use is expected
to be interactive input, and those strings are not normally
unlimited. However my hashlib expands its table by a factor of two
each time, resulting in constant average overhead, and basically
eliminating consideration of table size. In each case there are
possible penalties.
 
J

James Harris

... snip ...



In general you should adjust the mechanism to fit the expected
use. As an example, my ggets routine expands the buffer in
constant increments (of 128), because its primary use is expected
to be interactive input, and those strings are not normally
unlimited. However my hashlib expands its table by a factor of two
each time, resulting in constant average overhead, and basically
eliminating consideration of table size. In each case there are
possible penalties.

Valid comment. I'll leave the code with comments on various options.
Whoever uses it can use it as it stands or, as you point out, adjust
the increase to suit the use.

James
 
T

thomas.mertes

Do you remember a previous discussion on this newsgroup as at

 http://groups.google.com/group/comp.lang.c/browse_frm/thread/cb502ce8....

It concerned code to manage a buffer, expanding it as needed semi
automatically. I've incorporated suggestions people made and published
the code at

 http://codewiki.wikispaces.com/xbuf.c

(For the impatient check the last two links in the table of contents
for the code. The rest is documentation.)

The code is intended to be fast. It's uses include reading an
arbitrary-length line from an input stream - a safe gets() replacement
if you like - but it is more flexible than line reading functions and
is intended to be useful in more cases than just line reading.

Your idea of an interface like xbuf_ok() and xbuf_trim()
is simple and generally usable. IMHO there are some
points. In most cases the size of the buffer may be
already okay (specially when it has been resized short
ago). Therefore I think that macros for xbuf_ok and
xbuf_trim, which call a resize function just when
necessary, would help to increase the performance.

The Seed7 runtime library uses a similar technic.
Strings are managed in a 'stristruct' which can be
resized:

typedef struct stristruct {
memsizetype size;
memsizetype capacity;
strelemtype mem[1];
} strirecord;

The strelemtype is a 32 Bit unsigned type. But
besides this the resizing logic is the same.
Resizing a string is done with the following macros:

#define GROW_STRI(v1,v2,l1,l2) ((l2)>(v2)->capacity?(v1=growStri
(v2,l2)):(v1=(v2)))
#define SHRINK_STRI(v1,v2,l1,l2) ((l2)<(v2)->capacity>>2?
(v1=shrinkStri(v2,l2)):(v1=(v2)))

The four parameters of this macros are:
v1 ... Stristruct variable which gets the resized string.
v2 ... Stristruct pointer with the old string.
l1 ... size of the old string (not used).
l2 ... requested size for the resized string.

The functions 'growStri' and 'shrinkStri' use
realloc and assign also a new value to the capacity.
The resizing macros are used when strings are
concatenated.

Functions which read a string from a file use a
different resizing logic, which is encapsulated
in the function. In the line read functions the
string starts with a size of 256 (which is probably
bigger that the usual text line length) and resizes
in jumps of 2048. At the end of the function the
string is realloced down to the actual size. When
the line read function is left the size and the
capacity are identical.

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
J

James Harris

On 23 Nov, 18:19, (e-mail address removed) wrote:
....
Your idea of an interface like xbuf_ok() and xbuf_trim()
is simple and generally usable. IMHO there are some
points. In most cases the size of the buffer may be
already okay (specially when it has been resized short
ago). Therefore I think that macros for xbuf_ok and
xbuf_trim, which call a resize function just when
necessary, would help to increase the performance.


The ideal calling sequence for xbuf_ok() is

if (offset >= buf1_size && xbuf_ok(...)) {
/* handle out of space condition */
}

This only calls the routine if the buffer needs to be enlarged and
would be very fast. Rough x86 assembler for that could be


cmp eax, buf1_size ;Where eax is the desired offset
jl ok

call xbuf_ok
/* test return code and handle any error */

ok:


The initial cmp and jl instructions would execute paired in one cycle
on a Pentium I and would probably consume no cycles at all on more
modern CPU due to branch prediction. There is a note in this in the
xbuf_ok function comments.

Maybe the above call can be made into a macro...? Since xbuf_trim
would normally be called only once after the buffer has been filled I
can't see any value in a macro for its call.

I'm not sure I know how to write a C macro. Based on your examples
below would something like the following do?

#define XBUF_OK(buf, buf_size, offset) \
((offset) >= (buf_size) && xbuf_ok((&buf), (&buf_size), (offset)))

I'm not sure about the placement of parens round (&buf) etc. Should
this be &(buf)? Given the above the code which uses it would be

if XBUF_OK(buf1, buf1_size, offset) {
/* Handle lack of space */
}


If this is OK where should be macro be distributed - with the .c file
or in the header?


The Seed7 runtime library uses a similar technic.
Strings are managed in a 'stristruct' which can be
resized:

typedef struct stristruct {
memsizetype size;
memsizetype capacity;
strelemtype mem[1];
} strirecord;

The strelemtype is a 32 Bit unsigned type. But
besides this the resizing logic is the same.
Resizing a string is done with the following macros:

#define GROW_STRI(v1,v2,l1,l2) ((l2)>(v2)->capacity?(v1=growStri
(v2,l2)):(v1=(v2)))
#define SHRINK_STRI(v1,v2,l1,l2) ((l2)<(v2)->capacity>>2?
(v1=shrinkStri(v2,l2)):(v1=(v2)))

The four parameters of this macros are:
v1 ... Stristruct variable which gets the resized string.
v2 ... Stristruct pointer with the old string.
l1 ... size of the old string (not used).
l2 ... requested size for the resized string.

The functions 'growStri' and 'shrinkStri' use
realloc and assign also a new value to the capacity.
The resizing macros are used when strings are
concatenated.


Do these macros handle depletion of space?

Functions which read a string from a file use a
different resizing logic, which is encapsulated
in the function. In the line read functions the
string starts with a size of 256 (which is probably
bigger that the usual text line length) and resizes
in jumps of 2048. At the end of the function the
string is realloced down to the actual size. When
the line read function is left the size and the
capacity are identical.

It's always good to see code take sensible approaches to achieving
performance.

James
 
B

Ben Bacarisse

#define XBUF_OK(buf, buf_size, offset) \
((offset) >= (buf_size) && xbuf_ok((&buf), (&buf_size), (offset)))

I'm not sure about the placement of parens round (&buf) etc. Should
this be &(buf)?

Yes. If someone were to risk "passing" and expression with an
operator of lower precedence than & then (&x) will go wrong by &(x)
won't.
 
J

James Harris

Yes. If someone were to risk "passing" and expression with an
operator of lower precedence than & then (&x) will go wrong by &(x)
won't.

OK have set it up that way and added use of the macro to the
documentation.

I've put the macro in the header on the assumption that that's the
right place. Presumably it's right to bring in the macro along with
the function templates.

James
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,588
Members
45,094
Latest member
PollyBlau4

Latest Threads

Top