String Buffers?

J

Jamie

Hello Newsgroup:

I'm not a C programmer, though I've dabbled on and off through the years. (wish
I could justify doing more in C/C++ because it is enjoyable, just highly time
consuming compared to other languages)

Mostly when I use 'C' it's just to fork a process set-ID or send a signal to
another process, etc... real basic stuff. I'm not seasoned in it and still
find the syntax of structs and arrays confusing.

Anyhow, I was surprised my books don't discuss the area of basic, mundane
string/append buffers.

Classic example being something that accumulates text (like an expat
handler in perl might need to do)

I wrote a linked list / string buffer as a pet project, mainly because
I would like to get better at C, in case I ever have the opportunity
to use C.

It's linked list is in "blocks", tracks the TAIL of the list rather then the
head to make appends faster, and it doesn't need to call strlen() on the whole
string each time an append is required (it only needs strlen() to get the length
of the new part)

For review/manglement/flames and entanglements:

http://geniegate.com/other/slbuf/

These compile on a UNIX machine. Using the 'time' command, under *ideal*
conditions, (larger strings with many many appends and a decent sized block
size) a combination linked list with a stringbuffer APPEARS to run about 10
times faster than realloc + strcat. (but, in less then ideal conditions, it can
end up running slower, low block sizes really slow it down)

The "sample program" is a utility that slurps stdin into a stringbuffer translating
[<>&"] into their respective HTML entities and spits out a crude HTML document
of the text. Not exactly a useful program (would of course be best to spit it
out in chunks) but it does illustrate using the stringbuffer in a "real" program.
(don't run it on a HUGE file, it slurps all of stdin)

On an old box, 100,000 iterations of 50+ appends takes about 2-3 seconds -vs-
around 13 seconds for the realloc + strcat approach.

I'm surprised my books don't talk about stringbuffers (Or, perhaps
I wasn't looking in the right places?) Seems like a really common operation.

Am I missing something obvious?

In practice, are linked lists better at dealing with memory fragmentation,
or is realloc best? (how do you know when there is fragmentation?)

Jamie
 
F

Frank Silvermann

Jamie said:
Hello Newsgroup:

I'm not a C programmer, though I've dabbled on and off through the years. (wish
I could justify doing more in C/C++ because it is enjoyable, just highly time
consuming compared to other languages)

Mostly when I use 'C' it's just to fork a process set-ID or send a signal to
another process, etc... real basic stuff. I'm not seasoned in it and still
find the syntax of structs and arrays confusing.

Anyhow, I was surprised my books don't discuss the area of basic, mundane
string/append buffers.

Classic example being something that accumulates text (like an expat
handler in perl might need to do)

I wrote a linked list / string buffer as a pet project, mainly because
I would like to get better at C, in case I ever have the opportunity
to use C.

It's linked list is in "blocks", tracks the TAIL of the list rather then the
head to make appends faster, and it doesn't need to call strlen() on the whole
string each time an append is required (it only needs strlen() to get the length
of the new part)

For review/manglement/flames and entanglements:

http://geniegate.com/other/slbuf/

These compile on a UNIX machine. Using the 'time' command, under *ideal*
conditions, (larger strings with many many appends and a decent sized block
size) a combination linked list with a stringbuffer APPEARS to run about 10
times faster than realloc + strcat. (but, in less then ideal conditions, it can
end up running slower, low block sizes really slow it down)

The "sample program" is a utility that slurps stdin into a stringbuffer translating
[<>&"] into their respective HTML entities and spits out a crude HTML document
of the text. Not exactly a useful program (would of course be best to spit it
out in chunks) but it does illustrate using the stringbuffer in a "real" program.
(don't run it on a HUGE file, it slurps all of stdin)

On an old box, 100,000 iterations of 50+ appends takes about 2-3 seconds -vs-
around 13 seconds for the realloc + strcat approach.

I'm surprised my books don't talk about stringbuffers (Or, perhaps
I wasn't looking in the right places?) Seems like a really common operation.

Am I missing something obvious?

In practice, are linked lists better at dealing with memory fragmentation,
or is realloc best? (how do you know when there is fragmentation?)
I would start by reading the thread "Critic of the Given Code", started
yesterday. People in the know are always going back and forth about
realloc around here. frank
 
S

SM Ryan

(e-mail address removed) (Jamie) wrote:

# size) a combination linked list with a stringbuffer APPEARS to run about 10
# times faster than realloc + strcat. (but, in less then ideal conditions, it can

Na•ve use of strcat can increase runtime by an order of magnitude. Also you
want to reallocate the string but a constant factor instead of making it
just long enough each time. Something like

int m = 0,n = 0; char *s = 0;
char *appendstring(char *t) {
int l = strlen(t);
if (n+l+1>==m) {m = 2*(n+l+1); s = realloc(s,m);
strcpy(s+n,t); n += l;
return s;
}
 
J

Jamie

In said:
# size) a combination linked list with a stringbuffer APPEARS to run about 10
# times faster than realloc + strcat. (but, in less then ideal conditions, it can

Na•ve use of strcat can increase runtime by an order of magnitude. Also you
want to reallocate the string but a constant factor instead of making it
just long enough each time. Something like

int m = 0,n = 0; char *s = 0;
char *appendstring(char *t) {
int l = strlen(t);
if (n+l+1>==m) {m = 2*(n+l+1); s = realloc(s,m);
strcpy(s+n,t); n += l;
return s;
}

Hmm... I like your way better. :) Totally eliminates the linked list.

Only thing is, it appears to need global variables.

Wonder how a combo would work, something like:

StringBuf *sb = mythical_create_sbuf(size_t bsz, const char *initial_value);
mythical_append(sb,"String to append");

Where sb is a pointer to a struct with the state information, bsz is a
setting used for realloc and initial size.

Also, your approach provides ready access to the string itself, no need for
a "pack" operation or a separate copy. Very nice!

Next time I poke around with it, I think I'll try yours and check to see which
are faster.

I do wonder which is better for memory fragmentation. Is it best to malloc
several small(er) chunks and string together or realloc. Your approach takes
less space. (in both data size AND code size)


Jamie
 
S

SM Ryan

(e-mail address removed) (Jamie) wrote:

# Only thing is, it appears to need global variables.
#
# Wonder how a combo would work, something like:
#
# StringBuf *sb = mythical_create_sbuf(size_t bsz, const char *initial_value);
# mythical_append(sb,"String to append");

It's been implemented that way by many different people.

# I do wonder which is better for memory fragmentation. Is it best to malloc
# several small(er) chunks and string together or realloc. Your approach takes
# less space. (in both data size AND code size)

The one problem is that it can overallocate space by a very large number
and end up wasting lots of space. This is less of problem when you have
virtual memory around, but on smaller systems it could blow out the heap
for pretty modest strings.

The point of doubling the size each time is you get exponential increase
in size (and cost of copying the string on realloc), but the number of
reallocations drops off logarithmically. The overall effect is linear
time and space, and a linear waste of space. Chained string segments
is also linear, but then you have the reassembly problem for libraries
that expect the string as a contiguous array.
 
M

Malcolm

Jamie said:
Also, your approach provides ready access to the string itself, no need
for
a "pack" operation or a separate copy. Very nice!


I do wonder which is better for memory fragmentation. Is it best to malloc
several small(er) chunks and string together or realloc. Your approach
takes
less space. (in both data size AND code size)
If you've got to worry about memory usage, use the small chunk approach.
If you are running on a modern PC and the string is less than a million
characters, or a short novel, then the realloc() approach with a single
block will be fine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top