Concatenating strings

B

Big Brother

I've been thinking about the seemingly simple task of writing a
va_arg-type function to concatenate arbitarily many strings.

My first thoughts violated the following principles:
1) never calculate the length of a string more than once;
2) never invoke realloc() multiple times if you can make do with a
single malloc().

As a result, I came up with the code below. This avoids the two pitfalls
above, but at the expense of needing three passes through the argument
list, which seems a bit clumsy.

Does anyone have any thoughts on the rights and wrongs of this?


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>

/* Returns the concatenation of the strings supplied as arguments. The
* final argument must be NULL.
* Caller should free() the returned pointer when done.
* In case of allocation failure, NULL is returned.
*/
char *concat(char *s, ...)
{
va_list ap;
unsigned n=1;
size_t *len, total=0;
char *t, *r;

/* pass 1: count arguments */
for(va_start(ap, s); va_arg(ap,char *); n++)
;

/* pass 2: get lengths */
if(!(len=malloc(n * sizeof *len)))
return NULL;
va_start(ap, s);
len[n=0]=strlen(s);
while(t=va_arg(ap, char *))
total+=(len[++n]=strlen(t));
va_end(ap);

/* pass 3: copy strings */
if(r=malloc(total+1)) {
memcpy(r, s, total = *len);
n=1;
for(va_start(ap, s) ; t=va_arg(ap, char *) ; total+=len[n++])
memcpy(r+total, t, len[n]);
va_end(ap);
}
free(len);
return r;
}

int main()
{
char *s;
s=concat("hello ", "world", " and ", "everyone", " in it!", NULL);
puts(s);
free(s);
}
 
E

Eric Sosman

Big said:
I've been thinking about the seemingly simple task of writing a
va_arg-type function to concatenate arbitarily many strings.

My first thoughts violated the following principles:
1) never calculate the length of a string more than once;
2) never invoke realloc() multiple times if you can make do with a
single malloc().

As a result, I came up with the code below. This avoids the two pitfalls
above, but at the expense of needing three passes through the argument
list, which seems a bit clumsy.

Does anyone have any thoughts on the rights and wrongs of this?

My preference would be to give up on #1, saving one pass
over the arguments and eliminating the extra malloc()/free().
(You could do it in two passes without the extra space by
building the output string with repeated strcat(), but that
obeys #1 in name only.)

Other comments in-line.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>

/* Returns the concatenation of the strings supplied as arguments. The
* final argument must be NULL.

Since you retrieve that argument as a char*, it must be
supplied as (char*)NULL -- an unadorned NULL is incorrect.
* Caller should free() the returned pointer when done.
* In case of allocation failure, NULL is returned.
*/
char *concat(char *s, ...)

It would be nice to const-qualify the named argument.
{
va_list ap;
unsigned n=1;
size_t *len, total=0;

Stylistic observation: The code will be easier to read
(and harder to botch) if you keep all the components of a
loop close together. Don't initialize these variables at
the very start of the function; initialize them just before
you use them.
char *t, *r;

/* pass 1: count arguments */
for(va_start(ap, s); va_arg(ap,char *); n++)
;

You need a va_end(ap) here. Also, you've not taken
care of the boundary case where s == NULL.
/* pass 2: get lengths */
if(!(len=malloc(n * sizeof *len)))
return NULL;
va_start(ap, s);
len[n=0]=strlen(s);
while(t=va_arg(ap, char *))
total+=(len[++n]=strlen(t));
va_end(ap);

Stylistic observation: Brevity is the soul of wit, but
parsimony is the enemy of clarity. There are no medals given
for packing the greatest amount of activity into the fewest
number of characters. Relax, spread yourself out a little,
let your code do one thing at a time and do it clearly --
/* pass 3: copy strings */
if(r=malloc(total+1)) {
memcpy(r, s, total = *len);
n=1;
for(va_start(ap, s) ; t=va_arg(ap, char *) ; total+=len[n++])
memcpy(r+total, t, len[n]);
va_end(ap);
}
free(len);
return r;
}

int main()
{
char *s;
s=concat("hello ", "world", " and ", "everyone", " in it!", NULL);

... (char*)NULL); as mentioned above.

Undefined behavior if concat() returns NULL, which its
documentation says it might do.
free(s);
}

Falling off the end of main() without returning a value
is an abomination. The fact that the abomination has received
the blessing of C99 doesn't make it less abominable.
 
F

Francine.Neary

Other comments in-line.

Couple more bugs...
Since you retrieve that argument as a char*, it must be
supplied as (char*)NULL -- an unadorned NULL is incorrect.


It would be nice to const-qualify the named argument.


Stylistic observation: The code will be easier to read
(and harder to botch) if you keep all the components of a
loop close together. Don't initialize these variables at
the very start of the function; initialize them just before
you use them.

And in fact, there's a bug exactly at this point - the initialization
of total is wrong: ...
char *t, *r;
/* pass 1: count arguments */
for(va_start(ap, s); va_arg(ap,char *); n++)
;

You need a va_end(ap) here. Also, you've not taken
care of the boundary case where s == NULL.
/* pass 2: get lengths */
if(!(len=malloc(n * sizeof *len)))
return NULL;
va_start(ap, s);
len[n=0]=strlen(s);

....replace this with total=len[n=0]=strlen(s);
while(t=va_arg(ap, char *))
total+=(len[++n]=strlen(t));
va_end(ap);

Stylistic observation: Brevity is the soul of wit, but
parsimony is the enemy of clarity. There are no medals given
for packing the greatest amount of activity into the fewest
number of characters. Relax, spread yourself out a little,
let your code do one thing at a time and do it clearly --
you might even <gasp!> go so far as to introduce a couple
of additional variables.


/* pass 3: copy strings */
if(r=malloc(total+1)) {
memcpy(r, s, total = *len);
n=1;
for(va_start(ap, s) ; t=va_arg(ap, char *) ; total+=len[n++])
memcpy(r+total, t, len[n]);
va_end(ap);
}
free(len);
return r;

Oops! You forgot to null-terminate r.
 
M

Malcolm McLean

Big Brother said:
I've been thinking about the seemingly simple task of writing a
va_arg-type function to concatenate arbitarily many strings.

My first thoughts violated the following principles:
1) never calculate the length of a string more than once;
2) never invoke realloc() multiple times if you can make do with a
single malloc().

As a result, I came up with the code below. This avoids the two pitfalls
above, but at the expense of needing three passes through the argument
list, which seems a bit clumsy.

Does anyone have any thoughts on the rights and wrongs of this?


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>

/* Returns the concatenation of the strings supplied as arguments. The
* final argument must be NULL.
* Caller should free() the returned pointer when done.
* In case of allocation failure, NULL is returned.
*/
char *concat(char *s, ...)
{
va_list ap;
unsigned n=1;
size_t *len, total=0;
char *t, *r;

/* pass 1: count arguments */
for(va_start(ap, s); va_arg(ap,char *); n++)
;

/* pass 2: get lengths */
if(!(len=malloc(n * sizeof *len)))
return NULL;
va_start(ap, s);
len[n=0]=strlen(s);
while(t=va_arg(ap, char *))
total+=(len[++n]=strlen(t));
va_end(ap);
Here you seem to have forgotten the first string in calculating total.
This was hard to spot, because of all the fancy assignments in expressions.
/* pass 3: copy strings */
if(r=malloc(total+1)) {
memcpy(r, s, total = *len);
n=1;
for(va_start(ap, s) ; t=va_arg(ap, char *) ; total+=len[n++])
memcpy(r+total, t, len[n]);
va_end(ap);
}
No terminating nul. Also, aren't you copying string t on the first pass?
That's set to NULL at this point.
 
S

SM Ryan

# I've been thinking about the seemingly simple task of writing a
# va_arg-type function to concatenate arbitarily many strings.
#
# My first thoughts violated the following principles:
# 1) never calculate the length of a string more than once;
# 2) never invoke realloc() multiple times if you can make do with a
# single malloc().

You're free to do whatever you want, but as far as practical code,
if you realloc by a constant factor (say double the string length
every realloc) instead of by a constant amount, the running time
is linear in the the length of the strings for any malloc you're
likely to encounter.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top