Concatenating strings

Discussion in 'C Programming' started by Big Brother, Sep 2, 2007.

  1. Big Brother

    Big Brother Guest

    I've been thinking about the seemingly simple task of writing a
    va_arg-type function to concatenate arbitarily many strings.

    My first thoughts violated the following principles:
    1) never calculate the length of a string more than once;
    2) never invoke realloc() multiple times if you can make do with a
    single malloc().

    As a result, I came up with the code below. This avoids the two pitfalls
    above, but at the expense of needing three passes through the argument
    list, which seems a bit clumsy.

    Does anyone have any thoughts on the rights and wrongs of this?


    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdarg.h>

    /* Returns the concatenation of the strings supplied as arguments. The
    * final argument must be NULL.
    * Caller should free() the returned pointer when done.
    * In case of allocation failure, NULL is returned.
    */
    char *concat(char *s, ...)
    {
    va_list ap;
    unsigned n=1;
    size_t *len, total=0;
    char *t, *r;

    /* pass 1: count arguments */
    for(va_start(ap, s); va_arg(ap,char *); n++)
    ;

    /* pass 2: get lengths */
    if(!(len=malloc(n * sizeof *len)))
    return NULL;
    va_start(ap, s);
    len[n=0]=strlen(s);
    while(t=va_arg(ap, char *))
    total+=(len[++n]=strlen(t));
    va_end(ap);

    /* pass 3: copy strings */
    if(r=malloc(total+1)) {
    memcpy(r, s, total = *len);
    n=1;
    for(va_start(ap, s) ; t=va_arg(ap, char *) ; total+=len[n++])
    memcpy(r+total, t, len[n]);
    va_end(ap);
    }
    free(len);
    return r;
    }

    int main()
    {
    char *s;
    s=concat("hello ", "world", " and ", "everyone", " in it!", NULL);
    puts(s);
    free(s);
    }
    Big Brother, Sep 2, 2007
    #1
    1. Advertising

  2. Big Brother

    Big Brother Guest

    On 2 Sep 2007 at 15:22, Big Brother wrote:
    ....

    > /* pass 1: count arguments */
    > for(va_start(ap, s); va_arg(ap,char *); n++)
    > ;


    Oops,should be a va_end(ap); here, of course.

    >
    > /* pass 2: get lengths */


    ....
    Big Brother, Sep 2, 2007
    #2
    1. Advertising

  3. Big Brother

    Eric Sosman Guest

    Big Brother wrote:
    > I've been thinking about the seemingly simple task of writing a
    > va_arg-type function to concatenate arbitarily many strings.
    >
    > My first thoughts violated the following principles:
    > 1) never calculate the length of a string more than once;
    > 2) never invoke realloc() multiple times if you can make do with a
    > single malloc().
    >
    > As a result, I came up with the code below. This avoids the two pitfalls
    > above, but at the expense of needing three passes through the argument
    > list, which seems a bit clumsy.
    >
    > Does anyone have any thoughts on the rights and wrongs of this?


    My preference would be to give up on #1, saving one pass
    over the arguments and eliminating the extra malloc()/free().
    (You could do it in two passes without the extra space by
    building the output string with repeated strcat(), but that
    obeys #1 in name only.)

    Other comments in-line.

    > #include <stdio.h>
    > #include <stdlib.h>
    > #include <string.h>
    > #include <stdarg.h>
    >
    > /* Returns the concatenation of the strings supplied as arguments. The
    > * final argument must be NULL.


    Since you retrieve that argument as a char*, it must be
    supplied as (char*)NULL -- an unadorned NULL is incorrect.

    > * Caller should free() the returned pointer when done.
    > * In case of allocation failure, NULL is returned.
    > */
    > char *concat(char *s, ...)


    It would be nice to const-qualify the named argument.

    > {
    > va_list ap;
    > unsigned n=1;
    > size_t *len, total=0;


    Stylistic observation: The code will be easier to read
    (and harder to botch) if you keep all the components of a
    loop close together. Don't initialize these variables at
    the very start of the function; initialize them just before
    you use them.

    > char *t, *r;
    >
    > /* pass 1: count arguments */
    > for(va_start(ap, s); va_arg(ap,char *); n++)
    > ;


    You need a va_end(ap) here. Also, you've not taken
    care of the boundary case where s == NULL.

    > /* pass 2: get lengths */
    > if(!(len=malloc(n * sizeof *len)))
    > return NULL;
    > va_start(ap, s);
    > len[n=0]=strlen(s);
    > while(t=va_arg(ap, char *))
    > total+=(len[++n]=strlen(t));
    > va_end(ap);


    Stylistic observation: Brevity is the soul of wit, but
    parsimony is the enemy of clarity. There are no medals given
    for packing the greatest amount of activity into the fewest
    number of characters. Relax, spread yourself out a little,
    let your code do one thing at a time and do it clearly --
    you might even <gasp!> go so far as to introduce a couple
    of additional variables.

    > /* pass 3: copy strings */
    > if(r=malloc(total+1)) {
    > memcpy(r, s, total = *len);
    > n=1;
    > for(va_start(ap, s) ; t=va_arg(ap, char *) ; total+=len[n++])
    > memcpy(r+total, t, len[n]);
    > va_end(ap);
    > }
    > free(len);
    > return r;
    > }
    >
    > int main()
    > {
    > char *s;
    > s=concat("hello ", "world", " and ", "everyone", " in it!", NULL);


    ... (char*)NULL); as mentioned above.

    > puts(s);


    Undefined behavior if concat() returns NULL, which its
    documentation says it might do.

    > free(s);
    > }


    Falling off the end of main() without returning a value
    is an abomination. The fact that the abomination has received
    the blessing of C99 doesn't make it less abominable.

    --
    Eric Sosman
    lid
    Eric Sosman, Sep 2, 2007
    #3
  4. Big Brother

    Guest

    On Sep 2, 4:57 pm, Eric Sosman <> wrote:
    > Other comments in-line.


    Couple more bugs...

    >
    > > #include <stdio.h>
    > > #include <stdlib.h>
    > > #include <string.h>
    > > #include <stdarg.h>

    >
    > > /* Returns the concatenation of the strings supplied as arguments. The
    > > * final argument must be NULL.

    >
    > Since you retrieve that argument as a char*, it must be
    > supplied as (char*)NULL -- an unadorned NULL is incorrect.
    >
    > > * Caller should free() the returned pointer when done.
    > > * In case of allocation failure, NULL is returned.
    > > */
    > > char *concat(char *s, ...)

    >
    > It would be nice to const-qualify the named argument.
    >
    > > {
    > > va_list ap;
    > > unsigned n=1;
    > > size_t *len, total=0;

    >
    > Stylistic observation: The code will be easier to read
    > (and harder to botch) if you keep all the components of a
    > loop close together. Don't initialize these variables at
    > the very start of the function; initialize them just before
    > you use them.


    And in fact, there's a bug exactly at this point - the initialization
    of total is wrong: ...

    >
    > > char *t, *r;

    >
    > > /* pass 1: count arguments */
    > > for(va_start(ap, s); va_arg(ap,char *); n++)
    > > ;

    >
    > You need a va_end(ap) here. Also, you've not taken
    > care of the boundary case where s == NULL.
    >
    > > /* pass 2: get lengths */
    > > if(!(len=malloc(n * sizeof *len)))
    > > return NULL;
    > > va_start(ap, s);
    > > len[n=0]=strlen(s);


    ....replace this with total=len[n=0]=strlen(s);

    > > while(t=va_arg(ap, char *))
    > > total+=(len[++n]=strlen(t));
    > > va_end(ap);

    >
    > Stylistic observation: Brevity is the soul of wit, but
    > parsimony is the enemy of clarity. There are no medals given
    > for packing the greatest amount of activity into the fewest
    > number of characters. Relax, spread yourself out a little,
    > let your code do one thing at a time and do it clearly --
    > you might even <gasp!> go so far as to introduce a couple
    > of additional variables.
    >
    >
    >
    > > /* pass 3: copy strings */
    > > if(r=malloc(total+1)) {
    > > memcpy(r, s, total = *len);
    > > n=1;
    > > for(va_start(ap, s) ; t=va_arg(ap, char *) ; total+=len[n++])
    > > memcpy(r+total, t, len[n]);
    > > va_end(ap);
    > > }
    > > free(len);
    > > return r;


    Oops! You forgot to null-terminate r.

    > > }

    >
    > > int main()
    > > {
    > > char *s;
    > > s=concat("hello ", "world", " and ", "everyone", " in it!", NULL);

    >
    > ... (char*)NULL); as mentioned above.
    >
    > > puts(s);

    >
    > Undefined behavior if concat() returns NULL, which its
    > documentation says it might do.
    >
    > > free(s);
    > > }

    >
    > Falling off the end of main() without returning a value
    > is an abomination. The fact that the abomination has received
    > the blessing of C99 doesn't make it less abominable.
    >
    > --
    > Eric Sosman
    >
    , Sep 2, 2007
    #4
  5. "Big Brother" <> wrote in message
    news:...
    > I've been thinking about the seemingly simple task of writing a
    > va_arg-type function to concatenate arbitarily many strings.
    >
    > My first thoughts violated the following principles:
    > 1) never calculate the length of a string more than once;
    > 2) never invoke realloc() multiple times if you can make do with a
    > single malloc().
    >
    > As a result, I came up with the code below. This avoids the two pitfalls
    > above, but at the expense of needing three passes through the argument
    > list, which seems a bit clumsy.
    >
    > Does anyone have any thoughts on the rights and wrongs of this?
    >
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    > #include <string.h>
    > #include <stdarg.h>
    >
    > /* Returns the concatenation of the strings supplied as arguments. The
    > * final argument must be NULL.
    > * Caller should free() the returned pointer when done.
    > * In case of allocation failure, NULL is returned.
    > */
    > char *concat(char *s, ...)
    > {
    > va_list ap;
    > unsigned n=1;
    > size_t *len, total=0;
    > char *t, *r;
    >
    > /* pass 1: count arguments */
    > for(va_start(ap, s); va_arg(ap,char *); n++)
    > ;
    >
    > /* pass 2: get lengths */
    > if(!(len=malloc(n * sizeof *len)))
    > return NULL;
    > va_start(ap, s);
    > len[n=0]=strlen(s);
    > while(t=va_arg(ap, char *))
    > total+=(len[++n]=strlen(t));
    > va_end(ap);
    >

    Here you seem to have forgotten the first string in calculating total.
    This was hard to spot, because of all the fancy assignments in expressions.

    > /* pass 3: copy strings */
    > if(r=malloc(total+1)) {
    > memcpy(r, s, total = *len);
    > n=1;
    > for(va_start(ap, s) ; t=va_arg(ap, char *) ; total+=len[n++])
    > memcpy(r+total, t, len[n]);
    > va_end(ap);
    > }
    >

    No terminating nul. Also, aren't you copying string t on the first pass?
    That's set to NULL at this point.
    >
    > free(len);
    > return r;
    > }
    >
    > int main()
    > {
    > char *s;
    > s=concat("hello ", "world", " and ", "everyone", " in it!", NULL);
    > puts(s);
    > free(s);
    > }
    >
    Malcolm McLean, Sep 2, 2007
    #5
  6. Big Brother

    SM Ryan Guest

    Big Brother <> wrote:
    # I've been thinking about the seemingly simple task of writing a
    # va_arg-type function to concatenate arbitarily many strings.
    #
    # My first thoughts violated the following principles:
    # 1) never calculate the length of a string more than once;
    # 2) never invoke realloc() multiple times if you can make do with a
    # single malloc().

    You're free to do whatever you want, but as far as practical code,
    if you realloc by a constant factor (say double the string length
    every realloc) instead of by a constant amount, the running time
    is linear in the the length of the strings for any malloc you're
    likely to encounter.

    --
    SM Ryan http://www.rawbw.com/~wyrmwif/
    Death is the worry of the living. The dead, like myself,
    only worry about decay and necrophiliacs.
    SM Ryan, Sep 3, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stan Horwitz
    Replies:
    2
    Views:
    2,738
    Stan Horwitz
    Feb 15, 2006
  2. Replies:
    11
    Views:
    624
    Karl Heinz Buchegger
    Apr 8, 2005
  3. John Henry

    Concatenating strings

    John Henry, Jul 1, 2006, in forum: Python
    Replies:
    1
    Views:
    306
    Steven Bethard
    Jul 1, 2006
  4. John Henry

    Concatenating strings

    John Henry, Jul 1, 2006, in forum: Python
    Replies:
    1
    Views:
    330
    Robert Kern
    Jul 1, 2006
  5. EHC

    concatenating strings

    EHC, Dec 15, 2006, in forum: Python
    Replies:
    3
    Views:
    322
    Caleb Hattingh
    Dec 16, 2006
Loading...

Share This Page