copy a string into a 2d array of chars

Discussion in 'C Programming' started by Simon Schaap, Apr 7, 2004.

  1. Simon Schaap

    Simon Schaap Guest

    Hello,
    I have encountered a strange problem and I hope you can help me to
    understand it. What I want to do is to pass an array of chars to a
    function that will split it up (on every location where a * occurs in
    the string). This split function should allocate a 2D array of chars
    and put the split results in different rows. The listing below shows
    how I started to work on this. To keep the program simple and help
    focus the program the string is not actually split. The split function
    in this case just allocates a 2D array of size 1 by the length of the
    passed string and copies the entire input string into this newly
    allocated array. A pointer to this array is then passed to the caller
    function. By the way, allocating of these so called 2D arrays is done
    by a funtion that I adapted from the book "C unleashed", R Heath, L
    Kirby et al.

    Unfortunately, even this simple program escapes my comprehension. When
    the caller function prints the chars in the 2D array it just received
    from the split function, the first char turns out to be the '\0'
    character! I am completely at loss here, where does this '\0'
    character come from? I hope someone will find the time to enlight me.

    Sincerely,
    Simon

    BEGIN OF LISTING:


    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>


    char** allocate_2d_array_of_chars(size_t m, size_t n);
    char** split_string(char *instring);

    int main(void)
    {
    char **a=NULL;
    char b[14]="*(10)*5*(1)*1";
    int i;

    a=split_string(b);

    if (a!=NULL) {
    for (i=0;i<14;i++) {
    printf("%d %c\n", i, a[0]);
    }
    /* to show the strange behavior : */
    if (a[0][0]=='\0') {
    printf("a[0][0] equals '\0' \n Strange..., not?\n");
    }
    free(a);
    }
    return 0;
    }

    char **split_string(char *instr)
    {
    int instrlen;
    int i;
    char **retarr;

    instrlen = strlen(instr);

    retarr = allocate_2d_array_of_chars(1,instrlen);
    if (retarr==NULL) {
    printf("could not allocate retarr\n");
    return NULL;
    }

    /* copy the string */
    for (i=0;i<instrlen;i++) {
    retarr[0]=instr;
    }

    return retarr;
    }

    char** allocate_2d_array_of_chars(size_t m, size_t n)
    {
    /* adapted from "C unleashed", R Heath, L Kirby et al.*/
    /* allocates a 2D array of one contiguous chunk of memory */

    typedef char T;

    T **a;
    T *p;
    size_t Row;

    a=malloc(m * n * sizeof **a + m * sizeof *a);
    if (a != NULL) {
    for (Row = 0, p = (T *)a + m; Row < m; Row++, p+=n) {
    a[Row] = p;
    }
    }
    return a;
    }

    END OF LISTING
     
    Simon Schaap, Apr 7, 2004
    #1
    1. Advertising

  2. Simon Schaap

    Mark Henning Guest

    "Simon Schaap" wrote:
    > I have encountered a strange problem and I hope you can help me to
    > understand it. What I want to do is to pass an array of chars to a
    > function that will split it up (on every location where a * occurs in
    > the string). This split function should allocate a 2D array of chars
    > and put the split results in different rows. The listing below shows
    > how I started to work on this. To keep the program simple and help
    > focus the program the string is not actually split. The split function
    > in this case just allocates a 2D array of size 1 by the length of the
    > passed string and copies the entire input string into this newly
    > allocated array. A pointer to this array is then passed to the caller
    > function. By the way, allocating of these so called 2D arrays is done
    > by a funtion that I adapted from the book "C unleashed", R Heath, L
    > Kirby et al.
    >
    > Unfortunately, even this simple program escapes my comprehension. When
    > the caller function prints the chars in the 2D array it just received
    > from the split function, the first char turns out to be the '\0'
    > character! I am completely at loss here, where does this '\0'
    > character come from? I hope someone will find the time to enlight me.
    >
    > Sincerely,
    > Simon


    I have no idea if this is the 'correct' thing to do, but when dealing with
    2D arrays, i normally allocate an array of pointers, each pointing to an
    array.

    Something like:

    char **allocate_2d_array_of_chars(size_t m, size_t n)
    {
    int i;
    char **a = malloc(m * sizeof(char *));

    for(i = 0; i <= m; i++)
    {
    a = malloc(n * sizeof(char));
    }

    return a;
    }

    Note that this is untested code and contains no error checking.

    If you do this, you need to ensure that you iterate through the 'base' array
    and
    free() each individual element before freeing the array as a whole.
     
    Mark Henning, Apr 7, 2004
    #2
    1. Advertising

  3. On 7 Apr 2004 02:43:42 -0700, (Simon Schaap)
    wrote:

    >Hello,
    >I have encountered a strange problem and I hope you can help me to
    >understand it. What I want to do is to pass an array of chars to a
    >function that will split it up (on every location where a * occurs in
    >the string). This split function should allocate a 2D array of chars
    >and put the split results in different rows. The listing below shows
    >how I started to work on this. To keep the program simple and help
    >focus the program the string is not actually split. The split function
    >in this case just allocates a 2D array of size 1 by the length of the
    >passed string and copies the entire input string into this newly
    >allocated array. A pointer to this array is then passed to the caller


    No it doesn't copy the entire string. Your looping and space
    allocation is controlled by the value returned from strlen. strlen
    does not count the terminating '\0' which is a part of the string.
    What you would end up with (except for a problem to be discussed
    later) is an array of char containing the original contents of the
    string except for the '\0'. If you want the result to be strings you
    should compute strlen()+1 and use that for loop control and
    allocation.

    >function. By the way, allocating of these so called 2D arrays is done
    >by a funtion that I adapted from the book "C unleashed", R Heath, L
    >Kirby et al.


    I don't have the book so I don't know if you copied it wrong or the
    authors made the mistake described later.

    >
    >Unfortunately, even this simple program escapes my comprehension. When


    Before getting to the error, let's discuss the basic intent of the
    function. You want an array of strings. Since a string is an array
    of char, you want something that looks like an array of m strings
    which really means an array of m arrays of n char.

    If the original string has a total length (including the '\0') of n,
    then n is also the maximum for each resulting string and m*n is
    guaranteed to be large enough to hold all m strings.

    But you don't want to have to do address arithmetic every time you
    want to reference string i. The answer is to allocate space for m
    pointers. The i-th pointer will contain the starting address of the
    i-th string. Since everything about the strings is variable, the only
    thing you know for sure is the starting address of the allocated
    memory (a in your code). If the pointers are placed at the start of
    this memory, they can be referred to with normal subscript notation
    (a).

    So, you need to allocate space for m*n characters and m pointers. In
    your code, m * sizeof *a computes the space needed for m pointers and
    m *n * sizeof **a computes the space for the m strings of length n.
    Since the pointers come first, the first string will follow the last
    pointer. (While this is relatively safe for arrays of char, see
    comments below about potential alignment problems for arrays of long
    or double.)

    In the for statement, the first clause initialized Row as the index of
    the first pointer (a[0]) and attempts to initialize p as the address
    of the first string. (This is where the error occurs which I will get
    to later.) The second clause terminates the loop after processing m
    pointers. The third clause increments Row to be the index of the next
    pointer and increments p to point to the start of the next string.
    And then of course, the address is stored in the pointer.

    When it is all done, the allocated area of memory would look like
    |first pointer|second pointer|..................................................|
    |...............|last (m-th) pointer|space for first string|space for second string|...|
    |..................................|space for last (m-th) string|
    where the i-th pointer contains the starting address of the space for
    the i-th string.

    >the caller function prints the chars in the 2D array it just received
    >from the split function, the first char turns out to be the '\0'
    >character! I am completely at loss here, where does this '\0'
    >character come from? I hope someone will find the time to enlight me.


    Due to the error in the code explained below, the value stored in a[0]
    is only one byte beyond the value in a. (On my system, a is set to
    0x00780eb0 upon return from malloc and a[0] is set to 0x00780eb1.)

    When your allocate function returns to your split function, the for
    loop tries to copy the characters from where instr points to where
    retarr[0] points. As noted above retarr[0] actually points to one of
    the bytes in itself. (On a big-endian machine, it would point to the
    0x78; on a little-endian one it would point to the 0xeb.) On the
    first iteration through the for loop, this byte is replaced by the
    first character in instr ('*'). This has the affect of changing
    retar[0] so that it points somewhere else. This invokes undefined
    behavior and anything can happen.

    >
    >Sincerely,
    >Simon
    >
    >BEGIN OF LISTING:
    >
    >
    >#include <stdio.h>
    >#include <stdlib.h>
    >#include <string.h>
    >
    >
    >char** allocate_2d_array_of_chars(size_t m, size_t n);
    >char** split_string(char *instring);
    >
    >int main(void)
    >{
    > char **a=NULL;
    > char b[14]="*(10)*5*(1)*1";


    Better to omit the dimension let the compiler decide how big it needs
    to be.

    > int i;
    >
    > a=split_string(b);
    >
    > if (a!=NULL) {
    > for (i=0;i<14;i++) {


    Since you use strlen for allocation, when i is 13 the call to printf
    will invoke undefined behavior.

    > printf("%d %c\n", i, a[0]);
    > }
    > /* to show the strange behavior : */
    > if (a[0][0]=='\0') {
    > printf("a[0][0] equals '\0' \n Strange..., not?\n");
    > }
    > free(a);
    > }
    > return 0;
    >}
    >
    >char **split_string(char *instr)
    >{
    > int instrlen;
    > int i;
    > char **retarr;
    >
    > instrlen = strlen(instr);


    You need a +1 here to accommodate the '\0'.

    >
    > retarr = allocate_2d_array_of_chars(1,instrlen);
    > if (retarr==NULL) {
    > printf("could not allocate retarr\n");
    > return NULL;
    > }
    >
    > /* copy the string */
    > for (i=0;i<instrlen;i++) {
    > retarr[0]=instr;


    Without the +1, this will not copy the '\0'. When you actually split
    the string, this will be somewhat critical.

    > }
    >
    > return retarr;
    >}
    >
    >char** allocate_2d_array_of_chars(size_t m, size_t n)
    >{
    > /* adapted from "C unleashed", R Heath, L Kirby et al.*/
    > /* allocates a 2D array of one contiguous chunk of memory */
    >
    > typedef char T;
    >
    > T **a;
    > T *p;
    > size_t Row;
    >
    > a=malloc(m * n * sizeof **a + m * sizeof *a);
    > if (a != NULL) {
    > for (Row = 0, p = (T *)a + m; Row < m; Row++, p+=n) {


    Here is the error in the p= assignment. It stems from how C does
    pointer arithmetic. If Q is a pointer to type R (R *Q;), then the
    expression Q+i involves pointer arithmetic and is treated in our
    normal everyday integer arithmetic as Q + i*sizeof(R). That is, the
    expression Q+i points to the i-th object of type R past the one Q
    currently points to.

    By casting a as a T*, the expression (T*)a+m points to the m-th T
    after the one a currently points to. Since T is char and m is 1, it
    evaluates to the address of the first char after a, which is only one
    byte into the allocated area. Without the cast, a is a T** or, for
    this discussion, a pointer to T*. Then the expression evaluates to
    the m-th T* after the one a points to. T is still char but a char* is
    typically 4 bytes (the exact size doesn't matter). m is still 1 so
    the a+1 points to the first char* after the one a currently points to,
    which is typically 4 bytes beyond the address in a.

    Then, when you set a[0] to p in the next statement, the address stored
    will be that of the next byte beyond the pointer, which is where you
    really want the string to start.

    Now for the caution. If T is any type that has a more stringent
    alignment than T* (8 byte doubles and longs with 4 byte pointers would
    be an example), there is no guarantee that the value initially
    computed for p is properly aligned for the type T. On most systems,
    this is not a problem when T is char.

    > a[Row] = p;
    > }
    > }
    > return a;
    >}
    >
    >END OF LISTING




    <<Remove the del for email>>
     
    Barry Schwarz, Apr 8, 2004
    #3
  4. Simon Schaap

    Ravi Uday Guest

    <snip>

    > char** allocate_2d_array_of_chars(size_t m, size_t n)
    > {
    > /* adapted from "C unleashed", R Heath, L Kirby et al.*/
    > /* allocates a 2D array of one contiguous chunk of memory */
    >
    > typedef char T;
    >
    > T **a;
    > T *p;
    > size_t Row;
    >
    > a=malloc(m * n * sizeof **a + m * sizeof *a);
    > if (a != NULL) {
    > for (Row = 0, p = (T *)a + m; Row < m; Row++, p+=n) {
    > a[Row] = p;
    > }
    > }
    > return a;
    > }
    >


    way too complicated, you can replace with this one.

    char** allocate_2d_array_of_chars(size_t rows, size_t columns)
    {
    int i;
    char **db_array;

    db_array = malloc ( rows * sizeof *db_array);

    if ( db_array == NULL )
    {
    puts ("Unable to allocate.. returning");
    return NULL;
    }

    for ( i = 0; i<rows; i++)
    {
    db_array = malloc ( columns * sizeof *db_array);
    if ( db_array == NULL )/* Handle errors appropriately. */
    printf ("Unable to allocate db_array[%d]\n", i);
    }

    return db_array;
    }

    For freeing, you can use the one below.

    void free_2d_array_of_chars( char **db_array, size_t rows)
    {
    int i;

    for ( i = 0; i<rows; i++)
    free ( db_array );

    free (db_array);
    }

    - Ravi
     
    Ravi Uday, Apr 8, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dhara
    Replies:
    1
    Views:
    1,216
  2. Kosio

    Floats to chars and chars to floats

    Kosio, Sep 16, 2005, in forum: C Programming
    Replies:
    44
    Views:
    1,342
    Tim Rentsch
    Sep 23, 2005
  3. Hongyu
    Replies:
    9
    Views:
    965
    James Kanze
    Aug 8, 2008
  4. Coolm@x

    Copy string of chars

    Coolm@x, Jan 17, 2010, in forum: C Programming
    Replies:
    2
    Views:
    623
    Eric Sosman
    Jan 17, 2010
  5. M.Posseth

    receiving ??? chars instead of "special" chars

    M.Posseth, Nov 15, 2004, in forum: ASP .Net Web Services
    Replies:
    3
    Views:
    285
    Dan Rogers
    Nov 16, 2004
Loading...

Share This Page