Printing a char* which is not a string: I do not understand this

Discussion in 'C Programming' started by Hendrik Maryns, Jan 24, 2008.

  1. Hi group,

    I am working on a JNI project. However, since things didn’t work the
    way I wanted them, I wrote some utility functions. One of them is a
    function to print the following struct to stdout:

    typedef unsigned mgState;
    typedef unsigned mgId;
    typedef char *mA;

    typedef struct mgTreeNode { /* tree node */
    mA a; /* alphabet element */
    struct mgTreeNode *left, *right; /* successors */
    mgId id; /* state space id */
    mgState state; /* automaton state */
    } mgTreeNode;

    Important to notice here is that, although mA is typedef’ed to char*, it
    is not a string, but really an array of char, containing 0 and 1 (so
    some primitive form of bit vector). However, when printing it, I want
    to see '0' and '1', of course.

    My first try was the following:

    void printTreeNode(mgTreeNode * node, int labelLength) {
    int i;
    char *label = malloc(labelLength);
    for (i = 0; i < labelLength; ++i) {
    label = node->a + '0';
    }
    printf("a: %s, id: %d, state: %d, left: [", label, node->id, node->state);
    if (node->left) {
    printTreeNode(node->left, labelLength);
    } else {
    printf("nil");
    }
    printf("], right: [");
    if (node->right) {
    printTreeNode(node->right, labelLength);
    } else {
    printf("nil");
    }
    printf("]");
    fflush(stdout);
    free(label);
    }

    There definitely are more elegant ways to do it, but hey, I’m a real
    noob in C.

    Now, when trying this out in a C program, it gave the expected results:
    a: 000000, id: 0, state: 0, left: [a: 100000, id: 0, state: 0, left: [a:
    010001, id: 0, state: 0, … (continued recursively)

    However, I wrapped this function with SWIG (http://www.swig.org/), and
    when calling it from Java, I get to see the following stuff:
    a: 0�\Ӫ*, id: 0, state: 0, left: [a: 0�\Ӫ*, id: 0, state: 0, left: [a:
    0�\Ӫ*, …

    So some encoding problem is getting in the way. Maybe this is the time
    to say that I am on 64-bit Linux, where UTF-8 is the system standard.
    Note that those characters are being produced on the C side, the string
    is not passed to Java (I tried that as well, with the same result).

    I then messed around a bit and came up with the following:

    void printTreeNode(mgTreeNode * node, int labelLength) {
    int i;
    printf("a: ");
    for (i = 0; i < labelLength; ++i) {
    printf("%c", node->a + '0');
    }
    printf(", id: %d, state: %d, left: [", node->id, node->state);
    if (node->left) {
    printTreeNode(node->left, labelLength);
    } else {
    printf("nil");
    }
    printf("], right: [");
    if (node->right) {
    printTreeNode(node->right, labelLength);
    } else {
    printf("nil");
    }
    printf("]");
    fflush(stdout);
    }

    You’ll notice a little difference in that the ‘label’ variable is no
    longer used and the chars are printed directly.

    My question: what the hell causes this strange stuff?

    Grateful for any clarifications, H.
    --
    Hendrik Maryns
    http://tcl.sfs.uni-tuebingen.de/~hendrik/
    ==================
    http://aouw.org
    Ask smart questions, get good answers:
    http://www.catb.org/~esr/faqs/smart-questions.html


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.4-svn0 (GNU/Linux)
    Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

    iD8DBQFHmLl1e+7xMGD3itQRAu/kAJ9MEd80nzicyq8r/g96lBcNHB+ctACeKC8K
    J5jK2Uy3OYV7OmqOm2Qx8xM=
    =UfwX
    -----END PGP SIGNATURE-----
     
    Hendrik Maryns, Jan 24, 2008
    #1
    1. Advertising

  2. Hendrik Maryns

    Guest

    On Jan 24, 8:14 am, Hendrik Maryns <> wrote:
    > Hi group,
    >
    > I am working on a JNI project. However, since things didn't work the
    > way I wanted them, I wrote some utility functions. One of them is a
    > function to print the following struct to stdout:
    >
    > typedef unsigned mgState;
    > typedef unsigned mgId;
    > typedef char *mA;
    >
    > typedef struct mgTreeNode { /* tree node */
    > mA a; /* alphabet element */
    > struct mgTreeNode *left, *right; /* successors */
    > mgId id; /* state space id */
    > mgState state; /* automaton state */
    >
    > } mgTreeNode;
    >
    > Important to notice here is that, although mA is typedef'ed to char*, it
    > is not a string, but really an array of char, containing 0 and 1 (so
    > some primitive form of bit vector). However, when printing it, I want
    > to see '0' and '1', of course.
    >
    > My first try was the following:
    >
    > void printTreeNode(mgTreeNode * node, int labelLength) {
    > int i;
    > char *label = malloc(labelLength);
    > for (i = 0; i < labelLength; ++i) {
    > label = node->a + '0';
    > }
    > printf("a: %s, id: %d, state: %d, left: [", label, node->id, node->state);

    <snip>

    You do not show the code for storing information in node->a,
    so we can't tell whether a is even valid.

    The constant '0' is the character zero, which for ASCII
    is the integer 48. So you are adding 48 to each character??
    What is the purpose of this?

    Then you print using %s. But that requires a NULL-terminated
    array of characters. '\0' and '0' are two very different things.
    --
    Fred Kleinschmdit
     
    , Jan 24, 2008
    #2
    1. Advertising

  3. Re: Printing a char* which is not a string: I do not understandthis

    On Thu, 24 Jan 2008 12:13:18 -0800, fred.l.kleinschmidt wrote:
    > On Jan 24, 8:14 am, Hendrik Maryns <> wrote:
    >> Important to notice here is that, although mA is typedef'ed to char*,
    >> it is not a string, but really an array of char, containing 0 and 1 (so
    >> some primitive form of bit vector). However, when printing it, I want
    >> to see '0' and '1', of course.
    >>
    >> My first try was the following:
    >>
    >> void printTreeNode(mgTreeNode * node, int labelLength) {
    >> int i;
    >> char *label = malloc(labelLength);
    >> for (i = 0; i < labelLength; ++i) {
    >> label = node->a + '0';
    >> }
    >> printf("a: %s, id: %d, state: %d, left: [", label, node->id,
    >> node->state);

    > <snip>
    >
    > You do not show the code for storing information in node->a, so we can't
    > tell whether a is even valid.
    >
    > The constant '0' is the character zero, which for ASCII is the integer
    > 48. So you are adding 48 to each character?? What is the purpose of
    > this?


    a is stated to contain 0 or 1, and a+'0' converts it to '0' or '1',
    as intended.

    > Then you print using %s. But that requires a NULL-terminated array of
    > characters. '\0' and '0' are two very different things.


    This is correct, and exactly the problem. The argument to malloc should
    be labellength+1, and label[labellength] should be set to '\0'.
     
    Harald van Dijk, Jan 24, 2008
    #3
  4. Hendrik Maryns

    Guest

    On Jan 24, 10:13 pm, wrote:
    > On Jan 24, 8:14 am, Hendrik Maryns <> wrote:
    >
    > > Hi group,

    >
    > > I am working on a JNI project. However, since things didn't work the
    > > way I wanted them, I wrote some utility functions. One of them is a
    > > function to print the following struct to stdout:

    >
    > > typedef unsigned mgState;
    > > typedef unsigned mgId;
    > > typedef char *mA;

    >
    > > typedef struct mgTreeNode { /* tree node */
    > > mA a; /* alphabet element */
    > > struct mgTreeNode *left, *right; /* successors */
    > > mgId id; /* state space id */
    > > mgState state; /* automaton state */

    >
    > > } mgTreeNode;

    >
    > > Important to notice here is that, although mA is typedef'ed to char*, it
    > > is not a string, but really an array of char, containing 0 and 1 (so
    > > some primitive form of bit vector). However, when printing it, I want
    > > to see '0' and '1', of course.

    >
    > > My first try was the following:

    >
    > > void printTreeNode(mgTreeNode * node, int labelLength) {
    > > int i;
    > > char *label = malloc(labelLength);
    > > for (i = 0; i < labelLength; ++i) {
    > > label = node->a + '0';
    > > }
    > > printf("a: %s, id: %d, state: %d, left: [", label, node->id, node->state);

    >
    > <snip>
    >
    > You do not show the code for storing information in node->a,
    > so we can't tell whether a is even valid.
    >
    > The constant '0' is the character zero, which for ASCII
    > is the integer 48. So you are adding 48 to each character??

    '0' to '9' are guaranteed to be sequential.
    Therefore, '0' + 2 == '2' et cetera
     
    , Jan 24, 2008
    #4
  5. wrote:
    > <snip>
    > Then you print using %s. But that requires a NULL-terminated


    'null terminated' would be better; 'null byte terminated' is
    better still.

    > array of characters.


    Strictly speaking, it is possible to print a non string
    sequence of characters with the s conversion specifier,
    if you don't exceed the length of it...

    #include <stdio.h>

    int main(void)
    {
    const char huh[4] = { 'H', 'u', 'h', '!' };
    printf("%.4s\n", huh);
    return 0;
    }

    --
    Peter
     
    Peter Nilsson, Jan 25, 2008
    #5
  6. In article <>,
    Peter Nilsson <> wrote:

    >Strictly speaking, it is possible to print a non string
    >sequence of characters with the s conversion specifier,
    >if you don't exceed the length of it...
    >
    > #include <stdio.h>
    >
    > int main(void)
    > {
    > const char huh[4] = { 'H', 'u', 'h', '!' };
    > printf("%.4s\n", huh);
    > return 0;
    > }


    And even more usefully, the length does not need to be constant:

    printf("%.*s\n", len, buf);

    I have seen code from a certain well-known large software company that
    used malloc()/memcpy()/printf()/free() repeatedly to achieve this effect.

    -- Richard
    --
    :wq
     
    Richard Tobin, Jan 25, 2008
    #6
  7. Richard Tobin schreef:
    > In article <>,
    > Peter Nilsson <> wrote:
    >
    >> Strictly speaking, it is possible to print a non string
    >> sequence of characters with the s conversion specifier,
    >> if you don't exceed the length of it...
    >>
    >> #include <stdio.h>
    >>
    >> int main(void)
    >> {
    >> const char huh[4] = { 'H', 'u', 'h', '!' };
    >> printf("%.4s\n", huh);
    >> return 0;
    >> }

    >
    > And even more usefully, the length does not need to be constant:
    >
    > printf("%.*s\n", len, buf);
    >
    > I have seen code from a certain well-known large software company that
    > used malloc()/memcpy()/printf()/free() repeatedly to achieve this effect.


    Now that is a nice trick I will remember.

    Thank you all for your answers. We’re a little smarter now :)

    H.
    --
    Hendrik Maryns
    http://tcl.sfs.uni-tuebingen.de/~hendrik/
    ==================
    http://aouw.org
    Ask smart questions, get good answers:
    http://www.catb.org/~esr/faqs/smart-questions.html


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.4-svn0 (GNU/Linux)
    Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

    iD8DBQFHmf/we+7xMGD3itQRAvESAJ9GjPHdDLnvimfgUBiAOSMZp4NDhQCfQ9i5
    1LQ2zS+IKU+hHvAwpP+sC6U=
    =rEN2
    -----END PGP SIGNATURE-----
     
    Hendrik Maryns, Jan 25, 2008
    #7
  8. Re: Printing a char* which is not a string: I do not understandthis

    Harald van Dijk schreef:
    > On Thu, 24 Jan 2008 12:13:18 -0800, fred.l.kleinschmidt wrote:
    >> On Jan 24, 8:14 am, Hendrik Maryns <> wrote:
    >>> Important to notice here is that, although mA is typedef'ed to char*,
    >>> it is not a string, but really an array of char, containing 0 and 1 (so
    >>> some primitive form of bit vector). However, when printing it, I want
    >>> to see '0' and '1', of course.
    >>>
    >>> My first try was the following:
    >>>
    >>> void printTreeNode(mgTreeNode * node, int labelLength) {
    >>> int i;
    >>> char *label = malloc(labelLength);
    >>> for (i = 0; i < labelLength; ++i) {
    >>> label = node->a + '0';
    >>> }
    >>> printf("a: %s, id: %d, state: %d, left: [", label, node->id,
    >>> node->state);

    >> <snip>
    >>
    >> You do not show the code for storing information in node->a, so we can't
    >> tell whether a is even valid.


    Just suppose it is. As I said, there is a way to know the length of the
    array.

    >> The constant '0' is the character zero, which for ASCII is the integer
    >> 48. So you are adding 48 to each character?? What is the purpose of
    >> this?

    >
    > a is stated to contain 0 or 1, and a+'0' converts it to '0' or '1',
    > as intended.
    >
    >> Then you print using %s. But that requires a NULL-terminated array of
    >> characters. '\0' and '0' are two very different things.

    >
    > This is correct, and exactly the problem. The argument to malloc should
    > be labellength+1, and label[labellength] should be set to '\0'.


    I corrected the code as follows:

    char *printTreeNode(mgTreeNode * node, int labelLength) {
    int i;
    char *buffer;
    char *leftDaughter = 0;
    char *rightDaughter = 0;
    char *label = malloc(labelLength+1);
    for (i = 0; i < labelLength; ++i) {
    label = node->a + '0';
    }
    label[labelLength] = 0;
    if (node->left) {
    leftDaughter = printTreeNode(node->left, labelLength);
    }
    if (node->right) {
    rightDaughter = printTreeNode(node->right, labelLength);
    }
    buffer = malloc(500);
    sprintf(buffer, "a: %s, id: %d, state: %d, left: [%s], right: [%s]",
    label, node->id, node->state, leftDaughter, rightDaughter);
    free(label);
    if (leftDaughter) {
    free(leftDaughter);
    }
    if (rightDaughter) {
    free(rightDaughter);
    }
    return buffer;
    }

    Again, this gives correct output if I invoke it as a C program, but now
    the output I get when wrapping this function through JNI is the following:

    a: 0pš, id: 0, state: 0, left: [a: 0Ϳ, id: 0, state: 0, left:
    [(null)], right: [(null)]], right: [(null)

    So there is still something going wrong, and the output is chopped off
    arbitrarily. Do I have to make sure buffer is null-terminated as well?
    Maybe I should use calloc(labelLength + 1, sizeof(char)) for label?

    Thanks, H.
    --
    Hendrik Maryns
    http://tcl.sfs.uni-tuebingen.de/~hendrik/
    ==================
    http://aouw.org
    Ask smart questions, get good answers:
    http://www.catb.org/~esr/faqs/smart-questions.html


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.4-svn0 (GNU/Linux)
    Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

    iD8DBQFHmhL3e+7xMGD3itQRArGuAJsHjem1EKZVsA4hAwHNRD007WtwygCffPRC
    ZseDrb68Us6e9zaQ/JDsqdI=
    =7W6g
    -----END PGP SIGNATURE-----
     
    Hendrik Maryns, Jan 25, 2008
    #8
  9. [please ignore previous post] Re: Printing a char* which is not a

    Hendrik Maryns schreef:
    > Harald van Dijk schreef:
    >> On Thu, 24 Jan 2008 12:13:18 -0800, fred.l.kleinschmidt wrote:
    >>> On Jan 24, 8:14 am, Hendrik Maryns <> wrote:
    >>>> Important to notice here is that, although mA is typedef'ed to char*,
    >>>> it is not a string, but really an array of char, containing 0 and 1 (so
    >>>> some primitive form of bit vector). However, when printing it, I want
    >>>> to see '0' and '1', of course.
    >>>>
    >>>> My first try was the following:
    >>>>
    >>>> void printTreeNode(mgTreeNode * node, int labelLength) {
    >>>> int i;
    >>>> char *label = malloc(labelLength);
    >>>> for (i = 0; i < labelLength; ++i) {
    >>>> label = node->a + '0';
    >>>> }
    >>>> printf("a: %s, id: %d, state: %d, left: [", label, node->id,
    >>>> node->state);
    >>> <snip>
    >>>
    >>> You do not show the code for storing information in node->a, so we can't
    >>> tell whether a is even valid.

    >
    > Just suppose it is. As I said, there is a way to know the length of the
    > array.
    >
    >>> The constant '0' is the character zero, which for ASCII is the integer
    >>> 48. So you are adding 48 to each character?? What is the purpose of
    >>> this?

    >>
    >> a is stated to contain 0 or 1, and a+'0' converts it to '0' or
    >> '1', as intended.
    >>
    >>> Then you print using %s. But that requires a NULL-terminated array of
    >>> characters. '\0' and '0' are two very different things.

    >>
    >> This is correct, and exactly the problem. The argument to malloc
    >> should be labellength+1, and label[labellength] should be set to '\0'.

    >
    > I corrected the code as follows:
    >
    > char *printTreeNode(mgTreeNode * node, int labelLength) {
    > int i;
    > char *buffer;
    > char *leftDaughter = 0;
    > char *rightDaughter = 0;
    > char *label = malloc(labelLength+1);
    > for (i = 0; i < labelLength; ++i) {
    > label = node->a + '0';
    > }
    > label[labelLength] = 0;
    > if (node->left) {
    > leftDaughter = printTreeNode(node->left, labelLength);
    > }
    > if (node->right) {
    > rightDaughter = printTreeNode(node->right, labelLength);
    > }
    > buffer = malloc(500);
    > sprintf(buffer, "a: %s, id: %d, state: %d, left: [%s], right: [%s]",
    > label, node->id, node->state, leftDaughter, rightDaughter);
    > free(label);
    > if (leftDaughter) {
    > free(leftDaughter);
    > }
    > if (rightDaughter) {
    > free(rightDaughter);
    > }
    > return buffer;
    > }
    >
    > Again, this gives correct output if I invoke it as a C program, but now
    > the output I get when wrapping this function through JNI is the following:
    >
    > a: 0pš, id: 0, state: 0, left: [a: 0Ϳ, id: 0, state: 0, left:
    > [(null)], right: [(null)]], right: [(null)
    >
    > So there is still something going wrong, and the output is chopped off
    > arbitrarily. Do I have to make sure buffer is null-terminated as well?
    > Maybe I should use calloc(labelLength + 1, sizeof(char)) for label?


    I am sorry, please ignore this, I was invoking the function with an
    incorrect length parameter. Everything works fine now.

    H.
    --
    Hendrik Maryns
    http://tcl.sfs.uni-tuebingen.de/~hendrik/
    ==================
    http://aouw.org
    Ask smart questions, get good answers:
    http://www.catb.org/~esr/faqs/smart-questions.html


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.4-svn0 (GNU/Linux)
    Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

    iD8DBQFHmhRpe+7xMGD3itQRAv6lAJwI4DYJZBPhG/83akal5TSdg0HUbgCeKsyB
    E+Fz4I1s6ZaQHH0a5rLs7iI=
    =N+gh
    -----END PGP SIGNATURE-----
     
    Hendrik Maryns, Jan 25, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. wwj
    Replies:
    7
    Views:
    589
  2. wwj
    Replies:
    24
    Views:
    2,561
    Mike Wahler
    Nov 7, 2003
  3. Ben Pfaff
    Replies:
    5
    Views:
    496
    Tristan Miller
    Jan 17, 2004
  4. Steffen Fiksdal

    void*, char*, unsigned char*, signed char*

    Steffen Fiksdal, May 8, 2005, in forum: C Programming
    Replies:
    1
    Views:
    617
    Jack Klein
    May 9, 2005
  5. lovecreatesbeauty
    Replies:
    1
    Views:
    1,129
    Ian Collins
    May 9, 2006
Loading...

Share This Page