char * ptr = "hello" and char carray[] = "hello"

Discussion in 'C Programming' started by fei.liu, Jun 23, 2006.

  1. fei.liu

    fei.liu Guest

    Consider the following sample code

    char * ptr = "hello";
    char carray[] = "hello";
    int main(void){

    What does the standard have to say about the storage requirement
    about ptr and carray? Is it a fair statement that char *ptr will take
    4 more bytes (on 32bit platform) in DATA segment? I have found
    the statement true at least with gcc 2.96. I assume under certain
    condition the compiler can optimize the storage away?

    Thanks for your comments,

    fei.liu, Jun 23, 2006
    1. Advertisements

  2. Any optimization that doens't affect the output of the program is
    permitted. This includes eliminating unused objects.

    Barring that, ptr will occupy sizeof(char*) bytes, the first string
    literal "hello" will occupy 6 bytes, and carray will occupy another 6
    bytes. All of these will have static storage duration, meaning that
    they exist for the lifetime of the program. (C doesn't define
    anything called a "DATA segment".)
    Keith Thompson, Jun 23, 2006
    1. Advertisements

  3. No. Conceptually, no.

    Conceptually, the first declaration creates a [non-modifyable] string
    literal "hello" in static memory (occupying 6 bytes) and a pointer (say,
    4 bytes) pointing to that string literal. This is 10 bytes total.

    The second declaration creates a [non-modifyable] string literal "hello"
    in static memory and also allocates a modifyable array of 6 chars, which
    will be initialized by copying data from the string literal at program
    startup. This requores 12 bytes total.

    This means that the second delaration requires more memory that the
    first. But that is a purely conceptual point of view.

    In practice the compiler is allowed to merge identical string literals
    and perform other types of optimizations, which might significantly
    affect the memory consumption in cases like this.
    Andrey Tarasevich, Jun 23, 2006
  4. fei.liu

    pete Guest

    I disagree about the semantics of the second.
    char carray[] = "hello";
    is shorthand for
    char carray[] = {'h','e','l','l','o','\n'};
    which means that the initialiser for the array
    can be embedded in the opcode and that
    the initialiser need not exist in the same kind of
    memory as other string literals.
    pete, Jun 23, 2006
  5. fei.liu

    pete Guest

    I meant:
    char carray[] = {'h','e','l','l','o','\0'};
    pete, Jun 23, 2006
  6. Hm... I agree that it "need not exist". However, I don't see anything in the
    standard that would confirm the equivalence to the above "shorthand".

    According to the standard (once again - conceptually) each string literal is a
    non-modifiable array of static storage duration. No exception is made for the
    situation when the literal is used as an array initializer. And when it is used
    as an initializer for a char array, according to 6.7.8/14, the characters of the
    literal (i.e. of the aforementioned static array, as I understand it) initialize
    the elements of the char array.

    It is quite possible that the intent of 6.7.8/14 was different from how I
    understood it. Maybe what you are saying is indeed closer to what the standard
    intended to say. Anyway, in practice it is a moot point, since in practice
    there's indeed no need to keep the initializer as a separate array.
    Andrey Tarasevich, Jun 23, 2006
  7. fei.liu

    pete Guest

    Change that '\n' to '\0'
    6.7.8 Initialization
    [#32] EXAMPLE 8 The declaration
    char s[] = "abc", t[3] = "abc";
    defines ``plain'' char array objects s and t whose elements
    are initialized with character string literals. This
    declaration is identical to
    char s[] = { 'a', 'b', 'c', '\0' },
    t[] = { 'a', 'b', 'c' };

    It's also stated more plainly at the end of K&R,
    section 4.9 Initialization,
    which actually uses the word "shorthand".
    pete, Jun 23, 2006
  8. fei.liu

    fei.liu Guest

    This is how gcc handles char s[], it's put in .data segment and clearly
    is treated using the 'shorthand' approach.

    #include <stdio.h>

    static char * ptr = "hello";
    int x = 0x41414141;
    static char ptr8[] = "hello888";
    int y = 0x42424242;
    char ptr5[] = "hello";
    int z = 0x43434343;
    static char ptr8a[8] = "hello888"; // I got confused here between
    ptr8 and ptr8a
    int u = 0x42424242;

    int main(void){

    int i;
    for(i = 0; i < 9; i ++)
    printf("%d %c\n", i, ptr8);
    if((unsigned char)ptr8a[8] == 0x42)
    printf("not null terminated\n");
    if((unsigned char)ptr5[5] != 0x43)
    printf("null terminated, aligned on 8 byte boundary\n");

    printf("ptr[0] = %c\n", ptr[0]);

    Contents of section .rodata:
    80485e0 03000000 01000200 00000000 00000000 ................
    80485f0 00000000 00000000 00000000 00000000 ................
    8048600 68656c6c 6f002564 2025630a 006e6f74 hello.%d %c..not
    8048610 206e756c 6c207465 726d696e 61746564 null terminated
    8048620 0a000000 00000000 00000000 00000000 ................
    8048630 00000000 00000000 00000000 00000000 ................
    8048640 6e756c6c 20746572 6d696e61 7465642c null terminated,
    8048650 20616c69 676e6564 206f6e20 38206279 aligned on 8 by
    8048660 74652062 6f756e64 6172790a 00707472 te boundary..ptr
    8048670 5b305d20 3d202563 0a00 [0] = %c..
    Contents of section .data:
    804967c 00000000 00000000 cc960408 00000000 ................
    804968c 00860408 41414141 68656c6c 6f383838 ....AAAAhello888
    804969c 00000000 42424242 68656c6c 6f000000 ....BBBBhello...
    80496ac 43434343 68656c6c 6f383838 42424242 CCCChello888BBBB
    fei.liu, Jun 23, 2006
  9. OK, I agree, you are right.

    One thing that was bothering me is that the situation when a string
    literal is used as an initializer for a 'char[]' array is sometimes
    mentioned as an example of a context when the array-to-pointer decay
    does not take place (C FAQ 6.3, for example, I'd say that, taking what you
    said into account, this context does not really qualify as an example,
    since there's no "array" here at all. In such context string literal is
    nothing more than a piece of syntactic sugar, a shorthand form of
    aggregate initializer, which does not really represent any array by
    itself. Since there's no array, the issue of array-to-pointer decay is
    irrelevant in such context. The reference to string literal initializer
    in 6.3 is misleading.
    Andrey Tarasevich, Jun 23, 2006
  10. (This is a rather old topic, but I still think it is worth updating it.)

    I was to quick to agree. After coming across this issue several times and
    reviewing the language in the standard, I have to revoke my agreement. Sorry,
    the standard explicitly refers to the "abc" initializer as _string_ _literal_,
    which is explicitly guaranteed to become an object of static storage duration.

    What is meant buy the test in the example quoted above is that this kind of
    initialization is _functionally_ identical to the aggregate form, but it does
    not mean that it is _semantically_ equivalent at the abstract language level. It
    is not. Of course, the "as-if" rules allow compilers to threat them identically,
    but conceptually in C the literal form _is_ _not_ a shorthand for aggregate form.

    What K&R says is, of course, non-normative.
    Andrey Tarasevich, Nov 8, 2006
  11. fei.liu

    pete Guest

    I don't see how "This declaration is identical to ..."
    can possibley mean that one declaration creates
    an object of static duration while the other does not.
    pete, Nov 9, 2006
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.