char * ptr = "hello" and char carray[] = "hello"

F

fei.liu

Consider the following sample code

char * ptr = "hello";
char carray[] = "hello";
int main(void){
}

What does the standard have to say about the storage requirement
about ptr and carray? Is it a fair statement that char *ptr will take
4 more bytes (on 32bit platform) in DATA segment? I have found
the statement true at least with gcc 2.96. I assume under certain
condition the compiler can optimize the storage away?

Thanks for your comments,

Fei
 
K

Keith Thompson

Consider the following sample code

char * ptr = "hello";
char carray[] = "hello";
int main(void){
}

What does the standard have to say about the storage requirement
about ptr and carray? Is it a fair statement that char *ptr will take
4 more bytes (on 32bit platform) in DATA segment? I have found
the statement true at least with gcc 2.96. I assume under certain
condition the compiler can optimize the storage away?

Any optimization that doens't affect the output of the program is
permitted. This includes eliminating unused objects.

Barring that, ptr will occupy sizeof(char*) bytes, the first string
literal "hello" will occupy 6 bytes, and carray will occupy another 6
bytes. All of these will have static storage duration, meaning that
they exist for the lifetime of the program. (C doesn't define
anything called a "DATA segment".)
 
A

Andrey Tarasevich

Consider the following sample code

char * ptr = "hello";
char carray[] = "hello";
int main(void){
}

What does the standard have to say about the storage requirement
about ptr and carray? Is it a fair statement that char *ptr will take
4 more bytes (on 32bit platform) in DATA segment?

No. Conceptually, no.

Conceptually, the first declaration creates a [non-modifyable] string
literal "hello" in static memory (occupying 6 bytes) and a pointer (say,
4 bytes) pointing to that string literal. This is 10 bytes total.

The second declaration creates a [non-modifyable] string literal "hello"
in static memory and also allocates a modifyable array of 6 chars, which
will be initialized by copying data from the string literal at program
startup. This requores 12 bytes total.

This means that the second delaration requires more memory that the
first. But that is a purely conceptual point of view.

In practice the compiler is allowed to merge identical string literals
and perform other types of optimizations, which might significantly
affect the memory consumption in cases like this.
 
P

pete

Andrey said:
Consider the following sample code

char * ptr = "hello";
char carray[] = "hello";
int main(void){
}

What does the standard have to say about the storage requirement
about ptr and carray? Is it a fair statement that char *ptr will take
4 more bytes (on 32bit platform) in DATA segment?

No. Conceptually, no.

Conceptually, the first declaration creates a [non-modifyable] string
literal "hello" in static memory (occupying 6 bytes) and a pointer (say,
4 bytes) pointing to that string literal. This is 10 bytes total.

The second declaration creates a
[non-modifyable] string literal "hello"
in static memory and also allocates
a modifyable array of 6 chars, which
will be initialized by copying data from the string literal at program
startup. This requores 12 bytes total.

This means that the second delaration requires more memory that the
first. But that is a purely conceptual point of view.

I disagree about the semantics of the second.
char carray[] = "hello";
is shorthand for
char carray[] = {'h','e','l','l','o','\n'};
which means that the initialiser for the array
can be embedded in the opcode and that
the initialiser need not exist in the same kind of
memory as other string literals.
 
P

pete

pete said:
Andrey said:
Consider the following sample code

char * ptr = "hello";
char carray[] = "hello";
int main(void){
}

What does the standard have to say about the storage requirement
about ptr and carray? Is it a fair statement that char *ptr will take
4 more bytes (on 32bit platform) in DATA segment?

No. Conceptually, no.

Conceptually, the first declaration creates a [non-modifyable] string
literal "hello" in static memory (occupying 6 bytes) and a pointer (say,
4 bytes) pointing to that string literal. This is 10 bytes total.

The second declaration creates a
[non-modifyable] string literal "hello"
in static memory and also allocates
a modifyable array of 6 chars, which
will be initialized by copying data from the string literal at program
startup. This requores 12 bytes total.

This means that the second delaration requires more memory that the
first. But that is a purely conceptual point of view.

I disagree about the semantics of the second.
char carray[] = "hello";
is shorthand for
char carray[] = {'h','e','l','l','o','\n'};

I meant:
char carray[] = {'h','e','l','l','o','\0'};
 
A

Andrey Tarasevich

pete said:
...
Consider the following sample code

char * ptr = "hello";
char carray[] = "hello";
int main(void){
}

What does the standard have to say about the storage requirement
about ptr and carray? Is it a fair statement that char *ptr will take
4 more bytes (on 32bit platform) in DATA segment?

No. Conceptually, no.

Conceptually, the first declaration creates a [non-modifyable] string
literal "hello" in static memory (occupying 6 bytes) and a pointer (say,
4 bytes) pointing to that string literal. This is 10 bytes total.

The second declaration creates a
[non-modifyable] string literal "hello"
in static memory and also allocates
a modifyable array of 6 chars, which
will be initialized by copying data from the string literal at program
startup. This requores 12 bytes total.

This means that the second delaration requires more memory that the
first. But that is a purely conceptual point of view.

I disagree about the semantics of the second.
char carray[] = "hello";
is shorthand for
char carray[] = {'h','e','l','l','o','\n'};
which means that the initialiser for the array
can be embedded in the opcode and that
the initialiser need not exist in the same kind of
memory as other string literals.

Hm... I agree that it "need not exist". However, I don't see anything in the
standard that would confirm the equivalence to the above "shorthand".

According to the standard (once again - conceptually) each string literal is a
non-modifiable array of static storage duration. No exception is made for the
situation when the literal is used as an array initializer. And when it is used
as an initializer for a char array, according to 6.7.8/14, the characters of the
literal (i.e. of the aforementioned static array, as I understand it) initialize
the elements of the char array.

It is quite possible that the intent of 6.7.8/14 was different from how I
understood it. Maybe what you are saying is indeed closer to what the standard
intended to say. Anyway, in practice it is a moot point, since in practice
there's indeed no need to keep the initializer as a separate array.
 
P

pete

Andrey said:
...
Consider the following sample code

char * ptr = "hello";
char carray[] = "hello";
int main(void){
}

What does the standard have to say about the storage requirement
about ptr and carray? Is it a fair statement that char *ptr will take
4 more bytes (on 32bit platform) in DATA segment?

No. Conceptually, no.

Conceptually, the first declaration creates a [non-modifyable] string
literal "hello" in static memory (occupying 6 bytes) and a pointer (say,
4 bytes) pointing to that string literal. This is 10 bytes total.

The second declaration creates a
[non-modifyable] string literal "hello"
in static memory and also allocates
a modifyable array of 6 chars, which
will be initialized by copying data from the string literal at program
startup. This requores 12 bytes total.

This means that the second delaration requires more memory that the
first. But that is a purely conceptual point of view.

I disagree about the semantics of the second.
char carray[] = "hello";
is shorthand for
char carray[] = {'h','e','l','l','o','\n'};

Change that '\n' to '\0'
Hm... I agree that it "need not exist".
However, I don't see anything in the
standard that would confirm the equivalence to the above "shorthand".

N869
6.7.8 Initialization
[#32] EXAMPLE 8 The declaration
char s[] = "abc", t[3] = "abc";
defines ``plain'' char array objects s and t whose elements
are initialized with character string literals. This
declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

It's also stated more plainly at the end of K&R,
section 4.9 Initialization,
which actually uses the word "shorthand".
 
F

fei.liu

pete said:
Andrey said:
...
Consider the following sample code

char * ptr = "hello";
char carray[] = "hello";
int main(void){
}

What does the standard have to say about the storage requirement
about ptr and carray? Is it a fair statement that char *ptr will take
4 more bytes (on 32bit platform) in DATA segment?

No. Conceptually, no.

Conceptually, the first declaration creates a [non-modifyable] string
literal "hello" in static memory (occupying 6 bytes) and a pointer (say,
4 bytes) pointing to that string literal. This is 10 bytes total.

The second declaration creates a
[non-modifyable] string literal "hello"
in static memory and also allocates
a modifyable array of 6 chars, which
will be initialized by copying data from the string literal at program
startup. This requores 12 bytes total.

This means that the second delaration requires more memory that the
first. But that is a purely conceptual point of view.

I disagree about the semantics of the second.
char carray[] = "hello";
is shorthand for
char carray[] = {'h','e','l','l','o','\n'};

Change that '\n' to '\0'
Hm... I agree that it "need not exist".
However, I don't see anything in the
standard that would confirm the equivalence to the above "shorthand".

N869
6.7.8 Initialization
[#32] EXAMPLE 8 The declaration
char s[] = "abc", t[3] = "abc";
defines ``plain'' char array objects s and t whose elements
are initialized with character string literals. This
declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

It's also stated more plainly at the end of K&R,
section 4.9 Initialization,
which actually uses the word "shorthand".

This is how gcc handles char s[], it's put in .data segment and clearly
is treated using the 'shorthand' approach.

#include <stdio.h>

static char * ptr = "hello";
int x = 0x41414141;
static char ptr8[] = "hello888";
int y = 0x42424242;
char ptr5[] = "hello";
int z = 0x43434343;
static char ptr8a[8] = "hello888"; // I got confused here between
ptr8 and ptr8a
int u = 0x42424242;

int main(void){

int i;
for(i = 0; i < 9; i ++)
printf("%d %c\n", i, ptr8);
if((unsigned char)ptr8a[8] == 0x42)
printf("not null terminated\n");
if((unsigned char)ptr5[5] != 0x43)
printf("null terminated, aligned on 8 byte boundary\n");

printf("ptr[0] = %c\n", ptr[0]);
}

Contents of section .rodata:
80485e0 03000000 01000200 00000000 00000000 ................
80485f0 00000000 00000000 00000000 00000000 ................
8048600 68656c6c 6f002564 2025630a 006e6f74 hello.%d %c..not
8048610 206e756c 6c207465 726d696e 61746564 null terminated
8048620 0a000000 00000000 00000000 00000000 ................
8048630 00000000 00000000 00000000 00000000 ................
8048640 6e756c6c 20746572 6d696e61 7465642c null terminated,
8048650 20616c69 676e6564 206f6e20 38206279 aligned on 8 by
8048660 74652062 6f756e64 6172790a 00707472 te boundary..ptr
8048670 5b305d20 3d202563 0a00 [0] = %c..
Contents of section .data:
804967c 00000000 00000000 cc960408 00000000 ................
804968c 00860408 41414141 68656c6c 6f383838 ....AAAAhello888
804969c 00000000 42424242 68656c6c 6f000000 ....BBBBhello...
80496ac 43434343 68656c6c 6f383838 42424242 CCCChello888BBBB
 
A

Andrey Tarasevich

pete said:
...
Hm... I agree that it "need not exist".
However, I don't see anything in the
standard that would confirm the equivalence to the above "shorthand".

N869
6.7.8 Initialization
[#32] EXAMPLE 8 The declaration
char s[] = "abc", t[3] = "abc";
defines ``plain'' char array objects s and t whose elements
are initialized with character string literals. This
declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

It's also stated more plainly at the end of K&R,
section 4.9 Initialization,
which actually uses the word "shorthand".
...

OK, I agree, you are right.

One thing that was bothering me is that the situation when a string
literal is used as an initializer for a 'char[]' array is sometimes
mentioned as an example of a context when the array-to-pointer decay
does not take place (C FAQ 6.3, for example,
http://c-faq.com/aryptr/aryptrequiv.html). I'd say that, taking what you
said into account, this context does not really qualify as an example,
since there's no "array" here at all. In such context string literal is
nothing more than a piece of syntactic sugar, a shorthand form of
aggregate initializer, which does not really represent any array by
itself. Since there's no array, the issue of array-to-pointer decay is
irrelevant in such context. The reference to string literal initializer
in 6.3 is misleading.
 
A

Andrey Tarasevich

Andrey said:
pete said:
...
Hm... I agree that it "need not exist".
However, I don't see anything in the
standard that would confirm the equivalence to the above "shorthand".

N869
6.7.8 Initialization
[#32] EXAMPLE 8 The declaration
char s[] = "abc", t[3] = "abc";
defines ``plain'' char array objects s and t whose elements
are initialized with character string literals. This
declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

It's also stated more plainly at the end of K&R,
section 4.9 Initialization,
which actually uses the word "shorthand".
...

OK, I agree, you are right.
...

(This is a rather old topic, but I still think it is worth updating it.)

I was to quick to agree. After coming across this issue several times and
reviewing the language in the standard, I have to revoke my agreement. Sorry,
the standard explicitly refers to the "abc" initializer as _string_ _literal_,
which is explicitly guaranteed to become an object of static storage duration.

What is meant buy the test in the example quoted above is that this kind of
initialization is _functionally_ identical to the aggregate form, but it does
not mean that it is _semantically_ equivalent at the abstract language level. It
is not. Of course, the "as-if" rules allow compilers to threat them identically,
but conceptually in C the literal form _is_ _not_ a shorthand for aggregate form.

What K&R says is, of course, non-normative.
 
P

pete

Andrey said:
Andrey said:
pete said:
...
Hm... I agree that it "need not exist".
However, I don't see anything in the
standard that would confirm the equivalence to the above "shorthand".

N869
6.7.8 Initialization
[#32] EXAMPLE 8 The declaration
char s[] = "abc", t[3] = "abc";
defines ``plain'' char array objects s and t whose elements
are initialized with character string literals. This
declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

It's also stated more plainly at the end of K&R,
section 4.9 Initialization,
which actually uses the word "shorthand".
...

OK, I agree, you are right.
...

(This is a rather old topic,
but I still think it is worth updating it.)

I was to quick to agree.
After coming across this issue several times and
reviewing the language in the standard,
I have to revoke my agreement. Sorry,
the standard explicitly refers to the "abc"
initializer as _string_ _literal_,
which is explicitly guaranteed to become an object
of static storage duration.

What is meant buy the test in the example quoted above
is that this kind of
initialization is _functionally_ identical to the aggregate form,
but it does
not mean that it is _semantically_ equivalent at
the abstract language level. It is not.
Of course, the "as-if" rules allow compilers to threat them
identically,
but conceptually in C the literal form _is_ _not_ a
shorthand for aggregate form.

What K&R says is, of course, non-normative.

I don't see how "This declaration is identical to ..."
can possibley mean that one declaration creates
an object of static duration while the other does not.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top