Regarding initialization of arrays, and the fact that, e.g.,
char s[HUGE_NUMBER] = "";
has to fill all of the elements of s[] with zero:
Specifically dealing with automatic storage, it makes no sense to
me. The compiler has to generate code to perform the
initialization, which is a foolishly wasteful endeavor when the
code writer is right there to create such code. Untangling all
those braces and commas to create a herd of assignment statements
does not seem to be the best use of the compiler writers time, nor
of the compiler itself.
I am not sure what you mean by "untangling all those braces and
commas" -- the original example has none at all, for instance.
Crossed to c.std.c, where maybe someone knows of a liberating
clause that enables what I consider common sense.
I am not sure what you "consider common sense" either, but it is
worth noting the historical progression here.
In C89, initializers for automatic aggregates (arrays, structures,
unions, arrays of structures containing unions, and so on) always
had to consist entirely of constant-expressions. That is:
struct S { int a, b; };
struct S static_s = { 1, 2 }; /* static_s.a = 1, static_s.b = 2 */
void c89(void) {
struct S auto_s = { 3, 4 }; /* auto_s.a = 3, auto_s.b = 4 */
...
}
is legal in both C89 and C99. C99 now allows "non-constants" in
automatic aggregate initializers:
void c99(int x, int y) {
struct S auto_s = { x, y }; /* OK in C99, error in C89 */
...
}
If we restrict ourselves to the C89 system, we can immediately see
that *all* aggregate initializers can *always* be compiled "as if"
they were initializing some static instance, plus a call to memcpy()
to copy it to any automatic instance. Clearly static_s can simply
be generated at compile time, as something like:
.data
static_s:
.word 1
.word 2
In c89(), auto_s can be "compiled" as:
c89:
// setup code if any
.rodata
.L1: // initializer for auto_s
.word 3
.word 4
.text
lea .L1,a0 // &static_initializer
lea 4(fp),a1 // &auto_s
mov 8,d0 // sizeof(struct S)
call memcpy // memcpy 8 bytes from .L1 to 4(fp)
Any C89 compiler can do this for all automatic initializers, because
they have the same constraints as static initializers. There is, in
effect, no "code penalty" for arbitrarily large automatic data structures
with initializers -- the same memcpy() that handles a small 4 or 8 byte
structure also handles the 40000 byte zero-filled string. Of course,
there is a (possibly huge) *data* penalty, and at least some compilers
"just happen" to implement this as a call to memset -- for instance,
gcc will handle:
void f(void) {
char line[100] = "hello";
...
}
as something like:
.L1: .asciz "hello"; .align 4
f:
sub 128,sp
lea .L1,a0
lea 8(fp),a1
mov 6,d0
call memcpy // line[0] through line[5] inclusive
lea 14(fp),a0
mov 0,d0
mov 94,d1
call memset // zero out line[6] through line[99]
(the last time I dealt with it, gcc did indeed copy the '\0' that
terminates the string literal, then memset the "default zero" bytes,
even though some cases would benefit from noticing that the implied
'\0' at the end of the string literal is the *same* zero as the
"default" zero, so that we could use .ascii instead of .asciz,
etc.).
C99 removes the "must be constant" constraint, along with adding
anonymous aggregates whose members need not be constant themselves,
so C99 compilers do indeed have to "do more work" than C89 compilers.
The same trick works though: the compiler can call memset() to fill
memory regions with zero bytes (if that suffices; it can call other
runtime support routines to "default initialize" other data types).
Some source fragments are best handled "as if" the aggregate
initializer were simply exploded out into a series of simple
initializations, of course. The obvious degenerate case is a
struct containing a single ordinary data object:
struct temperature { int val; };
struct squarefootage { int val; };
...
void f(void) {
struct temperature = { NOT_TERRIBLY_WARM };
struct squarefootage = { VERY_LARGE_AREA };
...
heat_function(temperature);
area_function(squarefootage);
...
}
There is no need to set one "int" to some value via memcpy(), just
because the user made sure not to pass a temperature to an
area-function.