Declaration vs definition of array

N

Noob

Hello everyone,

I was playing around with arrays, when I noticed something
I don't understand.

Consider the following code:

extern int u[10];
int v[10];
int x[10] = { };
int y[10] = { 0 };
int z[10] = { 42 };

y and z are proper array definitions, while u is merely
a declaration. But what about v and x?

On my platform, objdump says:

00000000 g O .data 00000028 _z
00000000 g O .bss 00000028 _y
00000028 g O .bss 00000028 _x
00000028 O *COM* 00000004 _v

No mention of u, which is expected for a declaration.
y in bss, expected since it's all-0.
z in data, expected because it's not all-0.

Apparently, x is equivalent to y.
So int x[10] = { }; is a proper definition?
(Can you cite C&V allowing this syntax?)

I'm puzzled with v. It seems to have been considered
a pointer? (Since the size is 4.) Did my compiler
consider this a tentative definition?

I would have expected v to be equivalent to x and y.
I suppose I was wrong, given the output?

So I have to use x or y syntax to define an empty array?

Relevant links
http://stackoverflow.com/questions/2331584/global-variable-implementation
http://david.tribble.com/text/cdiffs.htm#C99-odr

Regards.
 
J

James Kuyper

Hello everyone,

I was playing around with arrays, when I noticed something
I don't understand.

Consider the following code:

extern int u[10];
int v[10];
int x[10] = { };
int y[10] = { 0 };
int z[10] = { 42 };

y and z are proper array definitions, while u is merely
a declaration. But what about v and x?

The array 'v' is a tentative definition, since no initializers are
provided. It is covered by 6.9.2p2: "... If a translation unit contains
one or more tentative definitions for an identifier, and the translation
unit contains no external definition for that identifier, then the
behavior is exactly as if the translation unit contains a file scope
declaration of that identifier, with the composite type as of the end of
the translation unit, with an initializer equal to 0."

It has been pointed out that, technically, this wording implies that the
equivalent declaration should be:

int v[10] = 0;

Which would be a constraint violation (6.7.9p16) because there should be
braces around that '0'. However, the intent of the committee was almost
certainly that the equivalent declaration be:

int v[10] = {0};

Which would result in the entire array being zero-initialized, and that
is in fact the way that most (all?) real-world conforming C compilers
interpret it.

You might think that x is a similar case, since no initializers are
explicitly provided there, either. However, a brace-enclosed initializer
list is itself an initializer - but an initializer consisting of braces
that don't enclose an initializer list is a syntax error (6.7.9p1), and
should therefore have triggered a diagnostic message. If your compiler
didn't provide one, you may need to set the warning level higher. gcc
requires the '-ansi -pedantic' options in order to properly diagnose
this issue. Your compiler may have similar requirements.
On my platform, objdump says:

00000000 g O .data 00000028 _z
00000000 g O .bss 00000028 _y
00000028 g O .bss 00000028 _x
00000028 O *COM* 00000004 _v

No mention of u, which is expected for a declaration.
y in bss, expected since it's all-0.
z in data, expected because it's not all-0.

Apparently, x is equivalent to y.
So int x[10] = { }; is a proper definition?
(Can you cite C&V allowing this syntax?)
No.

I'm puzzled with v. It seems to have been considered
a pointer? (Since the size is 4.) Did my compiler
consider this a tentative definition?

It should have set aside enough room for 10 ints, zero-initialized. I'm
not sufficiently familiar with objdump to be sure how to interpret those
results; perhaps it works correctly when linked with other code to make
a complete program?
I would have expected v to be equivalent to x and y.
I suppose I was wrong, given the output?

So I have to use x or y syntax to define an empty array?

The syntax you used for v should be sufficient, but y will work as well.
The syntax you used for x should not have worked.
 
N

Noob

James said:
Noob said:
extern int u[10];
int v[10];
int x[10] = { };
int y[10] = { 0 };
int z[10] = { 42 };

y and z are proper array definitions, while u is merely
a declaration. But what about v and x?

The array 'v' is a tentative definition, since no initializers are
provided. It is covered by 6.9.2p2: "... If a translation unit contains
one or more tentative definitions for an identifier, and the translation
unit contains no external definition for that identifier, then the

What's an external definition?

The following is just a declaration, right?
extern int v[10];

(For the record, adding that declaration to my source file doesn't
change the compiler's output.)
behavior is exactly as if the translation unit contains a file scope
declaration of that identifier, with the composite type as of the end of
the translation unit, with an initializer equal to 0."

It has been pointed out that, technically, this wording implies that the
equivalent declaration should be:

int v[10] = 0;

Which would be a constraint violation (6.7.9p16) because there should be
braces around that '0'. However, the intent of the committee was almost
certainly that the equivalent declaration be:

int v[10] = {0};

Which would result in the entire array being zero-initialized, and that
is in fact the way that most (all?) real-world conforming C compilers
interpret it.

You might think that x is a similar case, since no initializers are
explicitly provided there, either. However, a brace-enclosed initializer
list is itself an initializer - but an initializer consisting of braces
that don't enclose an initializer list is a syntax error (6.7.9p1), and
should therefore have triggered a diagnostic message. If your compiler
didn't provide one, you may need to set the warning level higher. gcc
requires the '-ansi -pedantic' options in order to properly diagnose
this issue. Your compiler may have similar requirements.

Indeed! So this is a gcc extension. Good to know.

$ sh4gcc -std=c99 -pedantic -Wall -Wextra -c -O3 -Wall arr.c
arr.c:3:13: warning: ISO C forbids empty initializer braces [-pedantic]
On my platform, objdump says:

00000000 g O .data 00000028 _z
00000000 g O .bss 00000028 _y
00000028 g O .bss 00000028 _x
00000028 O *COM* 00000004 _v

No mention of u, which is expected for a declaration.
y in bss, expected since it's all-0.
z in data, expected because it's not all-0.

Apparently, x is equivalent to y.
So int x[10] = { }; is a proper definition?
(Can you cite C&V allowing this syntax?)

No.

No C&V because it is an extension. (Thanks for pointing it out.)
It should have set aside enough room for 10 ints, zero-initialized. I'm
not sufficiently familiar with objdump to be sure how to interpret those
results; perhaps it works correctly when linked with other code to make
a complete program?

There probably is something special about the *COM* section.

This paragraph sounds very relevant:

3.18 Options for Code Generation Conventions
-fno-common
In C code, controls the placement of uninitialized global variables.
Unix C compilers have traditionally permitted multiple definitions of
such variables in different compilation units by placing the
variables in a common block. This is the behavior specified by
-fcommon, and is the default for GCC on most targets. On the other
hand, this behavior is not required by ISO C, and on some targets may
carry a speed or code size penalty on variable references. The
-fno-common option specifies that the compiler should place
uninitialized global variables in the data section of the object
file, rather than generating them as common blocks. This has the
effect that if the same variable is declared (without extern) in two
different compilations, you get a multiple-definition error when you
link them. In this case, you must compile with -fcommon instead.
Compiling with -fno-common is useful on targets for which it provides
better performance, or if you wish to verify that the program will
work on other systems that always treat uninitialized variable
declarations this way.
The syntax you used for v should be sufficient, but y will work as well.
The syntax you used for x should not have worked.

Thanks.
 
B

Barry Schwarz

James said:
Noob said:
extern int u[10];
int v[10];
int x[10] = { };
int y[10] = { 0 };
int z[10] = { 42 };

y and z are proper array definitions, while u is merely
a declaration. But what about v and x?

The array 'v' is a tentative definition, since no initializers are
provided. It is covered by 6.9.2p2: "... If a translation unit contains
one or more tentative definitions for an identifier, and the translation
unit contains no external definition for that identifier, then the

What's an external definition?

The phrase "external definition" is defined in the paragraph preceding
the one James quoted.
The following is just a declaration, right?
extern int v[10];

Non-sequitur. The storage class "extern" is not related to the phrase
"external definition".
(For the record, adding that declaration to my source file doesn't
change the compiler's output.)

The actions of one compiler need not reflect the intent of the
standard.
 
J

James Kuyper

James said:
Noob said:
extern int u[10];
int v[10];
int x[10] = { };
int y[10] = { 0 };
int z[10] = { 42 };

y and z are proper array definitions, while u is merely
a declaration. But what about v and x?

The array 'v' is a tentative definition, since no initializers are
provided. It is covered by 6.9.2p2: "... If a translation unit contains
one or more tentative definitions for an identifier, and the translation
unit contains no external definition for that identifier, then the

What's an external definition?

6.9p4: "... the unit of program text after preprocessing is a
translation unit, which consists of a sequence of external declarations.
These are described as ‘‘external’’ because they appear outside any
function (and hence have file scope). ..."

6.9p5: "An external definition is an external declaration that is also a
definition of a function (other than an inline definition) or an object.
...."

Providing a body for a function or an initializer for an object coverts
the declaration into a definition.
 
T

Tim Rentsch

Noob said:

There probably is something special about the *COM* section.

This paragraph sounds very relevant:

3.18 Options for Code Generation Conventions
-fno-common
In C code, controls the placement of uninitialized global
variables. Unix C compilers have traditionally permitted
multiple definitions of such variables in different compilation
units by placing the variables in a common block. This is the
behavior specified by -fcommon, and is the default for GCC on
most targets. On the other hand, this behavior is not required
by ISO C, and on some targets may carry a speed or code size
penalty on variable references. The -fno-common option specifies
that the compiler should place uninitialized global variables in
the data section of the object file, rather than generating them
as common blocks. This has the effect that if the same variable
is declared (without extern) in two different compilations, you
get a multiple-definition error when you link them. In this
case, you must compile with -fcommon instead. Compiling with
-fno-common is useful on targets for which it provides better
performance, or if you wish to verify that the program will work
on other systems that always treat uninitialized variable
declarations this way.

In case it isn't clear what this means: before C was standardized,
there was no notion of a 'tentative definition', and declaring a
variable without initializing it (and without using 'extern') made
the variable be a "shared global" (just a made-up term) without
strongly defining it in any one file. Thus, if we have in two
separate .c files, x.c and y.c, code like:

/* x.c */

int foo;

...


/* y.c */

int foo;

...

then both x.c and y.c can use the "shared global" variable 'foo',
which exists in one place in memory, not two. This usage was not
an error but the usual way things were done, and well-defined in
the sense that the compilers that existed at the time all did the
same thing.

When C was standardized, the language definition was changed so
that a declaration like 'int foo;' will define the variable in any
translation unit where such a declaration appears, and if there is
more than one then the result is undefined behavior (which is also
the case for any kind of multiple definition, not just ones that
don't have an initializer).

Note the significance of how the Standard addresses the situation.
Because (insofar as the Standard is concerned) the behavior is
undefined, an implementation is free to define the "shared global"
kind of declaration so it works just like the good old days. And
that's the reason for -fcommon, why your no-initializer variable
ended up in the COM area, etc.
 
L

Lowell Gilbert

Tim Rentsch said:
When C was standardized, the language definition was changed so
that a declaration like 'int foo;' will define the variable in any
translation unit where such a declaration appears, and if there is
more than one then the result is undefined behavior (which is also
the case for any kind of multiple definition, not just ones that
don't have an initializer).

If this is an impersonation, it's a pretty good one.
I think somebody with a rather strange sense of humor has actually
gotten Tim Rentsch's account.
 
T

Tim Rentsch

Lowell Gilbert said:
If this is an impersonation, it's a pretty good one.
I think somebody with a rather strange sense of humor has actually
gotten Tim Rentsch's account.

Yes, I have to admit, when I read it myself,
I'm amazed by how good the impersonation is.
 
G

glen herrmannsfeldt

(snip)
In case it isn't clear what this means: before C was standardized,
there was no notion of a 'tentative definition', and declaring a
variable without initializing it (and without using 'extern') made
the variable be a "shared global" (just a made-up term) without
strongly defining it in any one file.

Well, before there was and C, there was Fortran. Starting,
I believe, in Fortran II, there was COMMON such that variables
could be shared between subroutines. In the beginning there was
only blank (unnamed) COMMON, but not so much later, named COMMON.

Once idea of separate compilation came along, and linkage editors
to actually do it, the mechanism was there. As other languages came
along, they could use the existing mechanism.
Thus, if we have in two
separate .c files, x.c and y.c, code like:
/* x.c */
int foo;

/* y.c */
int foo;

then both x.c and y.c can use the "shared global" variable 'foo',
which exists in one place in memory, not two. This usage was not
an error but the usual way things were done, and well-defined in
the sense that the compilers that existed at the time all did the
same thing.

And reasonably likely, at least on systems also supporting Fortran,
the same as a named COMMON block named foo. (Or sometimes _foo
or foo_.)
When C was standardized, the language definition was changed so
that a declaration like 'int foo;' will define the variable in any
translation unit where such a declaration appears, and if there is
more than one then the result is undefined behavior (which is also
the case for any kind of multiple definition, not just ones that
don't have an initializer).

In addition to COMMON, though I believe added later, Fortran
has BLOCK DATA. Ordinarily, variables in Fortran COMMON are not
initialized to any specific value. BLOCK DATA allows one to
initialize variables in COMMON, but only in one place.

But C requires, I believe even back to K&R, that static variables,
including "shared global" be initialized, either to zero or another
specified value. Some linkers will allow multiple instances of
an initialized global (COMMON), others won't.

For the OS/360 linkage editor, the first initializing (not COM)
seen is used, and others are ignored. (That is one way to do actual
editing with the linkage editor. You can replace a CSECT in an existing
module by loading the new one first.)
Note the significance of how the Standard addresses the situation.
Because (insofar as the Standard is concerned) the behavior is
undefined, an implementation is free to define the "shared global"
kind of declaration so it works just like the good old days. And
that's the reason for -fcommon, why your no-initializer variable
ended up in the COM area, etc.

-- glen
 
N

Noob

Lowell said:
If this is an impersonation, it's a pretty good one.
I think somebody with a rather strange sense of humor has actually
gotten Tim Rentsch's account.

Why do you think that?
 
K

Keith Thompson

Lowell Gilbert said:
Well, these subtly-wrong postings were a little too early to be April
Fools' Day jokes.

Perhaps you could explain what's wrong with it.
 
L

Lowell Gilbert

Keith Thompson said:
Perhaps you could explain what's wrong with it.

It seems to be implying that "int foo;" has something other than
external linkage.
 
J

James Kuyper

It seems to be implying that "int foo;" has something other than
external linkage.

Well he didn't specify that he was talking exclusively about external
definitions, so that could actually be an accurate implication - but I
don't see that implication. Multiple external definitions of identifiers
with internal linkage have are a constraint violation only when they
occur in a single translation unit (6.9p3); he said that it is always
undefined behavior, which is more consistent with identifiers that have
external linkage , though only if the identifier is actually "used in an
expression (other than as part of the operand of a sizeof or _Alignof
operator whose result is an integer constant)" (6.9p5).
 
T

Tim Rentsch

glen herrmannsfeldt said:
(snip)


Well, before there was and C, there was Fortran. Starting,
I believe, in Fortran II, there was COMMON such that variables
could be shared between subroutines. In the beginning there was
only blank (unnamed) COMMON, but not so much later, named COMMON.

Once idea of separate compilation came along, and linkage editors
to actually do it, the mechanism was there. As other languages came
along, they could use the existing mechanism.

You're assuming (or perhaps implying) that such a mechanism
existed in the environments where the early C compilers were
done, but I don't know any reason to expect that was so.
The mention of a COM section had to do with more recent
tools, which undoubtedly were influenced from other sources.
(Not to mention, a facility for sharably reserving memory
areas almost certainly pre-dates their appearance in
FORTRAN.)

By the way, it would be nice if you would mark these kinds
of postings about Fortran or PL/I, etc, as 'OT:' in the
subject line, so people who are here just for C can skip
them. I for one would prefer to do so.
 
T

Tim Rentsch

Lowell Gilbert said:
It seems to be implying that "int foo;" has something other than
external linkage.

For the records, my comments were meant to be only about
identifiers having (what is now called) external linkage.
I hadn't imagined anyone would think otherwise.
 
T

Tim Rentsch

James Kuyper said:
Well he didn't specify that he was talking exclusively about external
definitions, so that could actually be an accurate implication - but I
don't see that implication. Multiple external definitions of identifiers
with internal linkage have are a constraint violation only when they
occur in a single translation unit (6.9p3); he said that it is always
undefined behavior, which is more consistent with identifiers that have
external linkage , though only if the identifier is actually "used in an
expression (other than as part of the operand of a sizeof or _Alignof
operator whose result is an integer constant)" (6.9p5).

Actually, an external-linkage identifier that has multiple
definitions results in undefined behavior whether it is used
in an expression or not.
 
L

Lowell Gilbert

Tim Rentsch said:
For the records, my comments were meant to be only about
identifiers having (what is now called) external linkage.
I hadn't imagined anyone would think otherwise.

Ah, so what you were saying was that at file scope,
int foo;
[if not overriden elsewhere in the file]
is a tenatitve definition (implicitly initialized to zero) while
extern int foo;
is not a definition at all.

Okay, that makes a lot more sense than what I thought you meant.

I should try to track the other thing I read as a joke (which I didn't
comment on), and see if I just generally needed more coffee on Saturday.

Thanks!
 
T

Tim Rentsch

Lowell Gilbert said:
Tim Rentsch said:
For the records, my comments were meant to be only about
identifiers having (what is now called) external linkage.
I hadn't imagined anyone would think otherwise.

Ah, so what you were saying was that at file scope,
int foo;
[if not overriden elsewhere in the file]
is a tenatitve definition (implicitly initialized to zero) while
extern int foo;
is not a definition at all.

Yes, except for a minor technical clarification -- 'int foo;'
at file scope is always a tentative definition, whether or not
the translation unit also has an explicit definition, and the
presence of a tentative definition in a TU means there will be
at least one (possibly implicit) definition for each identifier
tentatively defined. (And the absence of 'static' in a
tentative definition implies external linkage.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top