External object definition and linkage

J

J. J. Farrell

After many years of dealing with definition and linkage issues
in ways that I know to be safe, I've decided it's time to try
to understand this area properly. Consider a header file with
the file scope declaration

int i;

This header is included in two files that refer to i but do not
declare it. The two files build together into a single program.

i is therefore declared once in each translation unit. According
to C99 6.2.2 the declarations have external linkage and refer to
the same object. 6.9.2 says that each declaration is a tentative
external definition, which then gets converted to an actual
external definition. There are now 2 external definitions of i
in the program, which breaks the requirement in 6.9 that there
shall be exactly one external definition, so this results in
undefined behavior.

Have I got this right? Confirmation or correction appreciated.
 
E

Eric Sosman

J. J. Farrell said:
After many years of dealing with definition and linkage issues
in ways that I know to be safe, I've decided it's time to try
to understand this area properly. Consider a header file with
the file scope declaration

int i;

This header is included in two files that refer to i but do not
declare it. The two files build together into a single program.

i is therefore declared once in each translation unit. According
to C99 6.2.2 the declarations have external linkage and refer to
the same object. 6.9.2 says that each declaration is a tentative
external definition, which then gets converted to an actual
external definition. There are now 2 external definitions of i
in the program, which breaks the requirement in 6.9 that there
shall be exactly one external definition, so this results in
undefined behavior.

Have I got this right? Confirmation or correction appreciated.

You're right.

The usual practice (for any externally-linked object or
function) is to put a declaration, not a definition, in a
header file, and to #include that header wherever it's needed.
The actual definition goes in one and only one .c file, which
itself also #include's the header (so the compiler can complain
if the declaration and the definition disagree).

In your case, the header would say

extern int i;

.... where "extern" means, roughly, "this is just a declaration;
there's a matching definition somewhere else." For a function
you could say

extern int func(void);

.... except it turns out that "extern" is unnecessary (although
harmless) here: The compiler can tell it's not a definition
because there's no `{...}' function body.

One final point: Many people feel that "global variables"
are Bad, mostly because they can create hard-to-see couplings
between apparently unrelated parts of the program. I'm less
bitterly opposed to them than some denizens hereabouts, but
whenever I stick a global variable into my program I pause
and consider whether this is *really* the right thing to do.
Quite often, it isn't.
 
L

Leor Zolman

After many years of dealing with definition and linkage issues
in ways that I know to be safe, I've decided it's time to try
to understand this area properly. Consider a header file with
the file scope declaration

int i;

This header is included in two files that refer to i but do not
declare it. The two files build together into a single program.

i is therefore declared once in each translation unit. According
to C99 6.2.2 the declarations have external linkage and refer to
the same object. 6.9.2 says that each declaration is a tentative
external definition, which then gets converted to an actual
external definition. There are now 2 external definitions of i
in the program, which breaks the requirement in 6.9 that there
shall be exactly one external definition, so this results in
undefined behavior.

Have I got this right? Confirmation or correction appreciated.

The language of 6.9.2/2 is, IMHO, horribly ambiguous. Here's it is (I've
used quotes to indicate what is in italics in the Standard):

-----------------
A declaration of an identifier for an object that has file scope without an
initializer, and without a storage-class specifier or with the
storage-class specifier static, constitutes a "tentative definition". If a
translation unit contains one or more tentative definitions for an
identifier, and the translation unit contains no external definition for
that identifier, then the behavior is exactly as if the translation unit
contains a file scope declaration of that identifier, with the composite
type as of the end of the translation unit, with an initializer equal to 0.
----------------

So what's that word "declaration" doing there on the next-to-last line? How
can a "declaration" have an "initializer equal to 0" (or behave "as if" it
had one) without being considered "the definition" of the object across all
TU's? I don't know.

Note that this is not within a constraints section, and therefore whatever
it means, the compiler is not obliged to produce a diagnostic if it is
violated. So it could in fact be saying that there's only supposed to be
one, but if an implementation carries the "tentativeness" over to the link
phase, and it works as someone would expect it to (or be used to it working
from a historical basis), that isn't necessarily a non-conforming
implementation.

[Admission: I had some help from Greg Comeau on this, and he essentially
confirmed my own confusion about it all. But I hadn't thought about the
implications of it not being in a constraints section...]
-leor
 
L

Leor Zolman

The usual practice (for any externally-linked object or
function) is to put a declaration, not a definition, in a
header file, and to #include that header wherever it's needed.
The actual definition goes in one and only one .c file, which
itself also #include's the header (so the compiler can complain
if the declaration and the definition disagree).

In your case, the header would say

extern int i;

... where "extern" means, roughly, "this is just a declaration;
there's a matching definition somewhere else."

Being my usual literal-minded-self, I answered the OP's specific question
the best I could...but now I feel obliged to add that the way Eric says to
do things is of course the /right/ way to set up inter-TU linkage, since it
is completely unambiguous and works the same in both C and C++.

Trying to understand the situation where "int i;" appears at file scope in
multiple TU's is seems better to be left an exercise in Standard
decryption, and probably shouldn't be allowed to become a practical matter
;-)
-leor
 
C

CBFalconer

J. J. Farrell said:
After many years of dealing with definition and linkage issues
in ways that I know to be safe, I've decided it's time to try
to understand this area properly. Consider a header file with
the file scope declaration

int i;

This header is included in two files that refer to i but do not
declare it. The two files build together into a single program.

and is thus a mistake. You omitted the word "extern".
i is therefore declared once in each translation unit. According
to C99 6.2.2 the declarations have external linkage and refer to
the same object. 6.9.2 says that each declaration is a tentative
external definition, which then gets converted to an actual
external definition. There are now 2 external definitions of i
in the program, which breaks the requirement in 6.9 that there
shall be exactly one external definition, so this results in
undefined behavior.

Have I got this right? Confirmation or correction appreciated.

No. Header files should only specify the exposure of elements in
the source. They should not declare any data or functions.
 
C

CBFalconer

CBFalconer said:
and is thus a mistake. You omitted the word "extern".


No. Header files should only specify the exposure of elements in
the source. They should not declare any data or functions.

Quick, change that 'declare' to 'define', before the rats get at
it.
 
J

J. J. Farrell

Leor Zolman said:
The language of 6.9.2/2 is, IMHO, horribly ambiguous. Here's it is (I've
used quotes to indicate what is in italics in the Standard):

-----------------
A declaration of an identifier for an object that has file scope without an
initializer, and without a storage-class specifier or with the
storage-class specifier static, constitutes a "tentative definition". If a
translation unit contains one or more tentative definitions for an
identifier, and the translation unit contains no external definition for
that identifier, then the behavior is exactly as if the translation unit
contains a file scope declaration of that identifier, with the composite
type as of the end of the translation unit, with an initializer equal to 0.
----------------

So what's that word "declaration" doing there on the next-to-last line? How
can a "declaration" have an "initializer equal to 0" (or behave "as if" it
had one) without being considered "the definition" of the object across all
TU's? I don't know.

I don't think it can - that's the point. I guess they avoided "definition"
since this is the section that's defining "definition".
Note that this is not within a constraints section, and therefore whatever
it means, the compiler is not obliged to produce a diagnostic if it is
violated. So it could in fact be saying that there's only supposed to be
one, but if an implementation carries the "tentativeness" over to the link
phase, and it works as someone would expect it to (or be used to it working
from a historical basis), that isn't necessarily a non-conforming
implementation.

I think it just means that it results in undefined behaviour. The
pre-standard compilers that implemented the 'common model' can continue
to do so, and other compilers don't have to do anything in particular.
 
J

J. J. Farrell

Eric Sosman said:
You're right.
Thanks.

The usual practice (for any externally-linked object or
function) is to put a declaration, not a definition, in a
header file, and to #include that header wherever it's needed.
The actual definition goes in one and only one .c file, which
itself also #include's the header (so the compiler can complain
if the declaration and the definition disagree).

Indeed. An alternative to avoid listing all the variables in
two places is for the header to contain something like

#if defined(DEFINE_GLOBALS)
#define EXTDECL
#else
#define EXTDECL extern
#endif
EXTDECL char global;

and then define DEFINE_GLOBALS in exactly one .c file before
including the header. Not exactly elegant, but it avoids
duplicating a list of variables, and avoids all the bugs that
come from not getting the duplication right and up to date.

The aim of this exercise was to make sure I understood if any
other ways of doing it are valid, and to be able to give a
better answer than "because I said so" or "because that's what
the Standard says" when people don't believe that that's the
only right way to do it.
 
J

J. J. Farrell

CBFalconer said:
Quick, change that 'declare' to 'define', before the rats get at
it.

Thanks Chuck, but I'm confused by the "no". Are you saying it's not
undefined behaviour, or that it is undefined behaviour but my reasons
why are wrong, or something else?
 
I

Irrwahn Grausewitz

Indeed. An alternative to avoid listing all the variables in
two places is for the header to contain something like

#if defined(DEFINE_GLOBALS)
#define EXTDECL
#else
#define EXTDECL extern
#endif
EXTDECL char global;

This is however a bad chosen example, since macros that begin
with E followed by a digit or an uppercase letter are reserved
for future library extensions.

Regards
 
C

CBFalconer

J. J. Farrell said:
Thanks Chuck, but I'm confused by the "no". Are you saying it's
not undefined behaviour, or that it is undefined behaviour but my
reasons why are wrong, or something else?

No, you haven't got it right.
 
D

Dan Pop

In said:
No, you haven't got it right.

He (J. J. Farrell) has got it perfectly right. It was presented as a C
question, not as an example of good coding practice. And his analysis
of his problem was correct: there are two external definitions for i,
one in each translation unit, and this leads to undefined behaviour.

Dan
 
J

J. J. Farrell

Irrwahn Grausewitz said:
This is however a bad chosen example, since macros that begin
with E followed by a digit or an uppercase letter are reserved
for future library extensions.

Aaaargh! Thanks, Irrwahn. I'd normally use some context-specific
prefix on the macros which would avoid this. Using more "obvious"
names in the example turned out to be a mistake ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top