Separate compilation and access to small integer data in other programs

J

James Harris

Having tried a few options I'm not sure how best to do the following
in C or how feasible this is. Any suggestions?

AIUI the normal way of working in C where one program, say AA, uses
integers (or bitfields) as part of its interface those integers are
defined in a header file, say AA.h. Then when program BB wants to
interface with AA it includes AA.h and uses those symbolic values when
talking to AA. This means that the values are compiled-in to BB.

By contrast, I want the two progams to be compiled separately. Program
AA is to be compiled with its integer constants wholly inside it.
Program BB is to refer to the values as externals. Example below. The
intention is that the constants not be resolved at compile time but at
link time or later.

Best I've come up with so far can be illustrated as follows.

File AA:
int AA_CONST1 = 1;
int AA_CONST2 = 4;
int AA_FLAGS1 = 0x8000
void AA() {}

File BB:
extern AA_CONST2;
void BB() {
x = AA_CONST2;
AA_function(x, AA_FLAGS1);
}

It SEEMS to work but are there any gotchas? Is there a better way to
do it?

It is slightly annoying in that #defines in existing header files
cannot be used directly but ints have to be defined to represent them.

James
 
B

Ben Bacarisse

James Harris said:
Having tried a few options I'm not sure how best to do the following
in C or how feasible this is. Any suggestions?

AIUI the normal way of working in C where one program, say AA, uses
integers (or bitfields) as part of its interface those integers are
defined in a header file, say AA.h. Then when program BB wants to
interface with AA it includes AA.h and uses those symbolic values when
talking to AA. This means that the values are compiled-in to BB.

By contrast, I want the two progams to be compiled separately. Program
AA is to be compiled with its integer constants wholly inside it.
Program BB is to refer to the values as externals. Example below. The
intention is that the constants not be resolved at compile time but at
link time or later.

Best I've come up with so far can be illustrated as follows.

File AA:
int AA_CONST1 = 1;
int AA_CONST2 = 4;
int AA_FLAGS1 = 0x8000
void AA() {}

File BB:
extern AA_CONST2;
void BB() {
x = AA_CONST2;
AA_function(x, AA_FLAGS1);
}

I'm having trouble squaring this with the words. Neither AA nor BB
seems to be a program. They just look like translation units that might
be linked into one single program (with some other bits of course).
It SEEMS to work but are there any gotchas? Is there a better way to
do it?

How many programs are there? Could you give short actually compileable
examples of them? It might then be clear why what you want can't be
done another way (or maybe it will become clear that it can be).
It is slightly annoying in that #defines in existing header files
cannot be used directly but ints have to be defined to represent them.

Maybe you could give an example of what the problem with macro constants
is. Is the problem extra-C? I get a feeling this is maybe a build
problem, not one of C structure, and there is some build constraint
that's not been described.
 
B

BartC

James Harris said:
Having tried a few options I'm not sure how best to do the following
in C or how feasible this is. Any suggestions?

AIUI the normal way of working in C where one program, say AA, uses
integers (or bitfields) as part of its interface those integers are
defined in a header file, say AA.h. Then when program BB wants to
interface with AA it includes AA.h and uses those symbolic values when
talking to AA. This means that the values are compiled-in to BB.

By contrast, I want the two progams to be compiled separately. Program
AA is to be compiled with its integer constants wholly inside it.
Program BB is to refer to the values as externals. Example below. The
intention is that the constants not be resolved at compile time but at
link time or later.
File AA:
int AA_CONST1 = 1;
int AA_CONST2 = 4;
int AA_FLAGS1 = 0x8000
void AA() {}

File BB:
extern AA_CONST2;
void BB() {
x = AA_CONST2;
AA_function(x, AA_FLAGS1);
}

It seems fine provided you realise that these are not actually constants
(ie. immediate data), but variables.

Immediate data I think can be imported across modules via a linker, but I
don't think there's a mechanism in C to achieve this (even an assembler
would have difficulty because it might need to know the field-size needed to
contain the value).

What's the problem with having a header that is common to both AA and BB? Is
it that you are obliged to recompile both if some of the values
change?
 
J

James Harris

I'm having trouble squaring this with the words.  Neither AA nor BB
seems to be a program.  They just look like translation units that might
be linked into one single program (with some other bits of course).

Possibly poor choice of term on my part. "Program" may be too generic.
Yes, you could regard both of them linked together as a program. I
called each a program as the idea is that they be written, compiled
and distributed separately. It is intended that they are only linked
together when one is executed. Be that as it may, yes, they are
separate translation units.
How many programs are there?  Could you give short actually compileable
examples of them?  It might then be clear why what you want can't be
done another way (or maybe it will become clear that it can be).

I'll avoid the term "program" for the reasons mentioned above but I
can give a succinct example in which there are two source files. The
first, extern-t1.c, is

/*
* Test reference to external small constants.
*/
/* Build with
* cc -c extern-t1.c
*/
extern int unix_sys_open;
#include <stdio.h>
int main(void) {
fprintf(stderr, "Symbol number is %i\n", unix_sys_open);
}

The second file, os_sys.c, is

/*
* Test linkage to a small integer defined here
*/
/* Compile with
* cc -c os_sys.c
*/
#include <sys/syscall.h>
const int unix_sys_open = SYS_open;

Of course these are tiny examples. I figure small examples that
illustrate the point are preferred but perhaps I should say how such
translation units are intended to be run. Both will be compiled to
object (.o) files and not linked. Once the first, extern-t1, is
invoked the loader will look for and load extern-t1.o. It will then
resolve symbol unix_sys_open by loading os_sys.o and linking the two
modules together. In the above case it will then start extern-t1's
main() routine.

So only extern-t1 is invoked. The 'system' under which it runs will be
responsible for resolving the reference to the external symbol.

Maybe you could give an example of what the problem with macro constants
is.

One issue for me is that changes to symbolic constants in one routine
require recompilation of all modules which are potentially dependent.
In most cases the symbolic constants may not change but all dependent
files still need to be recompiled just in case. It would be better if
dependent module need to be recompiled only when other more
substantive parts of the interface change - e.g. a new version of a
piece of server code *takes away* a feature that existed in earlier
versions or *changes* an interface to make it incompatible with
previous versions.
Is the problem extra-C? I get a feeling this is maybe a build
problem, not one of C structure, and there is some build constraint
that's not been described.

It is (at least intended to be) completely in keeping with C standards
and to run with standard C though Bart has raised a point about making
the values read-only. Will reply to him on that.

James
 
J

James Harris

It seems fine provided you realise that these are not actually constants
(ie. immediate data), but variables.

In other words, I cannot protect the original values from being
altered? That may be a problem. Is there no way with C to make them
read-only?

I suppose ideally the loader would place any constants in a read-only
page. I can do that and it would protect them perfectly. It would also
allow any attempt to update them to be trapped and pinpointed.

The slight issue is that there seems to be no way in C to say which
symbols should be made available to other routines, i.e. which should
be exported, so if two programs are linked together each has access to
all the data in the other. I'd welcome correction if there is a
feature of C which allows this control.
Immediate data I think can be imported across modules via a linker, but I
don't think there's a mechanism in C to achieve this (even an assembler
would have difficulty because it might need to know the field-size neededto
contain the value).

Field sizes should be fixed (at least under a 32-bit OS; a change to a
64-bit OS would be another issue but them everything would probably
require a recompile).

Interesting idea about having the linker resolve immediate values.
That would be good. I can see that if full-size 32-bit forms of
immediate instructions were used the linker could patch them by its
standard mechanisms as that's the kind of thing it does anyway. Like
you I cannot think of a way to do that in C.
What's the problem with having a header that is common to both AA and BB?Is
it that you are obliged to recompile both if some of the values
change?

Yes. In most setups it would be rare for such values to change but
even where there is the potential for them to have changed all
dependent modules also need to be recompiled.

James
 
E

Eric Sosman

[...] a succinct example in which there are two source files. The
first, extern-t1.c, is

/*
* Test reference to external small constants.
*/
/* Build with
* cc -c extern-t1.c
*/
extern int unix_sys_open;

Add "const" here, or ...
#include <stdio.h>
int main(void) {
fprintf(stderr, "Symbol number is %i\n", unix_sys_open);
}

The second file, os_sys.c, is

/*
* Test linkage to a small integer defined here
*/
/* Compile with
* cc -c os_sys.c
*/
#include <sys/syscall.h>
const int unix_sys_open = SYS_open;

... remove "const" here.
Of course these are tiny examples. I figure small examples that
illustrate the point are preferred but perhaps I should say how such
translation units are intended to be run. Both will be compiled to
object (.o) files and not linked. Once the first, extern-t1, is
invoked the loader will look for and load extern-t1.o. It will then
resolve symbol unix_sys_open by loading os_sys.o and linking the two
modules together. In the above case it will then start extern-t1's
main() routine.

Okay. The linkers and loaders I've used would need to be told
where to look for the module defining unix_sys_open, but assuming
it's found all will be well.
So only extern-t1 is invoked. The 'system' under which it runs will be
responsible for resolving the reference to the external symbol.

Now you've made me unsure of your intent. The unaided 'system'
probably can't find unix_sys_open without some help from you --
after all, there might be forty-two .o files lying around with
forty-two incompatible definitions of unix_sys_open, and someone
has to tell the system which of them to use. The 'system' will
(most likely) document the places it will search and the order in
which it will search them, and it'll be up to you to ensure that
the desired unix_sys_open is found in one of them. (If it's found
in more than one, the system presumably documents how ties are
broken.)
One issue for me is that changes to symbolic constants in one routine
require recompilation of all modules which are potentially dependent.
In most cases the symbolic constants may not change but all dependent
files still need to be recompiled just in case. It would be better if
dependent module need to be recompiled only when other more
substantive parts of the interface change - e.g. a new version of a
piece of server code *takes away* a feature that existed in earlier
versions or *changes* an interface to make it incompatible with
previous versions.

Okay. Your use of an external-linkage variable initialized to
the desired value solves this. It has the drawback that the constant
can no longer be a "constant expression," so you couldn't (for
example) use `case unix_sys_open:' as a `switch' label. That may
or may not be important; it depends what you want to do with the
value.
 
J

James Kuyper

The term "variable" is not defined in the standard, except as part of
the phrase "variable length array". From the way it's used in the
standard, I've inferred that when used as a noun, it means "named
object", and every use in the standard is consistent with that meaning.
However, every use in the standard is also consistent with a more
restricted meaning that some people consider more consistent with it's
use in ordinary English, as something that can vary: "named object whose
definition is not const-qualified". His comment is more relevant if he's
referring to the more restricted definition.
In other words, I cannot protect the original values from being
altered? That may be a problem. Is there no way with C to make them
read-only?

Yes, you can add 'const' to the definition of such objects. That will
mean that diagnostics are mandatory if any part of the code makes a
naive attempt to modify them. The compiler has the option, after issuing
the mandatory diagnostic, of accepting your code and translating it into
a program; if you choose to run it, the behavior of that program is
undefined. The diagnostics can also be bypassed by use of a cast:

*(int*)AA_CONST = 3;

but such code also has undefined behavior.

Therefore, const-qualification doesn't actually guarantee that the
object can't be modified. Undefined behavior includes the possibility (
among infinitely many others) that the object's value could be modified.
However, mandatory diagnostics is the strongest guarantee C provides on
such matters; if it's not good enough, you'll have to find some other
language that provides a better guarantees.
The slight issue is that there seems to be no way in C to say which
symbols should be made available to other routines, i.e. which should
be exported, so if two programs are linked together each has access to
all the data in the other. I'd welcome correction if there is a
feature of C which allows this control.

C does not allow two programs to be linked together, so the issue never
comes up. Operating systems sometimes provide ways for programs to share
memory, for example if you're using a Unix-like system, you can use
mmap(), but the use of that function is outside the scope of the C
standard, and you'll get better answers to questions about it on
comp.unix.programmer. Other operating systems have similar features.

C does allow translation units to be linked together. I suspect that you
may be confusing translation units with programs. There can be at most
one main() function in any program (on freestanding systems, the
equivalent of main() may have a different name), but it can contain
multiple translation units. I'm most familiar with Unix-like systems, so
I'll use them as an example of how this usually works: you use a
compiler to convert a translation unit into an object file with an
extension of .o. Object files can be collected together into libraries
(with an extension of .a or .so). A linker is used to link one or more
..o files, whether stand-alone or extracted from a library, into a single
program.
Does that give you a better idea of what "program" means?

C does provide features that allow you to control whether or not an
identifier is shared between TUs. Every identifier has either internal
linkage, external linkage, or no linkage. Linkage applies only to
identifiers that identify objects or functions. Identifiers with
internal linkage identify the thing that they identify only within the
same translation unit where they are declared. Identifiers with external
linkage can be declared in one translation unit, and used within the
scope of that declaration to refer to things that might be defined in a
different translation unit. If your program has multiple definitions of
an identifier with external linkage, it has undefined behavior. If your
program actually uses the think identified by such an identifier, it has
undefined behavior unless it includes exactly one definition of the
identifier.

When you define a function or a file scope object, the identifier has
external linkage unless the 'static' keyword is used. Declarations of
functions and objects that have the 'extern' keyword do not define
anything, but merely make the corresponding identifier usable within the
scope of the declaration to refer to something defined with external
linkage using the same identifier in some other part of the program.

As a general rule, every identifier with external linkage (except
possibly "main") in your program should be declared in a header file
(with the 'extern' keyword if it identifies an object). That header file
should be #included in every translation unit where that identifier is
referred to - INCLUDING the one where the thing it identifies is
defined. This is not something that's required by the standard, it's
merely a good way to ensure that the thing is declared consistently
where it is used. The behavior of your program can be undefined if the
declarations are inconsistent between different translation units.
 
B

Ben Bacarisse

James Kuyper said:
However, mandatory diagnostics is the strongest guarantee C provides on
such matters; if it's not good enough, you'll have to find some other
language that provides a better guarantees.

It's probably worth pointing out that there is no mandatory diagnostic
if a const-qualified object is accessed via a non-const-qualified
declaration in another translation unit. The behaviour is undefined,
but there won't necessarily be a message, and there's not cast to alert
anyone that something fishy may be going on.

(Your strategy below -- snipped -- of putting all extern declarations in
a header file to be included in all files that use or declare the
objects in question avoids this problem.)

<snip>
 
B

BartC

In other words, I cannot protect the original values from being
altered? That may be a problem. Is there no way with C to make them
read-only?

After linking? No. There are 'const' attributes, meaning read-only, that can
be applied in the C source to a variable, but that doesn't protect against
accidental, deliberate or malicious modifications. Depends how much you
trust the other module.
I suppose ideally the loader would place any constants in a read-only
page. I can do that and it would protect them perfectly. It would also
allow any attempt to update them to be trapped and pinpointed.

I think that in the general case, that's not practical to do (some
const-attribute variables are on the stack, some are on the heap (via a
pointer), and some programs might depend on being able to bypass the
protection).

But, if the values *are* changed in one module, the new values will be
tracked in the other, so will that really cause a problem? Eg. will there
lots of dependencies which will then be out of step?
The slight issue is that there seems to be no way in C to say which
symbols should be made available to other routines, i.e. which should
be exported, so if two programs are linked together each has access to
all the data in the other. I'd welcome correction if there is a
feature of C which allows this control.

In C, every function and variable declaration outside of a function can be
assumed to have a 'global' attribute, meaning it is always exported. (In
fact I explicitly add such an attribute in my own code; defining it is easy:

#define global

This then makes it obvious.) If you don't want to export a name, use a
'static'
attribute.

The 'extern' attribute you already know about (it should match a 'global'
name in at most one other module. (But C also allows extern and global
declarations of the same name, in the same file - usually extern will be in
a header, shared with other files, and global in the main body of the file.)
Field sizes should be fixed (at least under a 32-bit OS; a change to a
64-bit OS would be another issue but them everything would probably
require a recompile).

They can; but sometimes you might be wasting a 32-bit field when there might
be an instruction available with an 8-bit field.

(BTW I trust you understand what I mean by 'variable' and I don't have to
expand on it!)
 
J

Jorgen Grahn

Possibly poor choice of term on my part. "Program" may be too generic.

No, it's simply the wrong term. A "program" is an executable which a
user can cause to run. It's a bit fuzzy around the edges, but the
word is never used the way you used it above.

/Jorgen
 
J

James Harris

No, it's simply the wrong term.  A "program" is an executable which a
user can cause to run.  It's a bit fuzzy around the edges, but the
word is never used the way you used it above.

I disagree, though I'm happy to work with the term as people here most
use it.

James
 
J

James Harris

....


After linking? No. There are 'const' attributes, meaning read-only, that can
be applied in the C source to a variable, but that doesn't protect against
accidental, deliberate or malicious modifications. Depends how much you
trust the other module.

I would rather specify in the defining module that they are constants
and have that honoured. I can see that (as someone suggested) I can
provide a header file which enumerates all the names that can be
referred to and that that header file can be included in 'client'
routines. I understand that C doesn't guarantee that there will be no
attempt to update them.
I think that in the general case, that's not practical to do (some
const-attribute variables are on the stack, some are on the heap (via a
pointer), and some programs might depend on being able to bypass the
protection).

An ability to bypass the protection is something I don't want! They
would have to be placed in RO memory. Fortunately, the defining
occurrences wouldn't be in arbitrary places.

....
In C, every function and variable declaration outside of a function can be
assumed to have a 'global' attribute, meaning it is always exported.

Understood. I didn't know. Thanks to you and others who have explained
this.

....
They can; but sometimes you might be wasting a 32-bit field when there might
be an instruction available with an 8-bit field.

For the relatively few times these are needed this would not be a
problem.

James
 
B

Ben Bacarisse

James Harris said:
I would rather specify in the defining module that they are constants
and have that honoured. I can see that (as someone suggested) I can
provide a header file which enumerates all the names that can be
referred to and that that header file can be included in 'client'
routines. I understand that C doesn't guarantee that there will be no
attempt to update them.

One of the most constant things in C is a function. Can you use

int constant_one() { return 42; }

in place of

const int constant_one = 42;

? Just a thought.

<snip>
 
J

James Kuyper

On 11/24/2012 10:07 AM, James Harris wrote:
....
An ability to bypass the protection is something I don't want! They
would have to be placed in RO memory. Fortunately, the defining
occurrences wouldn't be in arbitrary places.

A compiler that targets a platform which supports RO memory may put
const-qualified object definitions in such memory (possibly only the
ones with static storage duration). If an attempt is made to modify such
objects, they won't be modified, but that's something your compiler
guarantees, not the C standard. C is intended to be implementable even
on platforms where enforcement of such guarantees would not be easy to
achieve.
 
E

Eric Sosman

I would rather specify in the defining module that they are constants
and have that honoured. I can see that (as someone suggested) I can
provide a header file which enumerates all the names that can be
referred to and that that header file can be included in 'client'
routines. I understand that C doesn't guarantee that there will be no
attempt to update them.

Using such a header is the right thing to do for many other
reasons than `const', of course. #include the header not only
in the client modules, but also in the defining module: That way,
if the header says `extern const int razzle;' but the definer says
'const double razzle = 42.0;', the compiler will see the conflict
and complain about it. (The linker or loader might or might not,
but the compiler most definitely will.)

As for updating: One of C's underlying principles is "Give
the programmer enough rope, and see if he can hang himself in an
interesting way." C will allow you to try to change a `const'-
qualified object by subterfuge -- basically, the type system is too
loose to prevent it -- but does not define whether the attempt will
succeed or fail or do Something Really Strange. (It is conceivable
that the SRS case on some particular system might be exactly what
the programmer desires.) "Normal" attempts like `razzle++;' will,
however, elicit compiler errors.

For an almost-perfect read-only variable, you can keep the
variable private in the defining module where nobody can touch
it, and offer a "getter" function:

/* razzle.h */
int getRazzle(void);

/* razzle.c */
#include "razzle.h"
static const int privateRazzle = 42;
int getRazzle(void) {
return privateRazzle;
}

If you like, you could shroud it with some macro mystery:

/* razzle.h */
int getRazzle(void);
#define razzle getRazzle()

/* client.c */
#include "razzle.h"
...
printf("razzle = %d\n", razzle);

I said "almost" perfect because there's still the chance that
the privateRazzle variable could get clobbered by a wild pointer
or out-of-range array index, either accidentally or deliberately.
Sorry; that's life. (We do not insure against asteroid collisions,
either.)
An ability to bypass the protection is something I don't want! They
would have to be placed in RO memory. Fortunately, the defining
occurrences wouldn't be in arbitrary places.

If a system supports read-only memory, it *may* place at least
some `const'-qualified objects there. C does not require this, in
large part because C does not require memory with special attributes.
In portable C all you can do is apply `const', and hope, and avoid
subterfuges. On particular systems there may be ways to exercise
more control, possibly with #pragma and/or compiler options and/or
linker options.
 
J

James Harris

....


One of the most constant things in C is a function.  Can you use

  int constant_one() { return 42; }

in place of

  const int constant_one = 42;

Cool idea. Yes, I could use this as a fallback if the direct option
doesn't work. Thanks.

James
 
K

Keith Thompson

Ben Bacarisse said:
One of the most constant things in C is a function. Can you use

int constant_one() { return 42; }

in place of

const int constant_one = 42;

? Just a thought.

<snip>

And if you define it as "inline", it probably won't generate any
more code than a const-qualified object declaration (or an enum).
 
B

Ben Pfaff

Keith Thompson said:
And if you define it as "inline", it probably won't generate any
more code than a const-qualified object declaration (or an enum).

I agree, but I wouldn't expect either of the latter to generate
any code at all.
 
E

Eric Sosman

And if you define it as "inline", it probably won't generate any
more code than a const-qualified object declaration (or an enum).

The objective (as I understood it) was to make the value
accessible to a module that would not require recompilation
if the value were to change. There's no theoretical barrier
to link-time inlining, but my impression is that it's fairly
bleeding-edge stuff.

Summing up, the thread has touched on three ways of making
the value accessible:

0) #define it, and #include wherever needed. Pro: Simple,
potential for constant expression. Con: Must recompile
"everything" if value changes.

1) Use a `const'-qualified variable, and #include an `extern'
declaration. Pro: Minimizes recompilation. Con: No chance
for constant expression, `const' variable may be vulnerable
to change at run-time.

2) Use a function returning the value, #include function's
declaration. Pro: Minimizes recompilation, makes run-time
change very unlikely, allows non-constant initialization.
Con: No chance for constant expression.
 
K

Keith Thompson

Ben Pfaff said:
I agree, but I wouldn't expect either of the latter to generate
any code at all.

I was thinking of the code generated when you refer to it, which
presumably will load the value 42 into a register. You're right,
the definitions themselves likely won't generate any code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,218
Latest member
JolieDenha

Latest Threads

Top