Non-constant initializers

  • Thread starter Fred the Freshwater Catfish
  • Start date
U

Uncle Steve

Ok, so my impression now is that by "cleaner," you mean that such
reduces keyboard typing redundancy by having the value population appear
alongside the declaration, thus avoiding typing the identifier's name
again within an init() function.
Yeah.


Ah, now this gives me the impression that not only are you interested in
restoring saved settings, but that you'd _also_ like to be able to
compute things using C functions, and have those computation results
available at compile-time. Perhaps something like declaring an array
with a number of elements which is 'pow(5, 6)'. Do I understand your
wish correctly?

I didn't say anything about having function return values available
at compile-time. I can't even imagine why someone would want to do
such a thing.
Perhaps not in a standard, portable sense. As an example, we have
n1256.pdf, section 6.5.2.2, point 1:

"The expression that denotes the called function...shall have type
pointer to function returning void or returning an object type other
than an array type."

Ok, I guess that's what it says.
It's also nicely symmetrical if you have a save_settings(). The
suggested init() might as well be called restore_settings() or
load_settings().
But your "see above" suggests (to me) that you are interested in
computations which are needed in order to affect _compile-time_
decisions, rather than _run-time_ decisions.

That might be true. I have a hash function that could be exactly
tuned to have the optimum number of buckets. In theory.
One possible way to achieve _something_like_ the _look_ of the
declarations calling functions might be the following example "uncle.c":

#ifndef GLOBAL_INITS
# define TYPE(x) x
# define INIT(x) {0}
#else
# undef TYPE
# define TYPE(x)
# undef INIT
# define INIT(x) x
#endif /* GLOBAL_INITS */

#ifndef GLOBAL_INITS
#include <stdio.h>

#endif /* GLOBAL_INITS */
/*** Globals */
TYPE(int) saved_setting = INIT(get_saved_setting());
TYPE(int) saved_setting2 = INIT(get_saved_setting2());

#ifndef GLOBAL_INITS
int get_saved_setting(void) {
/* TODO: Actually fetch the saved setting */
return 3;
}

int get_saved_setting2(void) {
/* TODO: Actually fetch the saved setting */
return 5;
}

void init(void) {
#define GLOBAL_INITS 1
#include "uncle.c"
#undef GLOBAL_INITS
return;
}

int main(void) {
init();
printf("saved_setting: %d\n", saved_setting);
printf("saved_setting2: %d\n", saved_setting2);
return 0;
}

#endif /* GLOBAL_INITS */

That's still kind of ugly. :)

Yup. Recursive includes are considered harmful to the normal human
mind. Banned in some jurisdictions where programmer-cruelty laws hold
sway.
However this would not help with a wish to call C functions in order to
compute something which needs to influence compile-time decisions, such
as the size of an array declared at file-scope. I think that folks
usually have a separate C program perform such computations, then output
C source, which is then compiled with the rest of your code; code
generation.

Or you might want to size your buffers as a percentage of total
physical RAM. There are several scenarios where this would be
convenient.



Regards,

Uncle Steve
 
S

Shao Miller


Well, you can get pretty wild with macros. For example:

#include <stdio.h>

#define UNIT_DECL(type_, id_, init_) type_ id_;
#define UNIT_INIT(type_, id_, init_) id_ = init_;

#define GLOBALS(UNIT_) \
\
UNIT_(int, saved_setting, get_saved_setting()) \
UNIT_(int, saved_setting2, get_saved_setting2())

/* Declare globals */
GLOBALS(UNIT_DECL)

int get_saved_setting(void) {
/* TODO: Actually fetch the saved setting */
return 3;
}

int get_saved_setting2(void) {
/* TODO: Actually fetch the saved setting */
return 5;
}

void init(void) {
/* Populate globals */
GLOBALS(UNIT_INIT)
return;
}

int main(void) {
init();
printf("saved_setting: %d\n", saved_setting);
printf("saved_setting2: %d\n", saved_setting2);
return 0;
}

However the types used in the 'GLOBALS' #definition in this example are
confined to simple types without parentheses and struct/union
definition. You'd have to 'typedef' anything complicated or make the
macro even uglier to deal with types with parentheses (such as function
pointer types) or with types while wanting to define a struct/union type
within the declaration itself.
I didn't say anything about having function return values available
at compile-time. I can't even imagine why someone would want to do
such a thing.

Ok. I misunderstood "empirically"-"derive"d "and assign"ed "once" to
mean using "computed for compile-time." Sorry about that.
Ok, I guess that's what it says.

A function can certainly return a pointer which points to the first
element of an array. Or a function could even return a pointer to a:

struct sized_array {
size_t size;
char * array;
};

Essentially, it wasn't clear to me what your code example was trying to
do. Based on your responses in this thread, it seems like you were keen on:

int foops[] = howmuchfoo(void);

Doing something like:

int * foops;
foops = malloc(howmuchfoo());

At file-scope and performing the calls at a pre-main() time. I'm not
sure what you'd want 'sizeof foops' to evaluate to. But please see "I
wonder... #2" below.
That might be true. I have a hash function that could be exactly
tuned to have the optimum number of buckets. In theory.

I'm not sure how that's congruent with:
I didn't say anything about having function return values available
at compile-time. I can't even imagine why someone would want to do
such a thing.

but perhaps that was imagining why someone would want to do such a
thing. I think there certainly are use cases.
Yup. Recursive includes are considered harmful to the normal human
mind. Banned in some jurisdictions where programmer-cruelty laws hold
sway.

Well, some of that recursive-inclusion ugliness is due to trying to
provide the example as a single, compilable file, which can be copied
and pasted and compiled.

As two files, it could look like "uncle.h" and "uncle.c", given, in that
order, below:
---
#ifndef GLOBAL_INITS
# define TYPE(x) x
# define INIT(x) {0}
#else
# define TYPE(x)
# define INIT(x) x
#endif /* GLOBAL_INITS */

/*** Globals */
TYPE(int) saved_setting = INIT(get_saved_setting());
TYPE(int) saved_setting2 = INIT(get_saved_setting2());

#undef TYPE
#undef INIT
---
#include <stdio.h>

#include "uncle.h"

int get_saved_setting(void) {
/* TODO: Actually fetch the saved setting */
return 3;
}

int get_saved_setting2(void) {
/* TODO: Actually fetch the saved setting */
return 5;
}

void init(void) {
#define GLOBAL_INITS 1
#include "uncle.h"
#undef GLOBAL_INITS
return;
}

int main(void) {
init();
printf("saved_setting: %d\n", saved_setting);
printf("saved_setting2: %d\n", saved_setting2);
return 0;
}
---
Or you might want to size your buffers as a percentage of total
physical RAM. There are several scenarios where this would be
convenient.

A run-time decision; right. That scenario reads to me like it could
warrant two globals:

size_t buf_size;
char * buf;

Where you'd populate 'buf_size' in an init() function and then malloc()
the buffer (populating the 'buf' pointer) in the same init() function, also.

If C had a pre-main() time where globals could be initialized with the
results of function calls, such a time period would still have to answer
the questions put forward by Eric Sosman, else-thread, regarding the
state of the environment at such a pre-main() time.

Along with his small sampling of questions, I wonder:

1. If such a function used for initialization never returned (infinite
loop, for example), then main() would never be executed. If such a
function was your own, and included a bug which caused this, and the bug
was based on the environment (such as reading saved settings from a
file), then main() wouldn't even be able to report the program version
or a usage statement to a user. Would that be acceptable?

A: "Program just freezes."
B: "What version? Use 'program -v' to find out."
A: "It freezes on 'program -v', so I don't know."

I don't remember C++ well enough to be able to suggest whether or not
C++ programmers have to be careful about this type of thing with
constructors, but that's off-topic, anyway.

2. If such a function used for initialization encountered another kind
of error (insufficient memory while allocating an array object, for
example), how could this communicated to program portions using those
globals? It would seem that it'd be up to main() or some other function
called by main() (such as an init() function) to test the success of
such initializations.
 
B

Ben Bacarisse

Uncle Steve said:
Hmm. I'm not sure why you think that's important.

It's right there in the quotes. You asked if "it" (referring to your
howmuchfoo function) "could do just about anything if it was a C
function, couldn't it?". Eric answered by pointing out two things that
it could *not* do. What's more, these are things that might be
significant since the code fragment you showed:

| int foops[] = howmuchfoo(void);

suggested that howmuchfoo might be either returning an array or it might
be trying to return an array initialiser.
 
S

Seebs

You're making this too easy. What could be cleaner than a variable
declaration outside of main() which also initializes it to the value it
should have before main() is called?

Nearly anything? Most cesspits, for instance.
I'm fully aware that I can
assign a value to a variable any time I write code or macros to do it,
but there are certain kinds of application parameters, for instance,
which you may derive empirically, and assign once. I should think
that most programmers know about these types of variables.

Yes. I set them up at some point AFTER the program has started running,
because before it's running the world is not in a known or well-defined
state.
It could do just about anything if it was a C function, couldn't it?

Well, that's the thing. How much do you know about the way in which
the library initializes various internals which need to be in a particular
state for your code to execute?
I'm big on macros, but sometimes they clutter up the code with
redundant code blocks that are deader than a doornail. Maybe the
compiler will weed them out with global CSE, maybe not. If they
aren't in the code to begin with, they can't possibly get into the
instruction cache.

I refer you to Rules 1 and 2 of optimization.

Which is to say:

Rule 1: Don't do it.
Rule 2 (only for experts): Don't do it yet.

For the sorts of things it makes sense to try to use as initializers, you
can reasonably be confident that the value will be fully computed before
your code is done compiling.

There are, right now, probably as many as two dozen people in the world
who have a legitimate reason to form thoughts like "what if this code
ends up in the instruction cache?". You are not one of them.

-s
 
U

Uncle Steve

Uncle Steve said:
Hmm. I'm not sure why you think that's important.

It's right there in the quotes. You asked if "it" (referring to your
howmuchfoo function) "could do just about anything if it was a C
function, couldn't it?". Eric answered by pointing out two things that
it could *not* do. What's more, these are things that might be
significant since the code fragment you showed:

| int foops[] = howmuchfoo(void);

suggested that howmuchfoo might be either returning an array or it might
be trying to return an array initialiser.

But only if it was a macro. I'm not likely to use that exact scenario
in real code, perhaps I hammered out my initial example a little too
quickly. I don't pretend to be an expert on C grammar. Too often the
compiler identifies my typos and coding errors so I guess it may be
obvious I don't use a syntax-highlighting editor.

In any case, I'm not hung up on what kind of data is returned from a
function call in this usage scenario. It may vary well come from a
file containing random data owing to filesystem corruption from
bashing my head on the keyboard of my notebook. ;)



Regards,

Uncle Steve
 
U

Uncle Steve

Well, you can get pretty wild with macros. For example:

#include <stdio.h>

#define UNIT_DECL(type_, id_, init_) type_ id_;
#define UNIT_INIT(type_, id_, init_) id_ = init_;

#define GLOBALS(UNIT_) \
\
UNIT_(int, saved_setting, get_saved_setting()) \
UNIT_(int, saved_setting2, get_saved_setting2())

/* Declare globals */
GLOBALS(UNIT_DECL)

int get_saved_setting(void) {
/* TODO: Actually fetch the saved setting */
return 3;
}

int get_saved_setting2(void) {
/* TODO: Actually fetch the saved setting */
return 5;
}

void init(void) {
/* Populate globals */
GLOBALS(UNIT_INIT)
return;
}

int main(void) {
init();
printf("saved_setting: %d\n", saved_setting);
printf("saved_setting2: %d\n", saved_setting2);
return 0;
}

However the types used in the 'GLOBALS' #definition in this example are
confined to simple types without parentheses and struct/union
definition. You'd have to 'typedef' anything complicated or make the
macro even uglier to deal with types with parentheses (such as function
pointer types) or with types while wanting to define a struct/union type
within the declaration itself.

That's fairly slick. Mostly I typedef structures and follow a
consistent naming scheme, so I think I could use your construction. I
just have to wrestle with the back-end a little more before it all
gels into something that fits in my existing code.
Ok. I misunderstood "empirically"-"derive"d "and assign"ed "once" to
mean using "computed for compile-time." Sorry about that.

You had me for a moment because I think I read that gcc will use
branch profiling feedback in its instruction scheduler(?) and that's
pretty much data computed for compile-time, except that it doesn't
directly relate to application logic.
Ok, I guess that's what it says.

A function can certainly return a pointer which points to the first
element of an array. Or a function could even return a pointer to a:

struct sized_array {
size_t size;
char * array;
};

Essentially, it wasn't clear to me what your code example was trying to
do. Based on your responses in this thread, it seems like you were keen on:

int foops[] = howmuchfoo(void);

Doing something like:

int * foops;
foops = malloc(howmuchfoo());

At file-scope and performing the calls at a pre-main() time. I'm not
sure what you'd want 'sizeof foops' to evaluate to. But please see "I
wonder... #2" below.

Nothing quite so byzantine. I'm thinking more along the lines of an
..ini file that is consulted by a lookup function. I doubt there'd
really be much arbitrary binary data.
I'm not sure how that's congruent with:


but perhaps that was imagining why someone would want to do such a
thing. I think there certainly are use cases.

That's a bit of a grey area, but then maybe we're in violent
agreement. I would never expect to have a function call launch in the
pre-processor pass of the compilation. Assuming it compiles.
Well, some of that recursive-inclusion ugliness is due to trying to
provide the example as a single, compilable file, which can be copied
and pasted and compiled.

As two files, it could look like "uncle.h" and "uncle.c", given, in that
order, below:
[snip]

I'm fine with that, I plan to give them both a close look later on
just to get a better feel for your approach.
A run-time decision; right. That scenario reads to me like it could
warrant two globals:

size_t buf_size;
char * buf;

At least two, don't forget that double-buffering schemes and the like
are really quite common.
Where you'd populate 'buf_size' in an init() function and then malloc()
the buffer (populating the 'buf' pointer) in the same init() function, also.
Yes.

If C had a pre-main() time where globals could be initialized with the
results of function calls, such a time period would still have to answer
the questions put forward by Eric Sosman, else-thread, regarding the
state of the environment at such a pre-main() time.

Along with his small sampling of questions, I wonder:

1. If such a function used for initialization never returned (infinite
loop, for example), then main() would never be executed. If such a
function was your own, and included a bug which caused this, and the bug
was based on the environment (such as reading saved settings from a
file), then main() wouldn't even be able to report the program version
or a usage statement to a user. Would that be acceptable?

I think so. At least in many scenarios. This is also implementation
specific. If the system libraries the code may be linked with are
already initialized the error recovery procedure can be fairly
complicated without worry: open the tty directly, open a socket to
syslog, etc. What is required is a guarantee that these application
variable initializations occurs immediately prior to the call to
main() by the .init code section, or whatever it is on your platform.
A: "Program just freezes."
B: "What version? Use 'program -v' to find out."
A: "It freezes on 'program -v', so I don't know."

Yes, I can see that the user might yell at the programmer if there is
no feedback and the application just stops working.
I don't remember C++ well enough to be able to suggest whether or not
C++ programmers have to be careful about this type of thing with
constructors, but that's off-topic, anyway.

2. If such a function used for initialization encountered another kind
of error (insufficient memory while allocating an array object, for
example), how could this communicated to program portions using those
globals? It would seem that it'd be up to main() or some other function
called by main() (such as an init() function) to test the success of
such initializations.

This is a complication, but you could commit to program execution
conditional on the success of the initialization sequence. Anything
more complicated gets hairy. I suppose I could just switch to C++,
which as others have said includes the necessary compiler support for
this operation. But I'm comfortable with C enough to view the switch
to C++ with more than a little trepidation.



Regards,

Uncle Steve
 
B

Ben Bacarisse

Uncle Steve said:
Uncle Steve said:
25 PM, Uncle Steve wrote:
It could do just about anything if it was a C function, couldn't it?

C functions cannot return arrays, nor can they return array
initializers.

Hmm. I'm not sure why you think that's important.

It's right there in the quotes. You asked if "it" (referring to your
howmuchfoo function) "could do just about anything if it was a C
function, couldn't it?". Eric answered by pointing out two things that
it could *not* do. What's more, these are things that might be
significant since the code fragment you showed:

| int foops[] = howmuchfoo(void);

suggested that howmuchfoo might be either returning an array or it might
be trying to return an array initialiser.

But only if it was a macro.

Yes, that's been covered several times already. You specifically
referred to "it" being a function -- hence Eric's answer and my attempt
at clarification.

<snip>
 
U

Uncle Steve

Sorry, skipped your message this morning by accident.

Nearly anything? Most cesspits, for instance.

Yes. I set them up at some point AFTER the program has started running,
because before it's running the world is not in a known or well-defined
state.

Well, it's fairly difficult to approach this without making reference
to a specific implementation. main() is supposed to be the defined
entrypoint for C programs, and anything that happens before that is
supposed (I assume) to set up the execution environment defined by the
standard, and is implementation specific. OK. I get it. Sorry for
my confusion over what the gcc info doc said.
Well, that's the thing. How much do you know about the way in which
the library initializes various internals which need to be in a particular
state for your code to execute?

I don't know anything about how the system libraries are initialized.
Perhaps I should look; glibc, ulibc, bsd libc, etc. probably all do
things differently (within reason), meaning I'd probably have to code
a special case for every port of the app. if I was to use implementation-
specific hooks.
I refer you to Rules 1 and 2 of optimization.

Which is to say:

Rule 1: Don't do it.
Rule 2 (only for experts): Don't do it yet.

I rely on a consistent heuristic: develop efficient coding habits.
Within the limits of my skill, this seems to work ok. I don't really
agonize over loops as I might have when I was a teenager fooling
around with asm on small machines. I know the other Rule of
optimization:

Rule 3: Write better algorithms.

Which isn't to say that it can't be fun to find a clever way to shave
cycles. There are an endless number of programming tricks that one
can experiment with, on critical paths, if you've got the time.
For the sorts of things it makes sense to try to use as initializers, you
can reasonably be confident that the value will be fully computed before
your code is done compiling.

For many scalars and string constants this is true, and I have no
argument about that fact. What this is becoming, however, is a
pedantic disagreement over the validity of function calls in the
outermost scope of a C program. It seems a little arbitrary that
function calls are only valid in the main() scope, or at least this is
what I perceive as being what the standard says from your replies.

Don't worry, I'm not going to press the C2015 Committee[1] to revise
the standard. I simply don't care about it all that much, but it is
worth noting.
There are, right now, probably as many as two dozen people in the world
who have a legitimate reason to form thoughts like "what if this code
ends up in the instruction cache?". You are not one of them.

Alas, you have penetrated my disguise. I am not worthy to decide what
code should or should not be in the instruction cache. I am sorry if
I have offended your grace, or shamed the honored guild of instruction
cachers with my unworthy considerations. In the Future, I will
endeavor to think about memory as a transparent medium of data storage.


[1] Joke. Not sure what the next revision of C will be called. I
can't see there being many changes at this point, C should be about
ready to be carved in stone by now.



Regards,

Uncle Steve
 
U

Uncle Steve

External objects are initialized before program startup.
The initial values can be embedded in the opcode.

Sure, I know that, and it even makes sense. That's not my issue at
this point. What bugs me now about all this is that a C source file
contains two sections: One in which variable declarations and
program statements are valid, and one in which variable declarations
(and assignments) are valid, but where scalar quantities must be
resolved at compile time.

I realize that in practice, variables may be declared out of function
scope, and that the language must offer a facility to initialize them
to a known value (for practical reasons). But when you're looking at
the code in an editor the only thing distinguishing these two "zones"
is the presence of an open/close brace pair from a function call. It
seems arbitrary, and I guess there's lots of reasons why it was done
this way -- which I won't speculate about here to avoid flames.

I suppose the real thing here to note is that languages like Perl or
Python are insulated from the processor architecture to a much
greater degree than C, and so some things are easier/harder than
others in these languages from the programmer standpoint. I'm not
going to switch to C++, so for my purposes, I'll be implementing some
other solution. Possibly, I many end up without any global variables
at all.



Regards,

Uncle Steve
 
S

Seebs

Sorry, skipped your message this morning by accident.

No worries.
Well, it's fairly difficult to approach this without making reference
to a specific implementation. main() is supposed to be the defined
entrypoint for C programs, and anything that happens before that is
supposed (I assume) to set up the execution environment defined by the
standard, and is implementation specific. OK. I get it. Sorry for
my confusion over what the gcc info doc said.

It's implementation-specific, and there's also no obvious reason to imagine
it not to be subject to change-at-whim, since it's not part of a spec.

When my initializer runs, is stderr available? Heck if I know.

(And I say this as someone who is, in fact, using this feature...)
I don't know anything about how the system libraries are initialized.
Perhaps I should look; glibc, ulibc, bsd libc, etc. probably all do
things differently (within reason), meaning I'd probably have to code
a special case for every port of the app. if I was to use implementation-
specific hooks.

Basically...

My advice would be, don't rely on such functionality unless you *have* to.
It's an extra source of things which could go wrong, and it's a source of
things which could go wrong in a not-well-specified state, using code paths
that are likely not to be getting tested as carefully as the rest of the
system, and which go wrong in a way that may be a nightmare for a debugger.
I rely on a consistent heuristic: develop efficient coding habits.

My experience has been that "efficient" should usually be a matter of
coder time, not speculative execution time.
Which isn't to say that it can't be fun to find a clever way to shave
cycles. There are an endless number of programming tricks that one
can experiment with, on critical paths, if you've got the time.

Yeah. But... It's harder to debug code than write it. You should stay
very far away from the limits of what you can do when writing code, because
if you don't, you won't be able to debug it.
For many scalars and string constants this is true, and I have no
argument about that fact. What this is becoming, however, is a
pedantic disagreement over the validity of function calls in the
outermost scope of a C program. It seems a little arbitrary that
function calls are only valid in the main() scope, or at least this is
what I perceive as being what the standard says from your replies.

I don't think it's arbitrary at all. This isn't a question of scope. It's
a question of lifetimes and execution sequence.

Starting with main(), we have a nice, robust, notion of what executes and
when, and what state the environment is in.

Before that, we have NO IDEA. Consider stderr. Imagine that the
implementation writes some "init" code which does stuff like:

stderr = fdopen(2, "w");

now... Where's the guarantee that their code runs before your code which
does
fprintf(stderr, "help!\n");
?

Answer: There isn't one and can't really be one.
Don't worry, I'm not going to press the C2015 Committee[1] to revise
the standard. I simply don't care about it all that much, but it is
worth noting.

Yes. It's not "arbitrary", though, except in the sense that any specification
is a little bit arbitrary.

The decision of whether people drive on the left or right side of the road
is arbitrary, but this doesn't mean it's a bad idea to make such a decision
and impose it as a standard.

The C language offers you a well-defined execution environment. For this to
be feasible, it has to say "starting *here*, you know the following will
work". The obvious way to do this would be to say "execution starts here,
and then statements occur in order." Initializers which have to be able
to execute code create all sorts of fun. Do they execute in parallel? In
series? In a consistent series from one run to another, or one compile to
another?

Can of worms. BIIIIG can of worms.

It was not a careless oversight that the language doesn't support this; it
was a carefully considered decision.
Alas, you have penetrated my disguise. I am not worthy to decide what
code should or should not be in the instruction cache.

"Worthy" isn't at issue.
I am sorry if
I have offended your grace, or shamed the honored guild of instruction
cachers with my unworthy considerations. In the Future, I will
endeavor to think about memory as a transparent medium of data storage.

It's not a question of worthy, or an "honored guild". It's that you can
be reasonably confident that, if you try to think about this stuff, your
code will perform worse than it would have if you didn't.

Sound surprising? It shouldn't be. Code is, in general, run on multiple
targets. Something that tries to optimize for one is likely to deepotimize
for others.

Time spent thinking about whether something which will actually have been
converted to a constant at compile time "fits in the instruction cache" is
time wasted, and if you'd spent it on something more useful, your code would
be better; more stable, faster, whatever else you'd have been working on.
In the real world, your code will be preempted, causing stuff to leave the
instruction cache anyway, it will be run virtualized, it will be translated
into hardware-specific microcode by a JIT compiler running on the CPU...

Do you seriously think that modern x86 CPUs actually *execute* x86
instructions as you would see them in an assembler? They haven't for years.

Thinking about stuff like that is a distraction from focusing on the code at
a level where you can actually do stuff. It will only make things worse, not
better.
[1] Joke. Not sure what the next revision of C will be called. I
can't see there being many changes at this point, C should be about
ready to be carved in stone by now.

Nah, lots left to be done.

-s
 
S

Seebs

Sure, I know that, and it even makes sense. That's not my issue at
this point. What bugs me now about all this is that a C source file
contains two sections: One in which variable declarations and
program statements are valid, and one in which variable declarations
(and assignments) are valid, but where scalar quantities must be
resolved at compile time.
Yup.

I realize that in practice, variables may be declared out of function
scope, and that the language must offer a facility to initialize them
to a known value (for practical reasons). But when you're looking at
the code in an editor the only thing distinguishing these two "zones"
is the presence of an open/close brace pair from a function call. It
seems arbitrary, and I guess there's lots of reasons why it was done
this way -- which I won't speculate about here to avoid flames.

Calling it arbitrary will get you a lot more "flames" than thinking about
the reasons would.

Looking at things in an editor is uninformative. Think about them compiling.
I suppose the real thing here to note is that languages like Perl or
Python are insulated from the processor architecture to a much
greater degree than C, and so some things are easier/harder than
others in these languages from the programmer standpoint.

This hasn't got a thing to do with processor architecture. It has to do
with the order in which things happen.

In perl, Python, Ruby, and the like, everything is being logically executed
when you hit it. Except BEGIN sections and the like... and even those are
logically "executed" as you hit them, except that executing them consists of
putting them on the list of things to do before running anything else...
which list is executed in the order they were hit.

In C, though, things aren't being "executed" when the compiler sees them.
They're being translated, stored up, and then shuffled together.

Imagine two modules of perl:
foo.pl:
sub bar() { return 3; }
$x = foo();
1;

bar.pl:
sub foo() { return 3; }
$y = bar();
1;

Now require these both from another module. You'll get an error. Which
error? The one that comes from the one you require first. But... Put
the functions in one, the assignments in the other, require the functions
first, and it works.

This is because the evaluation of these things is *well ordered*. There
is a strict, unambiguous, definition of the order in which things occur.

Now consider two modules of C:
foo.c:
int bar(void) { return 3; }
int x;
void init_x(void) { x = foo(); }

bar.c:
int foo(void) { return 3; }
int y;
void init_y(void) { y = bar(); }

main.c:
extern int foo(), bar(), x, y;
extern void init_x(), init_y();

int
main(void) {
init_x();
init_y();
}

Now try any combo you like:
$ cc -o t foo.c bar.c main.c
$ cc -o t main.c bar.c foo.c

It'll always work. Why? Because things aren't being "executed" in the order
that the compiler sees them. Things are being compiled, which doesn't involve
any ordering of anything, then linked, which doesn't involve any ordering
of anything (except that some linkers are sorta dumb), and then at runtime,
everything has already been resolved, adjusted, and figured out...

But that's because the language can provide an unambiguous definition of
the order in which it runs code, and that ordering doesn't depend on when
you compile things, or in which order, or anything like that.

The C solution is, arguably, less flexible in terms of what you write, but
it provides much greater confidence in what your code will do when executed.

This distinction isn't arbitrary. Yes, some implementations allow some kind
of initialization hooks, which may or may not have well-defined interactions
with each other, or with the library. But... In perl, if you need one
initializer to run before another, you execute it first. In C, you don't
really have that option for file-scope things, because there's no "first".
I'm not
going to switch to C++, so for my purposes, I'll be implementing some
other solution. Possibly, I many end up without any global variables
at all.

This is nearly always the right choice. :)

-s
 
U

Uncle Steve

No worries.


It's implementation-specific, and there's also no obvious reason to imagine
it not to be subject to change-at-whim, since it's not part of a spec.

When my initializer runs, is stderr available? Heck if I know.

(And I say this as someone who is, in fact, using this feature...)

Basically...

My advice would be, don't rely on such functionality unless you *have* to.
It's an extra source of things which could go wrong, and it's a source of
things which could go wrong in a not-well-specified state, using code paths
that are likely not to be getting tested as carefully as the rest of the
system, and which go wrong in a way that may be a nightmare for a debugger.

You're preaching to the converted here. I've got an irrational
aversion to #ifdefs surrounding platform specific code. Like, no shit
POSIX was invented. Sure I could deal with it if I *had to*, but who
would willingly subject themselves to that horror?
My experience has been that "efficient" should usually be a matter of
coder time, not speculative execution time.

That depends. Every function has it's own ranking of relative hotness
within the application. If you've got money to burn, you can afford
to throw as many programmers as you want at the problem. Maybe you
have so much money you don't care that they're optimizing the
documentation tools. All I know is that I have programming
proclivities that lead to Cycle Conservation, and who could argue
against that?
Yeah. But... It's harder to debug code than write it. You should stay
very far away from the limits of what you can do when writing code, because
if you don't, you won't be able to debug it.

I've heard that before, and what I say is that the same skill that is
used to write the code is also used in debugging the code. I really
don't see how I'm going to write myself into a situation where I cannot
debug any system that I wrote.
I don't think it's arbitrary at all. This isn't a question of scope. It's
a question of lifetimes and execution sequence.

Starting with main(), we have a nice, robust, notion of what executes and
when, and what state the environment is in.

Before that, we have NO IDEA. Consider stderr. Imagine that the
implementation writes some "init" code which does stuff like:

stderr = fdopen(2, "w");

now... Where's the guarantee that their code runs before your code which
does
fprintf(stderr, "help!\n");
?

Answer: There isn't one and can't really be one.

Sure there can, but it must rely on implementation specific
documentation.
Don't worry, I'm not going to press the C2015 Committee[1] to revise
the standard. I simply don't care about it all that much, but it is
worth noting.

Yes. It's not "arbitrary", though, except in the sense that any specification
is a little bit arbitrary.

The decision of whether people drive on the left or right side of the road
is arbitrary, but this doesn't mean it's a bad idea to make such a decision
and impose it as a standard.

The C language offers you a well-defined execution environment. For this to
be feasible, it has to say "starting *here*, you know the following will
work". The obvious way to do this would be to say "execution starts here,
and then statements occur in order." Initializers which have to be able
to execute code create all sorts of fun. Do they execute in parallel? In
series? In a consistent series from one run to another, or one compile to
another?

Can of worms. BIIIIG can of worms.

It was not a careless oversight that the language doesn't support this; it
was a carefully considered decision.

I realize the C standard was not designed arbitrarily, as I understand
the definition of "arbitrary". Look. I'm not really worried about
this issue; my program will get written with code that conforms to the
existing standards applicable to the problem domain, and everything
will be just fine.
"Worthy" isn't at issue.


It's not a question of worthy, or an "honored guild". It's that you can
be reasonably confident that, if you try to think about this stuff, your
code will perform worse than it would have if you didn't.
Sound surprising? It shouldn't be. Code is, in general, run on multiple
targets. Something that tries to optimize for one is likely to deepotimize
for others.

Maybe, but I think you presume too much when you assume that the
compiler should be the arbiter of optimization. Possibly some far-in-
the-future compiler will have strong-AI capabilities wrt code
generation, but until that time the human programmer will probably be
able to do a better job than the compiler. All things being equal.
Time spent thinking about whether something which will actually have been
converted to a constant at compile time "fits in the instruction cache" is
time wasted, and if you'd spent it on something more useful, your code would
be better; more stable, faster, whatever else you'd have been working on.
In the real world, your code will be preempted, causing stuff to leave the
instruction cache anyway, it will be run virtualized, it will be translated
into hardware-specific microcode by a JIT compiler running on the CPU...

Do you seriously think that modern x86 CPUs actually *execute* x86
instructions as you would see them in an assembler? They haven't for years.

Thinking about stuff like that is a distraction from focusing on the code at
a level where you can actually do stuff. It will only make things worse, not
better.

We'll see.
[1] Joke. Not sure what the next revision of C will be called. I
can't see there being many changes at this point, C should be about
ready to be carved in stone by now.

Nah, lots left to be done.

Are you saying that there are major changes to the C standard in the
offing? I don't see how that could be without changing fundamentals,
and we all know that that is about as likely as porcine avians.



Regards,

Uncle Steve
 
U

Uncle Steve

Calling it arbitrary will get you a lot more "flames" than thinking about
the reasons would.

Looking at things in an editor is uninformative. Think about them compiling.


This hasn't got a thing to do with processor architecture. It has to do
with the order in which things happen.

In perl, Python, Ruby, and the like, everything is being logically executed
when you hit it. Except BEGIN sections and the like... and even those are
logically "executed" as you hit them, except that executing them consists of
putting them on the list of things to do before running anything else...
which list is executed in the order they were hit.

In C, though, things aren't being "executed" when the compiler sees them.
They're being translated, stored up, and then shuffled together.

Imagine two modules of perl:
foo.pl:
sub bar() { return 3; }
$x = foo();
1;

bar.pl:
sub foo() { return 3; }
$y = bar();
1;

Now require these both from another module. You'll get an error. Which
error? The one that comes from the one you require first. But... Put
the functions in one, the assignments in the other, require the functions
first, and it works.

This is because the evaluation of these things is *well ordered*. There
is a strict, unambiguous, definition of the order in which things occur.

Now consider two modules of C:
foo.c:
int bar(void) { return 3; }
int x;
void init_x(void) { x = foo(); }

bar.c:
int foo(void) { return 3; }
int y;
void init_y(void) { y = bar(); }

main.c:
extern int foo(), bar(), x, y;
extern void init_x(), init_y();

int
main(void) {
init_x();
init_y();
}

Now try any combo you like:
$ cc -o t foo.c bar.c main.c
$ cc -o t main.c bar.c foo.c

It'll always work. Why? Because things aren't being "executed" in the order
that the compiler sees them. Things are being compiled, which doesn't involve
any ordering of anything, then linked, which doesn't involve any ordering
of anything (except that some linkers are sorta dumb), and then at runtime,
everything has already been resolved, adjusted, and figured out...

But that's because the language can provide an unambiguous definition of
the order in which it runs code, and that ordering doesn't depend on when
you compile things, or in which order, or anything like that.

The C solution is, arguably, less flexible in terms of what you write, but
it provides much greater confidence in what your code will do when executed.
s
This distinction isn't arbitrary. Yes, some implementations allow some kind
of initialization hooks, which may or may not have well-defined interactions
with each other, or with the library. But... In perl, if you need one
initializer to run before another, you execute it first. In C, you don't
really have that option for file-scope things, because there's no "first".

I have no argument to this, I am in full agreement with your summary.
Since you've arranged the execution order in the code, the compiler
just does what you want.
This is nearly always the right choice. :)

It's a bloody lot of work though!



Regards,

Uncle Steve
 
J

James Kuyper

On 05/16/2011 09:16 PM, Uncle Steve wrote:
....
Maybe, but I think you presume too much when you assume that the
compiler should be the arbiter of optimization. Possibly some far-in-
the-future compiler will have strong-AI capabilities wrt code
generation, but until that time the human programmer will probably be
able to do a better job than the compiler. All things being equal.

Average C programmers fell behind compilers in their ability to perform
optimization at least a decade ago; probably more. An expert assembly
language programmer can still produce code that is better optimized than
a compiler can produce, But it would take that expert days to optimize
code that the compiler can optimize in seconds. It is, in most cases, a
horrible waste of time (==money) to let humans do something that
computers are so much better at. Algorithm development is a much more
profitable use of expensive human expertise than routine optimization.
[1] Joke. Not sure what the next revision of C will be called. I
can't see there being many changes at this point, C should be about
ready to be carved in stone by now.

Nah, lots left to be done.

Are you saying that there are major changes to the C standard in the
offing? I don't see how that could be without changing fundamentals,
and we all know that that is about as likely as porcine avians.

The latest draft is dated 2010-12-02, and can be found at
 
S

Seebs

That depends. Every function has it's own ranking of relative hotness
within the application. If you've got money to burn, you can afford
to throw as many programmers as you want at the problem. Maybe you
have so much money you don't care that they're optimizing the
documentation tools. All I know is that I have programming
proclivities that lead to Cycle Conservation, and who could argue
against that?

Someone who has had to debug code that was written with "efficient" habits.
I've heard that before, and what I say is that the same skill that is
used to write the code is also used in debugging the code. I really
don't see how I'm going to write myself into a situation where I cannot
debug any system that I wrote.

Very easily. It is harder to understand how code is failing than to
understand how it's supposed to work.
Sure there can, but it must rely on implementation specific
documentation.

And even given that... Who says they'll not change it later? Nothing
to keep them from doing so. Who says the documentation is right? Who
says they'll fix a "bug" in this stuff?

It's the kind of thing people tend to not care about.
Maybe, but I think you presume too much when you assume that the
compiler should be the arbiter of optimization. Possibly some far-in-
the-future compiler will have strong-AI capabilities wrt code
generation, but until that time the human programmer will probably be
able to do a better job than the compiler. All things being equal.

Well, uhm.

No.

It's been at least a decade, probably longer, during which humans have
consistently lost to compilers at optimization.
Are you saying that there are major changes to the C standard in the
offing? I don't see how that could be without changing fundamentals,
and we all know that that is about as likely as porcine avians.

I don't know about "major", but certainly "significant". Compound literals
were a major change, and I wouldn't be surprised to see comparably significant
changes in future revisions.

-s
 
U

Uncle Steve

On 05/16/2011 09:16 PM, Uncle Steve wrote:
...

Average C programmers fell behind compilers in their ability to perform
optimization at least a decade ago; probably more. An expert assembly
language programmer can still produce code that is better optimized than
a compiler can produce, But it would take that expert days to optimize
code that the compiler can optimize in seconds. It is, in most cases, a
horrible waste of time (==money) to let humans do something that
computers are so much better at. Algorithm development is a much more
profitable use of expensive human expertise than routine optimization.

Yeah, well algorithm development isn't all that sexy these days. It's
all about the Next! Great! Platform! like Java or whatever. Ruby
looks good, and just as soon as I free up some CFT, I'm really going
to check it out.
[1] Joke. Not sure what the next revision of C will be called. I
can't see there being many changes at this point, C should be about
ready to be carved in stone by now.

Nah, lots left to be done.

Are you saying that there are major changes to the C standard in the
offing? I don't see how that could be without changing fundamentals,
and we all know that that is about as likely as porcine avians.

The latest draft is dated 2010-12-02, and can be found at
— conditional (optional) features (including some that were previously mandatory)
— support for multiple threads of execution including an improved memory sequencing
model, atomic objects, and thread-local storage (<stdatomic.h> and
<threads.h>)
— additional floating-point characteristic macros (<float.h>)
— querying and specifying alignment of objects (<stdalign.h>, <stdlib.h>)
— Unicode characters and strings (<uchar.h>) (originally specified in
ISO/IEC TR 19769:2004)
— type-generic expressions
— static assertions
— anonymous structures and unions
— no-return functions
— macros to create complex numbers (<complex.h>)
— support for opening files for exclusive access
— removed the gets function (<stdio.h>)
— added the aligned_alloc, at_quick_exit, and quick_exit functions
(<stdlib.h>)
— (conditional) support for bounds-checking interfaces (originally specified in
ISO/IEC TR 24731−1:2007)
— (conditional) support for analyzability

Yikes! Well, I knew RDMA was going to make waves, but I haven't been
programming with threads long enough to really get a feel for how it
might affect the language. And then there's all that math stuff.

I guess there really is a lot yet to do.



Regards,

Uncle Steve
 
U

Uncle Steve

Someone who has had to debug code that was written with "efficient" habits.


Very easily. It is harder to understand how code is failing than to
understand how it's supposed to work.

If you don't understand the machine. Or if there's stuff going on
that's undocumented. Otherwise, what's the big deal?
And even given that... Who says they'll not change it later? Nothing
to keep them from doing so. Who says the documentation is right? Who
says they'll fix a "bug" in this stuff?

It's the kind of thing people tend to not care about.


Well, uhm.

No.

It's been at least a decade, probably longer, during which humans have
consistently lost to compilers at optimization.

What do you mean by "optimization"?
I don't know about "major", but certainly "significant". Compound literals
were a major change, and I wouldn't be surprised to see comparably significant
changes in future revisions.

Yeah, well maybe I'll just become one of those wierd people who lock
in all their code to C99, no matter what developments arise. I could
do that, you know. I'm a wierdo.



Regards,

Uncle Steve
 
S

Seebs

If you don't understand the machine. Or if there's stuff going on
that's undocumented. Otherwise, what's the big deal?

Otherwise, the big deal is that if there's a bug, *by definition* you
don't quite understand how the code works. Figuring it out necessarily
involves understanding something you didn't understand when writing it.

If something is near the limits of what I can design and code correctly,
it will be Too Hard to debug. I might be able to muddle through, but in
general, it'll stump me.

So I carefully write stuff that's as simple as I can make it, to reduce
the risk.
What do you mean by "optimization"?

I mean "trying to make code faster by means other than algorithm design".
Stuff like, say, trying to hand-tune assembly, or trying to decide when
a loop should or shouldn't be unrolled, anything like that. The kind of
thing that is implied by trying to decide whether something will fit in
an instruction cache.

-s
 
U

Uncle Steve

Otherwise, the big deal is that if there's a bug, *by definition* you
don't quite understand how the code works. Figuring it out necessarily
involves understanding something you didn't understand when writing it.

Your approach to the problem assumes bugs a priori. It's fairly
difficult to argue against a position where someone has already
assumed the concluding point.
If something is near the limits of what I can design and code correctly,
it will be Too Hard to debug. I might be able to muddle through, but in
general, it'll stump me.

So I carefully write stuff that's as simple as I can make it, to reduce
the risk.

Of course. No programmers expose themselves to the risk of error
intentionally. Slow and steady as she goes is the order of the day in
programming.
I mean "trying to make code faster by means other than algorithm design".
Stuff like, say, trying to hand-tune assembly, or trying to decide when
a loop should or shouldn't be unrolled, anything like that. The kind of
thing that is implied by trying to decide whether something will fit in
an instruction cache.

I suppose it depends on the application. Some people go to lengths to
avoid cache-line thrashing, and all the rest. Most applications don't
require that level of analytical detail, and so your point is valid
for the majority of cases.



Regards,

Uncle Steve
 
S

Seebs

Your approach to the problem assumes bugs a priori.

Why, yes.

I am totally open to reconsidering this if I am ever exposed to an example
of a bug-free program. And no, I haven't seen one yet. And I'm including
things like "an assembly program which exists only to execute a single
instruction which takes no operands".
I suppose it depends on the application. Some people go to lengths to
avoid cache-line thrashing, and all the rest. Most applications don't
require that level of analytical detail, and so your point is valid
for the majority of cases.

And the people who are doing that are often wrong -- if not on the machine
they started on, on the machine their code goes live on.

There are a few people doing stuff on multi-billion dollar supercomputers
who might not be stupid to do that stuff. Not many. Mostly, it's better
to spend the time on things that will have higher payoffs.

-s
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top