Getting the size of a C function

Mark Borgerson · Jan 25, 2010

Anything that relies on the compiler being stupid, or deliberately
crippled ("disable all optimisations") or other such nonsense is a bad
solution. It is conceivable that it might happen to work - /if/ you can
get the compiler in question to generate bad enough code. But it is
highly dependent on the tools in question, and needs to be carefully
checked at the disassembly level after any changes.

In this particular example of a highly risky solution, what happens when
the compiler generates proper code? The compiler is likely to generate
the equivalent of :

int MoveMe(..., bool findend) {
if (findend) "jump" Markend();
// do all the stuff
}

Or perhaps it will inline Markend, MoveMe, or both. Or maybe it will
figure out that MoveMe is never called with "findend" set, and thus
optimise away that branch. All you can be sure of, is that there is no
way you can demand that a compiler produces directly the code you
apparently want it to produce - C is not assembly.

That's true. But it is also true that you can verify that a particular
compiler DOES produce the desired code an use that code effectively.
For embedded programming, it doesn't particularly matter if 50
other compilers don't produce what you want, as long as the compiler
your are using does.

Mark Borgerson

Mark Borgerson · Jan 25, 2010

You get good and bad compilers for all sorts of processors, and even a
half-decent one will be able to move code around if it improves the
speed or size of the target - something that can apply on any size of
processor.

I don't know about typical "comp.lang.c" programmers, but typical
"comp.arch.embedded" programmers use compilers that generate tight code,
and they let the compiler do its job without trying to force the tools
into their way of thinking. At least, that's the case for good embedded
programmers - small and fast code means cheap and reliable
microcontrollers in this line of work. And code that has to be
disassembled and manually checked at every change is not reliable or
quality code.

None of that is at odds with writing a flash update routine once,
verifying that the end of the code is properly marked and using
the code. If you are worried about changes in optimization
levels for future compiles, you can generate a binary library
for the flash update function and link that in future applications.
AFAIK, linking a library does not generally result in any change
in the binary code of the library if the library was generated
from position-independent code. (And, if your going to copy
a function to a differnt location for execution, it had better
be position-independent.)

That said, I will have to look very carefully at some of the
MSP430 code that I have generated---the compiler may access
I/O locations using PC-relative addressing. That would totally
mess up code that got simply copied to RAM. However, that's
an altogether different problem than simply finding the length
of the function.

Mark Borgerson

David Brown · Jan 25, 2010

You give me a great way to segue into something. There are
cases where you simply have no other option than to do
exactly that. I'll provide one example. There are others.

In embedded development, /every/ rule has an exception, except this one

.

There are definitely times when you have to manually check your outputs,
or write code that only works with specific compiler options, or add
assembly code hacks that rely on details of the compiler working. But
you don't do it unless you have no better way - you certainly don't
design in your hacks at the first step.

Another rule for embedded development is always know your tools, and
preferably pick /good/ tools. Microchip are known to be good for many
things - the quality of their 16-bit PIC C compilers is definitely not
one of them.

David Brown · Jan 25, 2010

On 24 Jan, 21:44, David Brown<[email protected]>
wrote:
...

I *think* Mark is aware of the limitations of his suggestion but there
seems to be no C way to solve the OP's problem. It does sound like the
problem only needs to be solved as a one-off in a particular
environment.

You are correct that there is no standard C way to solve the problem.
But for the majority of compilers used in embedded development, there
are ways that will reliably solve this problem when working /with/ the
compiler, rather than /against/ the compiler. We are not trying to get
a highly portable solution here, but it is always better to find a
design that could be reused if possible. And it is always better to
work with the features of your toolset, especially when there is no
standard C solution, rather than trying to find ways to limit your tools.

For this problem, the best solution is generally to use a specific
section for the functions in question. This can often be done using the
gcc "__attribute__" syntax (even for non-gcc compilers), or by using
compiler-specific pragmas. Any tools suitable for embedded development
will support something to this effect, and give you control over the
linking and placement of the function (this is assuming, of course, you
are working with a microcontroller that supports execution from ram).

The details of how you do this depend on the situation. For example,
you may be happy to dedicate the required ram space to the function, or
you may want to copy it into ram only when needed. The former case is
the easiest, as you can arrange for the linker to put the code in flash,
but linked as though it were in ram. There is no need for any
position-independent code, and you can happily debug and step through
the code in ram. You can often "cheat" and put the code in the ".data"
section, then you don't even have to think about the linker file or
copying over the function - the C startup code handles that (since it
treats the function like initialised data). With gcc on the msp430, you
have your function defined something like this:

static void critical __attribute__ ((section(".data"))) progflash(...)

Of course, you still have to ensure that the function doesn't call other
functions - or that these are also in ram. And it is worth checking the
disassembly here if you are not sure - it is easy to accidentally
include library functions calls. But the difference is that you have a
reliable and safe way to achieve the effect you want, that is
independent of details such as the compiler flags or precise compiler
version, and will continue to work even if the source is changed.
Because you are working /with/ the tools, you can take full advantage of
debugging and optimisation. And though the details may vary for
different processors or toolchains, the principle can be re-used. As
with all code that cannot be implemented in standard C, there is always
the possibility of this solution failing with future compilers or
different devices, and you must check the results carefully - but this
is the best you can get.

That said, what about taking function pointers for all functions and
sorting their values? It still wouldn't help with the size of the last
function. Can we assume the data area would follow the code? I guess
not.

You can't make any assumptions about the ordering of code or data. You
cannot practically speaking make function pointers for all functions
without a great deal of effort, and making an unnecessary pointer to a
function cripples the compiler's optimisations of that function and
functions that call it.

David Brown · Jan 25, 2010

That's true. But it is also true that you can verify that a particular
compiler DOES produce the desired code an use that code effectively.
For embedded programming, it doesn't particularly matter if 50
other compilers don't produce what you want, as long as the compiler
your are using does.

True enough - but it /does/ matter that the compiler you are using
produces the code you want each of the 50 times you change and compile
the program, or when you change the compiler flags and compile them, or
when you update the compiler and recompile (I recommend keeping exactly
the same compiler version for any given project, but sometimes that is
not practical). If you have code that relies on working around the
compiler, you need to check it /every/ time, and you are never able to
take advantage of your tools to generate the best code.

Jon Kirwan · Jan 25, 2010

In embedded development, /every/ rule has an exception, except this one .

There are definitely times when you have to manually check your outputs,
or write code that only works with specific compiler options, or add
assembly code hacks that rely on details of the compiler working. But
you don't do it unless you have no better way - you certainly don't
design in your hacks at the first step.

Just to be argumentative (no other good reason, really), one
of my applications requires equal execution times across two
code edges. In other words, the execution time must be
constant regardless which branch is taken. c doesn't provide
for that, quite simply. So the very first thing I do porting
this application to a new processor is to ensure that I can
achieve this well, or if not, exactly what the variability
will be (because I must then relax the clocking rate to
account for it.) It's one of those unknowns that must be
locked down, immediately.

So yes, I hack at the very first step in this case. But I'm
just toying. In general, I take your point here.

......

As an aside, one of the first things I may do with a new c
compiler and target is to explore methods to support process
semantics. The c language doesn't provide quite a number of
very useful semantics, this being one of them.

(Another I enjoy the use of is named, link-time constants.
They are not variable instances, in case you are confused
about my wording here. Instead, they are much like #define
in c except that these constants are link-time, not compile-
time, and if you change them there is no need to recompile
all the c code that uses them. You just change one file that
creates those constants and re-link. The linker patches in
the values, directly. Saves recompile time. Probably every
assembler supports them, and every linker _must_ support
them. But c does not provide syntax to access the semantic
that is available in its own linker.)

With cooperative switching (and I use that where possible,
because it is much easier to implement and support) I may be
able to write fairly simple routines in assembly to support
it (a dozen lines, or two.) But there is no escaping the
idea that whatever I do there relies on details about the
compiler. Different compilers on the MSP430, for example,
make different choices about register assignments, which must
be preserved across calls, which are scratchable, and which
are used to optionally pass parameters (and the conditions
under which registers may be chosen to pass them.)

With pre-emptive switching, it opens up a Pandora's box.
Library routines that may use static memory, for example. But
if pre-emptive switching is a part of the product, then I
face the problems squarely and usually up front in the
development. It's crucial to know exactly what works and how
well it works, right away.

I also enjoy the use of coroutine thunking, from time to
time. This, and process semantics, make for clear, very
readable code that works well and is able to be maintained by
a broader range of programmers (so long as they don't try and
rewrite the core o/s code, of course.)

I still take your point. But I hope you don't mind a small
moment of banter just to add to your suggestion that every
rule has exceptions, including the rule of not hacking things
at the outset.

Another rule for embedded development is always know your tools, and
preferably pick /good/ tools. Microchip are known to be good for many
things - the quality of their 16-bit PIC C compilers is definitely not
one of them.
<snip>

Well, there is that. I cannot defend their use of _static_
memory for compiler temporaries, as they chose to do. It's
unconscionable. Their argument to me (one or two of those
who actually _wrote_ its code) was that it led to faster
emitted code -- in short, it appeared to show off their parts
better. And they felt they "had it covered."

Well, they were wrong and a false bargain was made.

I'm sure they aren't the only ones guilty of choosing to sell
the smell of sizzle over the quality of meat, though. Not by
a long shot.

Jon

David Brown · Jan 25, 2010

Just to be argumentative (no other good reason, really), one

Being argumentative /is/ a good reason if it makes us think.

of my applications requires equal execution times across two
code edges. In other words, the execution time must be
constant regardless which branch is taken. c doesn't provide
for that, quite simply. So the very first thing I do porting
this application to a new processor is to ensure that I can
achieve this well, or if not, exactly what the variability
will be (because I must then relax the clocking rate to
account for it.) It's one of those unknowns that must be
locked down, immediately.

So yes, I hack at the very first step in this case. But I'm
just toying. In general, I take your point here.

That's an example of when you need special consideration. My point is
that you only do that sort of thing if you have no better way to
implement the required functionality.

.....

As an aside, one of the first things I may do with a new c
compiler and target is to explore methods to support process
semantics. The c language doesn't provide quite a number of
very useful semantics, this being one of them.

(Another I enjoy the use of is named, link-time constants.
They are not variable instances, in case you are confused
about my wording here. Instead, they are much like #define
in c except that these constants are link-time, not compile-
time, and if you change them there is no need to recompile
all the c code that uses them. You just change one file that
creates those constants and re-link. The linker patches in
the values, directly. Saves recompile time. Probably every
assembler supports them, and every linker _must_ support
them. But c does not provide syntax to access the semantic
that is available in its own linker.)

Are you talking about using constants in your code which are evaluated
at link time, much in the way that static addresses are handled? Maybe
I've misunderstood you, but that strikes me as a poor way to handle what
are really compile-time constants - it's bad modularisation and
structure (sometimes a single file is the best place to put these
constants - but it should be because that's the best place, not because
you want to fit some weird way of compiling). It is highly
non-standard, potentially leading to confusion and maintenance issues.
It also limits the compiler's options for optimising the code. And if
re-compilation time is a serious issue these days, you need to consider
getting better tools (PC and/or compiler), or making better use of them
(better makefile setup, or use ccache).

Of course, it is always fun getting your tools to do interesting things
in unusual ways - but it's not always a good idea for real work.

With cooperative switching (and I use that where possible,
because it is much easier to implement and support) I may be
able to write fairly simple routines in assembly to support
it (a dozen lines, or two.) But there is no escaping the
idea that whatever I do there relies on details about the
compiler. Different compilers on the MSP430, for example,
make different choices about register assignments, which must
be preserved across calls, which are scratchable, and which
are used to optionally pass parameters (and the conditions
under which registers may be chosen to pass them.)

Yes, these are more examples of where you need to work with the compiler
details.

With pre-emptive switching, it opens up a Pandora's box.
Library routines that may use static memory, for example. But
if pre-emptive switching is a part of the product, then I
face the problems squarely and usually up front in the
development. It's crucial to know exactly what works and how
well it works, right away.

I also enjoy the use of coroutine thunking, from time to
time. This, and process semantics, make for clear, very
readable code that works well and is able to be maintained by
a broader range of programmers (so long as they don't try and
rewrite the core o/s code, of course.)

I still take your point. But I hope you don't mind a small
moment of banter just to add to your suggestion that every
rule has exceptions, including the rule of not hacking things
at the outset.

Well, there is that. I cannot defend their use of _static_
memory for compiler temporaries, as they chose to do. It's
unconscionable. Their argument to me (one or two of those
who actually _wrote_ its code) was that it led to faster
emitted code -- in short, it appeared to show off their parts
better. And they felt they "had it covered."

It is certainly perfectly possible to use static memory for compiler
temporaries, and it will certainly be faster than the normal alternative
(temporaries on a stack) for many small processors. But it has to be
implemented correctly!

Mark Borgerson · Jan 25, 2010

True enough - but it /does/ matter that the compiler you are using
produces the code you want each of the 50 times you change and compile
the program, or when you change the compiler flags and compile them, or
when you update the compiler and recompile (I recommend keeping exactly
the same compiler version for any given project, but sometimes that is
not practical). If you have code that relies on working around the
compiler, you need to check it /every/ time, and you are never able to
take advantage of your tools to generate the best code.

Unless, you put the function into a separately-compiled library to
be linked in when you build the program the next 50 times. If
you change compilers, you may have to rebuild and verify the
library.

Mark Borgerson

Jon Kirwan · Jan 25, 2010

Being argumentative /is/ a good reason if it makes us think.

That's an example of when you need special consideration. My point is
that you only do that sort of thing if you have no better way to
implement the required functionality.

Understood, and agreed.

Are you talking about using constants in your code which are evaluated
at link time, much in the way that static addresses are handled? Maybe
I've misunderstood you, but that strikes me as a poor way to handle what
are really compile-time constants - it's bad modularisation and
structure (sometimes a single file is the best place to put these
constants - but it should be because that's the best place, not because
you want to fit some weird way of compiling). It is highly
non-standard, potentially leading to confusion and maintenance issues.
It also limits the compiler's options for optimising the code. And if
re-compilation time is a serious issue these days, you need to consider
getting better tools (PC and/or compiler), or making better use of them
(better makefile setup, or use ccache).

Of course, it is always fun getting your tools to do interesting things
in unusual ways - but it's not always a good idea for real work.

Well, you are of course correct in the sense that a specific
constant value shouldn't be scattered throughout a series of
modules like casting dust to the winds. It's not a good
idea. Your point is wisely made. However, you are also
wrong in suggesting, once again, some absolute rule that
_always_ applies. In this case, my point remains because
there is _some_ need for the semantic. It doesn't matter if
there are better ways for most things, if there are some
times a need for this semantic.

I think you understood me, correctly. Just in case there is
any question at all, I'm talking about this semantic, if you
are familiar with the Microsoft assembler:

QUANTUM EQU 47
PUBLIC QUANTUM

You can't do that in c. There is no syntax for it.

In the above example, this constant might be the default
number of timer ticks used per process quantum in a round
robin arrangement. But as you say, you are correct to
suggest that this kind of value usually only needs placement
in a single module, so the advantage may arguably be reduced
to a theoretical one, not a practical one. (Though I suppose
I could always posit a specific case where this QUANTUM might
be used in several reasonable places.)

However, there are times where there are values which may be
required in several modules. These may be field masks and
init values, for example, of hardware registers or software
control flags. It's not always the case that writing a
specific subroutine to compose them for you is the better
solution. Sometimes, it's better to expose the constants,
broadly speaking, and use them in a simple, constant-folding,
c language way. Libraries in c are riddled with these.

In addition, these public link-time constants can be used to
conditionally include or exclude code sections. In fact,
almost every compiler uses this fact in one way or the other.
CRT0, in particular, may take advantage of such features to
conditionally include or exclude initialization code for
libraries which may, or may not, have been linked in. And
most linkers support the concept in some fashion -- because
it is needed.

And yes, I'd sometimes like c-level access to it.

Yes, these are more examples of where you need to work with the compiler
details.

Yes. No question.

It is certainly perfectly possible to use static memory for compiler
temporaries, and it will certainly be faster than the normal alternative
(temporaries on a stack) for many small processors. But it has to be
implemented correctly!

Well, _if_ one is going to use statics _then_ of course it
has to be implemented correctly! Who could argue otherwise?

The problem is in the _doing_ of that. It requires (or at
least I imagine so, right now, being ignorant of a better
way) looking at the entire program block to achieve. And
that is a bit of a step-change away from the usual c compiler
mode of operation. It _might_ be implemented in the linker
stage, I suppose. Though I'm struggling to imagine something
a little less than a Rube Goldberg contraption to get there
in the linker side.

As an aside, I have a lot of other things I'd like in c or
c++ which just aren't there. For example, I dearly miss
having access to thunking semantics in c or c++ (which does
NOT break the c/c++ program model in any way, shape, or form
and could easily be implemented as part of either language
with no dire impacts at all. I might use this for efficient
iterators (don't imagine that I'm talking about std library
iterators here, which are somewhat similar in use but in no
way similar in their implementation details -- they are much
less efficient), so also do I miss this. There is no good
reason I can think of not to have it and its utility is
wonderful. (I'd be so happy to talk about it, at some point,
as the examples are excellent and easily shown.)

Jon

Grant Edwards · Jan 25, 2010

Ditto.

If the function is not linked to the RAM run address then it
probably won't work.

Did this a year or so ago with Metrowerks and an HC12 but
don't remember quite how. It wasn't terribly hard once one
knows what is needed. The linker properly generated constant
"variables" containing start and end addresses for me. Quite
properly it would not generate the runtime code for moving the
image from FLASH to RAM as this code was not intended to
permanently reside in RAM.

You've also got to be careful in RAM-resident routine not to
write any code that generates library calls. On a 16-bit CPU,
doing long arithmetic will likely generate a library call, so
be mindful of which integer types you use.

David Brown · Jan 26, 2010

Understood, and agreed.

Well, you are of course correct in the sense that a specific
constant value shouldn't be scattered throughout a series of

Constant values should, like everything else, be declared and defined in
the place that makes most sense for the structure of the program. That
may mean just locally within a module or function, or in a module's
header file, in a program-global header, or occasionally declared in a
header and defined in an implementation file (for better data hiding,
though perhaps missed optimisation opportunities). So the only rule
here is to put them in the right place for the program, not just because
it shaves a quarter second off the re-compile time.

modules like casting dust to the winds. It's not a good
idea. Your point is wisely made. However, you are also
wrong in suggesting, once again, some absolute rule that
_always_ applies. In this case, my point remains because

Rules in embedded development are never absolute - we have both said as
much in this thread. But there are plenty of rules, written and
unwritten, that are strong enough to state as though they always apply.
If you feel you need to break them, you do so when you have clear and
reasoned arguments why your software will be better with the rule broken.

there is _some_ need for the semantic. It doesn't matter if
there are better ways for most things, if there are some
times a need for this semantic.

I think you understood me, correctly. Just in case there is
any question at all, I'm talking about this semantic, if you
are familiar with the Microsoft assembler:

QUANTUM EQU 47
PUBLIC QUANTUM

You can't do that in c. There is no syntax for it.

I can't imagine a time when I would need to do that, or any problem it
might solve. Sometimes you want symbols that are defined at the linker
level to be exported to C - the start and end of a section, for example
- but I don't see any reason to pass constants around within the C
program itself in this way. I suppose you might have an assembly module
which exports constants that you then want to use in C, but it would be
better to use a common #define that is available to both C code and the
assembly code.

In the above example, this constant might be the default
number of timer ticks used per process quantum in a round
robin arrangement. But as you say, you are correct to
suggest that this kind of value usually only needs placement
in a single module, so the advantage may arguably be reduced
to a theoretical one, not a practical one. (Though I suppose
I could always posit a specific case where this QUANTUM might
be used in several reasonable places.)

Can you give me an example in which this is actually a required way to
handle such constants? As I said above, you could have "#define QUANTUM
47" in a header that is included by both the C code and assembly code as
needed. (And if your assembler doesn't like that sort of syntax, get a
better assembler. If that fails, between the C preprocessor and the
assembler's macro capabilities, you should be able to concoct a common
way of including the constants. And if that also fails, write a script
that is called by your Makefile and generates the required headers and
include files.)

However, there are times where there are values which may be
required in several modules. These may be field masks and
init values, for example, of hardware registers or software
control flags. It's not always the case that writing a
specific subroutine to compose them for you is the better
solution. Sometimes, it's better to expose the constants,
broadly speaking, and use them in a simple, constant-folding,
c language way. Libraries in c are riddled with these.

I agree entirely that exposing the global constants is often the best
way of using such values - I hate these silly little "accessor"
functions people write because they think that global data (variable or
constant) is somehow "bad", and it's better to write an inefficient and
unclear global function instead. But your constants should be available
to the compiler at compile time if at all possible - having them
available only at link time wastes your compiler's strengths.

In addition, these public link-time constants can be used to
conditionally include or exclude code sections. In fact,
almost every compiler uses this fact in one way or the other.
CRT0, in particular, may take advantage of such features to
conditionally include or exclude initialization code for
libraries which may, or may not, have been linked in. And
most linkers support the concept in some fashion -- because
it is needed.

And yes, I'd sometimes like c-level access to it.

On devices for which I write my own CRT0 or other pre-C startup code, I
write it in C. Typically there's a couple of lines of assembly to set
the stack pointer and jump to the C startup function, but things like
clearing .bss and copying .data are all done in C. If I wanted sections
that may or may not be included, I'd use C - either with #if's, or by
relying on the compiler to eliminate dead code. And some symbols, such
as section start and end points, are passed from the linker into the C
code - but only those that /must/ be passed in that way.

Yes. No question.

Well, _if_ one is going to use statics _then_ of course it
has to be implemented correctly! Who could argue otherwise?

Apparently the quality controllers and testers of your particular
compiler would argue otherwise!

The problem is in the _doing_ of that. It requires (or at
least I imagine so, right now, being ignorant of a better
way) looking at the entire program block to achieve. And
that is a bit of a step-change away from the usual c compiler
mode of operation. It _might_ be implemented in the linker
stage, I suppose. Though I'm struggling to imagine something
a little less than a Rube Goldberg contraption to get there
in the linker side.

It certainly isn't an easy problem to use statics for temporary data in
a way that makes efficient use of memory - and of course, in a way that
is safe and correct. I'm sure Walter Banks could tell you all about it
(though he might consider it a trade secret).

As an aside, I have a lot of other things I'd like in c or
c++ which just aren't there. For example, I dearly miss
having access to thunking semantics in c or c++ (which does
NOT break the c/c++ program model in any way, shape, or form
and could easily be implemented as part of either language
with no dire impacts at all. I might use this for efficient
iterators (don't imagine that I'm talking about std library
iterators here, which are somewhat similar in use but in no
way similar in their implementation details -- they are much
less efficient), so also do I miss this. There is no good
reason I can think of not to have it and its utility is
wonderful. (I'd be so happy to talk about it, at some point,
as the examples are excellent and easily shown.)

What do you mean by "thunking" in this context? The term has several
meanings, as far as I know. If your answer is going to take more than a
dozen lines (you are better known for your in-depth explanations than
your short summaries!), it should probably be in its own thread.

mvh.,

David

Jon Kirwan · Jan 26, 2010

Constant values should, like everything else, be declared and defined in
the place that makes most sense for the structure of the program. That
may mean just locally within a module or function, or in a module's
header file, in a program-global header, or occasionally declared in a
header and defined in an implementation file (for better data hiding,
though perhaps missed optimisation opportunities). So the only rule
here is to put them in the right place for the program, not just because
it shaves a quarter second off the re-compile time.

Rules in embedded development are never absolute - we have both said as
much in this thread. But there are plenty of rules, written and
unwritten, that are strong enough to state as though they always apply.
If you feel you need to break them, you do so when you have clear and
reasoned arguments why your software will be better with the rule broken.

I can't imagine a time when I would need to do that, or any problem it
might solve. Sometimes you want symbols that are defined at the linker
level to be exported to C - the start and end of a section, for example
- but I don't see any reason to pass constants around within the C
program itself in this way. I suppose you might have an assembly module
which exports constants that you then want to use in C, but it would be
better to use a common #define that is available to both C code and the
assembly code.

Can you give me an example in which this is actually a required way to
handle such constants? As I said above, you could have "#define QUANTUM
47" in a header that is included by both the C code and assembly code as
needed. (And if your assembler doesn't like that sort of syntax, get a
better assembler. If that fails, between the C preprocessor and the
assembler's macro capabilities, you should be able to concoct a common
way of including the constants. And if that also fails, write a script
that is called by your Makefile and generates the required headers and
include files.)

I agree entirely that exposing the global constants is often the best
way of using such values - I hate these silly little "accessor"
functions people write because they think that global data (variable or
constant) is somehow "bad", and it's better to write an inefficient and
unclear global function instead. But your constants should be available
to the compiler at compile time if at all possible - having them
available only at link time wastes your compiler's strengths.

On devices for which I write my own CRT0 or other pre-C startup code, I
write it in C. Typically there's a couple of lines of assembly to set
the stack pointer and jump to the C startup function, but things like
clearing .bss and copying .data are all done in C. If I wanted sections
that may or may not be included, I'd use C - either with #if's, or by
relying on the compiler to eliminate dead code. And some symbols, such
as section start and end points, are passed from the linker into the C
code - but only those that /must/ be passed in that way.

Let's just leave it here as an agreement to disagree, then. I
believe you are sincere in considering what I've already
written, probably doing me more justice than I may deserve.
But since you still cannot gather, I have to assume it's my
fault in writing poorly and that the effort may require more
time than I care to have for a semantic only of some small
value. We've already made more of it than the newsgroup's
time is worth. So I'm fine leaving the topic behind and just
saying that I have used the semantic before to good use and
sometimes miss it in c. There's no payoff here. Let's just
leave it as a mild disagreement over a not-terribly-important
issue.

Apparently the quality controllers and testers of your particular
compiler would argue otherwise!

I think they simply didn't see it, before. A failure to
imagine as well as they should have done. Nothing more.

It certainly isn't an easy problem to use statics for temporary data in
a way that makes efficient use of memory - and of course, in a way that
is safe and correct. I'm sure Walter Banks could tell you all about it
(though he might consider it a trade secret).

I'm sure he knows a great deal more about the complexities
than I do, so you are most certainly right that he could tell
me about the subject. I've never claimed expertise here.
Merely the ability to observe specific failures when I see
poor results from poorly imagined solutions.

What do you mean by "thunking" in this context? The term has several
meanings, as far as I know. If your answer is going to take more than a
dozen lines (you are better known for your in-depth explanations than
your short summaries!), it should probably be in its own thread.

There are some excellent discussions already available. See,
for example, the Metaware C (and Pascal) compiler manuals and
their implemention of an iterator semantic, as well as their
extensive discussion (with well-made examples) of its
abundant benefits. There is also a somewhat different, but
also interesting discussion made by Randall Hyde in The Art
of Assembly manuals he generously put together some years
back. (Don't worry yourself thumbing through Microsoft's use
of the term.)

....

The following case below is not nearly as well thought out,
syntax wise, as Metaware's implementation (in other words,
don't mistake it as a complete syntax designed by experts)
but it may get the seed of a point across.

for ( p in Primes( 20, 70 ) )
printf( "%d\n", p );

The code for Primes() might be, in short-hand, something like
this below. (Please excuse my abuse of basic knowledge about
prime numbers by using an increment by 1 in the for loop or
any other tests for starting or ending on an even number only
because I want to keep the code with as few lines as is
reasonable to make the point. The idea is the point, not the
implementation here.)

integer Primes( int a, int b ) {
int i;
for ( i= a; i <= b; ++i )
if ( IsPrime( i ) )
yield( i );
return;
}

I'm intentionally being brief, as well as leaving out a short
discussion of each line. Partly, because longer examples may
interfere with the central points. Partly, because you
shouldn't need any discussion and should be able to almost
instantly see what the above code "means" without that extra.
It's in a form that should be plain without a manual.

The place to focus isn't so much on examining Primes(), but
instead more by imagining a wide variety of the types of
for() loops which may require the mechanism itself. (Not my
poor example, which may or may not be useful to anyone.)

For example, you might prefer to imagine that Primes() is
instead a routine that yields all the nodes of some arbitrary
tree or graph using some very specific walking mechanism. If
you use your imagination and let it take you for a ride, then
perhaps the point may be clarified.

If that gets you towards where I'm pointing, then the next
question is how would you implement this in assembly code,
consistent with c compilers you are aware of and in such a
way that does _not_ break an existing linker in the process?

On the other hand, if this doesn't do it for you at all -- if
in short, your imagination isn't moved by those examples
beyond the short distance I walked with them -- then let me
commend again Metaware's c/pascal implementations and Hyde's
AofA documentation before further discussion continues.

But less than the above made my imagination literally spin
with ideas when I first came across it. So maybe the above
is enough. I hope so. It's also closely connected in a
round about way to the idea of nested functions, aka Pascal.
(If you see the connection, then I think you've probably got
the larger picture.)

Jon

Jon Kirwan · Jan 26, 2010

Okay. I'm not sure whether I am misunderstanding you, or disagreeing
with you, but I'm happy to leave it for now. Maybe the topic will turn
up another time, and it will all suddenly become obvious.

I only introduced it as an aside, to start. It's just not
that important.

Ah, what you are talking about is often called a generator (for example,
in Python or JavaScript), or perhaps a closure. A generator is somewhat
like a lazily evaluated list, although it could be generalised (for
example, do the parameter values have to be the same for repeat calls as
they are for the initial call?).

I've used generators in Python - they are a very nice way to solve some
kinds of problems. Unfortunately, they are not easy to implement
cleanly in a stack-based language because a general implementation
requires that the generator (or "thunk", if you prefer) has its own
stack for arbitrary local variables and state. Thus most languages that
implement generators use garbage collection.

You are way off the reservation, already. Let me suggest you
think about the _implementation_ for a moment. Maybe that
will clarify what I'm pointing towards. For one thing,
garbage collection has _NOTHING_ whatever to do with it. If
you think so, you are far off-point.

You can get some of the features of generators in C++ using a class to
encapsulate the generator's state in class data. The newer lambda
syntax makes it a little neater, and comes somewhat closer to generators
or closures - but there are limitations. You can't implement them in
general without substantial helper code (as seen in boost's libraries)
or a major change in the structure of the language (including garbage
collection).

In short, generators (and closures) are a very nice high-level concept -
you need a high level language (or, somewhat ironically, assembly) to
use them, and C / C++ are not suitable.

I recommend that reading I suggested. Take the AofA one
first. It's easier to get ahold of. It gets into the
details of implementation, but does NOT deal with it as a c
level concept. So you will have to fend for yourself there
until you can get ahold of Metaware docs. (Or, I might be
tempted to copy some of them for this purpose.)

Anyway, I can see I've sent you spinning in the wrong
direction. Take a breath, read AofA on the topic of thunks
and the nearby related chapters to it. That should provide
an idea about implementation. Not the _whole_ idea, by the
way. As it might be done in c, it involves the concept of
nested functions (which you clearly don't yet see) without
the use of the specific syntax you are used to seeing for
them (it's entirely hidden at the c language level, but
explicit at the assembly level.) If you _see_ this much, we
are probably on the same page.

Jon

David Brown · Jan 27, 2010

You are way off the reservation, already. Let me suggest you
think about the _implementation_ for a moment. Maybe that
will clarify what I'm pointing towards. For one thing,
garbage collection has _NOTHING_ whatever to do with it. If
you think so, you are far off-point.

I recommend that reading I suggested. Take the AofA one
first. It's easier to get ahold of. It gets into the
details of implementation, but does NOT deal with it as a c
level concept. So you will have to fend for yourself there
until you can get ahold of Metaware docs. (Or, I might be
tempted to copy some of them for this purpose.)

I don't have these books and manuals you are referring to, nor do I have
easy access to them. If you have web links of interest then I'll
happily look at them - but I am not going to find and order books just
to read a few pages. This discussion is interesting, but there's a
limit to what is a practical and appropriate use of time and money for a
discussion.

Anyway, I can see I've sent you spinning in the wrong
direction. Take a breath, read AofA on the topic of thunks
and the nearby related chapters to it. That should provide
an idea about implementation. Not the _whole_ idea, by the
way. As it might be done in c, it involves the concept of
nested functions (which you clearly don't yet see) without
the use of the specific syntax you are used to seeing for
them (it's entirely hidden at the c language level, but
explicit at the assembly level.) If you _see_ this much, we
are probably on the same page.

Nested functions are perfectly possible in some extensions to C - in
particular, gcc supports them (since gcc also supports Ada, which has
nested functions, much of the gcc structure already supports nested
functions, and thus the C and C++ front-ends can get them almost for free).

Nested functions, C++ classes, the new C++ lambda syntax, etc., are all
ways to implement a limited form of generator or iterator. Compiler
extensions can be used to make a nicer syntax, and to automate some of
the manual work involved. But without some sort of multiple stack
system or garbage collection, you have serious limitations. I don't
mean to say that these ideas are not useful despite the limitations -
just that you cannot add proper flexible generators to a language with
the sort of structure of C or C++ without fundamental changes to the way
the language works - the compiler would need to be free to allocate (and
free) dynamic memory as needed, rather than through explicit malloc /
new calls.

It could well be that what you call "thunking" really means "generators
with various limitations", in which case you are right that garbage
collection is not needed, and it's reasonably easy to figure out several
good implementations. But the term "thunking" is not a well known or
well-defined expression, and is used in many different ways by different
people - I have no idea how the author of a particular book you've read
happens to use it.

To look at some more general generators, and see why they can be used
much more freely in a language like Python than they can in C or C++,
let's vary your Primes function, using Python syntax so that we have
actual working code:

def IsPrime(i) :
if i < 2 :
return False
for j in range(2, int(math.sqrt(i)) + 1) :
if (i % j) == 0 :
return False
return True

def Primes(a, b) :
i = a
if (i == 2) :
yield i
if (i % 2 == 0) :
i = i + 1
while ( i <= b ) :
if (IsPrime(i)) :
yield i
i = i + 1
return

for p in Primes(1, 20) :
print p

To make this implementation of Primes work, the Primes closure has to
include information about where the execution currently is in the Primes
function, and in general it must also track any local variables. This
becomes increasingly difficult for more complex functions, especially
when doing it manually - you have to define a struct (or C++ class) to
hold all the local data, as well as a current "state" which is used for
jumping to the correct re-entry point on later calls. A language
extension could hide and automate much of this, however.

If you were using such generators in a real program, you would want to
use structured and modular programming - and then things get difficult.
To use generators in a stack-based language, you would have to
allocate a structure containing all the local data on the caller
function's stack - that means you (either the programmer figuring out
the closure data manually, or the extended compiler) need access to the
implementation when declaring the generation and using it. With a
garbage collecting language, the generator itself would allocate space
on the heap as needed - the caller need not know anything about the details.

You start getting really high-level programming when you can pass
generators around as parameters and return values. This is something
that cannot be done with a stack model - if a function returns a
generator (or any function which requires closure data), the closure
data must exist even after the calling stack frame has exited. There
are ways to implement this without a general garbage collection facility
(for example, a pointer to a clean-up function could be passed up or
down the call chain while the closure itself is on the heap). But
basically, complex function manipulation like this needs more advanced
automatic control of the memory that you get in C or C++.

Jon Kirwan · Jan 27, 2010

I don't have these books and manuals you are referring to, nor do I have
easy access to them. If you have web links of interest then I'll
happily look at them - but I am not going to find and order books just
to read a few pages. This discussion is interesting, but there's a
limit to what is a practical and appropriate use of time and money for a
discussion.

Nested functions are perfectly possible in some extensions to C - in
particular, gcc supports them (since gcc also supports Ada, which has
nested functions, much of the gcc structure already supports nested
functions, and thus the C and C++ front-ends can get them almost for free).

Yes, but as a general rule I don't always have the option of
using gcc. Customers sometimes already have existing tools
they want used, for example. There are other reasons, too.
So it's not a general solution. Just an interesting one.

Nested functions, C++ classes, the new C++ lambda syntax, etc., are all
ways to implement a limited form of generator or iterator. Compiler
extensions can be used to make a nicer syntax, and to automate some of
the manual work involved. But without some sort of multiple stack
system or garbage collection, you have serious limitations. I don't
mean to say that these ideas are not useful despite the limitations -
just that you cannot add proper flexible generators to a language with
the sort of structure of C or C++ without fundamental changes to the way
the language works - the compiler would need to be free to allocate (and
free) dynamic memory as needed, rather than through explicit malloc /
new calls.

It could well be that what you call "thunking" really means "generators
with various limitations", in which case you are right that garbage
collection is not needed, and it's reasonably easy to figure out several
good implementations. But the term "thunking" is not a well known or
well-defined expression, and is used in many different ways by different
people - I have no idea how the author of a particular book you've read
happens to use it.

Hopefully, the above will tell you more.

To look at some more general generators, and see why they can be used
much more freely in a language like Python than they can in C or C++,
let's vary your Primes function, using Python syntax so that we have
actual working code:

def IsPrime(i) :
if i < 2 :
return False
for j in range(2, int(math.sqrt(i)) + 1) :
if (i % j) == 0 :
return False
return True

def Primes(a, b) :
i = a
if (i == 2) :
yield i
if (i % 2 == 0) :
i = i + 1
while ( i <= b ) :
if (IsPrime(i)) :
yield i
i = i + 1
return

for p in Primes(1, 20) :
print p

To make this implementation of Primes work, the Primes closure has to
include information about where the execution currently is in the Primes
function, and in general it must also track any local variables. This
becomes increasingly difficult for more complex functions, especially
when doing it manually - you have to define a struct (or C++ class) to
hold all the local data, as well as a current "state" which is used for
jumping to the correct re-entry point on later calls. A language
extension could hide and automate much of this, however.

In fact, that's what is vital. In the implementation done by
Metaware's compilers, it was very well handled and the
implementation was quite general and nestable to any depth
without the programmer worrying over details such as that.
It's all simply kept as stack frame contexts, just as normal
functions do. The difference is that a thunk is used,
instead, to move back and forth in order to preserve the
stack context while the iterator remains "live." Once the
iterator completes, though, the stack is unwound in the usual
way and the context disappears just as you would expect for
any function call.

If you were using such generators in a real program, you would want to
use structured and modular programming - and then things get difficult.

Things do not get difficult. I've used metaware's compiler
tools and there was NO difficulty involved. It's _exactly_
like using c, except you've got a wonderful additional
semantic to handle, in beautiful and efficient ways, concepts
like walking graphs. The idea of "data hiding" is expanded
to also now include "algorithm hiding," but in a very light
weight fashion that is entirely consistent with the c
worldview.

To use generators in a stack-based language, you would have to
allocate a structure containing all the local data on the caller
function's stack - that means you (either the programmer figuring out
the closure data manually, or the extended compiler) need access to the
implementation when declaring the generation and using it. With a
garbage collecting language, the generator itself would allocate space
on the heap as needed - the caller need not know anything about the details.

Got it. I understand your point now about garbage collection
-- makes sense. But the way Metaware handles it is beautiful
and doesn't require any of that. It's entirely handled
within the standard c style program model with a single stack
and all the usual, normal stack frame elements. The body of
a for loop is placed into a separate, nested function within
the body of the enclosing function. The iterator is called
using all the usual means, but includes a pointer to the
body. The iterator may itself call any number of other
functions, as well as other iterators if it likes, which may
be nested down the stack to any depth you want. When a yield
takes place, it is really a call to the for-body nested
function but with the stack frame pointer set to the
enclosing function so all the usual local variables are
appropriately accessible off of the base pointer reference
that all c compilers may normally use. The nested function
returns rather normally, restoring the frame back to the down
stream end of the stack. If the for-body temporarily stores
on the stack, it does so at the end of course and obviously
must restore it before returning. But that's just basic,
anyway.

You start getting really high-level programming when you can pass
generators around as parameters and return values. This is something
that cannot be done with a stack model - if a function returns a
generator (or any function which requires closure data), the closure
data must exist even after the calling stack frame has exited. There
are ways to implement this without a general garbage collection facility
(for example, a pointer to a clean-up function could be passed up or
down the call chain while the closure itself is on the heap). But
basically, complex function manipulation like this needs more advanced
automatic control of the memory that you get in C or C++.

So let me think about this for a second. Passing a generator
would involve being able to return not only a pointer to
code, but also its entire current context (and any such
context of all activation records still active at the time?)
In other words, it's not just a generator at its initial
point, but one that may have already been used for a bit but
hasn't yet completed and so it can be returned to a caller
for more continued use? Interesting, and I gather the
additional value here.

However, as an _embedded_ programmer usually working on tiny
micros, not workstation-competent board-level systems, I'm
focused upon very modest but very useful extensions where I
_know_ I have good application and where I don't have to pay
for it with a significant change to the existing models I
have grown to know well and fully understand and trust.

That said, I'd be interested in seeing how to implement
something like that which would work in ways where the
run-time execution duration is entirely predictable and
invariant (knowing, obviously, the initial conditions for the
generator.) I think you hint towards this, but I'd need to
see a specific implementation.

In the meantime, you might look at the PDFs I've referred you
towards. They aren't that long and are quite detailed. They
do NOT show you the implementation used by Metaware, but I
can talk about that.

Jon

David Brown · Jan 28, 2010

Go here:
http://webster.cs.ucr.edu/AoA/Windows/PDFs/0_PDFIndexWin.html

Then download and read all of Volume 5.
http://webster.cs.ucr.edu/AoA/Windows/PDFs/Volume5.pdf
http://webster.cs.ucr.edu/AoA/Windows/PDFs/Thunks.pdf
http://webster.cs.ucr.edu/AoA/Windows/PDFs/Iterators.pdf
http://webster.cs.ucr.edu/AoA/Windows/PDFs/Coroutines.pdf
http://webster.cs.ucr.edu/AoA/Windows/PDFs/ParameterImplementation.pdf
http://webster.cs.ucr.edu/AoA/Windows/PDFs/LexicalNesting.pdf
http://webster.cs.ucr.edu/AoA/Windows/PDFs/V5Questions.pdf

I apologize for not doing this earlier. I had expected that
you already knew about AofA and it's availability on the web.
Had I known you didn't know about it, I would have
immediately provided you the links. Again, my sincere
apologies for not doing this earlier.

Thanks - that makes a /huge/ difference! Of course, now I just need the
time to read it. As I've only had a brief look at it (I read most of
the chapter on thunks), I may be misjudging it here, but I have
difficulty seeing the relevance of the book at this time. I can see the
point of a DOS assembly book long ago, and I can see the point of a
general book on assembly for multiple architectures. But I can't think
of any reason (other than for fun) why anyone would write software in
assembly for the x86 - and certainly not for Windows. There are
certainly times when you might want to /use/ assembly on an x86 -
speeding up critical loops, for example - but not to write entire
programs. The HLA concept strikes me as a waste of time in this day and
age.

Having said that, some of the concepts (such as in the chapters you have
indicated) are interesting and have wider applications. Coroutines are
useful devices - it's just that the implementation details of how to use
them with HLA x86 assembly are irrelevant to reality. Had the author
shown how to use them in C, Java, Python, or even in an artificial HLL,
it would have been more useful.

Anyway, now I see what you mean by the term "thunk" - and it is clear
from the book that these are limited devices that are basically
equivalent to a C++ class with initialisation of private data values and
a single method (or alternatively a C function that takes a struct
pointer). Useful, but hardly revolutionary. Your proposed syntax for
them in C is, however, neat and elegant - that would be a useful
addition to the C language.

Yes, but as a general rule I don't always have the option of
using gcc. Customers sometimes already have existing tools
they want used, for example. There are other reasons, too.
So it's not a general solution. Just an interesting one.

Agreed. I use various gcc extensions if I think they improve the code
(with the emphasis here on improving the source code rather than the
target code - that's a bonus). I haven't used nested functions - they
often make code less readable because it is often unclear where
different functions start and end.

Hopefully, the above will tell you more.

In fact, that's what is vital. In the implementation done by
Metaware's compilers, it was very well handled and the
implementation was quite general and nestable to any depth
without the programmer worrying over details such as that.
It's all simply kept as stack frame contexts, just as normal
functions do. The difference is that a thunk is used,
instead, to move back and forth in order to preserve the
stack context while the iterator remains "live." Once the
iterator completes, though, the stack is unwound in the usual
way and the context disappears just as you would expect for
any function call.

I agree here that such a syntax and compiler-aided handling of the
details would give you a very nice way to use these "thunks" - much more
convenient that doing things manually in C or C++. I suspect you could
get a fair way with "normal C" using a system similar to Adam Dunkels'
protothreads - but integrating it into the language would be best.

Things do not get difficult. I've used metaware's compiler
tools and there was NO difficulty involved. It's _exactly_
like using c, except you've got a wonderful additional
semantic to handle, in beautiful and efficient ways, concepts
like walking graphs. The idea of "data hiding" is expanded
to also now include "algorithm hiding," but in a very light
weight fashion that is entirely consistent with the c
worldview.

Got it. I understand your point now about garbage collection
-- makes sense. But the way Metaware handles it is beautiful
and doesn't require any of that. It's entirely handled
within the standard c style program model with a single stack
and all the usual, normal stack frame elements. The body of

The Metaware implementation, as far as I can see, is limited to
situations where the thunk's frame can be allocated on the stack (or
possibly as a statically allocated region). That is certainly the
situation described in AofA. That is, of course, an entirely reasonable
limitation for an extension to C.

My view of such concepts has come down from higher-level languages like
Python (and also functional programming languages), in which you have
much more general capability in how you work with function-like objects
and closures. From that angle, these "thunks" look limited, because you
need a compiler and run-time that handles dynamic memory (typically some
sort of garbage collection, but that's not absolutely necessary) to
implement the capabilities I assume. But when you are thinking of these
as an upwards extension of C, I can see these being a very useful
addition to the language.

a for loop is placed into a separate, nested function within
the body of the enclosing function. The iterator is called
using all the usual means, but includes a pointer to the
body. The iterator may itself call any number of other
functions, as well as other iterators if it likes, which may
be nested down the stack to any depth you want. When a yield
takes place, it is really a call to the for-body nested
function but with the stack frame pointer set to the
enclosing function so all the usual local variables are
appropriately accessible off of the base pointer reference
that all c compilers may normally use. The nested function
returns rather normally, restoring the frame back to the down
stream end of the stack. If the for-body temporarily stores
on the stack, it does so at the end of course and obviously
must restore it before returning. But that's just basic,
anyway.

So let me think about this for a second. Passing a generator
would involve being able to return not only a pointer to
code, but also its entire current context (and any such
context of all activation records still active at the time?)
In other words, it's not just a generator at its initial
point, but one that may have already been used for a bit but
hasn't yet completed and so it can be returned to a caller
for more continued use? Interesting, and I gather the
additional value here.

That's correct. You can see here how this requires the generator's
local frame to remain valid after the function that created it has
exited - that means it has to exist outside the main stack. Thus for
this sort of thing to be handled directly by the language and the
compiler, rather than through explicit "new" or "malloc" calls, the
compiler has to have a direct understanding and control of dynamic memory.

However, as an _embedded_ programmer usually working on tiny
micros, not workstation-competent board-level systems, I'm
focused upon very modest but very useful extensions where I
_know_ I have good application and where I don't have to pay
for it with a significant change to the existing models I
have grown to know well and fully understand and trust.

I agree here - and I would appreciate the addition to C of the sort of
capabilities you have been describing. For embedded systems, it is
important to be able to understand the implementation for the code you
write, and that is possible for "thunks" as you have described them. On
a PC, it (typically) doesn't matter if the software takes a few extra MB
of run space, and runs through a byte code virtual machine - thus I
program in Python and take advantage of the language's power to write
shorter code.

bartc · Jan 28, 2010

David Brown said:
On 27/01/2010 21:15, Jon Kirwan wrote:

Thanks - that makes a /huge/ difference! Of course, now I just need the
time to read it. As I've only had a brief look at it (I read most of the
chapter on thunks), I may be misjudging it here, but I have difficulty
seeing the relevance of the book at this time. I can see the point of a
DOS assembly book long ago, and I can see the point of a general book on
assembly for multiple architectures. But I can't think of any reason
(other than for fun) why anyone would write software in assembly for the
x86 - and certainly not for Windows. There are certainly times when you
might want to /use/ assembly on an x86 - speeding up critical loops, for
example - but not to write entire programs. The HLA concept strikes me as
a waste of time in this day and age.

Assembler is 100% flexible compared to any HLL, even C, so sometimes it
makes life easier.

While you probably wouldn't write applications in it, there are types of
programs which do have a big proportion of assembler (in my case, these are
interpreters).

Hyde's HLA is not for everyone, but I use of form of HLA (inline assembler
within a HLL) which makes writing large amounts of assembler much less
painful.

And, if you are working on a language product which generates assembler
code, then you need to understand how it works even if you are not manually
writing the code yourself.

Jon Kirwan · Jan 28, 2010

Thanks - that makes a /huge/ difference! Of course, now I just need the
time to read it. As I've only had a brief look at it (I read most of
the chapter on thunks), I may be misjudging it here, but I have
difficulty seeing the relevance of the book at this time. I can see the
point of a DOS assembly book long ago, and I can see the point of a
general book on assembly for multiple architectures. But I can't think
of any reason (other than for fun) why anyone would write software in
assembly for the x86 - and certainly not for Windows. There are
certainly times when you might want to /use/ assembly on an x86 -
speeding up critical loops, for example - but not to write entire
programs. The HLA concept strikes me as a waste of time in this day and
age.

Having said that, some of the concepts (such as in the chapters you have
indicated) are interesting and have wider applications. Coroutines are
useful devices - it's just that the implementation details of how to use
them with HLA x86 assembly are irrelevant to reality. Had the author
shown how to use them in C, Java, Python, or even in an artificial HLL,
it would have been more useful.

Anyway, now I see what you mean by the term "thunk" - and it is clear
from the book that these are limited devices that are basically
equivalent to a C++ class with initialisation of private data values and
a single method (or alternatively a C function that takes a struct
pointer). Useful, but hardly revolutionary. Your proposed syntax for
them in C is, however, neat and elegant - that would be a useful
addition to the C language.

Agreed. I use various gcc extensions if I think they improve the code
(with the emphasis here on improving the source code rather than the
target code - that's a bonus). I haven't used nested functions - they
often make code less readable because it is often unclear where
different functions start and end.

I agree here that such a syntax and compiler-aided handling of the
details would give you a very nice way to use these "thunks" - much more
convenient that doing things manually in C or C++. I suspect you could
get a fair way with "normal C" using a system similar to Adam Dunkels'
protothreads - but integrating it into the language would be best.

The Metaware implementation, as far as I can see, is limited to
situations where the thunk's frame can be allocated on the stack (or
possibly as a statically allocated region). That is certainly the
situation described in AofA. That is, of course, an entirely reasonable
limitation for an extension to C.

My view of such concepts has come down from higher-level languages like
Python (and also functional programming languages), in which you have
much more general capability in how you work with function-like objects
and closures. From that angle, these "thunks" look limited, because you
need a compiler and run-time that handles dynamic memory (typically some
sort of garbage collection, but that's not absolutely necessary) to
implement the capabilities I assume. But when you are thinking of these
as an upwards extension of C, I can see these being a very useful
addition to the language.

That's correct. You can see here how this requires the generator's
local frame to remain valid after the function that created it has
exited - that means it has to exist outside the main stack. Thus for
this sort of thing to be handled directly by the language and the
compiler, rather than through explicit "new" or "malloc" calls, the
compiler has to have a direct understanding and control of dynamic memory.

I agree here - and I would appreciate the addition to C of the sort of
capabilities you have been describing. For embedded systems, it is
important to be able to understand the implementation for the code you
write, and that is possible for "thunks" as you have described them. On
a PC, it (typically) doesn't matter if the software takes a few extra MB
of run space, and runs through a byte code virtual machine - thus I
program in Python and take advantage of the language's power to write
shorter code.

Sweet. Now if we can just convince those c-standard folks!

By the way, just so you know, Metaware's founder (one of
them, at least) was Dr. Frank DeRemer. He's known well for
his Ph.D. thesis on LALR parsing, "Practical translators for
LR(k) languages," MIT, Cambridge, Massachusetts, 1969. He
and Dr. Tom Pennello went on to write some tools for compiler
compilers and an article called "Efficient Computation of
LALR(1) Look-Ahead Set," TOPLAS, vol 4, no 4, in October
1982. Which was around the time, I think, that Metaware was
becoming a reality of sorts.

I very much enjoyed my conversations and learned a few things
from them (especially Tom), back around that time. They were
generous with their time and help and willingness to teach.

If you wonder how a paper that doesn't use LALR in its title
is about that, take a look at the wiki page here:

http://en.wikipedia.org/wiki/LALR_parser_generator

Dr. DeRemer invented LALR.

Jon

Jon Kirwan · Jan 28, 2010

Assembler is 100% flexible compared to any HLL, even C, so sometimes it
makes life easier.

While you probably wouldn't write applications in it, there are types of
programs which do have a big proportion of assembler (in my case, these are
interpreters).

Hyde's HLA is not for everyone, but I use of form of HLA (inline assembler
within a HLL) which makes writing large amounts of assembler much less
painful.

And, if you are working on a language product which generates assembler
code, then you need to understand how it works even if you are not manually
writing the code yourself.

This last paragraph makes an excellent point, regardless of
how one may take the rest of what you say (which I also
consider well-spoken.)

Jon

David Brown · Jan 28, 2010

Assembler is 100% flexible compared to any HLL, even C, so sometimes it
makes life easier.

While you probably wouldn't write applications in it, there are types of
programs which do have a big proportion of assembler (in my case, these
are interpreters).

Hyde's HLA is not for everyone, but I use of form of HLA (inline
assembler within a HLL) which makes writing large amounts of assembler
much less painful.

And, if you are working on a language product which generates assembler
code, then you need to understand how it works even if you are not
manually writing the code yourself.

I am not saying there is no place for assembly - for small systems,
assembly can still be a good choice (and I have done a lot of assembly
programming on small systems through the years). There are also parts
of large systems that are best done in assembly. And of course you
should understand assembly when working with embedded systems, and as
you say, a compiler writer is going to have to be an assembly expert.
But the days of writing large applications in x86 assembly for PCs (this
book is targeting x86 assembly for windows) are long gone, bar a few
specialist applications or keen enthusiasts.

Need help getting the duration of an audio file	7	Mar 31, 2022
Setting array size with a variable - What does the C compiler do?	3	Feb 25, 2022
Coding for the mathematical function in C	1	Aug 29, 2021
Getting extra blank rows from appending HTML..?	2	Oct 24, 2023
Mystic A function carried in C++?	0	Oct 10, 2023
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Getting value of instances of variable.	1	Mar 25, 2023
Repetitive Function	3	Mar 23, 2023

Getting the size of a C function

Mark Borgerson

Mark Borgerson

David Brown

David Brown

David Brown

Jon Kirwan

David Brown

Mark Borgerson

Jon Kirwan

Grant Edwards

David Brown

Jon Kirwan

Jon Kirwan

David Brown

Jon Kirwan

David Brown

bartc

Jon Kirwan

Jon Kirwan

David Brown

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads