C Compilation..

Keith Thompson · Aug 23, 2008

jacob navia said:
Look Heathfield, if you can't read it's nobody problem but yours;

As usual, you choose to be insulting just because you're talking to
Richard Heathfield.

It's a 258-page document. Do you expect him, or anyone else, to read
the whole thing just to answer a question? Sheesh.

Take for instance this C code. I add line numbers for reference

(1) int heathfield(short cbfalconer,double jkuyper)
(2) {
(3) }

I presume that doesn't appear in the ABI.

Here we have in line 1 a function that returns an integer
and receives two arguments: a short and a double.

Now, in the *real* world, we have to specify in which machine
register the argument is returned, and in which machine
register the arguments are passed if at all.

The C standard doesn't mention at all this, for obvious reasons.
An ABI then, is a standard thet allows programs compiled by
different vendors and even in different languages to cooperate
by means of a common interface.

The ABI mentioned above specifies how are arguments passed in
machine code to the function, and how the result is stored
when the function exits.

In line 2 we have an opening brace, i.e. a function prologue.
This is specified too, as well as line 3, the function epilogue.

So the ABI specifies how C code interfaces with other code, and that
certainly affects the generated code in some circumstances. But
that's nowhere near what the OP was asking about, namely "standards
that tell us how c code has to be compiled into machine code".

The ABI presumably says that the first argument, if it's of certain
types, is passed by copying it to some specified register (for
example; I don't know the details). That specifies the *effect* of
the generated code. My guess is that it doesn't mandate any
particular machine instructions to accomplish that effect. I would
expect that two compilers that achieve the same effect using different
instruction sequences would both satisfy the ABI's requirements.

Perhaps I've guessed incorrectly. (I'm not going to take the time to
download and read a 258-page document.) Perhaps, for some reason, it
really does mandate specific instruction sequences (if so, it would be
difficult to detect an implementation that violates such a requirement
by using a different instruction sequence to achieve the same effect).
But even if that's the case, I doubt that the ABI says anything about
*internal* code, i.e., for C code that doesn't interact with anything
outside the program.

You presented an example of a function that takes two arguments and
does nothing with them, an example specifically restricted to what the
ABI covers. What about something like this?

unsigned long factorial(unsigned long n)
{
if (n <= 1) return n;
return n * factorial(n - 1);
}

Assume that the compiler can determine that this function isn't called
from outside the compilation unit that contains it.

The ABI specifies how the function is to be called, how the argument
is to be passed to it, and how the returned value is to be made
available to the caller, but does it say *anything* else about the
machine code generated for this function? Does it specify whether
"n - 1" is computed using a subtraction or a decrement instruction?

Does it either require or forbid tail recursion optimization? Does it
require "n <= 1" to be computed using an integer comparison operator,
or can the compiler generate code to evaluate "!(n & ~1)"?

An ABI specifies how generated code interfaces with other code. It
*might* specify the actual generated machine code to accomplish this,
though I don't see why it would need to do so. It *doesn't* specify
in general what machine code is generated for C source code, which is
what the OP was asking about.

Yeah, C can be interpreted by Heathfield and co.
Instead of buying a computer, I can send my program to
Heathfield, he will parse it, and then
calculate everything with paper and pencil.

You don't do sarcasm well.

C can be compiled into some intermediate form and interpreted. That's
not a common implementation strategy, but it's perfectly legal.

There are as many standards as operating systems and processors.
For combination of OS and processor, there is a standard.
For instance for the Solaris operating system, there is an ABI for
the x86/32 bits, SPARC 32 bits, Sparc 64 bits, and x86/64 bits.

Obviously all the assembly examples in the ABI are not machine code.
For instance in page 40 we have

epilogue:
popl %ebx /restore local register
popl %esi /restore local register
popl %edi /restore local register
leave /restore framepointer
ret /pop return address

This is not machine code since if I search for
"machine code" without bothering to read
the document I will not find it

Are you suggesting that unless Richard takes the time to sit down and
read the entire 258-page document, he shouldn't comment on it?

Somebody else made the extraordinary claim (or at least implied it)
that this document specifies how to generate machine code from C.
It's up to the person making that claim, or to those supporting the
claim, to justify it.

Yes, not difficult to "set you straight". Learn to read.

Good advice. Please follow it.

This is the way those standards are done. We are not speaking about ISO
here, nor about C89...

Nor are we speaking about standards for generating machine code, which
was the original question.

The answer to the original question is "No.". There may be some
standards that specify some aspects of the generated machine code for
some systems.

Kenny McCormack · Aug 23, 2008

Antoninus Twink said:
He is deliberately interpreting "standard" as "ISO standard", even
though it's obvious to every other poster in this thread, and indeed
every reasonable person, that that isn't what the OP meant. He knows
this full well, but will continue to play along telling everyone else
their idiots because they won't play by his silly rules.

Word games - SMOKE and MIRRORS. This is Heathfield's "contribution" to
clc.

On a good day...

Philipp Klaus Krause · Aug 23, 2008

Malcolm said:
C is fast because, generally, it is easy to write an optimising compiler
for it. Most C statements map quite naturally to only a few assembler
instructions.
However if you use an old C compiler that maybe doesn't make full use of
the computer's instruction set, the code will be slower than a competing
language run under a new environment. Theoretically you could also have
a bad C compiler, but in practise such compilers aren't released.

Well, even between today's compilers there can be huge differences. For
example http://sdcc.wiki.sourceforge.net/Philipp's+TODO+list has a
comparison of some current C compilers for the Z80 and there are code
size differences in excess of 300% for some functions.

Philipp

Flash Gordon · Aug 23, 2008

Malcolm McLean wrote, On 23/08/08 11:00:

C is fast because, generally, it is easy to write an optimising compiler
for it. Most C statements map quite naturally to only a few assembler
instructions.

Nowhere near as relevant as it used to be. Modern optimisers (or even
optimisers from 10 years ago) will drastically rearrange code so it
looks nothing like the original code.

However if you use an old C compiler that maybe doesn't make full use of
the computer's instruction set,

This is very common with modern compilers. Most people want their
application to run on more than just the latest processor in the series,
and if you support the previous version of a processor you cannot use
the instructions added in the new version!

the code will be slower than a competing
language run under a new environment. Theoretically you could also have
a bad C compiler, but in practise such compilers aren't released.

Oh, there are bad C compilers around, they just are not the most used
compilers.

Flash Gordon · Aug 23, 2008

Malcolm McLean wrote, On 23/08/08 13:04:

Algorithms are inherently portable.

I generally find algorithms *very* portable. So much so that I've ported
algorithms between different languages and processors without ever
having to change the algorithm.

Of course an O(N) library might be
available for one particular platform, and be too difficult to
reimplement with reasonable effort.

If the algorithm was implemented properly in the first place it is
normally extremely simple to port. The exception comes when the
algorithm assumes that more of some resource (e.g. memory) is available
than is available on the new target.

Hence there could be a choice
between a portable O(N^2) and a non-portable O(N) solution. But this is
rare.

OK, so give an example of an *algorithm* that is not portable and say
*what* prevents it from being portable.

Most people naturally assume that the reason for writing portable code
is to run the same code on different machines. Whilst this is a benefit,
the real reason is to separate the logic from the IO. Logic is portable,
IO tends to be very platform dependent. The struture of the program
benefits considerably from this.

I agree that this is a major additional benefit.

Flash Gordon · Aug 23, 2008

jacob navia wrote, On 23/08/08 18:13:

Look Heathfield, if you can't read it's nobody problem but yours;

Take for instance this C code. I add line numbers for reference

(1) int heathfield(short cbfalconer,double jkuyper)
(2) {
(3) }

Here we have in line 1 a function that returns an integer
and receives two arguments: a short and a double.

Now, in the *real* world, we have to specify in which machine
register the argument is returned, and in which machine
register the arguments are passed if at all.

Only *if* the compiler does not inline the function. An ABI does not
specify whether or not functions can be inlined in general (I'm thinking
static functions here), although obviously it implicitly defines this
for exported functions.

The C standard doesn't mention at all this, for obvious reasons.
An ABI then, is a standard thet allows programs compiled by
different vendors and even in different languages to cooperate
by means of a common interface.
Agreed.

The ABI mentioned above specifies how are arguments passed in
machine code to the function, and how the result is stored
when the function exits.

In line 2 we have an opening brace, i.e. a function prologue.
This is specified too, as well as line 3, the function epilogue.

Agreed for exported functions.

<snip>

I agree that an ABI *is* a standard, even if it is not an ISO (or ANSI
or some other national body) standard. However, it only specifies a
small part of the machine code generated for any given program. It does
not, for example, specify the instruction to use to increment a variable
by 1 or how to determine which local variables should be held in
registers, these and many other issues in translating C (or any other
language) to machine code are left up to the ingenuity of the compiler
writers.

Philipp Klaus Krause · Aug 24, 2008

Flash said:
If the algorithm was implemented properly in the first place it is
normally extremely simple to port. The exception comes when the
algorithm assumes that more of some resource (e.g. memory) is available
than is available on the new target.

OK, so give an example of an *algorithm* that is not portable and say
*what* prevents it from being portable.

LZW decompression isn't really portable to the ColecoVision since LZW
builds a dictionary at run-time, which thus has to reside in RAM. This
dictionary gets relatively large, so LZW isn't usable on the
ColecoVision with it's 1 KB of RAM.

There exist algorithms which are not portable to the Colossus computer,
since the instruction set of Colossus was not Turing complete.

Philipp

Chris Torek · Aug 24, 2008

Regarding "standards that tell us how c code has to be compiled
into machine code", in the original poster's words...

Richard Tobin said:

It appears to be a 258-page document.

(The one I am looking at is a 377-page document.)

Could you possibly help me out by telling me whereabouts in that
document I can find the bit that tells us how C code has to be
compiled into machine code?

It would be fun to be disingenuous, with the punchline being that C doesn't
have to be compiled into machine code in the first place, and therefore
there can't be a standard telling us it does have to be. But that's not
what I'm getting at here.

Nonetheless, it is quite true.

However, if one *has* chosen to produce x86 machine code from C
source code, and if one decides to be compatible with the System V
ABI, it does have information for this.

Secondly, I see no claim within the document itself to be a standard, and
the frontispiece doesn't bode well:

"Information in this document is subject to change without notice and does
not represent a commitment on the part of The Santa Cruz Operation, Inc."

If it's subject to change without notice, it's hard to see how it can
reasonably be called a standard.

(This wording exists at least in part because of lawyers.

)

It is not a *formal* standard, to be sure, but it works to some
extent as a de facto standard. For that matter, it might eventually
become a base document for a "real" standard (AMD, for instance,
have done some work to this end). As for *how* well it works...?

The version I am looking at (2nd hit on Google for "system v abi x86";
the first was an AMD draft with x86-64 support) has, for instance,
the mapping from C types to x86 architecture types on page 28,
in section 3-2. If we proceed to page 31, however, we find that
"double"s are to be aligned on 4-byte boundaries in structure types,
and most x86 C compilers that I have used align them on 8-byte
boundaries by default.

The description of struct-or-union-valued functions (pp. 41--42)
requires an "inefficient" call/return sequence. GCC, at least,
does not use this method for small structures by default.

While not required for C89, there are no prescriptions for "long
long", nor for complex types. To be a workable de facto standard
for today, it needs to address at least "long long" (in my opinion,
of course -- one big problem with de-facto standards, at least when
they are not derived by referring to "real" standards, is that they
tend to be poorly defined, especially around various "edges", but
this has advantages as well as drawbacks).

In short, it needs quite a bit more to turn it into a "real"
standard, along the same lines as K&R-1 needed to turn it into
C89. It is a decent starting point, though.

It is also worth adding two more points:

- The ABI says little (and should say little) about specific
instruction choices. That is, it is not a "standard for
turning C code into x86 machine code", but rather a "standard
for how x86 machine code should behave at particular interface
boundary points with other x86 machine code". Some of those
boundary points tie in directly to C source code constructs
(data types being one particularly relevant point); others
are looser and/or more-widely applicable (function-call
sequences can apply from C code to Fortran code, or from a
Python interpreter to compiled C code, and so on).

- In any case, an ABI for the x86 says nothing at all about
compiling C code to PowerPC or MIPS machine code. Nor, as
noted above, is it necessary to turn C code into machine code
at all. It is just the usual approach. Having made a decision
to compile C code to machine code, one must then choose an
existing standard, or make up one's own (or some mix of the
two). Once they have been "concretized" (concreted? realized?
whatever verb you like goes here), the various boundary points
one has chosen turn out to be more difficult to change than
some would like -- i.e., poor decisions can come back to haunt
you -- but because few if any of these "standards" are formalized,
they may also be less fixed and stable than others would like.
That is, if *you* try to follow some other compiler vendor's
methods (in order to be compatible with their compiler), you
can get burned anyway.

"That's the great thing about standards: there are so many to choose
from, and if you don't like any of those, you can just make up your
own!"

Walter Banks · Aug 24, 2008

Philipp said:
LZW decompression isn't really portable to the ColecoVision since LZW
builds a dictionary at run-time, which thus has to reside in RAM. This
dictionary gets relatively large, so LZW isn't usable on the
ColecoVision with it's 1 KB of RAM.

LZW was implemented on Coleco Vision during development. I can't
remember the typical of maximum RAM limits on Coleco Vision but
executables were loaded from tape to RAM and executed.

w..

Lassie · Aug 24, 2008

Keith Thompson said:
As usual, you choose to be insulting just because you're talking to
Richard Heathfield.

It's a 258-page document. Do you expect him, or anyone else, to read
the whole thing just to answer a question? Sheesh.

You dont have to read it to understand what kind of standard this is
especially if you have some practical experience in programming.

I presume that doesn't appear in the ABI.

it does. return value, parameters, function calls etc.

So the ABI specifies how C code interfaces with other code, and that
certainly affects the generated code in some circumstances. But
that's nowhere near what the OP was asking about, namely "standards
that tell us how c code has to be compiled into machine code".

It is, this is the way to go for that platform. You can view it as UB if a
compiler tries to be smart and does things completely different from other
compilers on the same machine. This code will not work together with other
binaries anymore. But I'm sure you wont view it like that, otherwise you
wouldnt be right anymore. Disagreement for the sake of argument.

The ABI presumably says that the first argument, if it's of certain
types, is passed by copying it to some specified register (for
example; I don't know the details). That specifies the *effect* of
the generated code. My guess is that it doesn't mandate any
particular machine instructions to accomplish that effect. I would
expect that two compilers that achieve the same effect using different
instruction sequences would both satisfy the ABI's requirements.

So the ABI says that a return value must be passed in the EAX register. How
many "instruction sequences" can you think of that write and read from that
register?

cr88192 · Aug 24, 2008

Boon said:
Evil empire? GCC conforms to the System V ABI as you point out.

the System V ABI is also an Evil Empire...

for example, consider the SysV / AMD64 ABI, with its elaborate and
complicated register-based calling convention, making it difficult to target
with all but the most elaborate of compilation techniques...

we should just be glad that they did not try to replace ELF with some
similarly complicated beast (such as, not only making ELF64 to replace
ELF32, but also merging it with DWARF at some fine level, making it
impossible to process the files absent having to deal head-on with DWARF's
encoding issues...).

OTOH, for 32-bit code, the conventions are far simpler, and much easier to
target.

now, of course, I "could" potentially target x86-64 using a hack similar to
how I handled "first-class function calls" on x86-64, basically building an
argument list in memory, and then using a signature string to pack/unpack
this buffer to/from the correct registers and stack positions (via wonky
assembler code), but this would suck performance wise.

I also lost motivation and so never did finish the newer SSA-based compiler
core (would probably be able to target it directly, but I lost motivation
since I primarily just develop on Win32 anyways...).

actually, I personally find the convention both annoying and silly, since
the design as such that, much of the time, internal overheads (trying to get
everything shuffled around, ... for any non-trivial functions) are likely to
exceed any such gains from using these regs and this layout anyways (or,
FFS, they could have at least left free-spots/holes on the stack, like in
the PowerPC conventions, ...).

this shuffling may well cost some, making the convention, in general, slower
than simply upscaling the 32-bit x86 cdecl convention would have been...

the sad thing is that Linux adopted this convention as is, and now people
have to live with it.

now, of course, it could also be possible to use multiple conventions
(allowing a simpler internal convention, and generate ugly patch stubs to
deal with interfacing them), but this introduces a good deal more issues
(performance, issues when using function pointers, ...), or a case similar
to in Win32, where many functions have to be declared with specific calling
conventions (__cdecl, __stdcall, ...).

on Linux this would mean customized system headers, or maybe wrapping all of
the native system headers with something like this:

in stdio.h:
extern __sysv {
#include <OS/include/stdio.h>
}

....

so, yes, maybe all this is an evil empire...

cr88192 · Aug 24, 2008

raashid bhatt said:
is there any standards that tell us how c code has to be compiled into
machine code

not really.

there are standards for C.

there are standards which tell how out machine code is to be structured
(various ABIs, instruction set architectures, ...).

but, between these extremes, people are allowed to do pretty much anything
they can make work...

for example, whether their compiler internally uses a stack-based process,
SSA, ... doesn't usually matter that much...

Keith Thompson · Aug 24, 2008

Lassie said:
You dont have to read it to understand what kind of standard this is
especially if you have some practical experience in programming.

it does. return value, parameters, function calls etc.

I meant that that specific code, with identifiers "heathfield",
"cbfalconer", and "jkuyper", doesn't appear in the ABI.

It is, this is the way to go for that platform. You can view it as UB
if a compiler tries to be smart and does things completely different
from other compilers on the same machine. This code will not work
together with other binaries anymore. But I'm sure you wont view it
like that, otherwise you wouldnt be right anymore. Disagreement for
the sake of argument.

Not at all.

My point, which you conveniently snipped, is that the ABI presumably
doesn't specify what machine code should be generated from C code in
all circumstances. It merely defines the behavior, and *maybe* the
actual machine code, for the interaction between the generated code
and other code in the system.

[snip]

Re-read the original question that started this thread. Do you really
think the OP was asking about ABIs?

Philipp Klaus Krause · Aug 24, 2008

Walter said:
LZW was implemented on Coleco Vision during development. I can't
remember the typical of maximum RAM limits on Coleco Vision but
executables were loaded from tape to RAM and executed.

w..

You're thinking of the Coleco Adam (the computer). The Coleco Adam has
enough RAM for LZW. The ColecoVision (the video game console) doesn't.
The ColecoVision executes directly from ROM chips in the cartridge.

Philipp

Lassie · Aug 24, 2008

Keith Thompson said:
Re-read the original question that started this thread. Do you really
think the OP was asking about ABIs?

The OP is not experienced in C, I believe he didnt know himself what he was
asking. But as far as his question concerns, ABI answers it pretty much yes.
It's not a 100% C to machine instruction standard no, but they cover a lot
of binary interfacing yes.

santosh · Aug 24, 2008

Bartc said:
ABIs are not necessarily tied to one any language which may explain
why C was not singled out (I haven't seen the document).

But, if you were tomorrow giving the task of translating C source to
binary x86 code for System V, you might suddenly find this document
becoming a lot more interesting.

Though it's called the System V ABI, it's actually applicable to all x86
machine code, whatever be the source language. It simply allows binary
level interoperability between programs compiled from different
languages and compilers.

Yes (or, No), C doesn't need translating to machine code, but usually
is, by some process or other. In that case the conventions for talking
and linking to other software can become important, if that's what the
OP was on about.

(Myself, I've mostly ignored any such conventions when I've done
similar work.)

Even when I write in assembler I usually stick to the main guidelines of
the ABI, since this allows me to make use of many use Standard library
functions.

Bartc · Aug 24, 2008

santosh said:
Though it's called the System V ABI, it's actually applicable to all x86
machine code, whatever be the source language. It simply allows binary
level interoperability between programs compiled from different
languages and compilers.

Even when I write in assembler I usually stick to the main guidelines of
the ABI, since this allows me to make use of many use Standard library
functions.

For x86-32, the guidelines can be summarised as follows: when calling
Windows functions: push the parameters right to left, and the callee adjusts
the stack on return. When calling C (I've found), do the same, except the
caller adjusts the stack. Oh, and when Windows (and, I think, C) calls
/your/ functions, you need to save most of the registers.

And... that's it. I've never had a need to look at any 258- or 377-page
documents (largely through ignorance of their existence).

x86-64 seems a lot more intricate, but even then, if I was still working at
that level, I think I could still do whatever I wanted (I used to push left
to right, with the last/only parameter passed via a register), /except/ when
calling, or being called by, an external function.

Antoninus Twink · Aug 24, 2008

Re-read the original question that started this thread. Do you really
think the OP was asking about ABIs?

To turn the tables, do you really think the OP was asking about ISO
Standards?

My own opinion is that the word "standard" as the OP used it was
deliberately vague, and ABIs fit it well enough as an answer. As you'll
see from my first followup to the OP, I interpreted his question
differently, but discussing ABIs is a perfectly valid response too to a
broad and not-very-well-specified question.

Walter Banks · Aug 24, 2008

Philipp said:
You're thinking of the Coleco Adam (the computer). The Coleco Adam has
enough RAM for LZW. The ColecoVision (the video game console) doesn't.
The ColecoVision executes directly from ROM chips in the cartridge.

Philipp

Your right I was thinking of ADAM. My apologies.

w..

jameskuyper · Aug 24, 2008

Keith said:
You're both right. Sometimes speed is simply part of the
requirements. (And sometimes it isn't.)

*Sometimes* getting the right answers too late isn't much better than
getting wrong answers.

Can you give any example of a situation where it's BETTER to get a
wrong answer early than to get the right answer late? By wrong, I
don't mean an inaccurate answer that is acceptably close to the true
value; I mean an answer which is unacceptably far away from the true
value.

Compilation of old source code.	0	Mar 3, 2022
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Beginner at c	0	Oct 5, 2023
Compilers c c++	3	May 23, 2023
Become a C++ programmer	4	Feb 21, 2023
Beginner - Chess at C	4	Oct 18, 2023
VHDL design flatten compilation	7	Jan 14, 2014
How to combine C++ with C#?	1	Jun 6, 2023

C Compilation..

Keith Thompson

Kenny McCormack

Philipp Klaus Krause

Flash Gordon

Flash Gordon

Flash Gordon

Philipp Klaus Krause

Chris Torek

Walter Banks

Lassie

cr88192

cr88192

Keith Thompson

Philipp Klaus Krause

Lassie

santosh

Bartc

Antoninus Twink

Walter Banks

jameskuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads