Does C implement the first C compiler itself?

L

lovecreatesbea...

K&R says the following in the preface to the first edition,

"... the C compiler, and ... are written in C."

I'm wondering, does it say even the first / original C compiler was
written in C?
 
C

Christopher Layne

K&R says the following in the preface to the first edition,

"... the C compiler, and ... are written in C."

I'm wondering, does it say even the first / original C compiler was
written in C?

Put some deductive reasoning/logic into it.
 
R

Roberto Waltman

Read "The Development of the C Language" at Dennis Ritchie home page (
http://cm.bell-labs.com/who/dmr )
Put some deductive reasoning/logic into it.

[OT] That may lead you to logically correct, but nevertheless false
conclusions. The first version of an Algol-like system programming
language for Burroughs mainframes (B5000?) was written in itself.
It was "hand compiled" into assembler to get the first running
version, and it was self-sustained from there.


Roberto Waltman

[ Please reply to the group,
return address is invalid ]
 
C

Christopher Layne

Roberto said:
[OT] That may lead you to logically correct, but nevertheless false
conclusions. The first version of an Algol-like system programming
language for Burroughs mainframes (B5000?) was written in itself.
It was "hand compiled" into assembler to get the first running
version, and it was self-sustained from there.

You just contradicted yourself. That was my point.

The first version was written in another language.
 
D

Dave Hansen

Roberto said:
[OT] That may lead you to logically correct, but nevertheless false
conclusions. The first version of an Algol-like system programming
language for Burroughs mainframes (B5000?) was written in itself.
It was "hand compiled" into assembler to get the first running
version, and it was self-sustained from there.

You just contradicted yourself. That was my point.

The first version was written in another language.


Not so.

The compiler was _written_ in the target language. It was
_translated_ to assembly.

This is a very common process referred to as "bootstrapping" a
compiler. It's a very useful exercise, in that it proves (or
disproves) the utility of the language with a non-trivial application,
and provides an immediate base of test code. Compilers tend to
require a wide range of algorithmic techniques and a variety of data
structures. If your language can implement a compiler, it's likely to
be useful for most other tasks as well.

Though these days, the intermediate language would more likely be C
than assembly, the process is essentially unchanged.

I guess you could argue that the first version of the compiler was
wetware writ by the finger of God, but that probably isn't very
helpful.

Regards,

-=Dave
 
D

David T. Ashley

K&R says the following in the preface to the first edition,

"... the C compiler, and ... are written in C."

I'm wondering, does it say even the first / original C compiler was
written in C?

Surprisingly, I understand the thought process behind this frivolous
question.

Here is another related question:

If one has the C source code for an optimizing compiler, the compiles the
compiler, creating a new compiler, and repeats this process over many
iterations, will the compiler executable get smaller and smaller each time?

(This is no more and no less frivolous than the original.)
 
C

Chris McDonald

David T. Ashley said:
If one has the C source code for an optimizing compiler, the compiles the
compiler, creating a new compiler, and repeats this process over many
iterations, will the compiler executable get smaller and smaller each time?

But there's no guarantees that there's optimizations to be made...
 
K

Keith Thompson

David T. Ashley said:
Surprisingly, I understand the thought process behind this frivolous
question.

Here is another related question:

If one has the C source code for an optimizing compiler, the compiles the
compiler, creating a new compiler, and repeats this process over many
iterations, will the compiler executable get smaller and smaller each time?

No. An optimizer (if it's working properly) makes the target code run
more quickly; it doesn't change its behavior.

Suppose you have C sources for two C compilers, compiler1 and
compiler2. compiler1 includes a very clever optimizer; compiler2
doesn't. Use compiler2 to compile itself, yielding compiler2a.exe,
which compiles slowly. Use compiler1 to compile compiler2, yielding
compiler2b.exe, which compiles more quickly. But compiler2a.exe and
compiler2b.exe work the same way; the code they generate is identical.
 
R

Roberto Waltman

David T. Ashley said:
in message

Surprisingly, I understand the thought process behind this frivolous
question.

Why frivolous?
Here is another related question:

If one has the C source code for an optimizing compiler, the compiles the
compiler, creating a new compiler, and repeats this process over many
iterations, will the compiler executable get smaller and smaller each time?

[OT] No, (at least not as worded) A simple test that every
self-compiling compiler should pass, is that when compiling its own
sources it should produce exact duplicates of itself.

If you then change the sources improving the optimization algorithms,
the compiler should produce a faster and/or smaller version of itself
reaching again a steady state after a few iterations.

To expect the cycle to continue for ever is akin to expect a file
compression algorithm to always reduce a file's size, which leads to
the conclusion that all files can be compressed into a single bit.

I could not find a reference now, but I recall reading about Niklaus
Wirth using this process to evaluate optimizations for his Pascal
compilers.

An optimization that would result in a faster or smaller compiler was
integrated in the code, but increasing the complexity of the compiler
with additional optimization steps that could improve code in some
marginal cases, but would not have a significant effect in the Pascal
compiler itself, was not considered a worthwhile trade-off.

Roberto Waltman

[ Please reply to the group,
return address is invalid ]
 
S

Stephen Sprunk

K&R says the following in the preface to the first edition,

"... the C compiler, and ... are written in C."

I'm wondering, does it say even the first / original C compiler was
written in C?

That would leave you with a rather serious chicken-and-egg problem.

The very, very first C compiler was written in B. It was subsequently
rewritten in C (which wasn't a major change) and recompiled with the first
compiler.

DMR has a web site that explains the history of how C (and its original
compiler) evolved into existence.

S
 
D

Dave Hansen

If one has the C source code for an optimizing compiler, the compiles the
compiler, creating a new compiler, and repeats this process over many
iterations, will the compiler executable get smaller and smaller each time?

Umm. The generated code should always be the same. Unless there's a
bug in the optimizer. Optimization is easy if you don't have to get
the right answer.

But you reminded me of something I heard years ago: It is said that
every program contains at least one bug, and can be shortened by one
instruction. Therefore, by induction, every program can be reduced to
a single instruction, but it will be wrong.

Regards,

-=Dave
 
F

Flash Gordon

Dave Hansen wrote, On 14/02/07 19:51:
Umm. The generated code should always be the same. Unless there's a
bug in the optimizer. Optimization is easy if you don't have to get
the right answer.
Indeed.

But you reminded me of something I heard years ago: It is said that
every program contains at least one bug, and can be shortened by one
instruction. Therefore, by induction, every program can be reduced to
a single instruction, but it will be wrong.

The versions I have heard have an additional clause which removes the
problem. The additional clause being something like, "of significant
complexity." It is important because of useful programs such as:

int main()
{
return 0;
}

Note, I've deliberately not specified a void parameter list to make it
compatible with what I know of pre-standard C as well as C89 & C99.
 
C

christian.bau

"David T. Ashley" wrote:
[OT] No, (at least not as worded) A simple test that every
self-compiling compiler should pass, is that when compiling its own
sources it should produce exact duplicates of itself.

Not necessarily.

The order in which subexpressions are evaluated is compiler-dependent.
For example, if we have a function

int f (void* p);

then it is compiler-dependent whether in the expression

f (right) + f (left)

f (right) is called first or f (left) is called first.

Now assume that your compiler A has a function codegen (), and when
the compiler wishes to generate code for two function calls in
unspecified order, with the results discarded, it calls

codegen (right_tree) + codegen (left_tree)

Lets say we compile this with a compiler X, which calls the left
function first. So the assembler code in the compiled code is

codegen (right_tree)
codegen (left_tree)

Compile the compiler using the compiler compiled by X. The assembler
code will now be

codegen (left_tree)
codegen (right_tree).

Compile the compiler with the generated code. The assembler code will
be

codegen (right_tree)
codegen (left_tree)

and so on. All the even versions are different from all the odd
versions.
 
R

Roberto Waltman

christian.bau said:
Roberto said:
[OT] No, (at least not as worded) A simple test that every
self-compiling compiler should pass, is that when compiling its own
sources it should produce exact duplicates of itself.

Not necessarily.

The order in which subexpressions are evaluated is compiler-dependent.
For example, if we have a function

int f (void* p);

then it is compiler-dependent whether in the expression

f (right) + f (left)

f (right) is called first or f (left) is called first.

Now assume that your compiler A has a function codegen (), and when
the compiler wishes to generate code for two function calls in
unspecified order, with the results discarded, it calls

codegen (right_tree) + codegen (left_tree)

Lets say we compile this with a compiler X, which calls the left
function first. So the assembler code in the compiled code is

codegen (right_tree)
codegen (left_tree)

Compile the compiler using the compiler compiled by X. The assembler
code will now be

codegen (left_tree)
codegen (right_tree).

Compile the compiler with the generated code. The assembler code will
be

codegen (right_tree)
codegen (left_tree)

and so on. All the even versions are different from all the odd
versions.

OK, I stand corrected. So the generated compilers fall into a cycle,
of length 2 in this case, (easily expanded to more.)
Is this always the case? (To have cyclic versions, that is.)

Roberto Waltman

[ Please reply to the group,
return address is invalid ]
 
M

Malcolm McLean

K&R says the following in the preface to the first edition,

"... the C compiler, and ... are written in C."

I'm wondering, does it say even the first / original C compiler was
written in C?
Yes.
So how did it compile itself? The usual way is to write a little
interpreter. Interpreters are far easier to write than compilers. See my
homepage for one in a single reasonable-length file.

The interpreter runs the compiler source on itself and creates the first
native compiler.
 
K

Keith Thompson

Malcolm McLean said:

No, it doesn't say that. Somebody in this thread said that the first
C compiler was written in B. I just scanned
<http://cm.bell-labs.com/cm/cs/who/dmr/chist.html> for the word
"compiler"; I didn't see a clear statement to that effect. It appears
that there was an evolutionary path from B to C, with "NB" (New B)
somewhere in the middle. I think that the compiler and the language
evolved in parallel, with new language features being implemented by
the compiler, then used in the next version of the compiler.
So how did it compile itself? The usual way is to write a little
interpreter. Interpreters are far easier to write than compilers. See my
homepage for one in a single reasonable-length file.

The interpreter runs the compiler source on itself and creates the first
native compiler.

It could have been done that way, but as far as I can tell it wasn't.
 
D

David T. Ashley

Dave Hansen said:
Optimization is easy if you don't have to get
the right answer.

Regrettably, some compiler vendors have this integrated into their business
model. I was once told by tech support "Don't use optimizaton level 5 until
we get the bugs fixed".
 
G

Guest

Roberto said:
christian.bau said:
Roberto said:
[OT] No, (at least not as worded) A simple test that every
self-compiling compiler should pass, is that when compiling its own
sources it should produce exact duplicates of itself.

Not necessarily.

The order in which subexpressions are evaluated is compiler-dependent.
For example, if we have a function

int f (void* p);

then it is compiler-dependent whether in the expression

f (right) + f (left)

f (right) is called first or f (left) is called first.

Now assume that your compiler A has a function codegen (), and when
the compiler wishes to generate code for two function calls in
unspecified order, with the results discarded, it calls

codegen (right_tree) + codegen (left_tree)

Lets say we compile this with a compiler X, which calls the left
function first. So the assembler code in the compiled code is

codegen (right_tree)
codegen (left_tree)

Compile the compiler using the compiler compiled by X. The assembler
code will now be

codegen (left_tree)
codegen (right_tree).

Compile the compiler with the generated code. The assembler code will
be

codegen (right_tree)
codegen (left_tree)

and so on. All the even versions are different from all the odd
versions.

OK, I stand corrected. So the generated compilers fall into a cycle,
of length 2 in this case, (easily expanded to more.)
Is this always the case? (To have cyclic versions, that is.)

No, this is not always the case. tinycc, for example, puts garbage in
its generated output. Its x86 long doubles are 96 bits, with only 80
value bits. When storing a long double in the generated object file,
all 96 bits are copied, but the 16 padding bits are never explicitly
initialised, and are for practical purposes completely random. You
can't even assume that one specific binary of the compiler with one
specific set of compiler options will always generate bitwise-
identical object code.

(I hope what I mean by "value bit" and "padding bit" is clear enough,
even though it does not exactly match the meanings of the same terms
as used by the standard.)
 
R

Roberto Waltman

Harald van D?k said:
...
No, this is not always the case. tinycc, for example, puts garbage in
its generated output. Its x86 long doubles are 96 bits, with only 80
value bits. When storing a long double in the generated object file,
all 96 bits are copied, but the 16 padding bits are never explicitly
initialised, and are for practical purposes completely random. You
can't even assume that one specific binary of the compiler with one
specific set of compiler options will always generate bitwise-
identical object code.

[OT] Thanks, after I posted my last question I realized this is
possible, (thinking of uninitialized structs padding.)
There was a recent thread in comp.software-eng about verifying that
two source files are equivalent after "pretty-printing" one.
The OP resorted to compiling them and comparing the resulting object
files, it is clear now this is not guaranteed to work.



Roberto Waltman

[ Please reply to the group,
return address is invalid ]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top