How do I make my own custom C compiler?

S

smnoff

Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?

Thanks.
 
A

Allan Adler

smnoff said:
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.
And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.
So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?

Speaking as someone who never wrote a compiler, I'd suggest:
(1) The Red Dragon Book
(2) Introduction to Compiler Construction with UNIX, by Axel T. Schreiner
and H.George Friedman, Jr.? They take you through the design and
implementation of a compiler for smallC. It was printed in 1985.
You might still be able to get a used copy.
 
M

Malcolm

smnoff said:
Ok, I am think I am a little more knowledgeable about C and pointers,
ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?

Thanks.
Hit my website

www.personal.leeds.ac.uk/~bgy1mm

and look at the MiniBasic section.

Writing a Basic interpreter is not trivial, but it is much easier than
writing a compiler.
Once you understand how to write an interpreter, you will have a good
foundation for moving on to a compiler.
 
K

Keith Thompson

smnoff said:
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?

I'd start with an existing open-source compiler, such as gcc or lcc.
 
G

Giannis Papadopoulos

Keith said:
I'd start with an existing open-source compiler, such as gcc or lcc.

Isn't a bit risky to start with such a behemoth?

--
one's freedom stops where others' begin

Giannis Papadopoulos
Computer and Communications Engineering dept. (CCED)
University of Thessaly
http://dop.freegr.net/
 
J

jacob navia

Giannis Papadopoulos a écrit :
Isn't a bit risky to start with such a behemoth?

gcc is impossible to understand unles you spend at least
2-3 YEARS working in it full time.

There are at most 20 people on the world that can understand
that compiler, and by understanding I mean that they are
able to modify something in it, something basic like
the parser for instance.

I tried something much simpler: to fix a bug.

Under windows, when a function was _stdcall, it would screw
the floating point stack.

I spent two weeks trying to fix it, learning how it works,
etc.

The first problem is to know RTL. You have to completely understand
RTL to understand the flow of things.

Second, the sheer size of the code base. There are 13-15 MB
of C source code to understand. And the code is mostly very sparsely
commented. Macros everywhere hide from you what is going on.

Accessing data structures is always done with macros, to easy
things when structure layout changes, but this makes it very
hard for newcomers to understand what the hell those macros
are DOING...

Third, you have to find your way in a mess of #ifdefs that defies
the imagination. gcc runs in many machines, and "portability"
has been taken to ridiculous extremes (the assembler, for instance).

This means that the same macro can have several interpretations
depending on which combination of machine/os you are running.

Fourth, like in any beast like this, you are bound to encounter
the horrible hacks that will kill you.

For instance I am trying to understand the way gcc generates the
DWARF tables for C++ exception handling, and I spent several
days trying to understand why the assembler instructions:

.byte 0x4
.long 1

would produce a single byte "0x41" instead of a byte 0x4 and
a 32 bit integer 1.

First, most gcc developers told me I was wrong and that was impossible.
I learned then, that most people in the mailing lists do not know what
they are talking about.

You have to find the guy that knows what he/she is talking about. It
took me a week to find him, and then he told me that the assembler,
when assembling the debug_frame section does not follow what is written
in the assembly directives but "optimizes" it, to save space.

Ahhhhhh.

I would have never found it, it just never crossed my mind...
Lesson learned: Be prepared to find all possible hacks.

ATTENTION IMPORTANT STUFF
-------------------------

Gcc is a very good compiler. It is a compiler that generates code for
MANY machines, and is therefore very complex. Nowhere I want to
imply with this message that its "crap" or "a bad compiler". I just
want to tell people here that is surely not something you
want to *start* with.

jacob
 
B

Ben Pfaff

Giannis Papadopoulos said:
Isn't a bit risky to start with such a behemoth?

Why? Hacking simple features into GCC is not that difficult.
I've done it a couple of times and so have my officemates.
 
O

osmium

Ben Pfaff said:
Why? Hacking simple features into GCC is not that difficult.
I've done it a couple of times and so have my officemates.

So how is that PhD coming? Is it still in the works or did it already
happen?
 
S

santosh

smnoff said:
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

By "fixing" C you create a language which can no longer be called C,
(as standardised by ISO).
So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?

lcc is said to be an easy compiler to customise and work with.
http://www.cs.princeton.edu/software/lcc/

You might also take a look at the following:
http://fabrice.bellard.free.fr/tcc/

In any case starting with a monster like gcc is not easy, unless you
already happen to have a familiarity with it's source.
 
B

Ben Pfaff

osmium said:
So how is that PhD coming? Is it still in the works or did it already
happen?

Still in the works. ETA December 2006, but hard to say with
accuracy...
 
R

Roberto Waltman

smnoff said:
So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?

Others gave you good advice already. This is a short bibliography you
may find useful, all these books have a practical approach, as opposed
to theoretical (Dragon book)

Holub: "Compiler Design in C"
Wirth: "Compiler Construction" (Free on-line. Oberon subset)
Pemberton & Daniels: "Pascal Implementation: The P4 Compiler and
Interpreter" (Free on-line)
Hendrix: "The Small-C Handbook" (C subset)
Brinch Hansen: "Brinch Hansen on Pascal Compilers" (Pascal subset)
Crenshaw: "Let's Build a Compiler" (Free articles on-line. Basic(?) )
Appel: "Modern Compiler Implementation in C"
Wirth & Gutknecht: "Project Oberon - The Design of an Operating System
and Compiler" (Free on-line)

I agree that gcc is *not* a good choice for a beginner compiler
writer. I would recommend starting with Wirth or Hansen's books.
They implement compilers for "toy" languages, using recursive descent
parsers, so there is no need, (at least at this stage) to learn about
additional parsing tools. LCC (a full C compiler) could follow.
Try also posting in comp.compilers.
 
G

Giannis Papadopoulos

Ben said:
Why? Hacking simple features into GCC is not that difficult.
I've done it a couple of times and so have my officemates.

Yes, but since this question is asked I'd expect that the OP does not
have the necessary experience to pursue such a quest.

--
one's freedom stops where others' begin

Giannis Papadopoulos
Computer and Communications Engineering dept. (CCED)
University of Thessaly
http://dop.freegr.net/
 
K

Keith Thompson

Giannis Papadopoulos said:
Yes, but since this question is asked I'd expect that the OP does not
have the necessary experience to pursue such a quest.

I'll concede that hacking gcc is probably not a good starting point
for a beginner. (I've never really looked at the gcc sources.)

As someone else mentioned, lcc is said to be reasonably easy to hack
-- and it even has its own newsgroup.
 
P

Peter Shaggy Haywood

Groovy hepcat smnoff was jivin' on Wed, 7 Jun 2006 22:49:37 -0500 in
comp.lang.c.
How do I make my own custom C compiler?'s a cool scene! Dig it!
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?

This would probably be best asked in comp.compilers. But anyhow...
Writing a C compiler is no mean feat. It is quite a complex language.
My advice is to start with an easier language.
Others have mentioned the "Dragon Book", also known as Compilers:
Principles, Techniques & Tools by Aho, Sethi & Ullman. This is
generally considered *the* book on compiler design, but is very dry
and technical. I'm currently reading it.
I highly recommend Compiler Construction by Wirth
(http://www.oberon.ethz.ch/books.html). It's an excellent work, and
quite hands-on. Wirth takes you through the construction of a compiler
for a subset of the Oberon language (similar to Pascal). I didn't
really feel fully confident about writing my own compiler until I read
this one. (Actually, it's an assembler I'm writing. I'll write
compilers for high level languages later.)
Crenshaw's series of articles entitled Let's Build a Compiler (URL
unavailable at this time) is aimed squarely at the rank beginner, and
is intended to get you writing compilers quickly. Unfortunately it has
its problems. For one thing the series was never finished. For another
thing it's rather haphazard, chopping and changing all over the place,
going over the same ground repeatedly, looking like he was making it
all up as he went along. There is much useful information in it,
though. This series takes you through the process of building a
compiler for a subset of a language the author made up, called KISS.

--

Dig the even newer still, yet more improved, sig!

http://alphalink.com.au/~phaywood/
"Ain't I'm a dog?" - Ronny Self, Ain't I'm a Dog, written by G. Sherry & W. Walker.
I know it's not "technically correct" English; but since when was rock & roll "technically correct"?
 
M

Morris Dovey

smnoff (in n7Nhg.5643$f76.4621@dukeread06) said:

| Ok, I am think I am a little more knowledgeable about C and
| pointers, ughh.
|
| And likewise, I want to fix C.....and not so much to make a C++ or
| Java or C# or even D like language.
|
| So, if I wanted to make my "custom" C compiler that's different
| that the current C99 or ANSI C, where would I start?

There are a several ways to approach the problem: modify the source
for an existing C compiler - or start from scratch and write the whole
thing in the language of your choosing.

Either way you'll learn much more than you expect. Some time back I
approached a similar goal by creating an intermediate compiler (which
compiled PL/C, a superset of BNF) - but by the time the PL/C compiler
was running cleanly, I'd lost interest in the original problem (mostly
because I'd learned enough that the original problem looked trivial.)

Go for it. I predict that you won't arrive at the originally intended
destination - but you will have learned a lot getting wherever you do
arrive. :)
 
A

Allan Adler

jacob navia said:
gcc is impossible to understand unles you spend at least 2-3 YEARS working
in it full time. [...]
The first problem is to know RTL. You have to completely understand
RTL to understand the flow of things.

I've already pointed out that I am not qualified to give advice about
this, but I will give some anyway.

I spent some time about 20 years ago trying to read some of the
source code for GCC and to configure it for a hypothetical machine.
I was singularly unqualified to do that and am no less so now.
However, it was very educational and I would be glad to have an
excuse to do something like that again. I do remember some of the
things I learned. I thought RTL was a lot of fun since it was
conceptually simple and fairly self-contained. Where I got into
trouble was in filling in the machine description files. To the
extent that it just described hardware and big- vs. little-
endianness, it was no problem, but there are places where you
have to give exact details about the calling sequence the operating
system uses to load a program on the target machine. I didn't know
enough about operating systems to guess what the calling sequence
would be on the machine I was trying to imagine.

Even if you fail to understand the code for GCC, it probably won't
do you any harm to try. You might find yourself going back to to the
source code again and again for guidance and inspiration as you learn
more about compilers in other ways.
Second, the sheer size of the code base. There are 13-15 MB
of C source code to understand. And the code is mostly very sparsely
commented. Macros everywhere hide from you what is going on.

One way of getting around that problem is to download an old version
of GCC, before it was ported to so many machines and before it supported
so many languages.
Accessing data structures is always done with macros, to easy
things when structure layout changes, but this makes it very
hard for newcomers to understand what the hell those macros
are DOING...

How about this: GCC is full of interesting data structures. You can
just take their definitions in isolation and try to figure out what
to do with them, even if their relevance to compilers is not immediately
apparent. Maybe the original code uses macros for greater efficiency,
but there are certain things you would always want to be able to do
with a given data structure and you can just write them yourself using
functions. Once you have a set of functions that will create or modify
or copy one of these data structures, or print one of them out in some
way, you can then try these macros out on them and see exactly what their
effects are, since you will know exactly what the data structure looks
like before you feed it to the macro.

In other words, as long as you are patient and don't mind studying the
code for its own sake, it seems to me that there are a lot of ways to
understand it. If you are in a hurry because you need to use the code
or modify it, or if you want to learn it quickly and then go write your
own, then the code appears as an obstacle and that might get in the way
of studying it. Just get what you can out of it and be glad that you got
that much.
Third, you have to find your way in a mess of #ifdefs that defies
the imagination. gcc runs in many machines, and "portability"
has been taken to ridiculous extremes (the assembler, for instance).
This means that the same macro can have several interpretations
depending on which combination of machine/os you are running.

I am not very good at GCC but I vaguely recall that it has a lot of options
that let you print out the results of various stages of processing a program.
For example, you can tell GCC to give you RTL output. Maybe if you compile
GCC with GCC and look at the output at the right stage (e.g. after cpp gets
through with it) you can get rid of all the #ifdefs by compiling with all
the things defined that need to be defined. As Jacob Navia points out,
that may not give you the meaning of a given macro on all possible platforms,
but for starters I think one would be happy to know what it means on one
platform.
 
S

spibou

Allan said:
I am not very good at GCC but I vaguely recall that it has a lot of options
that let you print out the results of various stages of processing a program.
For example, you can tell GCC to give you RTL output. Maybe if you compile
GCC with GCC and look at the output at the right stage (e.g. after cpp gets
through with it) you can get rid of all the #ifdefs by compiling with all
the things defined that need to be defined. As Jacob Navia points out,
that may not give you the meaning of a given macro on all possible platforms,
but for starters I think one would be happy to know what it means on one
platform.
--

You can get the output of the preprocessor using the -E option. But the
horrendous format will very likely make this output unreadable by a
human.

By the way , since noone has mentioned it , doesn't one need to be
fairly
proficient in the assembly of some processor before writing a compiler ?
 
M

Morris Dovey

(e-mail address removed) (in
(e-mail address removed)) said:

| By the way , since noone has mentioned it , doesn't one need to be
| fairly
| proficient in the assembly of some processor before writing a
| compiler ?

Only if the compiler is to output assembly code. :)

[ Imagine a compiler that translated it's source language into C, or
COBOL, or APL... ]
 
G

Giannis Papadopoulos

By the way , since noone has mentioned it , doesn't one need to be
fairly
proficient in the assembly of some processor before writing a compiler ?


If that one needs a full-feautered compiler yes. But he might stop his
compiler just before the creation of assembly language.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top