How do I make my own custom C compiler?

Discussion in 'C Programming' started by smnoff, Jun 8, 2006.

  1. smnoff

    smnoff Guest

    Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

    And likewise, I want to fix C.....and not so much to make a C++ or Java or
    C# or even D like language.

    So, if I wanted to make my "custom" C compiler that's different that the
    current C99 or ANSI C, where would I start?

    Thanks.
     
    smnoff, Jun 8, 2006
    #1
    1. Advertising

  2. smnoff

    Allan Adler Guest

    "smnoff" <> writes:

    > Ok, I am think I am a little more knowledgeable about C and pointers, ughh.
    > And likewise, I want to fix C.....and not so much to make a C++ or Java or
    > C# or even D like language.
    > So, if I wanted to make my "custom" C compiler that's different that the
    > current C99 or ANSI C, where would I start?


    Speaking as someone who never wrote a compiler, I'd suggest:
    (1) The Red Dragon Book
    (2) Introduction to Compiler Construction with UNIX, by Axel T. Schreiner
    and H.George Friedman, Jr.? They take you through the design and
    implementation of a compiler for smallC. It was printed in 1985.
    You might still be able to get a used copy.
    --
    Ignorantly,
    Allan Adler <>
    * Disclaimer: I am a guest and *not* a member of the MIT CSAIL. My actions and
    * comments do not reflect in any way on MIT. Also, I am nowhere near Boston.
     
    Allan Adler, Jun 8, 2006
    #2
    1. Advertising

  3. smnoff

    Malcolm Guest

    "smnoff" <> wrote in message
    news:n7Nhg.5643$f76.4621@dukeread06...
    > Ok, I am think I am a little more knowledgeable about C and pointers,
    > ughh.
    >
    > And likewise, I want to fix C.....and not so much to make a C++ or Java or
    > C# or even D like language.
    >
    > So, if I wanted to make my "custom" C compiler that's different that the
    > current C99 or ANSI C, where would I start?
    >
    > Thanks.
    >

    Hit my website

    www.personal.leeds.ac.uk/~bgy1mm

    and look at the MiniBasic section.

    Writing a Basic interpreter is not trivial, but it is much easier than
    writing a compiler.
    Once you understand how to write an interpreter, you will have a good
    foundation for moving on to a compiler.
     
    Malcolm, Jun 8, 2006
    #3
  4. "smnoff" <> writes:
    > Ok, I am think I am a little more knowledgeable about C and pointers, ughh.
    >
    > And likewise, I want to fix C.....and not so much to make a C++ or Java or
    > C# or even D like language.
    >
    > So, if I wanted to make my "custom" C compiler that's different that the
    > current C99 or ANSI C, where would I start?


    I'd start with an existing open-source compiler, such as gcc or lcc.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Jun 8, 2006
    #4
  5. Keith Thompson wrote:
    > "smnoff" <> writes:
    >> Ok, I am think I am a little more knowledgeable about C and pointers, ughh.
    >>
    >> And likewise, I want to fix C.....and not so much to make a C++ or Java or
    >> C# or even D like language.
    >>
    >> So, if I wanted to make my "custom" C compiler that's different that the
    >> current C99 or ANSI C, where would I start?

    >
    > I'd start with an existing open-source compiler, such as gcc or lcc.
    >


    Isn't a bit risky to start with such a behemoth?

    --
    one's freedom stops where others' begin

    Giannis Papadopoulos
    Computer and Communications Engineering dept. (CCED)
    University of Thessaly
    http://dop.freegr.net/
     
    Giannis Papadopoulos, Jun 8, 2006
    #5
  6. smnoff

    jacob navia Guest

    Giannis Papadopoulos a écrit :
    > Keith Thompson wrote:
    >
    >>"smnoff" <> writes:
    >>
    >>>Ok, I am think I am a little more knowledgeable about C and pointers, ughh.
    >>>
    >>>And likewise, I want to fix C.....and not so much to make a C++ or Java or
    >>>C# or even D like language.
    >>>
    >>>So, if I wanted to make my "custom" C compiler that's different that the
    >>>current C99 or ANSI C, where would I start?

    >>
    >>I'd start with an existing open-source compiler, such as gcc or lcc.
    >>

    >
    >
    > Isn't a bit risky to start with such a behemoth?
    >


    gcc is impossible to understand unles you spend at least
    2-3 YEARS working in it full time.

    There are at most 20 people on the world that can understand
    that compiler, and by understanding I mean that they are
    able to modify something in it, something basic like
    the parser for instance.

    I tried something much simpler: to fix a bug.

    Under windows, when a function was _stdcall, it would screw
    the floating point stack.

    I spent two weeks trying to fix it, learning how it works,
    etc.

    The first problem is to know RTL. You have to completely understand
    RTL to understand the flow of things.

    Second, the sheer size of the code base. There are 13-15 MB
    of C source code to understand. And the code is mostly very sparsely
    commented. Macros everywhere hide from you what is going on.

    Accessing data structures is always done with macros, to easy
    things when structure layout changes, but this makes it very
    hard for newcomers to understand what the hell those macros
    are DOING...

    Third, you have to find your way in a mess of #ifdefs that defies
    the imagination. gcc runs in many machines, and "portability"
    has been taken to ridiculous extremes (the assembler, for instance).

    This means that the same macro can have several interpretations
    depending on which combination of machine/os you are running.

    Fourth, like in any beast like this, you are bound to encounter
    the horrible hacks that will kill you.

    For instance I am trying to understand the way gcc generates the
    DWARF tables for C++ exception handling, and I spent several
    days trying to understand why the assembler instructions:

    .byte 0x4
    .long 1

    would produce a single byte "0x41" instead of a byte 0x4 and
    a 32 bit integer 1.

    First, most gcc developers told me I was wrong and that was impossible.
    I learned then, that most people in the mailing lists do not know what
    they are talking about.

    You have to find the guy that knows what he/she is talking about. It
    took me a week to find him, and then he told me that the assembler,
    when assembling the debug_frame section does not follow what is written
    in the assembly directives but "optimizes" it, to save space.

    Ahhhhhh.

    I would have never found it, it just never crossed my mind...
    Lesson learned: Be prepared to find all possible hacks.

    ATTENTION IMPORTANT STUFF
    -------------------------

    Gcc is a very good compiler. It is a compiler that generates code for
    MANY machines, and is therefore very complex. Nowhere I want to
    imply with this message that its "crap" or "a bad compiler". I just
    want to tell people here that is surely not something you
    want to *start* with.

    jacob
     
    jacob navia, Jun 8, 2006
    #6
  7. smnoff

    Ben Pfaff Guest

    Giannis Papadopoulos <> writes:

    > Keith Thompson wrote:
    >> "smnoff" <> writes:
    >>> So, if I wanted to make my "custom" C compiler that's different that the
    >>> current C99 or ANSI C, where would I start?

    >>
    >> I'd start with an existing open-source compiler, such as gcc or lcc.

    >
    > Isn't a bit risky to start with such a behemoth?


    Why? Hacking simple features into GCC is not that difficult.
    I've done it a couple of times and so have my officemates.
    --
    int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
    \n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
    );while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p\
    );}return 0;}
     
    Ben Pfaff, Jun 8, 2006
    #7
  8. smnoff

    osmium Guest

    "Ben Pfaff" writes:

    >> Isn't a bit risky to start with such a behemoth?

    >
    > Why? Hacking simple features into GCC is not that difficult.
    > I've done it a couple of times and so have my officemates.


    So how is that PhD coming? Is it still in the works or did it already
    happen?
     
    osmium, Jun 8, 2006
    #8
  9. smnoff

    santosh Guest

    Re: How do I make my own custom C compiler?

    smnoff wrote:
    > Ok, I am think I am a little more knowledgeable about C and pointers, ughh.
    >
    > And likewise, I want to fix C.....and not so much to make a C++ or Java or
    > C# or even D like language.


    By "fixing" C you create a language which can no longer be called C,
    (as standardised by ISO).

    > So, if I wanted to make my "custom" C compiler that's different that the
    > current C99 or ANSI C, where would I start?


    lcc is said to be an easy compiler to customise and work with.
    http://www.cs.princeton.edu/software/lcc/

    You might also take a look at the following:
    http://fabrice.bellard.free.fr/tcc/

    In any case starting with a monster like gcc is not easy, unless you
    already happen to have a familiarity with it's source.
     
    santosh, Jun 8, 2006
    #9
  10. smnoff

    Ben Pfaff Guest

    "osmium" <> writes:

    >> Why? Hacking simple features into GCC is not that difficult.
    >> I've done it a couple of times and so have my officemates.

    >
    > So how is that PhD coming? Is it still in the works or did it already
    > happen?


    Still in the works. ETA December 2006, but hard to say with
    accuracy...
    --
    int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
    \n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
    );while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p\
    );}return 0;}
     
    Ben Pfaff, Jun 8, 2006
    #10
  11. "smnoff" <> wrote:
    >So, if I wanted to make my "custom" C compiler that's different that the
    >current C99 or ANSI C, where would I start?


    Others gave you good advice already. This is a short bibliography you
    may find useful, all these books have a practical approach, as opposed
    to theoretical (Dragon book)

    Holub: "Compiler Design in C"
    Wirth: "Compiler Construction" (Free on-line. Oberon subset)
    Pemberton & Daniels: "Pascal Implementation: The P4 Compiler and
    Interpreter" (Free on-line)
    Hendrix: "The Small-C Handbook" (C subset)
    Brinch Hansen: "Brinch Hansen on Pascal Compilers" (Pascal subset)
    Crenshaw: "Let's Build a Compiler" (Free articles on-line. Basic(?) )
    Appel: "Modern Compiler Implementation in C"
    Wirth & Gutknecht: "Project Oberon - The Design of an Operating System
    and Compiler" (Free on-line)

    I agree that gcc is *not* a good choice for a beginner compiler
    writer. I would recommend starting with Wirth or Hansen's books.
    They implement compilers for "toy" languages, using recursive descent
    parsers, so there is no need, (at least at this stage) to learn about
    additional parsing tools. LCC (a full C compiler) could follow.
    Try also posting in comp.compilers.
     
    Roberto Waltman, Jun 8, 2006
    #11
  12. Ben Pfaff wrote:
    > Giannis Papadopoulos <> writes:
    >
    >> Keith Thompson wrote:
    >>> "smnoff" <> writes:
    >>>> So, if I wanted to make my "custom" C compiler that's different that the
    >>>> current C99 or ANSI C, where would I start?
    >>> I'd start with an existing open-source compiler, such as gcc or lcc.

    >> Isn't a bit risky to start with such a behemoth?

    >
    > Why? Hacking simple features into GCC is not that difficult.
    > I've done it a couple of times and so have my officemates.


    Yes, but since this question is asked I'd expect that the OP does not
    have the necessary experience to pursue such a quest.

    --
    one's freedom stops where others' begin

    Giannis Papadopoulos
    Computer and Communications Engineering dept. (CCED)
    University of Thessaly
    http://dop.freegr.net/
     
    Giannis Papadopoulos, Jun 8, 2006
    #12
  13. Giannis Papadopoulos <> writes:
    > Ben Pfaff wrote:
    >> Giannis Papadopoulos <> writes:
    >>
    >>> Keith Thompson wrote:
    >>>> "smnoff" <> writes:
    >>>>> So, if I wanted to make my "custom" C compiler that's different that the
    >>>>> current C99 or ANSI C, where would I start?
    >>>> I'd start with an existing open-source compiler, such as gcc or lcc.
    >>> Isn't a bit risky to start with such a behemoth?

    >>
    >> Why? Hacking simple features into GCC is not that difficult.
    >> I've done it a couple of times and so have my officemates.

    >
    > Yes, but since this question is asked I'd expect that the OP does not
    > have the necessary experience to pursue such a quest.


    I'll concede that hacking gcc is probably not a good starting point
    for a beginner. (I've never really looked at the gcc sources.)

    As someone else mentioned, lcc is said to be reasonably easy to hack
    -- and it even has its own newsgroup.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Jun 8, 2006
    #13
  14. Groovy hepcat smnoff was jivin' on Wed, 7 Jun 2006 22:49:37 -0500 in
    comp.lang.c.
    How do I make my own custom C compiler?'s a cool scene! Dig it!

    >Ok, I am think I am a little more knowledgeable about C and pointers, ughh.
    >
    >And likewise, I want to fix C.....and not so much to make a C++ or Java or
    >C# or even D like language.
    >
    >So, if I wanted to make my "custom" C compiler that's different that the
    >current C99 or ANSI C, where would I start?


    This would probably be best asked in comp.compilers. But anyhow...
    Writing a C compiler is no mean feat. It is quite a complex language.
    My advice is to start with an easier language.
    Others have mentioned the "Dragon Book", also known as Compilers:
    Principles, Techniques & Tools by Aho, Sethi & Ullman. This is
    generally considered *the* book on compiler design, but is very dry
    and technical. I'm currently reading it.
    I highly recommend Compiler Construction by Wirth
    (http://www.oberon.ethz.ch/books.html). It's an excellent work, and
    quite hands-on. Wirth takes you through the construction of a compiler
    for a subset of the Oberon language (similar to Pascal). I didn't
    really feel fully confident about writing my own compiler until I read
    this one. (Actually, it's an assembler I'm writing. I'll write
    compilers for high level languages later.)
    Crenshaw's series of articles entitled Let's Build a Compiler (URL
    unavailable at this time) is aimed squarely at the rank beginner, and
    is intended to get you writing compilers quickly. Unfortunately it has
    its problems. For one thing the series was never finished. For another
    thing it's rather haphazard, chopping and changing all over the place,
    going over the same ground repeatedly, looking like he was making it
    all up as he went along. There is much useful information in it,
    though. This series takes you through the process of building a
    compiler for a subset of a language the author made up, called KISS.

    --

    Dig the even newer still, yet more improved, sig!

    http://alphalink.com.au/~phaywood/
    "Ain't I'm a dog?" - Ronny Self, Ain't I'm a Dog, written by G. Sherry & W. Walker.
    I know it's not "technically correct" English; but since when was rock & roll "technically correct"?
     
    Peter Shaggy Haywood, Jun 11, 2006
    #14
  15. smnoff

    Morris Dovey Guest

    smnoff (in n7Nhg.5643$f76.4621@dukeread06) said:

    | Ok, I am think I am a little more knowledgeable about C and
    | pointers, ughh.
    |
    | And likewise, I want to fix C.....and not so much to make a C++ or
    | Java or C# or even D like language.
    |
    | So, if I wanted to make my "custom" C compiler that's different
    | that the current C99 or ANSI C, where would I start?

    There are a several ways to approach the problem: modify the source
    for an existing C compiler - or start from scratch and write the whole
    thing in the language of your choosing.

    Either way you'll learn much more than you expect. Some time back I
    approached a similar goal by creating an intermediate compiler (which
    compiled PL/C, a superset of BNF) - but by the time the PL/C compiler
    was running cleanly, I'd lost interest in the original problem (mostly
    because I'd learned enough that the original problem looked trivial.)

    Go for it. I predict that you won't arrive at the originally intended
    destination - but you will have learned a lot getting wherever you do
    arrive. :)


    --
    Morris Dovey
    DeSoto Solar
    DeSoto, Iowa USA
    http://www.iedu.com/DeSoto
     
    Morris Dovey, Jun 11, 2006
    #15
  16. smnoff

    Allan Adler Guest

    jacob navia <> writes:

    > gcc is impossible to understand unles you spend at least 2-3 YEARS working
    > in it full time. [...]
    > The first problem is to know RTL. You have to completely understand
    > RTL to understand the flow of things.


    I've already pointed out that I am not qualified to give advice about
    this, but I will give some anyway.

    I spent some time about 20 years ago trying to read some of the
    source code for GCC and to configure it for a hypothetical machine.
    I was singularly unqualified to do that and am no less so now.
    However, it was very educational and I would be glad to have an
    excuse to do something like that again. I do remember some of the
    things I learned. I thought RTL was a lot of fun since it was
    conceptually simple and fairly self-contained. Where I got into
    trouble was in filling in the machine description files. To the
    extent that it just described hardware and big- vs. little-
    endianness, it was no problem, but there are places where you
    have to give exact details about the calling sequence the operating
    system uses to load a program on the target machine. I didn't know
    enough about operating systems to guess what the calling sequence
    would be on the machine I was trying to imagine.

    Even if you fail to understand the code for GCC, it probably won't
    do you any harm to try. You might find yourself going back to to the
    source code again and again for guidance and inspiration as you learn
    more about compilers in other ways.

    > Second, the sheer size of the code base. There are 13-15 MB
    > of C source code to understand. And the code is mostly very sparsely
    > commented. Macros everywhere hide from you what is going on.


    One way of getting around that problem is to download an old version
    of GCC, before it was ported to so many machines and before it supported
    so many languages.

    > Accessing data structures is always done with macros, to easy
    > things when structure layout changes, but this makes it very
    > hard for newcomers to understand what the hell those macros
    > are DOING...


    How about this: GCC is full of interesting data structures. You can
    just take their definitions in isolation and try to figure out what
    to do with them, even if their relevance to compilers is not immediately
    apparent. Maybe the original code uses macros for greater efficiency,
    but there are certain things you would always want to be able to do
    with a given data structure and you can just write them yourself using
    functions. Once you have a set of functions that will create or modify
    or copy one of these data structures, or print one of them out in some
    way, you can then try these macros out on them and see exactly what their
    effects are, since you will know exactly what the data structure looks
    like before you feed it to the macro.

    In other words, as long as you are patient and don't mind studying the
    code for its own sake, it seems to me that there are a lot of ways to
    understand it. If you are in a hurry because you need to use the code
    or modify it, or if you want to learn it quickly and then go write your
    own, then the code appears as an obstacle and that might get in the way
    of studying it. Just get what you can out of it and be glad that you got
    that much.

    > Third, you have to find your way in a mess of #ifdefs that defies
    > the imagination. gcc runs in many machines, and "portability"
    > has been taken to ridiculous extremes (the assembler, for instance).
    > This means that the same macro can have several interpretations
    > depending on which combination of machine/os you are running.


    I am not very good at GCC but I vaguely recall that it has a lot of options
    that let you print out the results of various stages of processing a program.
    For example, you can tell GCC to give you RTL output. Maybe if you compile
    GCC with GCC and look at the output at the right stage (e.g. after cpp gets
    through with it) you can get rid of all the #ifdefs by compiling with all
    the things defined that need to be defined. As Jacob Navia points out,
    that may not give you the meaning of a given macro on all possible platforms,
    but for starters I think one would be happy to know what it means on one
    platform.
    --
    Ignorantly,
    Allan Adler <>
    * Disclaimer: I am a guest and *not* a member of the MIT CSAIL. My actions and
    * comments do not reflect in any way on MIT. Also, I am nowhere near Boston.
     
    Allan Adler, Jun 11, 2006
    #16
  17. smnoff

    Guest

    Re: How do I make my own custom C compiler?

    Allan Adler wrote:

    > > Third, you have to find your way in a mess of #ifdefs that defies
    > > the imagination. gcc runs in many machines, and "portability"
    > > has been taken to ridiculous extremes (the assembler, for instance).
    > > This means that the same macro can have several interpretations
    > > depending on which combination of machine/os you are running.

    >
    > I am not very good at GCC but I vaguely recall that it has a lot of options
    > that let you print out the results of various stages of processing a program.
    > For example, you can tell GCC to give you RTL output. Maybe if you compile
    > GCC with GCC and look at the output at the right stage (e.g. after cpp gets
    > through with it) you can get rid of all the #ifdefs by compiling with all
    > the things defined that need to be defined. As Jacob Navia points out,
    > that may not give you the meaning of a given macro on all possible platforms,
    > but for starters I think one would be happy to know what it means on one
    > platform.
    > --


    You can get the output of the preprocessor using the -E option. But the
    horrendous format will very likely make this output unreadable by a
    human.

    By the way , since noone has mentioned it , doesn't one need to be
    fairly
    proficient in the assembly of some processor before writing a compiler ?
     
    , Jun 11, 2006
    #17
  18. smnoff

    Morris Dovey Guest

    Re: How do I make my own custom C compiler?

    (in
    ) said:

    | By the way , since noone has mentioned it , doesn't one need to be
    | fairly
    | proficient in the assembly of some processor before writing a
    | compiler ?

    Only if the compiler is to output assembly code. :)

    [ Imagine a compiler that translated it's source language into C, or
    COBOL, or APL... ]

    --
    Morris Dovey
    DeSoto Solar
    DeSoto, Iowa USA
    http://www.iedu.com/DeSoto
     
    Morris Dovey, Jun 11, 2006
    #18
  19. Re: How do I make my own custom C compiler?

    wrote:
    > By the way , since noone has mentioned it , doesn't one need to be
    > fairly
    > proficient in the assembly of some processor before writing a compiler ?



    If that one needs a full-feautered compiler yes. But he might stop his
    compiler just before the creation of assembly language.
     
    Giannis Papadopoulos, Jun 12, 2006
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Randall Parker
    Replies:
    0
    Views:
    1,642
    Randall Parker
    Dec 4, 2005
  2. Stefan Siegl
    Replies:
    1
    Views:
    784
  3. Saverio M.
    Replies:
    0
    Views:
    535
    Saverio M.
    Jul 3, 2006
  4. David Filmer
    Replies:
    17
    Views:
    279
    J. Romano
    Aug 18, 2004
  5. PerlFAQ Server
    Replies:
    0
    Views:
    121
    PerlFAQ Server
    Mar 25, 2011
Loading...

Share This Page