jacob navia said:
gcc is impossible to understand unles you spend at least 2-3 YEARS working
in it full time. [...]
The first problem is to know RTL. You have to completely understand
RTL to understand the flow of things.
I've already pointed out that I am not qualified to give advice about
this, but I will give some anyway.
I spent some time about 20 years ago trying to read some of the
source code for GCC and to configure it for a hypothetical machine.
I was singularly unqualified to do that and am no less so now.
However, it was very educational and I would be glad to have an
excuse to do something like that again. I do remember some of the
things I learned. I thought RTL was a lot of fun since it was
conceptually simple and fairly self-contained. Where I got into
trouble was in filling in the machine description files. To the
extent that it just described hardware and big- vs. little-
endianness, it was no problem, but there are places where you
have to give exact details about the calling sequence the operating
system uses to load a program on the target machine. I didn't know
enough about operating systems to guess what the calling sequence
would be on the machine I was trying to imagine.
Even if you fail to understand the code for GCC, it probably won't
do you any harm to try. You might find yourself going back to to the
source code again and again for guidance and inspiration as you learn
more about compilers in other ways.
Second, the sheer size of the code base. There are 13-15 MB
of C source code to understand. And the code is mostly very sparsely
commented. Macros everywhere hide from you what is going on.
One way of getting around that problem is to download an old version
of GCC, before it was ported to so many machines and before it supported
so many languages.
Accessing data structures is always done with macros, to easy
things when structure layout changes, but this makes it very
hard for newcomers to understand what the hell those macros
are DOING...
How about this: GCC is full of interesting data structures. You can
just take their definitions in isolation and try to figure out what
to do with them, even if their relevance to compilers is not immediately
apparent. Maybe the original code uses macros for greater efficiency,
but there are certain things you would always want to be able to do
with a given data structure and you can just write them yourself using
functions. Once you have a set of functions that will create or modify
or copy one of these data structures, or print one of them out in some
way, you can then try these macros out on them and see exactly what their
effects are, since you will know exactly what the data structure looks
like before you feed it to the macro.
In other words, as long as you are patient and don't mind studying the
code for its own sake, it seems to me that there are a lot of ways to
understand it. If you are in a hurry because you need to use the code
or modify it, or if you want to learn it quickly and then go write your
own, then the code appears as an obstacle and that might get in the way
of studying it. Just get what you can out of it and be glad that you got
that much.
Third, you have to find your way in a mess of #ifdefs that defies
the imagination. gcc runs in many machines, and "portability"
has been taken to ridiculous extremes (the assembler, for instance).
This means that the same macro can have several interpretations
depending on which combination of machine/os you are running.
I am not very good at GCC but I vaguely recall that it has a lot of options
that let you print out the results of various stages of processing a program.
For example, you can tell GCC to give you RTL output. Maybe if you compile
GCC with GCC and look at the output at the right stage (e.g. after cpp gets
through with it) you can get rid of all the #ifdefs by compiling with all
the things defined that need to be defined. As Jacob Navia points out,
that may not give you the meaning of a given macro on all possible platforms,
but for starters I think one would be happy to know what it means on one
platform.