querry about compiling and linking

G

gurpreet

Hi this is gurpreet,
I know this is a very simple question but still I want
to clear some doubts.
What happens when we compile and link a c-program?


I hope aquite a lot of responses to my querry.


BYE! BYE!
 
J

Joona I Palaste

gurpreet said:
Hi this is gurpreet,
I know this is a very simple question but still I want
to clear some doubts.
What happens when we compile and link a c-program?
I hope aquite a lot of responses to my querry.

It depends entirely on the implementation.
 
W

Walter Roberson

: I know this is a very simple question but still I want
:to clear some doubts.
:What happens when we compile and link a c-program?

The ANSI/ISO specification lists a series of "phases" of compilation.
You pretty much have to read the phase description for yourself to
understand all the implications.

If one takes a program its most basic form, without preprocessor
magic (which is important magic) and without comments or oddities
such as trigraphs, and without \ sequences in quoted strings,
then -generally- speaking, the compiler examines sequences of
characters, internally breaks them up at "token" boundaries
(e.g., breaks 2*-36 into "2", "*", "-" and "36") and then starts
examining the sequence of tokens.

There is more than one major method of figuring out what the program
"means" from the sequence of tokens. In one of the methods, the
compiler proceeds token by token, with each new token encountered
causing a transition to a new state which is dependant on the token.

For example, upon seeing the "2" token it would enter the state
corresponding to "I have a number". Then when it saw the "*" it would
say "'*' can mean pointer indirection or multiplication, but pointer
indirection is not valid right after a number, so I must have a
partially-completed binary term." Then it would see the "-" and say
"'-' can mean subtraction or unary negatation, but subtraction is not
valid at this point in a binary term, so it must be negation". Upon
seeing the "36" it would say "Okay, that's a number" and then "Okay,
now I know what was being negated" and would take note of that, but it
would -not- immediately record the -36 as the second part of the term.
It would instead look further on, see that there is no more input: then
it would know that it was not, for example, part way through 2*-36/6
in which the '/' needs to be done first. Knowing that there was
nothing following the -36, it would look at it's state and say
"Okay, so the negated number must be the second part of the term
that I'm in the middle of processing, and then would take a record
that it had figured out that you were multiplying two things together,
the first of which is the number 2 and the second of which is the number
-36. After that, it would look again, see there was no more input
and see that there is no pending action such as an assignment,
and so it would figure out that you must be using the C statement
which is an expression by itself (evaluated for its side effects).

At that point it might look and say "Oh, but an expression statement
is only valid inside a block, and you aren't inside a block" and
so emit a compiler error. But if there was sufficient syntactical
context, then after processing the rest of that context, it could
emit some kind of representation of "multiply" 2 -36 and send that
to the code generator.

The code generation phase gets passed an in-memory representation
of the code to be generated, with everything already figured out as
to any mandatory order of operations -- but possibly without
having already nailed down the order of operation in cases where the
standard leaves the order to the implimentation. The code generator
would usually run one or more optimization phases, such as "dead code
elimination" [in which it is noticed that certain code is never used,
perhaps because it is inside an 'if' statement that is always false.]
Much more sophisticated optimization is possible as well. Once
an optimized internal representation of the code exists, the code
generator would then proceed to produce a platform-dependant sequence
of instructions that would be necessary to run the program. For
example, it might emit some headers and then machine code equivilent
to

load data register #1 with the constant value 2
load data register #2 with the constant value -36
multiply data register #1 by #2, putting the result into the
register pair (#1,#2) [multiplication might produce a result
with twice as many bits as the original values]
ignore the overflow value stored in data register #1
store the value currently in data register #2 to memory location ...

Sometimes more optimization is done after this, especially
"peephole optimization", which does very local optimizations. An
example of a "peephole optimization" would be for the optimizer
to notice that the value being multipled by is a small constant and
notice that the instruction set has a combination multiply
instruction that can be used with small constants in the case
where the overflow is being ignored, and so might
transform the code to something like;

load data register #1 with the constant value 2
multply data register #1 by the constant -36, putting the result
in data register #1
store the value currently in data register #1 to memory location...


After all the code is generated would come the linker phase. There
is more than one way that linkers can do their work, but generally
speaking they look for unresolved variable and function references
in the object code, figure out where those variables or fuhctions
are defined, and then patches up all the code references to use
the appropriate offsets. In some systems, there is a rigid separation
between an "executable" and a mostly-compiled file, with the linker
taking the mostly-compiled files and emitting an "executable".
In other systems, executables and partially-compiled files have the
same format, and as far as the linker is concerned the operation is
just one of taking two partily-compiled files and combining them into
a new composite partially-compiled file; such systems don't notice
that there are missing definitions until you actually tell it to
start executing the file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top