How do linkers work?

K

Kenny McCormack

No, by both.


Obviously a sufficiently complicated linker can be used for
everything. But until quite recently most linkers that were adequate
for C were *not* adequate for C++. Gcc's C++ for example had an
associated program "collect2" that provided the linkage features
absent in typical unix linkers (in particular, if I understand
correctly, support for initialisation by C++ constructors).

And therefore, I fear deeply for his clique membership.
Downfall looks imminent.
 
J

jacob navia

(e-mail address removed) wrote:
[snip]

Look, I just do not understand where you want to go

It is impossible to explain object files in a few words, i.e. in a
newsgroup posting, mentioning ALL possible forms an object file
can have.

That is why I explained the concept of separate compilation,
and the three parts that an object file must have, when separate
compilation is done.

Your example of the switch of MSVC doesn't apply here because actually
there is no separate compilation precisely. There is no linking and
there is no linker since all the inteermediate code is just
thrown into a back end.

I should have answered you that first.

Then, what it means "conceptually"

It means that I did not want to get very precise as to what is
actually in each of the formats of object files around because they
can change a lot.

That is why I insisted only in the general concepts of what must
be inside:
o Symbol Table
o Relocations
o Sections

And even if the object files vary a lot, there isn't
any object file (that supports separate compilation) that
doesn't have at least that inside.

Obviously, you feel an urge to polemic, irony etc.

Maybe because you have still not swallowed the fact that
the non existent stack is universally present :)

Now, you get into philosophy.

EPROMS are executables?

They have surely another format, but they do not have
an essential characteristic of executable files.

Executable files are files that are RELOCATABLE. They
can be loaded at different addresses in memory. Even if in
many systems that use virtual memory that address is fixed,
the shared objects (dlls) are truly relocatable.

Files in EPROM/ ROM are NOT relocatable therefore they are NOT
executable files in the normal sense.

They can be executed of course, but their organization is completely
different because there is no loader.

I will come back to executable formats when I describe the linker in
more detail.
 
F

Flash Gordon

jacob navia wrote, On 24/03/08 20:38:
(e-mail address removed) wrote:
[snip]

Look, I just do not understand where you want to go

He probably wants this to go to a group where it it is topical.
It is impossible to explain object files in a few words, i.e. in a
newsgroup posting, mentioning ALL possible forms an object file
can have.

Which is why there are system specific groups and for more general stuff
more general groups.
That is why I explained the concept of separate compilation,
and the three parts that an object file must have, when separate
compilation is done.

Ah, but it isn't always done and isn't topical here.
Your example of the switch of MSVC doesn't apply here because actually
there is no separate compilation precisely. There is no linking and
there is no linker since all the inteermediate code is just
thrown into a back end.

So, in other words, your post was specific to certain implementations
with certain combinations of switches. I.e. it is not generally correct.
I should have answered you that first.

Then, what it means "conceptually"

It means that I did not want to get very precise as to what is
actually in each of the formats of object files around because they
can change a lot.

That is why I insisted only in the general concepts of what must
be inside:
o Symbol Table
o Relocations
o Sections

Some code on some processors can be relocatable *without* relocation
tables. In fact, on some processors the code is smaller and faster if it
is relocatable! Then, of course, there are C interpreters which don't
have any of that.

I will come back to executable formats when I describe the linker in
more detail.

Please don't. You have already demonstrated in this post that it is
highly implementation specific which *should* be enough for you to see
why it is not topical here.
 
R

robertwessel2

(e-mail address removed) wrote:

[snip]

Look, I just do not understand where you want to go

It is impossible to explain object files in a few words, i.e. in a
newsgroup posting, mentioning ALL possible forms an object file
can have.

That is why I explained the concept of separate compilation,
and the three parts that an object file must have, when separate
compilation is done.

Your example of the switch of MSVC doesn't apply here because actually
there is no separate compilation precisely. There is no linking and
there is no linker since all the inteermediate code is just
thrown into a back end.

I should have answered you that first.


OK, I type:

cl -GL program1.c
cl -GL program2.c
link program1.obj+program2.obj

You're claiming that this is somehow *not* separate compilation as
normally understood? Even though the invocations of CL fully parse
and error check the input programs? And do who knows how much
analysis.

I'm sorry, but that's a really silly assertion. The implementation
details of what happens when and where in the process are just that:
implementation details. The fact the some part (*part*) of what's
traditionally considered the compilation process happens a bit latter
than usual, makes no difference.

And if I remove the -"GL" from the command lines, I get a program that
does exactly the same thing, just a little slower. Again, so what?
The fact that the vendor did something odd and funky under the hood
that allows the final program to run faster is not relevant to the
process.

And what would be your dividing line between what functions are
allowed in a linker before it becomes something other than a linker.
Plenty of linkers have done all sorts of things in the process of
generating executables. Plenty of linkers can patch up branches so
that a shorter form is used if the target is in range. Some linkers
can add overlay processing to a program. Some linkers can search for
missing code in libraries and conditionally add it to the executable.
Heck, what happens when code gets patched up at *runtime*? Is the
compiler now in the runtime somewhere?

Then, what it means "conceptually"

It means that I did not want to get very precise as to what is
actually in each of the formats of object files around because they
can change a lot.

That is why I insisted only in the general concepts of what must
be inside:
o Symbol Table
o Relocations
o Sections

And even if the object files vary a lot, there isn't
any object file (that supports separate compilation) that
doesn't have at least that inside.


While you might have such things internally in the linker in some
cases, are you really trying to argue that the transitory existence of
some internal structure is a fundamental part of C?

Obviously, you feel an urge to polemic, irony etc.

Maybe because you have still not swallowed the fact that
the non existent stack is universally present :)


No. You've said "an object file has..." I think that's overbroad, to
be point of being an outright error. "Commonly, an object file
has..." would be rather more reasonable. You're trying to argue the
former, even when I've given you a simple example where it's not
correct.

In essence you're trying to apply the as-if rule to the behavior of
the compiler/linker system in regards to the stuff in the output from
a translation of a translation unit. That's reasonable, except that
you've got the wrong basis for your as-if. Program1 has a reference
of some type to program2. That has to be resolved in the linking
step, but it does *not* require the traditional object file structures
to do that.

As a practical example, most systems that support link-time code
generation do not actually produce anything, internal or external,
that looks like a traditional object file for each translation unit.
It simply never exists. Rather they *do* typically produce an object
file (often internally, or as a temporary file - same thing), but
*one* object file, that is the combination of all the programs tossed
into the back end. In fact the object file looks like what would
happen if you pasted the source code for all the programs together and
then compiled them as a unit. In practice, the common technique is to
past together the parse trees, after patching up some names so you
don't get collisions and the like.

That one object file is then typically run through the traditional
link process to deal with libraries and whatnot (and other objects
from different compilers).

Now, you get into philosophy.

EPROMS are executables?

They have surely another format, but they do not have
an essential characteristic of executable files.

Executable files are files that are RELOCATABLE. They
can be loaded at different addresses in memory. Even if in
many systems that use virtual memory that address is fixed,
the shared objects (dlls) are truly relocatable.

Files in EPROM/ ROM are NOT relocatable therefore they are NOT
executable files in the normal sense.

They can be executed of course, but their organization is completely
different because there is no loader.


Good god I must be old. I remember that relocating linkers were once
the option or *upgrade* from the "normal" form. Heck, once upon a
time we worried that the extra overhead from the relocated versions of
the executables cost too much. Relocation is *not* a characteristic
attribute of executable files.

But I'm confused. It sounds like *you're* arguing that executables
are not really required. Which doesn't sound like you...
 
B

Bartc

You keep presenting this stuff as absolute, when it's not.

As I understand linking, it's resolving dependencies between separately
compiled modules before creating a runnable form of the program. That could
be done by any process at all, with intermediate files or not.

But *traditionally* it would be done the way Jacob has outlined.

So you're right, there are any number of ways of achieving the same ends.
But it doesn't mean you can't talk about a very common way of doing it.
Otherwise you wouldn't be able to talk about anything.

Whether c.l.c is an appropriate place for it is another matter. But I'm not
bothered about it myself.
 
U

user923005

As I understand linking, it's resolving dependencies between separately
compiled modules before creating a runnable form of the program. That could
be done by any process at all, with intermediate files or not.

But *traditionally* it would be done the way Jacob has outlined.

So you're right, there are any number of ways of achieving the same ends.
But it doesn't mean you can't talk about a very common way of doing it.
Otherwise you wouldn't be able to talk about anything.

Whether c.l.c is an appropriate place for it is another matter. But I'm not
bothered about it myself.

Let's not forget GSMATCH:

LINK

GSMATCH

Sets match control parameters for a shareable image and
specifies
the match algorithm. This option allows you to control whether
executable images that link with a shareable image must be
relinked each time the shareable image is updated and relinked.

Format

GSMATCH=keyword,major-id,minor-id

GSMATCH=EQUAL,link-time-derived-major-id,link-time-derived-
minor-id
(default)




Additional information available:

Option_Values
 
R

Richard

user923005 said:
{snip}
Here is the same thing, more succintly and accurately:
http://en.wikipedia.org/wiki/Linker
I guess that anybody wants to program and who is smart enough to
become a programmer could find it in two seconds.

And anyone who is "smart enough" and "wants to program" can google up
the return code of main() and read enough reviews to know that reading
Knuth is not the way for the average "smart enough" guy to learn
programming. Your point is? And if you have a point would you like to
define just how hard something must be before they are allowed to post
here to ask a question on it and you will deign to answer?
 
J

jacob navia

user923005 said:
Let's not forget GSMATCH:

LINK

GSMATCH

Sets match control parameters for a shareable image and
specifies
the match algorithm. This option allows you to control whether
executable images that link with a shareable image must be
relinked each time the shareable image is updated and relinked.

Format

GSMATCH=keyword,major-id,minor-id

GSMATCH=EQUAL,link-time-derived-major-id,link-time-derived-
minor-id
(default)




Additional information available:

Option_Values

And your point is what?
What does GSMATCH have to do here?
 
J

jacob navia

(e-mail address removed) wrote:

[snip]

Look, I just do not understand where you want to go

It is impossible to explain object files in a few words, i.e. in a
newsgroup posting, mentioning ALL possible forms an object file
can have.

That is why I explained the concept of separate compilation,
and the three parts that an object file must have, when separate
compilation is done.

Your example of the switch of MSVC doesn't apply here because actually
there is no separate compilation precisely. There is no linking and
there is no linker since all the inteermediate code is just
thrown into a back end.

I should have answered you that first.


OK, I type:

cl -GL program1.c
cl -GL program2.c
link program1.obj+program2.obj

You're claiming that this is somehow *not* separate compilation as
normally understood? Even though the invocations of CL fully parse
and error check the input programs? And do who knows how much
analysis.

I'm sorry, but that's a really silly assertion. The implementation
details of what happens when and where in the process are just that:
implementation details. The fact the some part (*part*) of what's
traditionally considered the compilation process happens a bit latter
than usual, makes no difference.

If you are not interested in the "implementation details"
then please stop this stupid discussion. I want to explain
that details precisely. If you do not want to know
anything about them, just type your commands
and forget about the internals

And if I remove the -"GL" from the command lines, I get a program that
does exactly the same thing, just a little slower. Again, so what?
The fact that the vendor did something odd and funky under the hood
that allows the final program to run faster is not relevant to the
process.

And what would be your dividing line between what functions are
allowed in a linker before it becomes something other than a linker.
Plenty of linkers have done all sorts of things in the process of
generating executables. Plenty of linkers can patch up branches so
that a shorter form is used if the target is in range. Some linkers
can add overlay processing to a program. Some linkers can search for
missing code in libraries and conditionally add it to the executable.
Heck, what happens when code gets patched up at *runtime*? Is the
compiler now in the runtime somewhere?

Linkers do "linking", i.e.
they read all the symbol tables of the object files
assemble the different sections together
Resolve all the symbols
Output an executable.

Whatever *else* they do is of no importance if they do
this.

While you might have such things internally in the linker in some
cases, are you really trying to argue that the transitory existence of
some internal structure is a fundamental part of C?

Not of C but of the linking process.
I am speaking about linking here.
No. You've said "an object file has..." I think that's overbroad, to
be point of being an outright error. "Commonly, an object file
has..." would be rather more reasonable. You're trying to argue the
former, even when I've given you a simple example where it's not
correct.

In essence you're trying to apply the as-if rule to the behavior of
the compiler/linker system in regards to the stuff in the output from
a translation of a translation unit. That's reasonable, except that
you've got the wrong basis for your as-if. Program1 has a reference
of some type to program2. That has to be resolved in the linking
step, but it does *not* require the traditional object file structures
to do that.

Maybe. I am speaking about the most common situation. Feel free to
explain other situations if you wish.

As a practical example, most systems that support link-time code
generation do not actually produce anything, internal or external,
that looks like a traditional object file for each translation unit.
It simply never exists. Rather they *do* typically produce an object
file (often internally, or as a temporary file - same thing), but
*one* object file, that is the combination of all the programs tossed
into the back end. In fact the object file looks like what would
happen if you pasted the source code for all the programs together and
then compiled them as a unit. In practice, the common technique is to
past together the parse trees, after patching up some names so you
don't get collisions and the like.

That one object file is then typically run through the traditional
link process to deal with libraries and whatnot (and other objects
from different compilers).

You confirm then, what I said in my previous message.
The object file is there, containing all the
program, that is linked with libraries etc.
Good god I must be old. I remember that relocating linkers were once
the option or *upgrade* from the "normal" form. Heck, once upon a
time we worried that the extra overhead from the relocated versions of
the executables cost too much. Relocation is *not* a characteristic
attribute of executable files.

I use that definition of executable files: relocation.
If not an EPROM file would be an executable file.

In some primitive OSes executable files can be like that.
So what? I use the relocation as definition.
But I'm confused.

Yes, that is obvious
 
T

Tony Giles

John Bode wrote:
That's kind of how you need to look at technical newsgroups.

Hi John,

Thanks to you (and others) for this detailed reply. I guess I'd had a
bad day and apologise for any noise. As I said, Jacob's topics were
informative to me (I never believed them to be authoritative Richard)
but after your post I see what you mean about perpetuating problems and
the topicality of technical newsgroups.

I'll be taking the advice to start browsing comp.programming and others
for non-specific C stuff and stop thinking that comp.lang.c should be my
only port of call.

Cheers, Tony.
 
F

Flash Gordon

Tony Giles wrote, On 25/03/08 06:46:
John Bode wrote:


Hi John,

Thanks to you (and others) for this detailed reply. I guess I'd had a
bad day and apologise for any noise. As I said, Jacob's topics were
informative to me (I never believed them to be authoritative Richard)
but after your post I see what you mean about perpetuating problems and
the topicality of technical newsgroups.

I'll be taking the advice to start browsing comp.programming and others
for non-specific C stuff and stop thinking that comp.lang.c should be my
only port of call.

Tony, thank you for being reasonable. You have just proved it is
possible to have a reasonable discussion about topicality. You might
also find the following pages interesting as to why some of the
attitudes are what they are http://clc-wiki.net/wiki/intro_to_clc
http://clc-wiki.net/wiki/C_community:comp.lang.c:Portability_attitude
 
F

Flash Gordon

jacob navia wrote, On 25/03/08 05:30:
(e-mail address removed) wrote:


If you are not interested in the "implementation details"
then please stop this stupid discussion. I want to explain
that details precisely. If you do not want to know
anything about them, just type your commands
and forget about the internals

Ah, but he has just shown that your "implementation details" are not
precise. Or to be more accurate they might be precise for some
implementations, but they are incorrect for others.

Linkers do "linking", i.e.
they read all the symbol tables of the object files
assemble the different sections together
Resolve all the symbols
Output an executable.

Whatever *else* they do is of no importance if they do
this.

Ah, so the relocation is not important, so why did you mention it earlier?

Although there are systems that do not output an executable at all, e.g.
C interpreters, and others where it is arguable because they do the
linking at load time, so there is never an executable written anywhere
other than memory.

^^^^^^^^^^^^^

Above you claim this is not relevant. In terms of C it is not relevant
and it is not always done and sometimes it is done at load time rather
than link time.

Not of C but of the linking process.
I am speaking about linking here.

Which needs a lot less discussion.

Maybe. I am speaking about the most common situation. Feel free to
explain other situations if you wish.

I think the most common situations are what goes on in the embedded
world rather than on Windows, although your "most common situation" was
demonstrated not to be true for MS Visual Studio with one of its
options, so even there you are on dodgy ground.

I use that definition of executable files: relocation.

In that case you are deliberately excluding vast swathes of applications
written in C thus showing yet again why this is not topical.
 
B

Bartc

Richard said:
And anyone who is "smart enough" and "wants to program" can google up
the return code of main() and read enough reviews to know that reading
Knuth is not the way for the average "smart enough" guy to learn
programming. Your point is? And if you have a point would you like to
define just how hard something must be before they are allowed to post
here to ask a question on it and you will deign to answer?

Sometimes trying to find an answer on Google or whatever is like phoning a
company with a query only to be presented with a plethora of prerecorded
messages none of which is relevant to your problem.

It's a lot easier to speak to a human and ask your question directly and get
a straight answer. And usenet is a bit like that.

The only downside is you don't great deals on your problem on eBay.
 
L

lawrence.jones

jacob navia said:
If you are not interested in the "implementation details"
then please stop this stupid discussion. I want to explain
that details precisely. If you do not want to know
anything about them, just type your commands
and forget about the internals

That's the problem -- you want to explain the details precisely, but you
can't because they vary from implementation to implementation. So you
end up describing some abstract implementation very precisely, which is
likely to confuse your intended audience into thinking that every
implementation has to work exactly the way you've described it,
prompting people who know better to immediately jump in to clarify that
that is not the case. You could avoid a lot of those comments by simply
making it clear that what you're describing is an abstraction that's
very similar to most real implementations. All that's required is a
liberal dose of phrases like "in general", "for example", "most", etc.

-Larry Jones

I don't see why some people even HAVE cars. -- Calvin
 
J

jacob navia

Flash said:
Ah, but he has just shown that your "implementation details" are not
precise. Or to be more accurate they might be precise for some
implementations, but they are incorrect for others.

No, he just showed that when invoked with some
special flags, the msvc compiler produces a single object file
that is linked differently with the system libraries.

Conceptually however, the object file is probably the same as the
other object files. But the model of SEPARATE compilation
does not hold since all source files are thrown together.
Ah, so the relocation is not important, so why did you mention it earlier?

"Resolve all symbols" is what then?
Reading all the relocation records, and producing a relocatable
executable.

But you just have no interest in anything else but proving me
wrong. Look the best is then is for you to see:

"jacob is wrong".

Happy now?
Although there are systems that do not output an executable at all, e.g.
C interpreters,

Great, but they still have dynamic linking. Anyway who cares?
and others where it is arguable because they do the
linking at load time, so there is never an executable written anywhere
other than memory.

So, all the modules are loaded separatedly each time?

It would be quite surprising if they did. No, what they do is
to dynamically link the executable that is in disk form.

The dlls/shared objects are linked on the fly when they are
loaded, under windows/linux. You wouldn't say that the
executable doesn't exists isn't it?

^^^^^^^^^^^^^

Above you claim this is not relevant.

No. I said "Resolving all symbols"

In my next installment I will explain how this is done using the
relocation records.
In terms of C it is not relevant
and it is not always done and sometimes it is done at load time rather
than link time.

This is wrong.

It is relevant. And very relevant, see the questions coming up every
time about linking.
In that case you are deliberately excluding vast swathes of applications
written in C thus showing yet again why this is not topical.

I am excluding them from this discussion about linkers and linking!

I am NOT excluding them from the language or saying that those
applications aren't valid or whatever your imagination is
constructing here!
 
R

robertwessel2

No, he just showed that when invoked with some
special flags, the msvc compiler produces a single object file
that is linked differently with the system libraries.

Conceptually however, the object file is probably the same as the
other object files. But the model of SEPARATE compilation
does not hold since all source files are thrown together.


Again, no they're not. The original source file has no meaning when
you get to the link step. It has been fully parsed, preprocessed,
syntax analyzed, templates expanded (after all this is a C++ compiler
too), folded spindled and mutilated, and who knows what else. The
traditional compiler back end has not been run yet, but the source
file is utterly irrelevant - you can, in fact, delete it if you wish.
IOW, you cannot get any of the traditional compile errors from the
link step - if you mistype "long" as "lnog" you'll get a diagnostic
when you run the compile step.

In fact, there is no programmer visible difference in the compile and
link process with link time code generation, except that the compile
steps are faster and the link step slower (and your program usually
runs a bit faster). You still get your diagnostics at compile time,
and you get your link errors at link time. I'm excluding errors in
the back end here, which shouldn't occur. And oh yes, the MS linker
says "code generation complete" if it does any code generation.

My mention of combined source files was only in the context of
describing what the resulting (internal) object file looked like -
which is potentially quite different than what any of the per-
translation-unit object files might look like in the traditional
process, as is the final result. Thus is not conceptually the same
process and many of the things that would need to exist with
traditional separate machine code object files can be omitted.

The compiler outputs some intermediate representation of the source
program into the ".obj" file. Just like it does in the traditional
case. Except that in the link-time-code-generation case, it's a much
higher level representation of the program, whereas the traditional
case has a heavily decorated form of machine code (which is *also* an
intermediate representation, but one somewhat closer to the final
executable). You're trying to claim that one is separate compilation,
and the other isn't. IMO, you're flatly wrong. The linking process
does exist to glue together bits of separately compiled code. Whether
or not all, some, or even any, of the bits of separately compiled code
are similar at time of input to the linker to a particular, or even
any, machine code, is quite irrelevant.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top