How do linkers work?

Kenny McCormack · Mar 24, 2008

No, by both.

Obviously a sufficiently complicated linker can be used for
everything. But until quite recently most linkers that were adequate
for C were *not* adequate for C++. Gcc's C++ for example had an
associated program "collect2" that provided the linkage features
absent in typical unix linkers (in particular, if I understand
correctly, support for initialisation by C++ constructors).

And therefore, I fear deeply for his clique membership.
Downfall looks imminent.

jacob navia · Mar 24, 2008

(e-mail address removed) wrote:
[snip]

Look, I just do not understand where you want to go

It is impossible to explain object files in a few words, i.e. in a
newsgroup posting, mentioning ALL possible forms an object file
can have.

That is why I explained the concept of separate compilation,
and the three parts that an object file must have, when separate
compilation is done.

Your example of the switch of MSVC doesn't apply here because actually
there is no separate compilation precisely. There is no linking and
there is no linker since all the inteermediate code is just
thrown into a back end.

I should have answered you that first.

Then, what it means "conceptually"

It means that I did not want to get very precise as to what is
actually in each of the formats of object files around because they
can change a lot.

That is why I insisted only in the general concepts of what must
be inside:
o Symbol Table
o Relocations
o Sections

And even if the object files vary a lot, there isn't
any object file (that supports separate compilation) that
doesn't have at least that inside.

Obviously, you feel an urge to polemic, irony etc.

Maybe because you have still not swallowed the fact that
the non existent stack is universally present

Now, you get into philosophy.

EPROMS are executables?

They have surely another format, but they do not have
an essential characteristic of executable files.

Executable files are files that are RELOCATABLE. They
can be loaded at different addresses in memory. Even if in
many systems that use virtual memory that address is fixed,
the shared objects (dlls) are truly relocatable.

Files in EPROM/ ROM are NOT relocatable therefore they are NOT
executable files in the normal sense.

They can be executed of course, but their organization is completely
different because there is no loader.

I will come back to executable formats when I describe the linker in
more detail.

John Bode · Mar 24, 2008

And therefore, I fear deeply for his clique membership.
Downfall looks imminent.

Breaking my own rule, but...

TINFC. GTFOY. HTH. HAND.

Flash Gordon · Mar 24, 2008

jacob navia wrote, On 24/03/08 20:38:

(e-mail address removed) wrote:
[snip]

Look, I just do not understand where you want to go

He probably wants this to go to a group where it it is topical.

It is impossible to explain object files in a few words, i.e. in a
newsgroup posting, mentioning ALL possible forms an object file
can have.

Which is why there are system specific groups and for more general stuff
more general groups.

That is why I explained the concept of separate compilation,
and the three parts that an object file must have, when separate
compilation is done.

Ah, but it isn't always done and isn't topical here.

Your example of the switch of MSVC doesn't apply here because actually
there is no separate compilation precisely. There is no linking and
there is no linker since all the inteermediate code is just
thrown into a back end.

So, in other words, your post was specific to certain implementations
with certain combinations of switches. I.e. it is not generally correct.

I should have answered you that first.

Then, what it means "conceptually"

It means that I did not want to get very precise as to what is
actually in each of the formats of object files around because they
can change a lot.

That is why I insisted only in the general concepts of what must
be inside:
o Symbol Table
o Relocations
o Sections

Some code on some processors can be relocatable *without* relocation
tables. In fact, on some processors the code is smaller and faster if it
is relocatable! Then, of course, there are C interpreters which don't
have any of that.

I will come back to executable formats when I describe the linker in
more detail.

Please don't. You have already demonstrated in this post that it is
highly implementation specific which *should* be enough for you to see
why it is not topical here.

robertwessel2 · Mar 24, 2008

(e-mail address removed) wrote:

[snip]

Look, I just do not understand where you want to go

It is impossible to explain object files in a few words, i.e. in a
newsgroup posting, mentioning ALL possible forms an object file
can have.

That is why I explained the concept of separate compilation,
and the three parts that an object file must have, when separate
compilation is done.

Your example of the switch of MSVC doesn't apply here because actually
there is no separate compilation precisely. There is no linking and
there is no linker since all the inteermediate code is just
thrown into a back end.

I should have answered you that first.

OK, I type:

cl -GL program1.c
cl -GL program2.c
link program1.obj+program2.obj

You're claiming that this is somehow *not* separate compilation as
normally understood? Even though the invocations of CL fully parse
and error check the input programs? And do who knows how much
analysis.

I'm sorry, but that's a really silly assertion. The implementation
details of what happens when and where in the process are just that:
implementation details. The fact the some part (*part*) of what's
traditionally considered the compilation process happens a bit latter
than usual, makes no difference.

And if I remove the -"GL" from the command lines, I get a program that
does exactly the same thing, just a little slower. Again, so what?
The fact that the vendor did something odd and funky under the hood
that allows the final program to run faster is not relevant to the
process.

And what would be your dividing line between what functions are
allowed in a linker before it becomes something other than a linker.
Plenty of linkers have done all sorts of things in the process of
generating executables. Plenty of linkers can patch up branches so
that a shorter form is used if the target is in range. Some linkers
can add overlay processing to a program. Some linkers can search for
missing code in libraries and conditionally add it to the executable.
Heck, what happens when code gets patched up at *runtime*? Is the
compiler now in the runtime somewhere?

Then, what it means "conceptually"

It means that I did not want to get very precise as to what is
actually in each of the formats of object files around because they
can change a lot.

That is why I insisted only in the general concepts of what must
be inside:
o Symbol Table
o Relocations
o Sections

And even if the object files vary a lot, there isn't
any object file (that supports separate compilation) that
doesn't have at least that inside.

While you might have such things internally in the linker in some
cases, are you really trying to argue that the transitory existence of
some internal structure is a fundamental part of C?

Obviously, you feel an urge to polemic, irony etc.

Maybe because you have still not swallowed the fact that
the non existent stack is universally present

No. You've said "an object file has..." I think that's overbroad, to
be point of being an outright error. "Commonly, an object file
has..." would be rather more reasonable. You're trying to argue the
former, even when I've given you a simple example where it's not
correct.

In essence you're trying to apply the as-if rule to the behavior of
the compiler/linker system in regards to the stuff in the output from
a translation of a translation unit. That's reasonable, except that
you've got the wrong basis for your as-if. Program1 has a reference
of some type to program2. That has to be resolved in the linking
step, but it does *not* require the traditional object file structures
to do that.

As a practical example, most systems that support link-time code
generation do not actually produce anything, internal or external,
that looks like a traditional object file for each translation unit.
It simply never exists. Rather they *do* typically produce an object
file (often internally, or as a temporary file - same thing), but
*one* object file, that is the combination of all the programs tossed
into the back end. In fact the object file looks like what would
happen if you pasted the source code for all the programs together and
then compiled them as a unit. In practice, the common technique is to
past together the parse trees, after patching up some names so you
don't get collisions and the like.

That one object file is then typically run through the traditional
link process to deal with libraries and whatnot (and other objects
from different compilers).

Now, you get into philosophy.

EPROMS are executables?

They have surely another format, but they do not have
an essential characteristic of executable files.

Executable files are files that are RELOCATABLE. They
can be loaded at different addresses in memory. Even if in
many systems that use virtual memory that address is fixed,
the shared objects (dlls) are truly relocatable.

Files in EPROM/ ROM are NOT relocatable therefore they are NOT
executable files in the normal sense.

They can be executed of course, but their organization is completely
different because there is no loader.

Good god I must be old. I remember that relocating linkers were once
the option or *upgrade* from the "normal" form. Heck, once upon a
time we worried that the extra overhead from the relocated versions of
the executables cost too much. Relocation is *not* a characteristic
attribute of executable files.

But I'm confused. It sounds like *you're* arguing that executables
are not really required. Which doesn't sound like you...

Bartc · Mar 25, 2008

You keep presenting this stuff as absolute, when it's not.

As I understand linking, it's resolving dependencies between separately
compiled modules before creating a runnable form of the program. That could
be done by any process at all, with intermediate files or not.

But *traditionally* it would be done the way Jacob has outlined.

So you're right, there are any number of ways of achieving the same ends.
But it doesn't mean you can't talk about a very common way of doing it.
Otherwise you wouldn't be able to talk about anything.

Whether c.l.c is an appropriate place for it is another matter. But I'm not
bothered about it myself.

user923005 · Mar 25, 2008

As I understand linking, it's resolving dependencies between separately
compiled modules before creating a runnable form of the program. That could
be done by any process at all, with intermediate files or not.

But *traditionally* it would be done the way Jacob has outlined.

So you're right, there are any number of ways of achieving the same ends.
But it doesn't mean you can't talk about a very common way of doing it.
Otherwise you wouldn't be able to talk about anything.

Whether c.l.c is an appropriate place for it is another matter. But I'm not
bothered about it myself.

Let's not forget GSMATCH:

LINK

GSMATCH

Sets match control parameters for a shareable image and
specifies
the match algorithm. This option allows you to control whether
executable images that link with a shareable image must be
relinked each time the shareable image is updated and relinked.

Format

GSMATCH=keyword,major-id,minor-id

GSMATCH=EQUAL,link-time-derived-major-id,link-time-derived-
minor-id
(default)

Additional information available:

Option_Values

user923005 · Mar 25, 2008

{snip}
Here is the same thing, more succintly and accurately:
http://en.wikipedia.org/wiki/Linker
I guess that anybody wants to program and who is smart enough to
become a programmer could find it in two seconds.

Richard · Mar 25, 2008

John Bode said:
Breaking my own rule, but...

TINFC. GTFOY. HTH. HAND.

Could you state the cross section #s in the standard for those please
John :-;

Richard · Mar 25, 2008

user923005 said:
{snip}
Here is the same thing, more succintly and accurately:
http://en.wikipedia.org/wiki/Linker
I guess that anybody wants to program and who is smart enough to
become a programmer could find it in two seconds.

And anyone who is "smart enough" and "wants to program" can google up
the return code of main() and read enough reviews to know that reading
Knuth is not the way for the average "smart enough" guy to learn
programming. Your point is? And if you have a point would you like to
define just how hard something must be before they are allowed to post
here to ask a question on it and you will deign to answer?

jacob navia · Mar 25, 2008

user923005 said:
Let's not forget GSMATCH:

LINK

GSMATCH

Sets match control parameters for a shareable image and
specifies
the match algorithm. This option allows you to control whether
executable images that link with a shareable image must be
relinked each time the shareable image is updated and relinked.

Format

GSMATCH=keyword,major-id,minor-id

GSMATCH=EQUAL,link-time-derived-major-id,link-time-derived-
minor-id
(default)

Additional information available:

Option_Values

And your point is what?
What does GSMATCH have to do here?

jacob navia · Mar 25, 2008

(e-mail address removed) wrote:

[snip]

Look, I just do not understand where you want to go

It is impossible to explain object files in a few words, i.e. in a
newsgroup posting, mentioning ALL possible forms an object file
can have.

That is why I explained the concept of separate compilation,
and the three parts that an object file must have, when separate
compilation is done.

Your example of the switch of MSVC doesn't apply here because actually
there is no separate compilation precisely. There is no linking and
there is no linker since all the inteermediate code is just
thrown into a back end.

I should have answered you that first.

Click to expand...

OK, I type:

cl -GL program1.c
cl -GL program2.c
link program1.obj+program2.obj

You're claiming that this is somehow *not* separate compilation as
normally understood? Even though the invocations of CL fully parse
and error check the input programs? And do who knows how much
analysis.

I'm sorry, but that's a really silly assertion. The implementation
details of what happens when and where in the process are just that:
implementation details. The fact the some part (*part*) of what's
traditionally considered the compilation process happens a bit latter
than usual, makes no difference.

If you are not interested in the "implementation details"
then please stop this stupid discussion. I want to explain
that details precisely. If you do not want to know
anything about them, just type your commands
and forget about the internals

And if I remove the -"GL" from the command lines, I get a program that
does exactly the same thing, just a little slower. Again, so what?
The fact that the vendor did something odd and funky under the hood
that allows the final program to run faster is not relevant to the
process.

And what would be your dividing line between what functions are
allowed in a linker before it becomes something other than a linker.
Plenty of linkers have done all sorts of things in the process of
generating executables. Plenty of linkers can patch up branches so
that a shorter form is used if the target is in range. Some linkers
can add overlay processing to a program. Some linkers can search for
missing code in libraries and conditionally add it to the executable.
Heck, what happens when code gets patched up at *runtime*? Is the
compiler now in the runtime somewhere?

Linkers do "linking", i.e.
they read all the symbol tables of the object files
assemble the different sections together
Resolve all the symbols
Output an executable.

Whatever *else* they do is of no importance if they do
this.

While you might have such things internally in the linker in some
cases, are you really trying to argue that the transitory existence of
some internal structure is a fundamental part of C?

Not of C but of the linking process.
I am speaking about linking here.

No. You've said "an object file has..." I think that's overbroad, to
be point of being an outright error. "Commonly, an object file
has..." would be rather more reasonable. You're trying to argue the
former, even when I've given you a simple example where it's not
correct.

In essence you're trying to apply the as-if rule to the behavior of
the compiler/linker system in regards to the stuff in the output from
a translation of a translation unit. That's reasonable, except that
you've got the wrong basis for your as-if. Program1 has a reference
of some type to program2. That has to be resolved in the linking
step, but it does *not* require the traditional object file structures
to do that.

Maybe. I am speaking about the most common situation. Feel free to
explain other situations if you wish.

As a practical example, most systems that support link-time code
generation do not actually produce anything, internal or external,
that looks like a traditional object file for each translation unit.
It simply never exists. Rather they *do* typically produce an object
file (often internally, or as a temporary file - same thing), but
*one* object file, that is the combination of all the programs tossed
into the back end. In fact the object file looks like what would
happen if you pasted the source code for all the programs together and
then compiled them as a unit. In practice, the common technique is to
past together the parse trees, after patching up some names so you
don't get collisions and the like.

That one object file is then typically run through the traditional
link process to deal with libraries and whatnot (and other objects
from different compilers).

You confirm then, what I said in my previous message.
The object file is there, containing all the
program, that is linked with libraries etc.

Good god I must be old. I remember that relocating linkers were once
the option or *upgrade* from the "normal" form. Heck, once upon a
time we worried that the extra overhead from the relocated versions of
the executables cost too much. Relocation is *not* a characteristic
attribute of executable files.

I use that definition of executable files: relocation.
If not an EPROM file would be an executable file.

In some primitive OSes executable files can be like that.
So what? I use the relocation as definition.

But I'm confused.

Yes, that is obvious

Tony Giles · Mar 25, 2008

John Bode wrote:

That's kind of how you need to look at technical newsgroups.

Hi John,

Thanks to you (and others) for this detailed reply. I guess I'd had a
bad day and apologise for any noise. As I said, Jacob's topics were
informative to me (I never believed them to be authoritative Richard)
but after your post I see what you mean about perpetuating problems and
the topicality of technical newsgroups.

I'll be taking the advice to start browsing comp.programming and others
for non-specific C stuff and stop thinking that comp.lang.c should be my
only port of call.

Cheers, Tony.

Flash Gordon · Mar 25, 2008

Tony Giles wrote, On 25/03/08 06:46:

John Bode wrote:

Hi John,

Thanks to you (and others) for this detailed reply. I guess I'd had a
bad day and apologise for any noise. As I said, Jacob's topics were
informative to me (I never believed them to be authoritative Richard)
but after your post I see what you mean about perpetuating problems and
the topicality of technical newsgroups.

I'll be taking the advice to start browsing comp.programming and others
for non-specific C stuff and stop thinking that comp.lang.c should be my
only port of call.

Tony, thank you for being reasonable. You have just proved it is
possible to have a reasonable discussion about topicality. You might
also find the following pages interesting as to why some of the
attitudes are what they are http://clc-wiki.net/wiki/intro_to_clc
http://clc-wiki.net/wiki/C_community:comp.lang.c:Portability_attitude

Flash Gordon · Mar 25, 2008

jacob navia wrote, On 25/03/08 05:30:

(e-mail address removed) wrote:

If you are not interested in the "implementation details"
then please stop this stupid discussion. I want to explain
that details precisely. If you do not want to know
anything about them, just type your commands
and forget about the internals

Ah, but he has just shown that your "implementation details" are not
precise. Or to be more accurate they might be precise for some
implementations, but they are incorrect for others.

Linkers do "linking", i.e.
they read all the symbol tables of the object files
assemble the different sections together
Resolve all the symbols
Output an executable.

Whatever *else* they do is of no importance if they do
this.

Ah, so the relocation is not important, so why did you mention it earlier?

Although there are systems that do not output an executable at all, e.g.
C interpreters, and others where it is arguable because they do the
linking at load time, so there is never an executable written anywhere
other than memory.

^^^^^^^^^^^^^

Above you claim this is not relevant. In terms of C it is not relevant
and it is not always done and sometimes it is done at load time rather
than link time.

Not of C but of the linking process.
I am speaking about linking here.

Which needs a lot less discussion.

Maybe. I am speaking about the most common situation. Feel free to
explain other situations if you wish.

I think the most common situations are what goes on in the embedded
world rather than on Windows, although your "most common situation" was
demonstrated not to be true for MS Visual Studio with one of its
options, so even there you are on dodgy ground.

I use that definition of executable files: relocation.

In that case you are deliberately excluding vast swathes of applications
written in C thus showing yet again why this is not topical.

Bartc · Mar 25, 2008

Richard said:
And anyone who is "smart enough" and "wants to program" can google up
the return code of main() and read enough reviews to know that reading
Knuth is not the way for the average "smart enough" guy to learn
programming. Your point is? And if you have a point would you like to
define just how hard something must be before they are allowed to post
here to ask a question on it and you will deign to answer?

Sometimes trying to find an answer on Google or whatever is like phoning a
company with a query only to be presented with a plethora of prerecorded
messages none of which is relevant to your problem.

It's a lot easier to speak to a human and ask your question directly and get
a straight answer. And usenet is a bit like that.

The only downside is you don't great deals on your problem on eBay.

lawrence.jones · Mar 25, 2008

jacob navia said:
If you are not interested in the "implementation details"
then please stop this stupid discussion. I want to explain
that details precisely. If you do not want to know
anything about them, just type your commands
and forget about the internals

That's the problem -- you want to explain the details precisely, but you
can't because they vary from implementation to implementation. So you
end up describing some abstract implementation very precisely, which is
likely to confuse your intended audience into thinking that every
implementation has to work exactly the way you've described it,
prompting people who know better to immediately jump in to clarify that
that is not the case. You could avoid a lot of those comments by simply
making it clear that what you're describing is an abstraction that's
very similar to most real implementations. All that's required is a
liberal dose of phrases like "in general", "for example", "most", etc.

-Larry Jones

I don't see why some people even HAVE cars. -- Calvin

user923005 · Mar 25, 2008

And your point is what?
What does GSMATCH have to do here?

*(exactly!)*
Nice to see that you are getting the point.

jacob navia · Mar 25, 2008

Flash said:
Ah, but he has just shown that your "implementation details" are not
precise. Or to be more accurate they might be precise for some
implementations, but they are incorrect for others.

No, he just showed that when invoked with some
special flags, the msvc compiler produces a single object file
that is linked differently with the system libraries.

Conceptually however, the object file is probably the same as the
other object files. But the model of SEPARATE compilation
does not hold since all source files are thrown together.

Ah, so the relocation is not important, so why did you mention it earlier?

"Resolve all symbols" is what then?
Reading all the relocation records, and producing a relocatable
executable.

But you just have no interest in anything else but proving me
wrong. Look the best is then is for you to see:

"jacob is wrong".

Happy now?

Although there are systems that do not output an executable at all, e.g.
C interpreters,

Great, but they still have dynamic linking. Anyway who cares?

and others where it is arguable because they do the
linking at load time, so there is never an executable written anywhere
other than memory.

So, all the modules are loaded separatedly each time?

It would be quite surprising if they did. No, what they do is
to dynamically link the executable that is in disk form.

The dlls/shared objects are linked on the fly when they are
loaded, under windows/linux. You wouldn't say that the
executable doesn't exists isn't it?

^^^^^^^^^^^^^

Above you claim this is not relevant.

No. I said "Resolving all symbols"

In my next installment I will explain how this is done using the
relocation records.

In terms of C it is not relevant
and it is not always done and sometimes it is done at load time rather
than link time.

This is wrong.

It is relevant. And very relevant, see the questions coming up every
time about linking.

In that case you are deliberately excluding vast swathes of applications
written in C thus showing yet again why this is not topical.

I am excluding them from this discussion about linkers and linking!

I am NOT excluding them from the language or saying that those
applications aren't valid or whatever your imagination is
constructing here!

robertwessel2 · Mar 25, 2008

No, he just showed that when invoked with some
special flags, the msvc compiler produces a single object file
that is linked differently with the system libraries.

Conceptually however, the object file is probably the same as the
other object files. But the model of SEPARATE compilation
does not hold since all source files are thrown together.

Again, no they're not. The original source file has no meaning when
you get to the link step. It has been fully parsed, preprocessed,
syntax analyzed, templates expanded (after all this is a C++ compiler
too), folded spindled and mutilated, and who knows what else. The
traditional compiler back end has not been run yet, but the source
file is utterly irrelevant - you can, in fact, delete it if you wish.
IOW, you cannot get any of the traditional compile errors from the
link step - if you mistype "long" as "lnog" you'll get a diagnostic
when you run the compile step.

In fact, there is no programmer visible difference in the compile and
link process with link time code generation, except that the compile
steps are faster and the link step slower (and your program usually
runs a bit faster). You still get your diagnostics at compile time,
and you get your link errors at link time. I'm excluding errors in
the back end here, which shouldn't occur. And oh yes, the MS linker
says "code generation complete" if it does any code generation.

My mention of combined source files was only in the context of
describing what the resulting (internal) object file looked like -
which is potentially quite different than what any of the per-
translation-unit object files might look like in the traditional
process, as is the final result. Thus is not conceptually the same
process and many of the things that would need to exist with
traditional separate machine code object files can be omitted.

The compiler outputs some intermediate representation of the source
program into the ".obj" file. Just like it does in the traditional
case. Except that in the link-time-code-generation case, it's a much
higher level representation of the program, whereas the traditional
case has a heavily decorated form of machine code (which is *also* an
intermediate representation, but one somewhat closer to the final
executable). You're trying to claim that one is separate compilation,
and the other isn't. IMO, you're flatly wrong. The linking process
does exist to glue together bits of separately compiled code. Whether
or not all, some, or even any, of the bits of separately compiled code
are similar at time of input to the linker to a particular, or even
any, machine code, is quite irrelevant.

How do I make this craftinfsystem Work	1	Feb 9, 2023
How ELF libraries work	2	Dec 28, 2009
How a linker works (continued)	29	Mar 26, 2008
I am trying to make an auto-play thing. How do I make it work?	5	Apr 5, 2022
How does creating connection pooling work with multiple client requests?	0	Nov 21, 2022
Page do not work, when adding php code	1	Sep 16, 2022
How do I position these parts?	4	Jan 6, 2024
How to make a div select work?	5	Jan 13, 2022

How do linkers work?

Kenny McCormack

jacob navia

John Bode

Flash Gordon

robertwessel2

Bartc

user923005

user923005

Richard

Richard

jacob navia

jacob navia

Tony Giles

Flash Gordon

Flash Gordon

Bartc

lawrence.jones

user923005

jacob navia

robertwessel2

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads