How ELF libraries work

S

sps

Hi All

In an attempt to learn more about ELF files (in particular how the GOT
and PLT work), I compiled a very simple C module as below

int num1 = 32;
extern int num2;

void print1()
{
printf("%d",num1);
printf("%d",num2);
}

void print2()
{
print1();

}

I then compiled it into a shared object (.so) file
gcc -c -fPIC print.c
gcc -shared -o libprint.so print.o

Using readelf and objdump, I examined the contents of the file.

readelf -a libprint.so
objdump -d libprint.so

I have mostly figured it out, but there are a couple of points that I
need clarified

a) Why are data objects and functions defined within the code module
indirectly accessed through the GOT. For example, the call to print1
is routed through the PLT. Why do this when you already know where
print1 is relative to the calling point? Is it just convenient to lump
everything in the GOT, or can these definitions be overriden?

b) There are two "mystery variables" which occur as the first and
second DWORD in the .data section. They have a RELATIVE relocation
applied to them. Because RELATIVE relocations do not relocate a
symbol, I don't know what these variables do? And in general, when do
use RELATIVE relocations. I understand that you are just adding the
load address to the location, but what code semantics create this
relocation?

c) Why are there two entries in the GOT for __cxa_finalize. They seem
to be identical, except that one is declared GLOB_DAT and the other
JMP_SLOT. The same occurs for __Jv_RegisterClasses.

d) What is the initial stack size of a process in Linux?

e) When the dynamic linker is called when a function is to be resolved
(lazy linkage), before it jumps to the DL, it pushes two values on the
stack: the first identifies the symbol to be resolved, and the second
identifies the calling module. How does this first value pushed map to
the symbol? It doesn't seem to be the symbol index in dynsym, and I
see no other relation to anything else?

f) Will a loader/dynamic linker only ever see GLOB_DAT, JMP_SLOT, COPY
and RELATIVE relocations? I assume the other relocation types only
apply to .o files. IS this correct? And if not, with what program
semantics would these other relocations appear?

Thanks for your answers.
 
B

BGB / cr88192

sps said:
Hi All

In an attempt to learn more about ELF files (in particular how the GOT
and PLT work), I compiled a very simple C module as below

probably OT for CLC, but oh well...

int num1 = 32;
extern int num2;

void print1()
{
printf("%d",num1);
printf("%d",num2);
}

void print2()
{
print1();

}

I then compiled it into a shared object (.so) file
gcc -c -fPIC print.c
gcc -shared -o libprint.so print.o

Using readelf and objdump, I examined the contents of the file.

readelf -a libprint.so
objdump -d libprint.so

I have mostly figured it out, but there are a couple of points that I
need clarified

a) Why are data objects and functions defined within the code module
indirectly accessed through the GOT. For example, the call to print1
is routed through the PLT. Why do this when you already know where
print1 is relative to the calling point? Is it just convenient to lump
everything in the GOT, or can these definitions be overriden?

it is so that code can be position independent without needing internal
fixups.
it is also because, otherwise, the compiler would need to know which code is
internal or external (at compile time), or risk another overhead (having to
emit an additional indirect jump at link-time).

on Windows, the strategy is to instead assume local (first) and fall-back to
an indirect jump, I guess assuming that local jumps are a lot more likely
than imports.

it is also worth noting that DLL's are not, as a general rule, position
independent, meaning that they have to be relocated if loaded to a
non-preferred address (commonly referred to as "rebasing").

for variables, issues get a little more ugly, which is what one can't
(generally) share global variables between DLL's.

b) There are two "mystery variables" which occur as the first and
second DWORD in the .data section. They have a RELATIVE relocation
applied to them. Because RELATIVE relocations do not relocate a
symbol, I don't know what these variables do? And in general, when do
use RELATIVE relocations. I understand that you are just adding the
load address to the location, but what code semantics create this
relocation?

I would have to go check, but I think those are self-reference pointers.
I am not certain not having digged into ELF shared-object mechanics that
much (I know more at this point about PE/COFF DLL's...).

c) Why are there two entries in the GOT for __cxa_finalize. They seem
to be identical, except that one is declared GLOB_DAT and the other
JMP_SLOT. The same occurs for __Jv_RegisterClasses.

these would appear to be related to g++ and GCJ.

AFAIK, '__cxa_finalize' is called during app teardown to do, whatever...
I think there should also be (possibly) a "__cxa_initialize" which would be
used for top-level initialization, again related to C++.

'__Jv_' is a prefix generally used for much of anything GCJ related.

d) What is the initial stack size of a process in Linux?

I think like 8MB or something...
note that this is not generally mapped in all at once, but the backing
memory gets paged into existence on write.

Windows uses 4MB, and slightly different behavior for paging in the stack
(one needs to be careful if grabbing too much stack memory at once).

e) When the dynamic linker is called when a function is to be resolved
(lazy linkage), before it jumps to the DL, it pushes two values on the
stack: the first identifies the symbol to be resolved, and the second
identifies the calling module. How does this first value pushed map to
the symbol? It doesn't seem to be the symbol index in dynsym, and I
see no other relation to anything else?

maybe a pointer?...
I really don't know on this one.

f) Will a loader/dynamic linker only ever see GLOB_DAT, JMP_SLOT, COPY
and RELATIVE relocations? I assume the other relocation types only
apply to .o files. IS this correct? And if not, with what program
semantics would these other relocations appear?

you may want to check the ELF spec for this one...


in my case (for a custom DLL loader), I just implemented all of the
relocation types.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top