How ELF libraries work

Discussion in 'C Programming' started by sps, Dec 28, 2009.

  1. sps

    sps Guest

    Hi All

    In an attempt to learn more about ELF files (in particular how the GOT
    and PLT work), I compiled a very simple C module as below

    int num1 = 32;
    extern int num2;

    void print1()
    {
    printf("%d",num1);
    printf("%d",num2);
    }

    void print2()
    {
    print1();

    }

    I then compiled it into a shared object (.so) file
    gcc -c -fPIC print.c
    gcc -shared -o libprint.so print.o

    Using readelf and objdump, I examined the contents of the file.

    readelf -a libprint.so
    objdump -d libprint.so

    I have mostly figured it out, but there are a couple of points that I
    need clarified

    a) Why are data objects and functions defined within the code module
    indirectly accessed through the GOT. For example, the call to print1
    is routed through the PLT. Why do this when you already know where
    print1 is relative to the calling point? Is it just convenient to lump
    everything in the GOT, or can these definitions be overriden?

    b) There are two "mystery variables" which occur as the first and
    second DWORD in the .data section. They have a RELATIVE relocation
    applied to them. Because RELATIVE relocations do not relocate a
    symbol, I don't know what these variables do? And in general, when do
    use RELATIVE relocations. I understand that you are just adding the
    load address to the location, but what code semantics create this
    relocation?

    c) Why are there two entries in the GOT for __cxa_finalize. They seem
    to be identical, except that one is declared GLOB_DAT and the other
    JMP_SLOT. The same occurs for __Jv_RegisterClasses.

    d) What is the initial stack size of a process in Linux?

    e) When the dynamic linker is called when a function is to be resolved
    (lazy linkage), before it jumps to the DL, it pushes two values on the
    stack: the first identifies the symbol to be resolved, and the second
    identifies the calling module. How does this first value pushed map to
    the symbol? It doesn't seem to be the symbol index in dynsym, and I
    see no other relation to anything else?

    f) Will a loader/dynamic linker only ever see GLOB_DAT, JMP_SLOT, COPY
    and RELATIVE relocations? I assume the other relocation types only
    apply to .o files. IS this correct? And if not, with what program
    semantics would these other relocations appear?

    Thanks for your answers.
    sps, Dec 28, 2009
    #1
    1. Advertising

  2. "sps" <> wrote in message
    news:hhb87l$18h$...
    > Hi All
    >
    > In an attempt to learn more about ELF files (in particular how the GOT
    > and PLT work), I compiled a very simple C module as below
    >


    probably OT for CLC, but oh well...


    > int num1 = 32;
    > extern int num2;
    >
    > void print1()
    > {
    > printf("%d",num1);
    > printf("%d",num2);
    > }
    >
    > void print2()
    > {
    > print1();
    >
    > }
    >
    > I then compiled it into a shared object (.so) file
    > gcc -c -fPIC print.c
    > gcc -shared -o libprint.so print.o
    >
    > Using readelf and objdump, I examined the contents of the file.
    >
    > readelf -a libprint.so
    > objdump -d libprint.so
    >
    > I have mostly figured it out, but there are a couple of points that I
    > need clarified
    >
    > a) Why are data objects and functions defined within the code module
    > indirectly accessed through the GOT. For example, the call to print1
    > is routed through the PLT. Why do this when you already know where
    > print1 is relative to the calling point? Is it just convenient to lump
    > everything in the GOT, or can these definitions be overriden?
    >


    it is so that code can be position independent without needing internal
    fixups.
    it is also because, otherwise, the compiler would need to know which code is
    internal or external (at compile time), or risk another overhead (having to
    emit an additional indirect jump at link-time).

    on Windows, the strategy is to instead assume local (first) and fall-back to
    an indirect jump, I guess assuming that local jumps are a lot more likely
    than imports.

    it is also worth noting that DLL's are not, as a general rule, position
    independent, meaning that they have to be relocated if loaded to a
    non-preferred address (commonly referred to as "rebasing").

    for variables, issues get a little more ugly, which is what one can't
    (generally) share global variables between DLL's.


    > b) There are two "mystery variables" which occur as the first and
    > second DWORD in the .data section. They have a RELATIVE relocation
    > applied to them. Because RELATIVE relocations do not relocate a
    > symbol, I don't know what these variables do? And in general, when do
    > use RELATIVE relocations. I understand that you are just adding the
    > load address to the location, but what code semantics create this
    > relocation?
    >


    I would have to go check, but I think those are self-reference pointers.
    I am not certain not having digged into ELF shared-object mechanics that
    much (I know more at this point about PE/COFF DLL's...).


    > c) Why are there two entries in the GOT for __cxa_finalize. They seem
    > to be identical, except that one is declared GLOB_DAT and the other
    > JMP_SLOT. The same occurs for __Jv_RegisterClasses.
    >


    these would appear to be related to g++ and GCJ.

    AFAIK, '__cxa_finalize' is called during app teardown to do, whatever...
    I think there should also be (possibly) a "__cxa_initialize" which would be
    used for top-level initialization, again related to C++.

    '__Jv_' is a prefix generally used for much of anything GCJ related.


    > d) What is the initial stack size of a process in Linux?
    >


    I think like 8MB or something...
    note that this is not generally mapped in all at once, but the backing
    memory gets paged into existence on write.

    Windows uses 4MB, and slightly different behavior for paging in the stack
    (one needs to be careful if grabbing too much stack memory at once).


    > e) When the dynamic linker is called when a function is to be resolved
    > (lazy linkage), before it jumps to the DL, it pushes two values on the
    > stack: the first identifies the symbol to be resolved, and the second
    > identifies the calling module. How does this first value pushed map to
    > the symbol? It doesn't seem to be the symbol index in dynsym, and I
    > see no other relation to anything else?
    >


    maybe a pointer?...
    I really don't know on this one.


    > f) Will a loader/dynamic linker only ever see GLOB_DAT, JMP_SLOT, COPY
    > and RELATIVE relocations? I assume the other relocation types only
    > apply to .o files. IS this correct? And if not, with what program
    > semantics would these other relocations appear?
    >


    you may want to check the ELF spec for this one...


    in my case (for a custom DLL loader), I just implemented all of the
    relocation types.


    > Thanks for your answers.
    BGB / cr88192, Dec 28, 2009
    #2
    1. Advertising

  3. Beej Jorgensen, Dec 29, 2009
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Spike

    Watcom and ELF...

    Spike, Jul 23, 2004, in forum: C++
    Replies:
    1
    Views:
    409
    Christopher Benson-Manica
    Jul 23, 2004
  2. a_bogdan_marinescu

    PE and ELF libraries for Python

    a_bogdan_marinescu, Nov 21, 2003, in forum: Python
    Replies:
    0
    Views:
    314
    a_bogdan_marinescu
    Nov 21, 2003
  3. Bogdan Marinescu

    Looking for ELF/PE library for Python

    Bogdan Marinescu, Dec 11, 2003, in forum: Python
    Replies:
    1
    Views:
    448
    Miki Tebeka
    Dec 11, 2003
  4. John Benson

    ELF object file analysis tool in Python?

    John Benson, Mar 3, 2004, in forum: Python
    Replies:
    2
    Views:
    530
    Miki Tebeka
    Mar 3, 2004
  5. John Benson
    Replies:
    1
    Views:
    1,547
    Bob Ippolito
    Mar 7, 2004
Loading...

Share This Page