How a linker works (continued)

Discussion in 'C Programming' started by jacob navia, Mar 26, 2008.

  1. jacob navia

    jacob navia Guest

    In the last installement we looked into the object files and what they
    contain.

    Some people insisted that I was generalizing too much and there could be
    C implementations without object files (like C interpreters) and C
    implementations that do not link files in separate compilation but just
    parse and digest each module, making the whole code generation step in
    the linker, from an unknown representation.

    Granted, werid implementation and special options may exists. Here I am
    speaking about the very common (or most common case) where the compiler
    produces traditional object files, stored in the disk somewhere.

    Those object files in an abstract way contain:
    (1) A symbol table that specifies whiwh symbols are exported and which
    symbols are imported
    (2) Several "Sections" containing the data of the program. (Code
    instructions, initialized tables, and just reserved space)
    (3) A series of relocation records that specify which parts of the data
    (code or tables) must be patched by the linker to insert the external
    symbols required by the module

    The linking process
    -------------------

    The linker opens all object files that it receives, and builds a symbol
    table. In this table we have several sets of symbols

    (a) The set of defined symbols, not in the common section. All this
    symbols have a fixed address already.

    (b) The set of symbols in the common section

    (c) The set of undefined symbols that have been seen as externals but
    where the definition is not yet processed.

    Symbols can be moved from the undefined set, into the common or into the
    defined symbols.

    This needs some explanation. Suppose you have in the file file1.c the
    following declaration:

    int iconst;

    The symbol ‘iconst’ will be assigned to the common section that is
    initialized to zero at program startup. But consider what happens if you
    include ‘file2.c’ in the link, that contains the declaration:

    int iconst = 53433;

    The linker will move the symbol ‘iconst’, from the common section to the
    data section. The definition in file1.c will be lost. If you relied in
    "iconst" being zero at startup now you are wrong.

    And there are worst things that can be done:
    file1.c:
    int buf[256];

    file2.c:

    int buff[512];

    The linker will leave ‘buf’ in the common section, but will set its size
    to the bigger value, i.e. 512. This is harmless, but beware that you
    make a definition in a file3.c

    int buff[4] = {0,0,0,0};

    Your table will have a size of just four positions instead of 512!!

    This can be tested, for instance, with the following two files:
    file t1.c
    int tab[12];

    File t2.c
    int tab[256];
    int main(void){return 0;}

    Linking t1.c and t2.c with MSVC 8 we obtain an executable *without any
    warnings* not even at the highest warning level.

    In the linker of lcc-win I added a warning:
    in t1.obj warning: '_tab' (defined with size 48)
    is redefined in t2.obj with size 1024

    The linker of gnu doesn't emit any warning:
    root@ubuntu:/tmp# gcc -Wall t1.c t2.c
    root@ubuntu:/tmp#

    The explanation that will be commonly given for this behavior is that
    any definition in the "common" section (non initialized data) is a
    "tentative definition" and only valid until another definition is seen
    by the linker.

    Dave Hanson, one of the authors of the original lcc compiler told me
    this, when we discussed about this problem:

    jacob:
    >> is char *p;
    >> a "tentative definition"?


    Dave Hanson:

    <<quote>>
    For the record, the declaration for p is indeed a tentative definition,
    but that status persists only until the end of the compilation unit,
    i.e., the end of f1.c. Since there's no subsequent external definition
    of p, the tentative declaration acts exactly as if there was a
    file-scope declaration for p with an initializer equal to 0. (See Sec.
    3.7.2 of the ANSI Standard, which I think is Sec. 6.7.2 of the ISO
    Standard). As a result, p is null at program startup--assuming there are
    no other similar declarations for p.

    This example illustrates nicely a problem with the common storage model:
    You can't determine whether or not a declaration is also a definition by
    examining just the program text, and it's easy to get strange behaviors.
    In this example, there was only one definition, which passes without
    complaint from linkers. In the stricter definition/reference model,
    linkers would complain about multiple definitions when combining the
    object code for f1.c and f2.c. This example also shows why it's best to
    initialize globals, because linkers will usually catch these kinds of
    multiple definitions.

    The common model also permits C's (admittedly weak) type system to be
    violated. I've seen programmers declare "int x[2]" in one file and
    "double x" in another one, for example, just so they can access x as a
    double and as a pair of ints.

    For a good summary of the four models of external
    definitions/declarations, see Sec. 4.8 in Harbison & Steele, C: A
    Reference Manual, 4th ed., Prentice-Hall, 1995.

    <<end quote>>
    ------------------------------------------------------------------------------

    Relocating all symbols
    ----------------------

    Let's come back to our linker however. I will outline with lcclnk and
    windows as exmaples, but in Unix and many other systems, the operations
    done by the linker are very similar.

    The next thing to do is to go through all symbols, and decide whether
    they will go into the final symbol table or not. Many of them are
    discarded, since they are local symbols of each compilation unit.

    Global symbols need to be relocated, i.e. the ‘value’ of the symbol has
    to be set to its final address. This is easy now that the position of
    the section that contains the symbol is exactly known: we just go
    through them setting the value field to the right number.


    The algorithm outline is simple:
    1. Read the relocation information from the object file.

    2. According to the type of relocation, adjust the value of the symbol.
    The relocations supported by lcclnk are just a few: the pc-relative
    relocation (code 7, and code 20), the normal 32-bit relocation (code 6),
    and two types of relocations for the debug information, code 10 and 11.

    3. Save the position within the executable file where the relocation is
    being done in the case of relocation type 6 (normal 32 bits relocation),
    to later build the .reloc section if this is needed.

    Normally this is needed only when generating a dll, since executables
    aren’t relocated under windows.

    The .reloc section of the executable is data for the program loader, to
    tell it where are the addresses that it should patch when loading the
    file into memory.

    Other linkers more complicated than lcc's support more fancy stuff. A
    symbol can be included only once even if it appears several times, and
    many other things

    Performing the relocations
    --------------------------
    More specifically, what the linker does, is fixing the data/code
    references that each module contains from all others, patching the code
    with the offsets that the external symbols have, now that the positions
    of all sections are known. For a C source line like:

    foo(5);

    the linker reads the corresponding relocation record emitted by the
    compiler, and looks up the symbol ‘foo’ in the symbol table. It patches
    the zeroes that are stored by the assembler at the position following
    the call opcodes with the relative offset from the point of the call to
    the address of foo. This will allow the processor to make a PC relative
    call instruction: the 4 bytes after the call instruction contain a
    32-bit offset to the address of foo.

    Using the utility pedump, you can see this process. Consider the
    following well-known program:

    #include <stdio.h>
    int main(int argc,char *argv[])
    {

    printf("Hello\n");
    }

    Compile this with:
    lcc -g2 hello.c
    Now, disassemble hello.obj with pedump like this:
    pedump /A hello.obj
    You will see near the end of the long listing that follows, the
    disassembled text section:

    section 00 (.text) size: 00020 file offs: 00220
    --------------------------------------------------------------
    _main: Size 18
    --------------------------------------------------------------
    [0000000] 55 pushl %ebp
    [0000001] 89e5 movl %esp,%ebp
    Line 5
    [0000003] 6800000000 pushl $0 (_$2) (relocation)
    [0000008] e800000000 call _printf (relocation)
    [0000013] 83c404 addl $4,%esp
    Line 6
    [0000016] 5d popl %ebp
    [0000017] c3 ret
    [0000018] 0000 addb %al,(%eax)

    Let’s follow the relocation to the function printf. You will see that
    pedump has a listing of the relocations that looks like this:
    Section 01 (.text) relocations

    Address Type Symbol Index Symbol Name
    ------- ---- ------------ ----- ----
    4 DIR32 4 _$2
    9 REL32 16 _printf

    The linker will then take the bytes starting at the address 4, and put
    the address of the symbol 4 in the symbol table of main.obj. It will
    search the address of printf, and put the relative address, i.e. the
    difference between the address of printf and the address of main+9 in
    those bytes starting at byte 9.

    As you can see there are several types of relocations, each specifying a
    different way of doing these additions. The compiler emits only three
    types of relocations:
    • Type 6 : Direct 32-bit reference to the symbols virtual address
    • Type 7: Direct 32-bit references to the symbols virtual address, base
    not included.
    • Type 20: PC-relative 32-bit reference to the symbols virtual address.

    This last one is the one used in the relocation to printf. We have to
    know too that the relative call is relative to the next instruction,
    i.e. to the byte 13 and not to the byte 9. Happily for us the linker now
    knows this stuff...

    --------------------------------------------------------------------
    Next installment will treat the object libraries

    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique
    http://www.cs.virginia.edu/~lcc-win32
    jacob navia, Mar 26, 2008
    #1
    1. Advertising

  2. jacob navia

    Eric Sosman Guest

    jacob navia wrote:
    > In the last installement we looked into the object files and what they
    > contain.
    > [...]
    > Next installment will treat the object libraries


    Be still, my heart.

    --
    Eric Sosman, Mar 26, 2008
    #2
    1. Advertising

  3. jacob navia

    Guest

    On Mar 26, 4:13 pm, jacob navia <> wrote:
    <big article about linkers, assembly & others>
    What's the actual intent behind these posts?
    Bring revolution to clc & usenet? Inform poor souls that figured out
    how to browse clc but not other groups? Bring more noise & trolls?
    Annoy the "no stack in C" people?
    You could get a blog if you like to write articles..
    , Mar 26, 2008
    #3
  4. jacob navia

    jacob navia Guest

    wrote:
    > On Mar 26, 4:13 pm, jacob navia <> wrote:
    > <big article about linkers, assembly & others>
    > What's the actual intent behind these posts?
    > Bring revolution to clc & usenet? Inform poor souls that figured out
    > how to browse clc but not other groups? Bring more noise & trolls?
    > Annoy the "no stack in C" people?
    > You could get a blog if you like to write articles..


    file a.c
    int a[12];

    file b.c
    int a[256];
    int main(void){return0;}

    This is a common error, that provokes no warnings. I wanted to
    discuss this state of affairs.

    Read the article before you say something about it.


    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique
    http://www.cs.virginia.edu/~lcc-win32
    jacob navia, Mar 26, 2008
    #4
  5. jacob navia

    Richard Bos Guest

    jacob navia <> wrote:

    > wrote:
    > > On Mar 26, 4:13 pm, jacob navia <> wrote:
    > > <big article about linkers, assembly & others>
    > > What's the actual intent behind these posts?


    > file a.c
    > int a[12];


    > This is a common error, that provokes no warnings. I wanted to
    > discuss this state of affairs.


    That may, perhaps, explain _that one post_. It doesn't explain the whole
    deluge.

    Richadr
    Richard Bos, Mar 26, 2008
    #5
  6. jacob navia

    Kaz Kylheku Guest

    On Mar 26, 7:13 am, jacob navia <> wrote:
    > In the last installement we looked into the object files and what they
    > contain.


    This kind of information is only useful when it is completely precise
    with respect to some particular object format, so that you can
    implement software that handles a particular object format.

    I can't write anything based on your descriptions.

    Readers can be better informed about how linking works from the ANSI C
    Rationale:

    http://www.lysator.liu.se/c/rat/title.html

    Section 3.1.2.2.

    > Those object files in an abstract way contain:
    > (1) A symbol table that specifies whiwh symbols are exported and which
    > symbols are imported


    So, like, you mean that a C translation unit can provide definitions
    of external names, and makes references to external names, and the
    translated unit still does this somehow?

    How could they leave something like that out of the language spec?

    > (2) Several "Sections" containing the data of the program. (Code
    > instructions, initialized tables, and just reserved space)


    You mean that stuff like functions, literal objects and initializers
    have a translated image?

    > (3) A series of relocation records that specify which parts of the data
    > (code or tables) must be patched by the linker to insert the external
    > symbols required by the module


    You mean, there is a way to find where the references are so they can
    be resolved, and it's probably not done by scanning the translated
    image of the code in search of some ambiguous bit pattern?

    Very good! Next you're going to tell us that an image file contains a
    header with the size of the picture and a bunch other parameters, and
    a section full of numbers that specify the color intensity values of
    the pixels.

    > The linking process
    > -------------------
    >
    > The linker opens all object files that it receives, and builds a symbol
    > table. In this table we have several sets of symbols
    >
    > (a) The set of defined symbols, not in the common section. All this
    > symbols have a fixed address already.
    >
    > (b) The set of symbols in the common section
    >
    > This needs some explanation. Suppose you have in the file file1.c the
    > following declaration:
    >
    > int iconst;


    This is actually a tentative definition. If no redefinition for iconst
    is seen by the end of the translation unit, it is treated as ``int
    iconst = 0''.

    > The symbol 'iconst' will be assigned to the common section that is
    > initialized to zero at program startup.


    You seem to be describing only the Fortran Common model here.

    > But consider what happens if you
    > include 'file2.c' in the link, that contains the declaration:
    >
    > int iconst = 53433;


    Undefined behavior. There must be extactly one definition for an
    external name. Or rather for one that is used; names that are not used
    need not have definitions.

    This is allowed by the ``relaxed ref/def'' model, not allowed by the
    ``strict ref/def'' model.

    > The linker will move the symbol 'iconst', from the common section to the
    > data section.


    There are linkers which will behave this way. But others will do the
    obvious thing and diagnose the multiple definition.

    ISO C leaves it as undefined behavior because there are such crappy
    linkers still in use.

    According to the rationale ``The model adopted in the Standard is a
    combination of features of the strict ref/def model and the
    initialization model.''

    But the undefined behaviors, like not requiring diagnosis of multiple
    definitions, mean that linkage can be implemented over the relaxed ref/
    def model or the common model.

    > And there are worst things that can be done:
    > file1.c:
    > int buf[256];
    >
    > file2.c:
    >
    > int buff[512];
    >
    > The linker will leave 'buf' in the common section, but will set its size
    > to the bigger value, i.e. 512.


    Not any linker that anyone in his right mind would be writing today.
    Not only is it simply a bad idea not to diagnose programming errors
    like this (even if it's not required to do so), but a linker also has
    to support other programming languages than C, such as C++. C++ has
    the one definition rule (ODR), so in that language, the above is a
    diagnosable semantic rule violation.
    Kaz Kylheku, Mar 26, 2008
    #6
  7. jacob navia

    jacob navia Guest

    Kaz Kylheku wrote:
    >> And there are worst things that can be done:
    >> file1.c:
    >> int buf[256];
    >>
    >> file2.c:
    >>
    >> int buff[512];
    >>
    >> The linker will leave 'buf' in the common section, but will set its size
    >> to the bigger value, i.e. 512.

    >
    > Not any linker that anyone in his right mind would be writing today.


    I showed that both MSVC and GCC/GNU linkers accept this without
    warnings. And this with MSVC 2008 and gcc 4.0.2, both relatively
    recent versions.

    > Not only is it simply a bad idea not to diagnose programming errors
    > like this (even if it's not required to do so), but a linker also has
    > to support other programming languages than C, such as C++. C++ has
    > the one definition rule (ODR), so in that language, the above is a
    > diagnosable semantic rule violation.


    I agree that this bad but the major versions of those compilers
    do not diagnose anything, as you can see for yourself.

    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique
    http://www.cs.virginia.edu/~lcc-win32
    jacob navia, Mar 26, 2008
    #7
  8. jacob navia <> writes:
    <snip>
    > This can be tested, for instance, with the following two files:
    > file t1.c
    > int tab[12];
    >
    > File t2.c
    > int tab[256];
    > int main(void){return 0;}
    >
    > Linking t1.c and t2.c with MSVC 8 we obtain an executable *without any
    > warnings* not even at the highest warning level.
    >
    > In the linker of lcc-win I added a warning:
    > in t1.obj warning: '_tab' (defined with size 48)
    > is redefined in t2.obj with size 1024
    >
    > The linker of gnu doesn't emit any warning:
    > root@ubuntu:/tmp# gcc -Wall t1.c t2.c


    *Please* post these articles in comp.programming. I'd join in a lot
    more if I could do so and be topical. However, you are dead set on
    knocking gcc without understanding it so...

    <off-topic>
    gcc uses the GNU linker ld. ld merges the common blocks to make tab
    the larger of the two size regardless of the linking order. In this
    case, I can't see why you'd want a diagnostic[1]. When a compilation
    unit initialises the table (so it can't be merged) the GNU linker
    *does* produce a warning:

    /usr/bin/ld: Warning: size of symbol `tab' changed from 1024 in t1.o
    to 16 in t2.o
    </off-topic>

    <snip>
    > Next installment will treat the object libraries


    Please post it where it belongs.

    [1] OK, a case can be made for a diagnostic in all such cases, but you
    are suggesting the gcc leads the programmer silently into a trap.

    --
    Ben.
    Ben Bacarisse, Mar 26, 2008
    #8
  9. jacob navia

    Flash Gordon Guest

    jacob navia wrote, On 26/03/08 16:21:
    > wrote:
    >> On Mar 26, 4:13 pm, jacob navia <> wrote:
    >> <big article about linkers, assembly & others>
    >> What's the actual intent behind these posts?
    >> Bring revolution to clc & usenet? Inform poor souls that figured out
    >> how to browse clc but not other groups? Bring more noise & trolls?
    >> Annoy the "no stack in C" people?
    >> You could get a blog if you like to write articles..

    >
    > file a.c
    > int a[12];
    >
    > file b.c
    > int a[256];
    > int main(void){return0;}
    >
    > This is a common error, that provokes no warnings.


    On some implementations. On others it produces an error. It would be
    more useful to simply tell people how to get there implementation to
    produce an error for it. The implementations that I know for a fact will
    produce an error are gcc/GNU ld under Linux when given specific options,
    I believe you can get the same behaviour on AIX and SCO.

    > I wanted to
    > discuss this state of affairs.


    It does not need masses of information about linkers to discus it and
    the problem is not specific to C.

    > Read the article before you say something about it.


    Try posting it to somewhere it is topical such as comp.programming. Are
    you fundamentally unable to understand the concept of topicality or you
    simply trolling?
    --
    Flash Gordon
    Flash Gordon, Mar 26, 2008
    #9
  10. jacob navia

    Kaz Kylheku Guest

    On Mar 26, 10:22 am, jacob navia <> wrote:
    > Kaz Kylheku wrote:
    > >> And there are worst things that can be done:
    > >> file1.c:
    > >> int buf[256];

    >
    > >> file2.c:

    >
    > >> int buff[512];

    >
    > >> The linker will leave 'buf' in the common section, but will set its size
    > >> to the bigger value, i.e. 512.

    >
    > > Not any linker that anyone in his right mind would be writing today.

    >
    > I showed that both MSVC and GCC/GNU linkers accept this without
    > warnings.


    Ah yes. The combination of GNU C and the GNU linker won't accept it
    if the arrays have initializers. This is true even if the initializers
    are { 0 }.

    Unlike what you said, it's not the linker that determines the
    assignment of the symbol to the section. This is done by the compiler,
    which emits pseudo-ops in the assembly output that control sectioning.

    Tentative definitions are placed, by the gcc, into a section
    called .comm, which is subject to special semantics purely for
    backward compatibility with ancient programs which rely on that model.

    However, normal definitions are placed into .bss.

    The GNU linker has an option --warn-common which will diagnose the
    merging of symbols in the common section.

    So if you compile with:

    gcc -Wl,--warn-common

    the link succeeds, but you get a diagnostic.

    Maybe the compiler itself can be coaxed into not doing the common
    allocation in the first place. Aha, yes, the -fno-common option!

    gcc -fno-common ...

    Now the tentative definitions behave just like normal definitions, and
    the link fails.

    This is what C says: by the end of a translation unit, a tentative
    definition becomes a fully fledged definition, is if with a zero
    initializer. It does not remain some kind of second-class citizen.

    It's somewhat braindamaged that -fno-common isn't the default. The
    backward compatibility behavior should be explicitly requested with -
    fcommon. Uninitialized definitions should go into .bss by default,
    not .comm.

    I'm going to patch this in the Linux distro that I maintain, to see
    what breaks. I'm guessing that quite a few things, because any time a
    programmer forgets to use ``extern'' in a header file declaration, and
    include the header in two or more translation unit, you're going to
    run into this.

    It would be reasonable for -ansi -pedantic to imply -fno-common.
    Kaz Kylheku, Mar 26, 2008
    #10
  11. jacob navia

    Guest

    On Mar 26, 9:13 am, jacob navia <> wrote:
    > And there are worst things that can be done:
    > file1.c:
    > int buf[256];
    >
    > file2.c:
    >
    > int buff[512];
    >
    > The linker will leave 'buf' in the common section, but will set its size
    > to the bigger value, i.e. 512. This is harmless, but beware that you
    > make a definition in a file3.c
    >
    > int buff[4] = {0,0,0,0};
    >
    > Your table will have a size of just four positions instead of 512!!
    >
    > This can be tested, for instance, with the following two files:
    > file t1.c
    > int tab[12];
    >
    > File t2.c
    > int tab[256];
    > int main(void){return 0;}
    >
    > Linking t1.c and t2.c with MSVC 8 we obtain an executable *without any
    > warnings* not even at the highest warning level.



    This is one of the many good reason to use lint. And an area where C+
    + is arguably a "better C" than C.
    , Mar 26, 2008
    #11
  12. jacob navia

    Dann Corbit Guest

    "Ben Bacarisse" <> wrote in message
    news:...
    > jacob navia <> writes:
    > <snip>
    >> This can be tested, for instance, with the following two files:
    >> file t1.c
    >> int tab[12];
    >>
    >> File t2.c
    >> int tab[256];
    >> int main(void){return 0;}
    >>
    >> Linking t1.c and t2.c with MSVC 8 we obtain an executable *without any
    >> warnings* not even at the highest warning level.
    >>
    >> In the linker of lcc-win I added a warning:
    >> in t1.obj warning: '_tab' (defined with size 48)
    >> is redefined in t2.obj with size 1024
    >>
    >> The linker of gnu doesn't emit any warning:
    >> root@ubuntu:/tmp# gcc -Wall t1.c t2.c

    >
    > *Please* post these articles in comp.programming. I'd join in a lot
    > more if I could do so and be topical. However, you are dead set on
    > knocking gcc without understanding it so...
    >
    > <off-topic>
    > gcc uses the GNU linker ld. ld merges the common blocks to make tab
    > the larger of the two size regardless of the linking order. In this
    > case, I can't see why you'd want a diagnostic[1]. When a compilation
    > unit initialises the table (so it can't be merged) the GNU linker
    > *does* produce a warning:
    >
    > /usr/bin/ld: Warning: size of symbol `tab' changed from 1024 in t1.o
    > to 16 in t2.o
    > </off-topic>
    >
    > <snip>
    >> Next installment will treat the object libraries

    >
    > Please post it where it belongs.
    >
    > [1] OK, a case can be made for a diagnostic in all such cases, but you
    > are suggesting the gcc leads the programmer silently into a trap.


    C:\tmp>splint t1.c t2.c
    Splint 3.1.1 --- 12 Mar 2007

    t2.c(2,5): Variable tab redefined
    A function or variable is redefined. One of the declarations should use
    extern. (Use -redef to inhibit warning)
    t1.c(2,5): Previous definition of tab

    Finished checking --- 1 code warning

    C:\tmp>lin t1.c t2.c

    C:\tmp>"C:\Lint\Lint-nt" +v -i"C:\Lint" std.lnt -os(_LINT.TMP) t1.c t2.c
    PC-lint for C/C++ (NT) Vers. 8.00u, Copyright Gimpel Software 1985-2006

    --- Module: t1.c (C)

    --- Module: t2.c (C)

    C:\tmp>type _LINT.TMP | more

    --- Module: t1.c (C)

    --- Module: t2.c (C)
    _
    int tab[256];
    t2.c(2) : Error 18: Symbol 'tab' redeclared (size) conflicts with line 2,
    file
    t1.c
    t1.c(2) : Info 830: Location cited in prior message
    _
    int tab[256];
    t2.c(2) : Error 14: Symbol 'tab' previously defined (line 2, file t1.c)
    t1.c(2) : Info 830: Location cited in prior message

    --- Global Wrap-up

    Info 765: external 'tab' (line 2, file t1.c) could be made static
    t1.c(2) : Info 830: Location cited in prior message
    Warning 552: Symbol 'tab' (line 2, file t1.c) not accessed
    t1.c(2) : Info 830: Location cited in prior message

    ---
    output placed in _LINT.TMP

    C:\tmp>type t1.c
    /* file t1.c */
    int tab[12];


    C:\tmp>type t2.c
    /* File t2.c */
    int tab[256];
    int main(void){tab[25] = 0; return 0;}

    P.S.
    There are plenty of instances of undefined behavior not caught by compilers.
    P.P.S.
    I do agree that it would be nice if compilers were omniscient (or closer
    than they are today).



    --
    Posted via a free Usenet account from http://www.teranews.com
    Dann Corbit, Mar 26, 2008
    #12
  13. On 26 Mar 2008 at 15:40, wrote:
    > On Mar 26, 4:13 pm, jacob navia <> wrote:
    ><big article about linkers, assembly & others>
    > What's the actual intent behind these posts?
    > Bring revolution to clc & usenet? Inform poor souls that figured out
    > how to browse clc but not other groups? Bring more noise & trolls?
    > Annoy the "no stack in C" people?


    This post really tells you all you need to know about clc. You're
    expected to justify yourself for posting an informative article about an
    essential part of C programming.

    Will the madness end one day?
    Antoninus Twink, Mar 26, 2008
    #13
  14. jacob navia

    jacob navia Guest

    Kaz Kylheku wrote:
    > On Mar 26, 10:22 am, jacob navia <> wrote:
    >> Kaz Kylheku wrote:
    >>>> And there are worst things that can be done:
    >>>> file1.c:
    >>>> int buf[256];
    >>>> file2.c:
    >>>> int buff[512];
    >>>> The linker will leave 'buf' in the common section, but will set its size
    >>>> to the bigger value, i.e. 512.
    >>> Not any linker that anyone in his right mind would be writing today.

    >> I showed that both MSVC and GCC/GNU linkers accept this without
    >> warnings.

    >
    > Ah yes. The combination of GNU C and the GNU linker won't accept it
    > if the arrays have initializers. This is true even if the initializers
    > are { 0 }.
    >
    > Unlike what you said, it's not the linker that determines the
    > assignment of the symbol to the section.


    I did not said that. The linker just uses the definitions
    in the object file of course, and that is generated by the
    compiler. Just a misunderstanding.

    > This is done by the compiler,
    > which emits pseudo-ops in the assembly output that control sectioning.
    >


    Yes, and those go into the object file.

    > Tentative definitions are placed, by the gcc, into a section
    > called .comm, which is subject to special semantics purely for
    > backward compatibility with ancient programs which rely on that model.
    >


    Like lcclnk, and MSVC.


    > However, normal definitions are placed into .bss.
    >
    > The GNU linker has an option --warn-common which will diagnose the
    > merging of symbols in the common section.
    >
    > So if you compile with:
    >
    > gcc -Wl,--warn-common
    >
    > the link succeeds, but you get a diagnostic.
    >


    Interesting. I did not know that option, maybe it should
    be make the default?


    > Maybe the compiler itself can be coaxed into not doing the common
    > allocation in the first place. Aha, yes, the -fno-common option!
    >
    > gcc -fno-common ...
    >


    OK.

    > Now the tentative definitions behave just like normal definitions, and
    > the link fails.
    >
    > This is what C says: by the end of a translation unit, a tentative
    > definition becomes a fully fledged definition, is if with a zero
    > initializer. It does not remain some kind of second-class citizen.
    >


    But most linkers do not use that in their default state, as you have
    seen. I think this is a flaw in the language. It should specifically
    forbid that.

    > It's somewhat braindamaged that -fno-common isn't the default.


    Agreed!

    > The
    > backward compatibility behavior should be explicitly requested with -
    > fcommon. Uninitialized definitions should go into .bss by default,
    > not .comm.
    >
    > I'm going to patch this in the Linux distro that I maintain, to see
    > what breaks. I'm guessing that quite a few things, because any time a
    > programmer forgets to use ``extern'' in a header file declaration, and
    > include the header in two or more translation unit, you're going to
    > run into this.
    >
    > It would be reasonable for -ansi -pedantic to imply -fno-common.


    I think that the language should specify this. It is a flaw of the
    language.


    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique
    http://www.cs.virginia.edu/~lcc-win32
    jacob navia, Mar 26, 2008
    #14
  15. jacob navia

    jacob navia Guest

    Dann Corbit wrote:

    [snip example of lint]

    >
    > P.S.
    > There are plenty of instances of undefined behavior not caught by compilers.
    > P.P.S.
    > I do agree that it would be nice if compilers were omniscient (or closer
    > than they are today).


    You are right about lint. It is a useful tool. But the problem is in the
    language, and specifically in the language standard. It doesn't specify
    this, and allows this behavior.

    This should be corrected at the language level, in my opinion.

    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique
    http://www.cs.virginia.edu/~lcc-win32
    jacob navia, Mar 26, 2008
    #15
  16. jacob navia

    jacob navia Guest

    Flash Gordon wrote:
    > jacob navia wrote, On 26/03/08 16:21:
    >> This is a common error, that provokes no warnings.

    >
    > On some implementations. On others it produces an error. It would be
    > more useful to simply tell people how to get there implementation to
    > produce an error for it. The implementations that I know for a fact will
    > produce an error are gcc/GNU ld under Linux when given specific options,
    > I believe you can get the same behaviour on AIX and SCO.
    >


    I think that this is a flaw of the language specifications. This should
    be forbidden. But somehow the standards left this out for political
    (or whatever) reasons. It is a mistake.

    Can you name an implementation that produces an error (without
    any extra obscure options) ?

    >> I wanted to
    >> discuss this state of affairs.

    >
    > It does not need masses of information about linkers to discus it and
    > the problem is not specific to C.
    >


    Well, I wanted to explain linking and the associated problems.
    This is a group about C and I do not see C without the link
    step, sorry (C interpreters are not the main usage of C)

    >> Read the article before you say something about it.

    >
    > Try posting it to somewhere it is topical such as comp.programming. Are
    > you fundamentally unable to understand the concept of topicality or you
    > simply trolling?


    To say that this essential step of all C programs is "Off topic"
    is an abomination really. Even the C standard mentions the linker
    so please...


    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique
    http://www.cs.virginia.edu/~lcc-win32
    jacob navia, Mar 26, 2008
    #16
  17. jacob navia

    jacob navia Guest

    wrote:
    >
    > This is one of the many good reason to use lint. And an area where C+
    > + is arguably a "better C" than C.



    Why not c hanged it and specify the linker model correctly?

    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique
    http://www.cs.virginia.edu/~lcc-win32
    jacob navia, Mar 26, 2008
    #17
  18. jacob navia

    Guest

    jacob navia <> wrote:
    >
    > You are right about lint. It is a useful tool. But the problem is in the
    > language, and specifically in the language standard. It doesn't specify
    > this, and allows this behavior.


    No, it doesn't, it tars and feathers it as "undefined behavior", which
    is hardly allowing it. It just doesn't require the compiler to diagnose
    it (since it can't with separate compilation) and it can't require the
    linker to diagnose it since that's out of scope. (On most systems, the
    linker is a separate product, not tied to any particular language. And
    the choice of which linkage model to use is frequently affected by
    inter-language compatibility concerns.)

    -Larry Jones

    Even if lives DID hang in the balance, it would depend on whose they were.
    -- Calvin
    , Mar 26, 2008
    #18
  19. jacob navia

    Flash Gordon Guest

    jacob navia wrote, On 26/03/08 20:14:
    > Flash Gordon wrote:
    >> jacob navia wrote, On 26/03/08 16:21:
    >>> This is a common error, that provokes no warnings.

    >>
    >> On some implementations. On others it produces an error. It would be
    >> more useful to simply tell people how to get there implementation to
    >> produce an error for it. The implementations that I know for a fact
    >> will produce an error are gcc/GNU ld under Linux when given specific
    >> options, I believe you can get the same behaviour on AIX and SCO.
    >>

    >
    > I think that this is a flaw of the language specifications. This should
    > be forbidden. But somehow the standards left this out for political
    > (or whatever) reasons. It is a mistake.


    Well, changes to the language specification belong in comp.std.c, but in
    any case it does not need going in to details about the linker.

    > Can you name an implementation that produces an error (without
    > any extra obscure options) ?


    Define an obscure option. However, I believe the TI TMS320C2xx
    compiler/assembler/linker would qualify since it does not have (as far
    as I remember or can see in the documentation) a common section.

    >>> I wanted to
    >>> discuss this state of affairs.

    >>
    >> It does not need masses of information about linkers to discus it and
    >> the problem is not specific to C.

    >
    > Well, I wanted to explain linking and the associated problems.
    > This is a group about C and I do not see C without the link
    > step, sorry (C interpreters are not the main usage of C)


    The details you provide are not needed to discus the issue and are not
    universally correct even if you ignore interpreters.

    >>> Read the article before you say something about it.

    >>
    >> Try posting it to somewhere it is topical such as comp.programming.
    >> Are you fundamentally unable to understand the concept of topicality
    >> or you simply trolling?

    >
    > To say that this essential step of all C programs is "Off topic"
    > is an abomination really. Even the C standard mentions the linker
    > so please...


    The level of implementation specific detail you are going in to is
    totally inappropriate for here. The C standard does not require a lot of
    what you describe, and indeed by changing one option on Linux, AIX or
    SCO I radically changed the behaviour so that one problem you talk about
    goes away. On other implementations it is even further away from your
    description.
    --
    Flash Gordon
    Flash Gordon, Mar 26, 2008
    #19
  20. jacob navia

    jacob navia Guest

    Flash Gordon wrote:
    >
    > The level of implementation specific detail you are going in to is
    > totally inappropriate for here. The C standard does not require a lot of
    > what you describe, and indeed by changing one option on Linux, AIX or
    > SCO I radically changed the behaviour so that one problem you talk about
    > goes away. On other implementations it is even further away from your
    > description.


    I have a different view of my trade. I think a programmer

    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique
    http://www.cs.virginia.edu/~lcc-win32
    jacob navia, Mar 26, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lloyd Dupont
    Replies:
    1
    Views:
    354
    Lloyd Dupont
    Sep 30, 2004
  2. Lloyd Dupont
    Replies:
    1
    Views:
    424
    Guest
    Dec 22, 2004
  3. =?Utf-8?B?S2lzaG9yZSBHb3BhbGFu?=

    (continued...System.Security.SecurityException

    =?Utf-8?B?S2lzaG9yZSBHb3BhbGFu?=, Oct 7, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    368
  4. joel s
    Replies:
    0
    Views:
    292
    joel s
    Nov 3, 2003
  5. Jeff
    Replies:
    3
    Views:
    791
    Jon A. Cruz
    Jan 17, 2004
Loading...

Share This Page