Re: Removing dead code and unused functions

Discussion in 'C++' started by Greg, Jun 21, 2005.

  1. Greg

    Greg Guest

    Richard Tobin wrote:
    > In article <BEDB6241.5C02%>,
    > Jean-Claude Arbaut <> wrote:
    >
    > >> Got a link ? The GNU linker at least only puts symbols that are included
    > >> into the link map. No mention of it cataloging symbols it excludes.

    >
    > >I'm not sure but "nm" could be useful here.

    >
    > Linkers typically do not exclude functions in the user program that are
    > unused. They only do that with libraries.
    >
    > More useful would be one of the many tools that generate call graphs.
    >
    > -- Richard


    The Metrowerks linker, as an example, strips all unreferenced,
    unexported functions from a build by default, and does so no matter
    where such functions are found. What would be the point of a linker
    leaving unreachable code and inaccessible data in a binary? And why
    would programmers want to perform this tedious chore by hand themselves
    rather than let the linker do it in a few seconds?

    The algorithm to strip unused code is well-understood. All the linker
    has to do is calculate the "transitive closure" for the set of
    functions in the object code to be linked, that includes main(). In
    fact calculating the transitive closure is no doubt how Apple was able
    to add the "-dead-strip" switch to GNU's ld linker on OS X; and the
    reason they did so is clear: many developers are understandably
    reluctant to use a linker that bloats their final builds.

    Greg
    Greg, Jun 21, 2005
    #1
    1. Advertising

  2. Greg wrote:

    > The Metrowerks linker, as an example, strips all unreferenced,
    > unexported functions from a build by default, and does so no matter
    > where such functions are found. What would be the point of a linker
    > leaving unreachable code and inaccessible data in a binary?


    They just do it because the linker authors don't put sufficient priority
    on dealing with the matter properly. The GNU linker for example only
    garbage collects sections rather than individual functions, so if you
    use one function in a large object file, the whole object will get linked.

    > And why
    > would programmers want to perform this tedious chore by hand themselves
    > rather than let the linker do it in a few seconds?


    If the codebase is full of cruft it is harder to maintain.
    Geronimo W. Christ Esq, Jun 22, 2005
    #2
    1. Advertising

  3. >> The Metrowerks linker, as an example, strips all unreferenced,
    >> unexported functions from a build by default, and does so no matter
    >> where such functions are found. What would be the point of a linker
    >> leaving unreachable code and inaccessible data in a binary?

    >
    >They just do it because the linker authors don't put sufficient priority
    >on dealing with the matter properly. The GNU linker for example only
    >garbage collects sections rather than individual functions, so if you
    >use one function in a large object file, the whole object will get linked.


    There may be insufficient information to TELL whether a particular
    piece of a compilation is used or not. For example, no law says that
    a particular machine instruction generated by the compiler can be
    identified as being part of exactly one function. Functions might
    share code. And the linker might not be able to TELL that functions
    are sharing code.

    int check1arg(char **argv)
    {
    int i;

    i = validate(argv[1]);
    /* common */
    if (i == OK)
    return 1;
    else if (i == MAYBE)
    return 0;
    else
    return -1;
    }
    int check2arg(char **argv)
    {
    int i;

    i = validate(argv[2]);
    /* common */
    if (i == OK)
    return 1;
    else if (i == MAYBE)
    return 0;
    else
    return -1;
    }

    For example, the same copy of the code below /* common */ may be
    shared between check1arg() and check2arg(). And possibly, check2arg()
    is unused. Can the code below /* common */ be omitted? No.
    But how does the linker know this? Decompiling compiler output?
    Possibly, but that seems to be a lot of extra effort.

    Gordon L. Burditt
    Gordon Burditt, Jun 22, 2005
    #3
  4. Gordon Burditt wrote:
    >>>The Metrowerks linker, as an example, strips all unreferenced,
    >>>unexported functions from a build by default, and does so no matter
    >>>where such functions are found. What would be the point of a linker
    >>>leaving unreachable code and inaccessible data in a binary?

    >>
    >>They just do it because the linker authors don't put sufficient priority
    >>on dealing with the matter properly. The GNU linker for example only
    >>garbage collects sections rather than individual functions, so if you
    >>use one function in a large object file, the whole object will get linked.

    >
    >
    > There may be insufficient information to TELL whether a particular
    > piece of a compilation is used or not. For example, no law says that
    > a particular machine instruction generated by the compiler can be
    > identified as being part of exactly one function. Functions might
    > share code. And the linker might not be able to TELL that functions
    > are sharing code.


    <snip>

    The example you gave is nothing to do with linking, because a linker
    never examines *within* functions to determine whether they are
    redundant or not. On the other hand, the GCC compiler does (when the
    optimizer is turned on) look for similar pieces of generated machine
    code and "compress" them by replacing them with one copy and some pointers.

    It would be very useful if the GNU linker would remove unused functions,
    but at the moment it doesn't.
    Geronimo W. Christ Esq, Jun 23, 2005
    #4
  5. >>>>The Metrowerks linker, as an example, strips all unreferenced,
    >>>>unexported functions from a build by default, and does so no matter
    >>>>where such functions are found. What would be the point of a linker
    >>>>leaving unreachable code and inaccessible data in a binary?
    >>>
    >>>They just do it because the linker authors don't put sufficient priority
    >>>on dealing with the matter properly. The GNU linker for example only
    >>>garbage collects sections rather than individual functions, so if you
    >>>use one function in a large object file, the whole object will get linked.

    >>
    >>
    >> There may be insufficient information to TELL whether a particular
    >> piece of a compilation is used or not. For example, no law says that
    >> a particular machine instruction generated by the compiler can be
    >> identified as being part of exactly one function. Functions might
    >> share code. And the linker might not be able to TELL that functions
    >> are sharing code.

    >
    ><snip>
    >
    >The example you gave is nothing to do with linking, because a linker
    >never examines *within* functions to determine whether they are
    >redundant or not.


    I didn't say it did. I said that if you have two functions compiled
    in a (object) file, and one of them isn't needed, there's no guarantee that
    the linker can determine what is part of the needed function (and possibly
    the other one also) to keep, and what is NOT part of the needed function
    (to delete).

    You don't get to conclude that function A starts here, and function
    B starts here, so everything between those two addresses is function
    A, and none of what's between those two is also part of function
    B or C, even if I'm only talking about the so-called code segment of
    both functions.

    >On the other hand, the GCC compiler does (when the
    >optimizer is turned on) look for similar pieces of generated machine
    >code and "compress" them by replacing them with one copy and some pointers.


    So in that situation, you can have functions that share code, and
    object code where there is no contiguous block of code where the
    linker can determine "this is function A, and all of function A, and
    none of any other function".

    >It would be very useful if the GNU linker would remove unused functions,
    >but at the moment it doesn't.


    The point here is that it may not have the information required to
    remove unused functions even if they can be determined to be unused.
    The object format may not even PERMIT passing the information required
    to determine what code is part of what function(s).

    Gordon L. Burditt
    Gordon Burditt, Jun 23, 2005
    #5
  6. Gordon Burditt wrote:

    > You don't get to conclude that function A starts here, and function
    > B starts here, so everything between those two addresses is function
    > A,


    I've difficulty picturing how any of the code inside a function can ever
    be used in any way if the function is never invoked. I don't see how a
    linker would be making an unsafe decision by removing a function that is
    never invoked.

    I imagine that when a compiler spots repetitive sections of code it
    takes them out of the function's object code into a common area of the
    object, and has the function point to them. That way redundant functions
    could be safely removed.
    Geronimo W. Christ Esq, Jun 23, 2005
    #6
  7. Greg

    Paul Groke Guest

    []
    >>On the other hand, the GCC compiler does (when the
    >>optimizer is turned on) look for similar pieces of generated machine
    >>code and "compress" them by replacing them with one copy and some pointers.

    >
    >
    > So in that situation, you can have functions that share code, and
    > object code where there is no contiguous block of code where the
    > linker can determine "this is function A, and all of function A, and
    > none of any other function".
    >
    >
    >>It would be very useful if the GNU linker would remove unused functions,
    >>but at the moment it doesn't.

    >
    >
    > The point here is that it may not have the information required to
    > remove unused functions even if they can be determined to be unused.
    > The object format may not even PERMIT passing the information required
    > to determine what code is part of what function(s).
    >
    > Gordon L. Burditt


    In that case the object format should be changed :)
    Paul Groke, Jun 23, 2005
    #7
  8. >> You don't get to conclude that function A starts here, and function
    >> B starts here, so everything between those two addresses is function
    >> A,

    >
    >I've difficulty picturing how any of the code inside a function can ever
    >be used in any way if the function is never invoked. I don't see how a
    >linker would be making an unsafe decision by removing a function that is
    >never invoked.


    Given an object file containing two functions, one used and one
    not, resulting from a single compilation, what makes you think that
    the linker can remove anything and be sure that it has not removed
    a piece of the function that *IS* used? Object file formats that
    I have seen do not have labels that say this byte is part of function
    a, this byte is part of function b and q, and this byte is part of
    functions a, b, j, n, and z.

    >I imagine that when a compiler spots repetitive sections of code it
    >takes them out of the function's object code into a common area of the
    >object, and has the function point to them. That way redundant functions
    >could be safely removed.


    And what makes you think that function1, function2, and "common area"
    are labelled in a way that the linker can identify them? Sure,
    the entry points are labelled. That's likely to be all the info
    available.

    Gordon L. Burditt
    Gordon Burditt, Jun 23, 2005
    #8
  9. >> The point here is that it may not have the information required to
    >> remove unused functions even if they can be determined to be unused.
    >> The object format may not even PERMIT passing the information required
    >> to determine what code is part of what function(s).
    >>
    >> Gordon L. Burditt

    >
    >In that case the object format should be changed :)


    Using that standard, can you name any object format that should
    NOT be changed? One in actual use, with an actual compiler that
    generates it?

    Gordon L. Burditt
    Gordon Burditt, Jun 23, 2005
    #9
  10. In article <>,
    Gordon Burditt <> wrote:

    >I didn't say it did. I said that if you have two functions compiled
    >in a (object) file, and one of them isn't needed, there's no guarantee that
    >the linker can determine what is part of the needed function (and possibly
    >the other one also) to keep, and what is NOT part of the needed function
    >(to delete).


    How hard can it be?

    I mean, all you have to do is solve the halting problem...


    dave

    --
    Dave Vandervies

    [T]he program's running time will be reduced by ONE WHOLE MILLISECOND! WOW!
    --Eric Sosman in comp.lang.c
    Dave Vandervies, Jun 23, 2005
    #10
  11. On Thu, 23 Jun 2005 21:57:17 -0000, (Gordon
    Burditt) wrote:

    > >> The point here is that it may not have the information required to
    > >> remove unused functions even if they can be determined to be unused.
    > >> The object format may not even PERMIT passing the information required
    > >> to determine what code is part of what function(s).
    > >>
    > >> Gordon L. Burditt

    > >
    > >In that case the object format should be changed :)

    >
    > Using that standard, can you name any object format that should
    > NOT be changed? One in actual use, with an actual compiler that
    > generates it?
    >

    The object file format used on Tandem^WCompaq^WHP NonStop in TNS
    (legacy) mode has completely disjoint code blocks (also data blocks),
    with a copy of interroutine references sorted by target, so you need
    only look at a single field to determine a routine is unreferenced.
    There are still supported and used compilers (and runtimes) for at
    least C, Fortran, and COBOL, and a cfront-based (less than Standard)
    C++; there used to be Pascal, but I don't think it's still supported.
    (The newer 'native' RISC tools are ELF, and full C++. The newest
    Itanium ones I haven't seen yet.)

    - David.Thompson1 at worldnet.att.net
    Dave Thompson, Jul 4, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    466
    Adrienne Boswell
    May 31, 2006
  2. Felix Kater

    linker: does it skip unused functions?

    Felix Kater, Dec 29, 2004, in forum: C Programming
    Replies:
    9
    Views:
    283
    Eltee
    Dec 30, 2004
  3. Geronimo W. Christ Esq

    Removing dead code and unused functions

    Geronimo W. Christ Esq, Jun 19, 2005, in forum: C Programming
    Replies:
    39
    Views:
    2,228
    Dave Thompson
    Jul 4, 2005
  4. Dan Henry
    Replies:
    0
    Views:
    379
    Dan Henry
    Jun 21, 2005
  5. Dom Gilligan
    Replies:
    6
    Views:
    2,186
    Dom Gilligan
    Aug 18, 2005
Loading...

Share This Page