Removing dead code and unused functions

  • Thread starter Geronimo W. Christ Esq
  • Start date
G

Geronimo W. Christ Esq

Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?

LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.
 
G

Greg

Geronimo said:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?

LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.

There is in fact such a tool, it's commonly called a "linker." And the
list of unreferenced code and data that it strips from a build is
usually cataloged in a file it can be directed to create. This file is
commonly called a "link map."

Greg
 
G

Geronimo W. Christ Esq

Greg said:
There is in fact such a tool, it's commonly called a "linker." And the
list of unreferenced code and data that it strips from a build is
usually cataloged in a file it can be directed to create. This file is
commonly called a "link map."

Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.
 
J

Jean-Claude Arbaut

Le 19/06/2005 17:49, dans (e-mail address removed),
« Geronimo W. Christ Esq » said:
Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.

I'm not sure but "nm" could be useful here.
 
R

Richard Tobin

Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.
[/QUOTE]
I'm not sure but "nm" could be useful here.

Linkers typically do not exclude functions in the user program that are
unused. They only do that with libraries.

More useful would be one of the many tools that generate call graphs.

-- Richard
 
G

Geronimo W. Christ Esq

Jean-Claude Arbaut said:
I'm not sure but "nm" could be useful here.

This problem can't be appropriately solved with a linker, particularly
not the GNU linker. GNU ld can only throw out sections, not unused
functions or global variables; so if you've got a file containing 10
functions, 9 of which are unused, all ten will still get linked.

Parsing the source code is the answer, it's just surprising no-one seems
to have done this yet.
 
G

Geronimo W. Christ Esq

Richard said:
Linkers typically do not exclude functions in the user program that are
unused. They only do that with libraries.

More useful would be one of the many tools that generate call graphs.

Can you think of any examples ?

I know of runtime tools which do this (could even use gcov at a pinch)
but in that case you need to come up with a set of test cases which
exercise the code fully.
 
P

Phlip

Geronimo said:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?

Write unit tests for every feature. Pass all tests between every 1~10 edits.

Constantly try to remove parameters, variables, lines, methods, classes, and
modules. If any test fails, hit Undo.

This process is great for growing code to maximize features and minimize
lines.
 
G

Geronimo W. Christ Esq

Phlip said:
Write unit tests for every feature. Pass all tests between every 1~10 edits.

That's what I would do if I had a group of developers and six months to
do it in. Unfortunately many of us in this post-dotcom age do not work
to near-infinite budgets.
 
J

jacob navia

Geronimo said:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?

LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.

The lcc-win32 IDE will do that. Select Object file cross-reference in
the analysis menu, then look for symbols that are not referenced anywhere.

A problem with this approach is that the IDE doesn't recognize functions
that are referenced in the same file. For instance:

int foo(void)
{
// ...
}

int main(void)
{
foo();
}

The function will appear as not referenced. Besdies the IDE only handles
C programs (it is a C IDE).

http://www.cs.virginia.edu/~lcc-win32.
 
P

Phlip

Geronimo said:
edits.

That's what I would do if I had a group of developers and six months to
do it in. Unfortunately many of us in this post-dotcom age do not work
to near-infinite budgets.

Do you have time and resources to debug?

You can leverage tests, like that, to replace many long hours of debugging
for a few short minutes writing tests.

The idea that automated testing requires an "infinite budget" is a myth.

(And if you indeed have a short deadline, why bother removing harmless but
unused code?)
 
T

Tim Prince

Jean-Claude Arbaut said:
Le 19/06/2005 17:49, dans (e-mail address removed),



I'm not sure but "nm" could be useful here.
In times gone by, the lorder and tsort tools showed which .o files were
not used, as well as finding a single pass link order, if one exists.
Now that no one cares about the single pass link, we don't find these
tools installed automatically.
 
W

Walter Roberson

Do you have time and resources to debug?
You can leverage tests, like that, to replace many long hours of debugging
for a few short minutes writing tests.
The idea that automated testing requires an "infinite budget" is a myth.
(And if you indeed have a short deadline, why bother removing harmless but
unused code?)

If you are handed a large program and told to "make it work",
then the first thing you need to do is bring it under control. Machines
are a lot faster and more accurate about matters such as which functions
are potentially callable, so it makes sense to mechanically
pre-process the code instead of going in and writing tests for
each section under the assumption that the code will be used.
One can spend endless hours trying to "fix" a routine that
isn't even needed. Overview first, -then- ensure that each
function performs its proper role in the design.

A program such as 'cscope' can assist in finding unused functions
and in finding locations from which functions are called.

The idea that automated testing requires an "infinite budget" is a myth.

Well, sure it is: there are only a finite number of states that
a program can be in on a given system, so the amount of testing
one has to do has a finite upper bound, not an infinite bound.

There's the small issue that current scientific thought suggests
that the Universe will not last long enough to test even fairly
trivial programs (e.g., it takes 1E21 years to test a program
with merely two 64-bit floating point numbers if the tests can be
done at 10 gigaflop).

But you are absolutely right that that won't require an infinite budget --
it only requires a budget larger than is likely to be available at
any time before Homo Sapiens Sapiens die off or evolve into something
else.
 
C

CBFalconer

jacob said:
The lcc-win32 IDE will do that. Select Object file
cross-reference in the analysis menu, then look for symbols that
are not referenced anywhere.

A problem with this approach is that the IDE doesn't recognize
functions that are referenced in the same file. For instance:

Those shouldn't appear in the first place. They should have been
declared static and omitted from the .h file.

--
Some informative links:
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html
 
J

Jean-Claude Arbaut

In times gone by, the lorder and tsort tools showed which .o files were
not used, as well as finding a single pass link order, if one exists.
Now that no one cares about the single pass link, we don't find these
tools installed automatically.

I didn't know they show this information, but that's true they are not very
useful nowadays. I think they are still part of the binutils package.
 
B

Ben Pope

Walter said:
Well, sure it is: there are only a finite number of states that
a program can be in on a given system, so the amount of testing
one has to do has a finite upper bound, not an infinite bound.

There's the small issue that current scientific thought suggests
that the Universe will not last long enough to test even fairly
trivial programs (e.g., it takes 1E21 years to test a program
with merely two 64-bit floating point numbers if the tests can be
done at 10 gigaflop).

Then it is clear that you do not understand unit testing.

Ben
 
P

Phlip

Walter said:
If you are handed a large program and told to "make it work",
then the first thing you need to do is bring it under control.

Read /Working Effectively with Legacy Code/ by Mike Feathers. He's a
consultant who routinely guides teams thru that exact situation.

A boss has spent a lot of money to build a codebase, with very little
return. Then a team must make the code valuable, without wasting more time
and effort.
Machines
are a lot faster and more accurate about matters such as which functions
are potentially callable, so it makes sense to mechanically
pre-process the code instead of going in and writing tests for
each section under the assumption that the code will be used.
One can spend endless hours trying to "fix" a routine that
isn't even needed. Overview first, -then- ensure that each
function performs its proper role in the design.

A program such as 'cscope' can assist in finding unused functions
and in finding locations from which functions are called.

Yes, automated tools that scan code and interpret it will help. But I don't
see the relation between "Where the bugs are" and "Where control flow is
not". The principle "Ain't broke don't fix it" applies here. Dead code ain't
broke. Bugs will lead to investigation of the live code causing them.
Well, sure it is: there are only a finite number of states that
a program can be in on a given system, so the amount of testing
one has to do has a finite upper bound, not an infinite bound.

The idea that developer tests should be like quality assurance tests is also
a myth. Developer tests are little more than the scaffolding used to support
a building while you build it. Earthquake-proofing the building is an
orthogonal concern.
There's the small issue that current scientific thought suggests
that the Universe will not last long enough to test even fairly
trivial programs (e.g., it takes 1E21 years to test a program
with merely two 64-bit floating point numbers if the tests can be
done at 10 gigaflop).

That's hardly an excuse not to try. The goal is _not_ "prove there are no
bugs". A math proof is, indeed, NP-incomplete. Tests can get within 99.9% of
a proof with a trivial effort. The last 0.1% is what costs so much.

The goal is "prevent 99.9% of bugs". You can get there by running tests
frequently, and hitting Undo if any test breaks, to back out the most recent
edit. That's infinitely preferrable to debugging.
 
M

Martijn

A problem with this approach is that the IDE doesn't recognize
Those shouldn't appear in the first place. They should have been
declared static and omitted from the .h file.


And if you do so, most decent compilers (at least GCC does) with the
appropriate warnings enabled will find unreferenced static functions for
you.
 
G

Geronimo W. Christ Esq

Phlip said:
Do you have time and resources to debug?

You can leverage tests, like that, to replace many long hours of debugging
for a few short minutes writing tests.

I've got just under a million lines of code here that have just come
into my possession. I'd love to believe that a few minutes would allow
me to create a suite of tests proving that the program generated from
that codebase worked the same before and after any changes, but I remain
somewhat cynical.
The idea that automated testing requires an "infinite budget" is a myth.

Timescales and budgets do not presently permit me to sit down and write
tests for a huge body of code which I am not completely familiar with. I
have no doubts about the wisdom or long term benefits of doing it, but I
don't possess the resources at the moment.
(And if you indeed have a short deadline, why bother removing harmless but
unused code?)

I don't believe I've mentioned anything about a deadline. What I do have
is a limited resource to work with. I can leverage that resource better
if I can grasp the code more easily. The code can be grasped more easily
if the redundant bits of it are removed.
 
G

Geronimo W. Christ Esq

Walter Roberson wrote:

A program such as 'cscope' can assist in finding unused functions
and in finding locations from which functions are called.

cscope is very handy (as is LXR as I mentioned before). I can indeed go
through each function manually and determine whether it is needed or
not. But I figure that the computer should be able to do that for me,
automatically. Cscope's (or LXR's) generated database contains all the
information that would be required to do that. It's just odd that no-one
has attempted to do the kind of source code profiling that I am talking
about yet, using those databases to generate lists of redundant
functions (or duplicate code).

The reason why it has to be automated is because you have to make
several passes. For example, you could come to function bar() and not
remove that because it is needed by function foo(). However, only later
would you find that function foo() is also unused. You would have to
make a second pass to remove bar(). Take that trivial example and scale
it up to a source base that has a few tens or hundreds of thousands of
functions defined within it and you can see the scale of the issue.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top