automatically remove unused #includes from C source?

S

smachin1000

Hi All,

Does anyone know of a tool that can automatically analyze C source to
remove unused #includes?

Thanks,
Sean
 
W

Walter Roberson

Does anyone know of a tool that can automatically analyze C source to
remove unused #includes?

Tricky.

A #define in include1.h might be used in a #define in include2.h that
might be used to build a type in include3.h that might be needed by a
function declaration brought in by include5.h that is #include'd by
include4.h, and the function name might be in a disguised array
initialization form in include6.h and the analyzer would have to
analyze your source to see whether you refer to that function directly
or if you use the array initialization...

In other words, such a tool would have to pretty much be a C compiler
itself, but one that kept track of all the "influences" that went
to build up every token, and figured out what wasn't used after-all.

It might be easier just to start commenting out #include's and
seeing if any compile problems came up.
 
R

Roland Pibinger

It might be easier just to start commenting out #include's and
seeing if any compile problems came up.

Automate that and you have the requested tool!

Best wishes,
Roland Pibinger
 
W

Walter Roberson

Automate that and you have the requested tool!

including a particular file can end up changing the meaning of
something else, but the code might compile fine without it.

For example, you might have an include file that contained

#define _use_search_heuristics 1

Then the code might have

#if defined(_use_search_heuristics)
/* do it one way */
#else
/* do it a different way */
#endif

where the code is valid either way.

Thus in order to test whether any particular #include is really
needed by checking the compile results, you need to analyze the
compiled object, strip out symbol tables and debug information and
compile timestamps and so on, and compare the generated code.
 
R

Roland Pibinger

including a particular file can end up changing the meaning of
something else, but the code might compile fine without it.

For example, you might have an include file that contained

#define _use_search_heuristics 1

Then the code might have

#if defined(_use_search_heuristics)
/* do it one way */
#else
/* do it a different way */
#endif

where the code is valid either way.

You are right in theory. But that kind of include file dependencies
(include order dependencies) is usally considered bad style.
Thus in order to test whether any particular #include is really
needed by checking the compile results, you need to analyze the
compiled object, strip out symbol tables and debug information and
compile timestamps and so on, and compare the generated code.

IMO, this is overdone. You have to test your application after code
changes anyway.

Best regards,
Roland Pibinger
 
A

Al Balmer

You are right in theory. But that kind of include file dependencies
(include order dependencies) is usally considered bad style.

No, he's right in practice. There's no guarantee that a body of
existing code will conform to your (or anyone's) rules of good style.
 
W

Walter Roberson

You are right in theory. But that kind of include file dependencies
(include order dependencies) is usally considered bad style.

It happens often with large projects with automakes and
system dependancies. The included file that changes the meaning
of the rest is a "hints" file.

For example, on the OS I use most often, for a well
known large project (perl as I recall) the autoconfigure
detects that the OS has library entries and include entries
for a particular feature. Unfortunately that particular feature
doesn't work very well in the OS -- broken -and- very inefficient.
So the OS hints file basically says, "Yes I know you've detected
that, but don't use it." So the large project goes aheads and
compiles in the code that performs the task using more standardized
system calls instead of the newer less-standardized API.
IMO, this is overdone. You have to test your application after code
changes

Conformance tests can take 3 days per build, and if you
are checking whether a project with 1500 #includes (distributed
over the source) can survive deleting one particular include
out of one particular module, then you need up to pow(2,1500)
complete builds and conformance tests. Even if each *complete*
application conformance test took only 1 second, it'd take
10^444 CPU years to complete the testing. *Much* faster to break
it into chunks (e.g., by source file) and check to see whether
each chunk still produces the same code after removal of a
particular include: the timing then becomes proportional to
the sum of pow(2,includes_in_this_chunk) instead of the product
of those as would be the case with what you propose.
 
W

Walter Bright

Walter said:
Conformance tests can take 3 days per build, and if you
are checking whether a project with 1500 #includes (distributed
over the source) can survive deleting one particular include
out of one particular module, then you need up to pow(2,1500)
complete builds and conformance tests. Even if each *complete*
application conformance test took only 1 second, it'd take
10^444 CPU years to complete the testing. *Much* faster to break
it into chunks (e.g., by source file) and check to see whether
each chunk still produces the same code after removal of a
particular include: the timing then becomes proportional to
the sum of pow(2,includes_in_this_chunk) instead of the product
of those as would be the case with what you propose.

That is a good idea: selectively removing #include statements, and then
simply seeing if the resulting object code file changes.

Otherwise, a customized C compiler could absolutely tell if there were
any dependencies on a particular #include file.
 
D

Don Porges

Walter Roberson said:
including a particular file can end up changing the meaning of
something else, but the code might compile fine without it.

For example, you might have an include file that contained

#define _use_search_heuristics 1

Then the code might have

#if defined(_use_search_heuristics)
/* do it one way */
#else
/* do it a different way */
#endif

where the code is valid either way.

Thus in order to test whether any particular #include is really
needed by checking the compile results, you need to analyze the
compiled object, strip out symbol tables and debug information and
compile timestamps and so on, and compare the generated code.

Then, analyze it to make sure you don't delete the #include of "seems_unused.h" in this:

seems_unused.h:
-----------------
#define MIGHT_NEED 1

somefile.c:
----------
#ifdef DEFINED_WITH_MINUS_D
int var = MIGHT_NEED;
#endif

-- so that next week, when somebody does gcc -DDEFINED_WITH_MINUS_D, the code still builds.
 
N

Neil

Hi All,

Does anyone know of a tool that can automatically analyze C source to
remove unused #includes?

Thanks,
Sean

doesn't PC-LINT give you a list of unused includes?
 
R

Roland Pibinger

Conformance tests can take 3 days per build, and if you
are checking whether a project with 1500 #includes (distributed
over the source) can survive deleting one particular include
out of one particular module, then you need up to pow(2,1500)
complete builds and conformance tests. Even if each *complete*
application conformance test took only 1 second, it'd take
10^444 CPU years to complete the testing.

That calculaton is quite contrived. I wonder how you would do changes
in you code base besides removing an #include, not to speak of
refactoring.
*Much* faster to break
it into chunks (e.g., by source file) and check to see whether
each chunk still produces the same code after removal of a
particular include:

.... and if it compiles but produces different object code then you
have found an include order dependency bug ;-)

Best regards,
Roland Pibinger
 
R

Roland Pibinger

Otherwise, a customized C compiler could absolutely tell if there were
any dependencies on a particular #include file.

BTW, there is a huge demand for static code analysis tools in C and
C++ (also in a commercial sense). For most of those code analysis
tasks you need to have a fully-fledged (customized) compiler. So, if I
had that compiler ...

Best regards,
Roland Pibinger
 
W

Walter Bright

Roland said:
BTW, there is a huge demand for static code analysis tools in C and
C++ (also in a commercial sense). For most of those code analysis
tasks you need to have a fully-fledged (customized) compiler. So, if I
had that compiler ...

True, I've seen some amazingly high prices quoted for static code
analysis. There's nothing stopping someone from approaching Digital Mars
or other compiler vendors and offering to purchase a license for the
compiler to get into that business.
 
R

Roland Pibinger

True, I've seen some amazingly high prices quoted for static code
analysis. There's nothing stopping someone from approaching Digital Mars
or other compiler vendors and offering to purchase a license for the
compiler to get into that business.

What is stopping you?
 
I

Ian Collins

Roland said:
BTW, there is a huge demand for static code analysis tools in C and
C++ (also in a commercial sense). For most of those code analysis
tasks you need to have a fully-fledged (customized) compiler. So, if I
had that compiler ...
Due to extensions, such an tool can only realy be part of the compiler
suite.
 
R

Richard Heathfield

Roland Pibinger said:
What is stopping you?

I don't think Walter Bright needs to approach /anyone/ to purchase a licence
for the Digital Mars compiler. :)
 
C

CBFalconer

Walter said:
That is a good idea: selectively removing #include statements, and
then simply seeing if the resulting object code file changes.

Otherwise, a customized C compiler could absolutely tell if there
were any dependencies on a particular #include file.

Such an operation would need C99 specs, otherwise the use of
implied int would foul the results. It might be enough to tell the
compiler to insist on prototypes.
 
W

Walter Roberson

That calculaton is quite contrived.

Contrived? Well, yes, in the sense that any large project is likely
to have much *more* than 1500 #include statements. For example, I just
ran a count against the trn4 source (which is less than 1 megabit
when gzip'd), and it has 1659 #include statements. openssl 0.9.7e
has 4679 #include statements (it's about 3 megabytes gzip'd).
I wonder how you would do changes
in you code base besides removing an #include, not to speak of
refactoring.

You seem to have forgotten that you yourself proposed,
"Automate that and you have the requested tool!" in response to my
saying, "It might be easier just to start commenting out #include's".
When I indicated that it is more complex than that and that
comparing object code is necessary (not just looking to see if
looking for compile errors), you said,
"You have to test your application after code changes anyway."

Taken in context, your remark about testing after code changes
must be considered to apply to the *automated* tool you proposed.
And the difficulty with automated tools along these lines is that they
are necessarily dumb: if removing #include file1.h gives you a
compile error, then the tool cannot assume that file1.h is a -necessary-
dependancy (i.e., a tool that could test in linear time): the tool
would have to assume the possibility that removing file1.h
only gave an error because of something in file2.h --- and yes,
there can be backwards dependancies, in which file1.h is needed to
complete something included -before- that point. Thus, in this kind
of automated tool that doesn't know how to parse the C code itself,
full dependancy checking can only be done by trying every -possible-
combination of #include files, which is a 2^N process.

Do you feel that 1 second to "test your application after code changes"
is significantly longer than is realistic? It probably takes longer
than that just to compile and link the source each time.

I wonder how you would do changes
in you code base besides removing an #include, not to speak of
refactoring.

I don't mechanically automate the code change and test process.

... and if it compiles but produces different object code then you
have found an include order dependency bug ;-)

Include order dependencies are not bugs unless the prevailing
development paradigm for the project has declared them to be so.

Once you get beyond standard C into POSIX or system dependancies,
it is *common* for #include files to be documented as being order
dependant upon something else. Better system developers hide
that by #include'ing the dependancies and ensuring that, as far as
is reasonable, that each system include file has guards against
multiple inclusion, but that's a matter of Quality of Implementation,
not part of the standards.


Still, it is true that in the case of multiple source files that
together have 1500 #include, that you would not need to do pow(2,1500)
application tests, if you are using a compiler that supports
independant compilation and later linking. If you do have independant
compilation, then within each source file it is a 2^N process
to find all the #include combinations that will compile, but most of
the combinations will not. Only the versions that will compile need
to go into the pool for experimental linkage; linkage experiments
would be the product of the number of eligable compilations for each
source. Only the linkages that survived would need to go on for testing.
The number of cases that wil make it to testing is not possible to
estimate without statistical information about the probability that any
given #include might turn out to be unneeded.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top