#include optimization

R

Ramesh

Hi,

I am currently maintaining a legacy code with a very very large code base.
I am facing problems with C/C++ files having a lot of un-necessary #includes.
On an average every C/C++ file has around 150+ .h files included. I find 75%
of the files unnecessary and could be removed. Considering the fact that I
have a huge code base, I can't manually fix it.

Are there any tools that would report un wanted .h files?

I am not sure if this is a right group to ask this question.
I would appreciate if any pointers could be provided.

Thanks
Ramesh
 
D

Dan Pop

In said:
I am currently maintaining a legacy code with a very very large code base.
I am facing problems with C/C++ files having a lot of un-necessary #includes.
On an average every C/C++ file has around 150+ .h files included. I find 75%
of the files unnecessary and could be removed. Considering the fact that I
have a huge code base, I can't manually fix it.

If it ain't broken, don't fix it.

Dan
 
T

Thomas Matthews

Ramesh said:
Hi,

I am currently maintaining a legacy code with a very very large code base.
I am facing problems with C/C++ files having a lot of un-necessary #includes.
On an average every C/C++ file has around 150+ .h files included. I find 75%
of the files unnecessary and could be removed. Considering the fact that I
have a huge code base, I can't manually fix it.

Are there any tools that would report un wanted .h files?

I am not sure if this is a right group to ask this question.
I would appreciate if any pointers could be provided.

Thanks
Ramesh

You could always write your own.

You may want to consider this technique:
#ifndef HEADER_SYMBOL
#include "header_file.h"
#endif
which could speed up compilation by not having to open
header files, then encounter the guard, then read until
EOR to reach the end of the guard, and close the file.

But this is more of a quality of implementation issue.
Some compilers may be smart enough not to open the
header files already.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
E

E. Robert Tisdale

Ramesh said:
I am currently maintaining a legacy code with a very very large code base.
I am facing problems with C/C++ files having a lot of un-necessary #includes.

You are probably mistaken.
Good programmers don't include header files that aren't necessary.
On an average every C/C++ file has around 150+ .h files included.
I find 75% of the files unnecessary and could be removed.

How did you determine that?
Considering the fact that I have a huge code base,
I can't manually fix it.

What, exactly, are you trying to fix?
Are there any tools that would report un wanted .h files?

Of course not. How would such a tool know
what should and shouldn't be in any given header file?


Try this:
> cat file.h
#ifndef GUARD_FILE_H
#define GUARD_FILE_H 1
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <math.h>
#include <string.h>
#include <values.h>
#include <time.h>
> cat file1.c
#include "file.h"
> time gcc -Wall -std=c99 -pedantic -c file1.c
0.133u 0.049s 0:00.18 94.4% 0+0k 0+0io 0pf+0w
> cat file128.c
#include "file.h"
#include "file.h"
.
.
.
#include "file.h"
> time gcc -Wall -std=c99 -pedantic -c file128.c
0.144u 0.039s 0:00.19 89.4% 0+0k 0+0io 0pf+0w

which shows that it doesn't take any more time to process
a header file 128 time than it takes to process it once!
Once the C preprocessor has read an idempotent file,
it doesn't read it again no matter how many times it is included.
 
A

Alan Balmer

You could always write your own.

You may want to consider this technique:
#ifndef HEADER_SYMBOL
#include "header_file.h"
#endif
which could speed up compilation by not having to open
header files, then encounter the guard, then read until
EOR to reach the end of the guard, and close the file.

But this is more of a quality of implementation issue.
Some compilers may be smart enough not to open the
header files already.

That doesn't really answer the OP's question, though. I don't know of
any such tool, though it would be useful at times. I've done the
equivalent manually, by ifdef'ing out header files and checking
whether they still compile.
 
C

clilley

Hi,

I have used a program called 'lint' before to QA C/C++ code. One of the
errors the SUN version reported was unused header files. This may prove
useful to you.

Regards

Clive
 
A

Andre Kostur

You are probably mistaken.
Good programmers don't include header files that aren't necessary.

You're assuming that the previous programmers in that project were good.
How did you determine that?

Probably by cursory examination. Keep in mind that the OP isn't talking
about every C/C++ file as in every one in existance, only the ones in his
project. Neither is the OP intending to be exact with the statistics.
The OP is merely indicating that he has noticed an appreciable number of
header files that do not appear to be contributing anything useful to
certain translation units.
What, exactly, are you trying to fix?

Excessive includes causing the compiler to load too many files.
Of course not. How would such a tool know
what should and shouldn't be in any given header file?

By examining what symbols are used in each giving translation unit, and
removing those header files that don't mention any of those symbols.
(OK, that's probably grossly simplified, and in no way am I trying to
imply that this would be an _easy_ task...)
Try this:

[snip some example about including the same file many times in one
translation unit]
which shows that it doesn't take any more time to process
a header file 128 time than it takes to process it once!
Once the C preprocessor has read an idempotent file,
it doesn't read it again no matter how many times it is included.

However, the OP's problem is that there exists many source files which
include "extra" header files. Not that one single file is including the
same header file over and over again.
 
K

Keith Thompson

E. Robert Tisdale said:
You are probably mistaken.
Good programmers don't include header files that aren't necessary.

But I wouldn't be at all surprised to find that the previous
programmers on his project cut-and-pasted some large set of #include
directives into each *.c file by rote, or that the set of #include
directives just grew over the years. Not all programmers are good
programmers (and even good programmers aren't good all the time).
Maintenance programmers often have to deal with the consequences of
other people's sloppiness.

If a single C (or C++) source file has over 150 #include directives,
there's probably a serious problem. If an *average* source file has
that many #include directives, this approaches certainty.

For that matter, if only 75% of them are superfluous, that still
leaves an average of nearly 40 #includes per source file, which is
more than I'd be comfortable with.

If the system is that poorly structured, it might be best to leave it
alone and make only careful incremental changes, but if you want to
try to get a handle on it, deleting the superfluous #includes could be
a good start.

[...]
Try this:
[snip]
time gcc -Wall -std=c99 -pedantic -c file128.c
0.144u 0.039s 0:00.19 89.4% 0+0k 0+0io 0pf+0w

which shows that it doesn't take any more time to process
a header file 128 time than it takes to process it once!
Once the C preprocessor has read an idempotent file,
it doesn't read it again no matter how many times it is included.

That may be true for gcc, but the OP didn't say he's using gcc. In
any case, the OP didn't say that compilation time is what he's worried
about. Programmer time is far more valuable than compilation time;
simplifying the code could save significantly on programmer time.
 
J

Jack Klein

Hi,

I am currently maintaining a legacy code with a very very large code base.
I am facing problems with C/C++ files having a lot of un-necessary #includes.
On an average every C/C++ file has around 150+ .h files included. I find 75%
of the files unnecessary and could be removed. Considering the fact that I
have a huge code base, I can't manually fix it.

Are there any tools that would report un wanted .h files?

I am not sure if this is a right group to ask this question.
I would appreciate if any pointers could be provided.

Thanks
Ramesh

PC Lint, http://www.gimpel.com.

I have verified this capability with C files, and have no reason to
doubt that it can do this with C++ source as well.

If the project is not being build on a PC, they make a much more
expensive for *nix types of systems. It may be feasible to import the
source tree onto a Windows PC just to run the PC version of the
product, which is a bargain at its price.
 
S

Suzie

E. Robert Tisdale said:
Good programmers don't include header files that aren't necessary.

How many good programmers do you know who include 150+ header files in
their source files?
 
G

Gregg

That doesn't really answer the OP's question, though. I don't know of
any such tool, though it would be useful at times. I've done the
equivalent manually, by ifdef'ing out header files and checking
whether they still compile.

That can work as long as you look out for headers containing

#define ENABLE_XYZ_FEATURE

and a CPP file containing

#ifdef ENABLE_XYZ_FEATURE
:
#endif

In this case, removing the header might not prevent the software from
compiling, but it might change what is being generated.

Gregg
 
R

Ramesh Natarajan

Thank a lot guys. Actually I was thinking about the questions raised in
my mind...
A lot of things could have been done better...... But the reality is
that,
with no offence to the developers, the current state of the code is
pretty bad
and it needs to be fixed.

The problem with these lot of un wanted #includes is that my
development
platform is Tandem and as I understand, the file open and close are the
most
expensive operations on the platform. Unfortunately we dont use a
cross compiler
and depend on a native compiler that needs to be run on the Tandem!!

In general the compilation is pretty slow and with these header
problems it takes
forever to compile!!

There are 2 problems that needs to be addressed.

(1)Multiple inclusions

eg:

b.h
----
#include<a.h>
....
....

c.h
----
#include<a.h>
#include<b.h>

Now the include of a.h in c.h is not needed.

Although a.h has macro guards and it would not be included twice,
the compiler still needs to open and close a.h file twice.

As suggested,I can move the guards to the top and so my c.h file
can be changed to

#ifndef A_H
#include <a.h>
#endif
....

So this is kinda feasible for scripting.. just read all the .h files
and
get the macro guard string and change all occurances of #include with
the
enclosing guard.


(2) The second problem is un-necessary #includes. Over a period of time
the
developers have got a set of #includes that they add to the files
without
thinking!!

for eg: consider

#include<stdio.h>
#include<sys/types.h>
#include<sys/socket.h>
int main(int argc,char** argv){
printf("Hello World\n");
}

Here the includes for types and socket is unnecessary.

This involves 2 file opens on this level that could be avoided.

I want to fix this also.

But I am unable to come out with some solution...

Anyone has some ideas on how to do this...

There were 2 options suggested. lint on sun and PCLint
I will give it a shot this week.

Thanks once again for the help

Ramesh
 
M

Michael Mair

Hi there,

I cannot look into Keith's head but one possible reason is that
from looking at which file includes which header you get information
where to look when changing something.

Especially in huge codes with many man years of work in them and
too few man days of documentation work and me not being an expert,
this (e.g. by find . -type f -exec grep obscureheader.h \{} -print)
is -- after browsing around with tags -- one step I might
take to get a feeling for how this connects with that and whether some
people took "shortcuts" somewhere that need to be fixed.
If you basically have people including about fifty header files for
no reason in every translation unit they get into their hands so
that they do not have to know about where which functions, definitions
and so on come from, then this approach obviously is not feasible.

Just a guess.


Cheers
Michael
 
J

John Bode

E. Robert Tisdale said:
You are probably mistaken.
Good programmers don't include header files that aren't necessary.

Yes they do, on occasion. It's possible that the original programmers
weren't very good. It's also possible that the headers were necessary
at one time, but over time the code was hacked beyond recognition and
the symbols in those headers are no longer being used in that source
file. Or it could be the result of a cut-and-paste fest stemming from
unclear requirements and a laughably unrealistic schedule.

I've seen this movie more than once. Hell, I'm *in* the movie.
How did you determine that?

By checking to see if the source file actually *uses* any of the
symbols defined in the header file maybe?
What, exactly, are you trying to fix?

He is trying to remove unnecessary #include directives, thereby making
the code easier to read and maintain (and maybe save some cycles
during builds).
Of course not. How would such a tool know
what should and shouldn't be in any given header file?

The tool could scan each header file for symbols (macros, typedefs,
enums, function declarations, external variable declarations) and then
check the source file to see if any of those symbols are present. If
none of the symbols in the header are present in the source file, then
the tool can mark that header as (probably) superfluous. I don't
personally know of any tool that does that particular job, and I won't
claim that it would be easy to write one (handling nested includes
would be "fun"), but it can be done (lex/flex and yacc/bison would
probably be the best way to go about it).
Try this:

#ifndef GUARD_FILE_H
#define GUARD_FILE_H 1
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <math.h>
#include <string.h>
#include <values.h>
#include <time.h>

#include "file.h"

0.133u 0.049s 0:00.18 94.4% 0+0k 0+0io 0pf+0w
#include "file.h"
#include "file.h"
.
.
.
#include "file.h"

0.144u 0.039s 0:00.19 89.4% 0+0k 0+0io 0pf+0w

which shows that it doesn't take any more time to process
a header file 128 time than it takes to process it once!
Once the C preprocessor has read an idempotent file,
it doesn't read it again no matter how many times it is included.

But that's not really the OP's problem, is it?
 
K

Keith Thompson


Because.

Take a look at what's been said so far. The source files have an
average of about 150 #include directives, most of which are
unnecessary. It's a "very very large code base". It seems fairly
obvious to me that the whole thing is a mess, and that cleaning it up
would make it easier to maintain. (I'm tempted to suggest the
possibility of throwing it away and starting from scratch, but that's
probably not feasible.)

Maybe it isn't really a problem, or maybe the unnecessary #includes
are such a small part of the problem that eliminating them wouldn't
really help, which is why I qualified my statement with the word
"probably". But since Ramesh specifically said that it's a problem,
and he's asking for ways to fix it, I'm not going to assume that he's
mistaken about the premise for his question.
 
C

CBFalconer

Keith said:
Because.

Take a look at what's been said so far. The source files have an
average of about 150 #include directives, most of which are
unnecessary. It's a "very very large code base". It seems fairly
obvious to me that the whole thing is a mess, and that cleaning it
up would make it easier to maintain. (I'm tempted to suggest the
possibility of throwing it away and starting from scratch, but
that's probably not feasible.)

Maybe it isn't really a problem, or maybe the unnecessary #includes
are such a small part of the problem that eliminating them wouldn't
really help, which is why I qualified my statement with the word
"probably". But since Ramesh specifically said that it's a
problem, and he's asking for ways to fix it, I'm not going to
assume that he's mistaken about the premise for his question.

Actually it is a job that can be attacked and checked piecemeal.
Remove the (presumably) useless includes in one file by commenting
out (thus retaining line numbers) and compile it to an object
file. Do a binary compare against the original object file. If
identical, all is well. If not, investigate the causes (which may
include a compilation datestamp). This assumes that the system
will not generate external linkages on seeing only a prototype.

Another approach revolves around a cross referance, or the
interactive equivalent supplied by cscope.
 
P

Peter van Merkerk

CBFalconer said:
Actually it is a job that can be attacked and checked piecemeal.
Remove the (presumably) useless includes in one file by commenting
out (thus retaining line numbers) and compile it to an object
file. Do a binary compare against the original object file. If
identical, all is well.

Even if binary compare shows that there are differences all may be well
too. I learned this the hard way; compiling the same code twice with a
certain compiler yielded different binaries. Apparently the compiler
wrote the build date or something like that into the binary. When debug
information is stored in object files this test may also fail for the
wrong reasons.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,128
Latest member
ElwoodPhil
Top