#include optimization

C

CBFalconer

Peter said:
Even if binary compare shows that there are differences all may be well
too. I learned this the hard way; compiling the same code twice with a
certain compiler yielded different binaries. Apparently the compiler
wrote the build date or something like that into the binary. When debug
information is stored in object files this test may also fail for the
wrong reasons.

Why did you snip the last sentence or two from my paragraph, which
mentioned precisely this problem?
 
M

Michael Mair

Actually it is a job that can be attacked and checked piecemeal.
Why did you snip the last sentence or two from my paragraph, which
mentioned precisely this problem?

Probably because you gave a solution to the problem without stating
the problem...
To be honest, I did not really get what you were aiming at, either.

For completeness, here is the left-out part:

Cheers,
Michael
 
D

Dan Pop

In said:
Because.

Take a look at what's been said so far. The source files have an
average of about 150 #include directives, most of which are
unnecessary. It's a "very very large code base". It seems fairly
obvious to me that the whole thing is a mess, and that cleaning it up
would make it easier to maintain. (I'm tempted to suggest the
possibility of throwing it away and starting from scratch, but that's
probably not feasible.)

One could merge all the 150 headers in a single header and you'll see
a single header being included instead of 150. Including that header
will also provide some unnecessary definitions and declarations, but this
is no different from including <stdlib.h> and getting more declarations
than your application actually needs (when was the last time you included
<stdlib.h> and used everything declared within?).

So, the fact that the source files include 150 headers, not all of them
necessary to each source file, is not a problem in itself. If the
maintainer is annoyed by seeing such a bunch of includes everywhere,
he can trivially write a new header, that includes all the application
headers, and include only this header in each source file.
Maybe it isn't really a problem, or maybe the unnecessary #includes
are such a small part of the problem that eliminating them wouldn't
really help, which is why I qualified my statement with the word
"probably". But since Ramesh specifically said that it's a problem,
and he's asking for ways to fix it, I'm not going to assume that he's
mistaken about the premise for his question.

And I'm not going to believe him until provided with *concrete* examples.

I'm tempted to believe that the mistake was creating so many application
specific headers in the first place and the *right* fix would be to
drastically reduce their number. Without knowing the specifics, there
is no way of telling whether merging all of them in a single header
(e.g. by using another header that includes all of them) is the right
fix or if there is a real need for more than one application specific
header. But I'm reasonably convinced that there is no need for 150 header
files.

OTOH, I'm reasonably convinced that their existence is not causing
any maintenance problems, either. It's just that the source files
containing so many include directives look mildly annoying.

Dan
 
C

CBFalconer

Michael Mair wrote: *** And removed attributions ***
Probably because you gave a solution to the problem without stating
the problem...
To be honest, I did not really get what you were aiming at, either.

For completeness, here is the left-out part:

Wrong left-out part, which was as follows, and was an integral
portion of the quoted paragraph:

Your elimination of attributions leaves the (mistaken) impression
that I was complaining about your actions, while the actual
offender was Peter van Merkerk.
 
M

Michael Mair

[Me messing up a discussion I did not take part in]
Wrong left-out part, which was as follows, and was an integral
portion of the quoted paragraph:

Argh, sorry, I did so not get it :-(
Your elimination of attributions leaves the (mistaken) impression
that I was complaining about your actions, while the actual
offender was Peter van Merkerk.

Er, right. Was a quick one when compiling my crap...
More thinking before sending next time.
Once again, sorry for messing it up!

--Michael
 
M

Michael Wojcik

Even if binary compare shows that there are differences all may be well
too. I learned this the hard way; compiling the same code twice with a
certain compiler yielded different binaries. Apparently the compiler
wrote the build date or something like that into the binary.

All it takes is one reference to __DATE__ or __TIME__.

I have projects where many of the TUs explicitly include build
timestamps using __DATE__ and __TIME__, since that information can be
useful in identifying precisely which build is being used. (Yes, the
projects also have explicit version information, and the modules have
version information that's updated automatically by the SCM system,
but it doesn't hurt to have confirmation.)

With that sort of project, binary comparison is basically useless.
Chuck's method certainly can work in some projects, but for something
the size that the OP is describing I suspect you'd spend a great deal
of time investigating false positives.

--
Michael Wojcik (e-mail address removed)

You brung in them two expert birdwatchers ... sayin' it was to keep us from
makin' dern fools of ourselfs ... whereas it's the inherent right of all to
make dern fools of theirselfs ... it ain't a right held by you official types
alone. -- Walt Kelly
 
C

CBFalconer

Michael said:
All it takes is one reference to __DATE__ or __TIME__.

I have projects where many of the TUs explicitly include build
timestamps using __DATE__ and __TIME__, since that information can be
useful in identifying precisely which build is being used. (Yes, the
projects also have explicit version information, and the modules have
version information that's updated automatically by the SCM system,
but it doesn't hurt to have confirmation.)

With that sort of project, binary comparison is basically useless.
Chuck's method certainly can work in some projects, but for something
the size that the OP is describing I suspect you'd spend a great deal
of time investigating false positives.

No it isn't useless. The differing areas will generally be of the
same size, if you have taken the precautions I mentioned. A
decent binary difference utility will show the differing bytes and
carry on. Mine shows the offset, the numerical difference, the
xor, the hex values from each file, and if printable the char
values. Something with the only difference a datestamp will
normally show 5 to 10 bytes of difference, and all else is not
present in the output.

c:>fdiff
usage: FDIFF [/q] [/nnn] file1 file2
options /q prevents page pauses
/nnn (hex digits) sets addr display.
binary comparison between file1 and file2

Ex: FDIFF /q/100 fdiff.com fdiff.bin
no page pauses, addr starts as 0100h
 
D

Dave Thompson

On 29 Sep 2004 05:34:35 -0700, "Ramesh Natarajan"
more appropriate for most of this and may have better ideas:

The problem with these lot of un wanted #includes is that my
development
platform is Tandem and as I understand, the file open and close are the
most
expensive operations on the platform. Unfortunately we dont use a
cross compiler
and depend on a native compiler that needs to be run on the Tandem!!

In general the compilation is pretty slow and with these header
problems it takes
forever to compile!!
From your example names I guess you are using the POSIX "personality"
or "subsystem" OSS. It's been a while since I've done so but as I
recall OSS file opens (or more precisely lookups) are unusually
expensive because the Unix-like filesystem must be emulated on the
real (Guardian) filesystem. Plus on Tandem almost all I/O is somewhat
more expensive because the OS is (and must be) message-based.

However, I would still expect opening and reading a few hundred files
(as long as none of them are absurdly large) to take only a few
seconds, maybe 5-10 at worst. If you have many -I directories that
need to be checked, maybe a few times that. If you are seeing worse,
it might be the system is not well configured for what you're doing --
there are a _lot_ of tuning "knobs" on Tandem and very few of them
automatic. You might ask your system manager if s/he has measured and
tuned for your type of workload. (Unless you are compiling on a
production system; then the response will probably be that the system
is tuned for production and if development suffers tough noogies.)

In fact if you have a large number of -I directories and can just
reduce them significantly, it will probably help. Maybe just create
one directory that contains a link or (I believe now) symlink to each
real file; you can do that with a few shell commands.

If a substantial number perhaps most of your included files are (or
can be) in one (single-level) directory or a few such, with names of
only up to 7 alphanumeric plus the .h, you might try putting them in a
Guardian subvolume and putting /G/somevol/mysubvol early in your
include path -- that _may_ bypass the emulation and go direct to
discprocess, I'm not sure.

Even more kludgily, the Tandem compilers have a nonstandard option to
#include only named sections of a file. You could combine all or at
least many of your current files into a single file with sections, and
change from a list of #include's to a single #include with a list of
sections; then you only have one open. But this won't help if one
"file" (section) #include's another, so you have to do the "all
includes at top level" style to really benefit. Also, regardless of
the order you specify the section names in the #include, they are
included in the order they appear in the file; you must make sure that
is consistent with any dependencies -- and if there are circular
dependencies that may be impossible. And of course this isn't
portable, although you could easily write a few lines of awk or
similar to convert it back when needed, or #if it.

All of these address only the symptom; if many of the #include's are
in fact unneeded as you said, it's obviously preferable to eliminate
them, for other systems and human readers as well. In addition to the
generic suggestions from others, I have one Tandem idea that _might_
help. The Tandem compilers used to have options to generate
cross-reference listings, although I can't find it in current (AFAICT)
manuals. If that option still exists, and possibly only if it includes
macros (#define's) which I don't recall, you could write a simple
program to go through such a listing, tally used symbols by file id,
and report any file ids not having any such symbol.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,058
Members
48,769
Latest member
Clifft

Latest Threads

Top