A way to decrease executable sizes?

F

Filipe Martins

Hello.

I've read somewhere that the executable is smaller if we use a source file
for each function!
So, I tested this with gcc and it seams to confirm! What seams to happen is
that if we call a function from a source-files that defines 3 others, the
linkers includes the code of all the 4 functions, even if the on we call
doesn't rely on the others!

What do you people think about this?
Is there any way to make the linker reject all the code that isn't needed?

If there isn't any other answer to this, I may in the future create a small
app that could be used in release mode to separate all the functions in
their own files and compile all of it.


Please state your opinions and theorys about this.

PS: If someone wants it I can make the test project available.
 
R

Rolf Magnus

Filipe said:
Hello.

I've read somewhere that the executable is smaller if we use a source
file for each function!

This might be the case or it might not, depending on your
compiler/linker.
So, I tested this with gcc and it seams to confirm! What seams to
happen is that if we call a function from a source-files that defines
3 others, the linkers includes the code of all the 4 functions, even
if the on we call doesn't rely on the others!
Yes.

What do you people think about this?

I think that it doesn't matter much. First, executable size doesn't
really matter much on most platforms. Second, why would you write
functions that are never used?
Is there any way to make the linker reject all the code that isn't
needed?

That depends on the linker and/or compiler.
 
O

osmium

Rolf said:
This might be the case or it might not, depending on your
compiler/linker.


I think that it doesn't matter much. First, executable size doesn't
really matter much on most platforms. Second, why would you write
functions that are never used?

Is it a given that a program that calls sin() will also call tanh()? ISTM
that is the kind of thing the OP is talking about.
 
L

Leor Zolman

Hello.

I've read somewhere that the executable is smaller if we use a source file
for each function!
So, I tested this with gcc and it seams to confirm! What seams to happen is
that if we call a function from a source-files that defines 3 others, the
linkers includes the code of all the 4 functions, even if the on we call
doesn't rely on the others!

What do you people think about this?
Is there any way to make the linker reject all the code that isn't needed?

If there isn't any other answer to this, I may in the future create a small
app that could be used in release mode to separate all the functions in
their own files and compile all of it.


Please state your opinions and theorys about this.

PS: If someone wants it I can make the test project available.

First of all, this is borderline off-topic because it really isn't a
language issue. But it is something that comes up and folks who create
projects should be aware of the general approach that the tools take.

My experience is not incredibly up-to-date with respect to the latest
tools and/or project configurations being used, but here's how I see it:

When you create a project that is composed of several primary source files,
the usual scenario is that all the functions in all the source files are
important parts of your program. If you use command line tools and give a
command such as this:

cl app.cpp more.cpp more2.cpp more3.cpp

then the compiler/linker driver (in this case it would happen to be MSVC's)
compiles all the cpp files into obj files, and then links them all into a
single executable. IOW, it doesn't bother checking for dependencies;
evidently, this is the same as the behavior you were seeing with gcc.

To create object files composed of many general-purpose functions, you'd
typically use some sort of library manager utility to create a special kind
of object file: a "library" file. On Win32/etc., these would have the
".LIB" extension, on Unix they'd be .a files, etc. When these library files
are provided on the command line:

cl app.obj more.obj lib1.lib lbi2.lib

then the extension clues in the linker that each and every function within
the libraries is /not/ necessarily one we want, and it only selects those
functions that are actually needed. IOW, functions are loaded based on
dependencies. That's why only the functions you actually /use/ out of the
Standard Library get loaded: they come from library files, not plain old
object files.

So to make a long story stay long, if you want the linker to pick and
choose from an object file, make it a library rather than a plain old
object file.

Disclaimer: I'm sure there are all sorts of special cases, different
extensions, etc., that would make some or most of what I've just said wrong
in some context, but I hope the basic idea is apropos.
-leor
 
E

E. Robert Tisdale

Filipe said:
I've read somewhere that the executable is smaller
if we use a source file for each function!
So, I tested this with gcc and it seams to confirm!
What seams to happen is that,
if we call a function from a source-files that defines 3 others,
the [link editor] includes the code of all the 4 functions,
even if the on we call doesn't rely on the others!

What do you people think about this?
Is there any way to make the linker reject all the code
that isn't needed?

If there isn't any other answer to this, I may, in the future,
create a small app that could be used in release mode
to separate all the functions in their own files

You mean like csplit?
and compile all of it.

Please state your opinions and theories about this.

PS: If someone wants it I can make the test project available.

Make sure that each file includes all of the headers that it needs.

Once you have split the files up, you may notice that
it takes longer, maybe a lot longer, to compile each time
that you make changes to one of the header files.
This is because each file will include the header file
in each translation unit and must re-parse it every time.
You can avoid this during program development
by simply creating another source file
which includes each of the separated source files.
If the headers are idempotent,
only the first one will be included and parsed by the compiler.
 
K

Kevin Goodsell

Filipe said:
Hello.

I've read somewhere that the executable is smaller if we use a source file
for each function!

<snip>

<off-topic>
If you want to reduce executable size, you might be better off looking
into some of the tools that are available for that purpose. There's a
GNU tool called 'strip' that can remove information that's not required
for execution. I'm not sure if this is based on a standard UNIX tool or not.

UPX is a compressor for executables. I don't know much about it, but it
seems to be able to compress several different executable formats.
Execution speed is apparently impacted somewhat. The output is still an
executable file, and can be run just like the original I believe.
</off-topic>

-Kevin
 
R

Rolf Magnus

osmium said:
Is it a given that a program that calls sin() will also call tanh()?
ISTM that is the kind of thing the OP is talking about.

To my knowledge, gcc (and that's what he was asking specifically about)
has functions like sin and tanh now built into the compiler itself,
since those are on many CPUs directly available on assembler level.
Other non-builtin library functions are usually linked as shared
library, where a concept of leaving out single functions doesn't make
much sense.
 
K

Karl Heinz Buchegger

To my knowledge, gcc (and that's what he was asking specifically about)
has functions like sin and tanh now built into the compiler itself,
since those are on many CPUs directly available on assembler level.
Other non-builtin library functions are usually linked as shared
library, where a concept of leaving out single functions doesn't make
much sense.

But the principle is still the same. Every one of us programmers has some
sort of common code which is just linked into the executable and does not
reside in some library. I just use those source files and don't care much
about functions in that source which are not needed in this specific project.
 
G

Gary Labowitz

To create object files composed of many general-purpose functions, you'd
typically use some sort of library manager utility to create a special kind
of object file: a "library" file. On Win32/etc., these would have the
".LIB" extension, on Unix they'd be .a files, etc. When these library files
are provided on the command line:

cl app.obj more.obj lib1.lib lbi2.lib

then the extension clues in the linker that each and every function within
the libraries is /not/ necessarily one we want, and it only selects those
functions that are actually needed. IOW, functions are loaded based on
dependencies. That's why only the functions you actually /use/ out of the
Standard Library get loaded: they come from library files, not plain old
object files.

So to make a long story stay long, if you want the linker to pick and
choose from an object file, make it a library rather than a plain old
object file.

I'd like to think that linkers did what you said, Leor, but I am still
under the impression that when a linker searches a library (such as
lib1.lib) it looks at the header and finds the name of a function it
is trying to resolve and then includes the entire library. Can it, in
fact, only pull out the function and place it in the exe? I don't
remember us doing that in the OS/360 linker, and the fact that
compiling the functions separately results in smaller exe would seem
to confirm that the entire module is included otherwise.
Of course, this could (and probably is) different with each linker,
making the whole conversation a troll.
Can one of the current developers here confirm that some linker
somewhere does in fact selectively pull function code from libraries
when linking?
I'll try some tests with g++ and report back.
 
L

Leor Zolman

I'd like to think that linkers did what you said, Leor, but I am still
under the impression that when a linker searches a library (such as
lib1.lib) it looks at the header and finds the name of a function it
is trying to resolve and then includes the entire library. Can it, in
fact, only pull out the function and place it in the exe? I don't
remember us doing that in the OS/360 linker, and the fact that
compiling the functions separately results in smaller exe would seem
to confirm that the entire module is included otherwise.
Of course, this could (and probably is) different with each linker,
making the whole conversation a troll.
Can one of the current developers here confirm that some linker
somewhere does in fact selectively pull function code from libraries
when linking?
I'll try some tests with g++ and report back.

Yeah, I wasn't sure exactly how to go about testing this. I'm pretty sure
that at some point in the past I've verified this behavior, but I haven't
really delved into it in ages.

I figured I'd have a better chance at getting accurate size measurements
under Unix, but unfortunately I have no C/C++ development tools in my
Cygwin installation.

So I resorted to good ole' MSVC to see what I could come up with. The
executable size doesn't mean squat with MSVC, because it always seem to
round up for some reason, but I figured there ought to be an option to
print out a symbol table I could search for the usual suspects. I created
the following program, which I compiled once with the log10 call in there
and once without:

//
// does using one fn out of a lib drag in all the others?
//

#include <iostream>
#include <cmath>
using namespace std;

int main()
{
double d = sqrt(5.0);
cout << "chi = (" << d << " + 1 ) / 2" << endl;

// Compared with and without the following line:
// d = log10(d);

cout << "log10(d) = " << d << endl;

return 0;
}

In the resulting map file for compiling as shown, there was exactly one
occurrence of the pattern "log10" in the generated map file (MSVC 7.1, /Fm
option), shown with surrounding context (3 lines total, I've wrapped them
manually and put a space between):

0002:00002278 ??_C@_04COOMCNPB@sinh?$AA@
00414278 LIBC:fpexcept.obj

0002:00002280 ??_C@_05HGHHAHAP@log10?$AA@
00414280 LIBC:fpexcept.obj

0002:00002288 ??_C@_03MGHMBJCF@log?$AA@
00414288 LIBC:fpexcept.obj


This looks to be some sort of master symbol dispatch table, but I'd guess
it does not represent address of actual code. But I don't know for sure.

In the /other/ map file (with the log10 call uncommented), I still get the
same entries as above (different addresses), PLUS:

0001:00005610 _log10 00406610 f LIBC:log10.obj
0001:00005650 __CIlog10 00406650 f LIBC:log10.obj
0001:0000568b __CIlog10_default 0040668b f LIBC:log10.obj
0001:0000569f __log10_default 0040669f LIBC:log10.obj

0001:00008a50 __CIlog10_pentium4 00409a50 f
LIBC:log10_pentium4.obj
0001:00008a68 __log10_pentium4 00409a68
LIBC:log10_pentium4.obj

0001:0000564f _$$$00002 0040664f f LIBC:log10.obj


So my guess would be that all those additional lines represent stuff that
got dragged into the executable for version #2 that was not in version #1.

Or I may be totally in dreamland. I don't know.
-leor
 
P

Peter van Merkerk

Gary said:
I'd like to think that linkers did what you said, Leor, but I am still
under the impression that when a linker searches a library (such as
lib1.lib) it looks at the header and finds the name of a function it
is trying to resolve and then includes the entire library. Can it, in
fact, only pull out the function and place it in the exe?

With MSVC you can enable the "function level linking" (/Gy) option so only
referenced functions are put in the executable. The last time I checked
with G++ it always included the whole .obj. In that case it would help to
put every function in a separate compilation unit. IOW the answer to the
original question: it depends on the tools you use.
 
L

Leor Zolman

With MSVC you can enable the "function level linking" (/Gy) option so only
referenced functions are put in the executable. The last time I checked
with G++ it always included the whole .obj. In that case it would help to
put every function in a separate compilation unit. IOW the answer to the
original question: it depends on the tools you use.

Absolutely true, "tools and options" even, and I neglected to restate that
in my original response. What I was trying to say (and I still tend to
stray off the major point of a thread sometimes while getting distracted by
specific details...) was that there may be platform-specific solutions to
the problem of unwanted code being dragged into an executable that do not
necessarily involve separating all the independent pieces into separate
TU's. Based on my experience, creating a "library" would be one possible
path to explore. In the case of MSVC, Peter's /Gy would definitely be
another...
-leor
 
T

Thomas Matthews

Filipe said:
Hello.

I've read somewhere that the executable is smaller if we use a source file
for each function!
So, I tested this with gcc and it seams to confirm! What seams to happen is
that if we call a function from a source-files that defines 3 others, the
linkers includes the code of all the 4 functions, even if the on we call
doesn't rely on the others!

What do you people think about this?
Is there any way to make the linker reject all the code that isn't needed?

Good luck. I've been in the embedded systems arena for over 20 years
and have been wanting a linker that will remove unused functions.
It seems easier (and cheaper) to make a linker that removes at the
file level rather than the function level.

If there isn't any other answer to this, I may in the future create a small
app that could be used in release mode to separate all the functions in
their own files and compile all of it.

As others have stated, place your modules into a library and use the
library.

Please state your opinions and theorys about this.

PS: If someone wants it I can make the test project available.

In today's programming world, executable size is seldom a high concern.
Although in many embedded systems with small memory footprints,
executable size is a concern, but not the highest.

On my current project, the following are the priorites, due to the
fact that the executable will be masked into a Read Only Memory
device and changes after the mask will be very expensive (and
time consuming):
1. Correctness: Is it behaving as designed or required?
2. Robustness: Are all conditions considered?
Will it run into a deadlock or unknown state?
3. Code Size: Will it fit into the allocated space?
4. Speed: Will the events be serviced in the required
time?
5. Schedule: Will the executable be delivered on time.

{Although my associates would like to raise the priority of
the schedule.}

Some tricks to shrink your code:
1. Remove dead code, Functions, requirements and other stuff.
2. Consolidate code and data. If it is used more than once
share it.
3. Refrain from using system libraries functions. These
functions often have hidden dependencies and callout
more functions. For example, write a "hello world"
program using puts, printf and fwrite. Note the
size differences in the executable. The printf function
may include a floating point library even though your
program does not use floating point.

By the way, before you optimize for space, record how much
time you spend optimizing. Then subtract this time from
your schedule to find out how early you could have finished
your project.


--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
O

osmium

Gary said:
I'd like to think that linkers did what you said, Leor, but I am still
under the impression that when a linker searches a library (such as
lib1.lib) it looks at the header and finds the name of a function it
is trying to resolve and then includes the entire library. Can it, in
fact, only pull out the function and place it in the exe? I don't
remember us doing that in the OS/360 linker, and the fact that
compiling the functions separately results in smaller exe would seem
to confirm that the entire module is included otherwise.
Of course, this could (and probably is) different with each linker,
making the whole conversation a troll.
Can one of the current developers here confirm that some linker
somewhere does in fact selectively pull function code from libraries
when linking?
I'll try some tests with g++ and report back.

You probably remember this conversation from last December. After a lot of
to and fro it turned out that Microsoft, with their advanced linker was able
to cram this code:

fabs(1.23);

into only 24,576 bytes!

And there was a long harangue about how complicated that code really was and
so on .....

ISTM that the exponent of smart linkers was trusting Microsoft's glib
promises rather than their end results.

http://www.google.com/groups?hl=en&lr=lang_en&ie=UTF-8&selm=br0ag9$270udj%
241%40ID-179017.news.uni-berlin.de
 
P

Peter van Merkerk

Yeah, I wasn't sure exactly how to go about testing this. I'm pretty
sure that at some point in the past I've verified this behavior, but
I haven't really delved into it in ages.

The way I tested is as follows:

---------- test.c----------
#include <stdio.h>

void f1()
{
puts("test1");
}

void f2()
{
puts("test2");
}

---------- main.c ----------
void f1();
void f2();

int main(int argc, char* argv[])
{
f1();
return 0;
}
------------------------

Since f2() isn't referenced it doesn't have to be dragged into the
executable. You can use an hex editor to see if the "test2" string appears
in the executable. If linker puts everything in .obj into the executable
the string "test2" will be in the executable as well. If "test2" is not in
the executable only what is referenced (which is not necessarilly the same
as what is used!) is put into the executable. To perform this test it is
best to compile with optimization enabled and debug symbols disabled.

<OT>
With MSVC it is possible to reduce size of the executable by replacing the
standard runtime library. With a replacement runtime library executables
as small as 2.5 Kbyte can be created (with the standard runtime library the
size of the executable starts at 28 Kbyte).
</OT>
 
G

Gary Labowitz

Peter van Merkerk said:
With MSVC you can enable the "function level linking" (/Gy) option so only
referenced functions are put in the executable. The last time I checked
with G++ it always included the whole .obj. In that case it would help to
put every function in a separate compilation unit. IOW the answer to the
original question: it depends on the tools you use.
I never heard of the /Gy in MSVC, but then I don't use MSVC very often
(too many bugs!).
I did try g++ with two functions, each with a large footprint and
linked them separately and then together and got the following
results:

Calling module using functions separate functions together
both funcs 872kb 872kb
on func 482kb 872kb

It would appear that when the two functions are in one .o they are
both linked in regardless of whether one or both functions are called
by the base module. With them separate (in two .o's) it allows for
only including what you need.
It's a lot of bother, really, and I suspect that it doesn't matter
nowadays.
 
M

Michiel Salters

E. Robert Tisdale said:
Make sure that each file includes all of the headers that it needs.

Once you have split the files up, you may notice that
it takes longer, maybe a lot longer, to compile each time
that you make changes to one of the header files.
This is because each file will include the header file
in each translation unit and must re-parse it every time.

Unless your development environment supports pre-compiled headers
You can avoid this during program development
by simply creating another source file
which includes each of the separated source files.
If the headers are idempotent,
only the first one will be included and parsed by the compiler.

Except that such an approach breaks anonymous namespaces,
statics, and may lead to subtle bugs like missing headers
because they were "inherited" from another .cpp

Regards,
Michiel Salters
 
P

Peter van Merkerk

You probably remember this conversation from last December. After a lot
of
to and fro it turned out that Microsoft, with their advanced linker was able
to cram this code:

fabs(1.23);

into only 24,576 bytes!

2048 bytes to be exactly.
And there was a long harangue about how complicated that code really was and
so on .....

For small programs like this it often is the run-time library that is being
linked in that causes the bloat. If you want to you can make executables as
small as 2 KBytes (which mostly consist obligatory headers for the OS and
padding due to section alignment requirements), even with Microsoft tools.
Eventually it boils down to the tools being used and skills of the one using
the tools. In all fairness, often much more is linked into an executable
than you would expect at first. It requires quite a bit digging around to
figure out why (which can lead to quite surprising insights).

The question remains if it is worth the effort. In most cases not, but
sometimes you have no choice if it has to fit in a memory constrained
device. Either way this way beyond what is topical on comp.lang.c++.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top