sammy said:
Word up!
If there are any gcc users here, maybe you could help me out. I have a
program and I've tried compiling it with -O2 and -O3 optimization
settings.
The wired thing is that it actually runs faster with -O2 than with -O3,
even though -O3 is a higher optimization setting.
Have I found a bug in gcc? Could I be doing something wrong?
Cheers.
According to my experience with gcc all optimizations beyond 2 are
a waste of time.
The problem with gcc is that every person interested in compiler
algorithms has hacked gcc to put his/her contribution, making
the whole quite messy.
Within lcc-win, I have targeted only ONE optimization strategy:
Code size.
There is nothing that runs faster than a deleted instruction. Lcc-win
features a very simple peephole optimizer, that is after a single
goal: delete redundant loads/stores, and in general to try to
reduce the code size as much as possible.
No other optimizations are done (besides the obvious ones done at
compile time like constant folding, division by constants, etc)
Gcc tries it all. I think there is no optimization that exists
somewhere in compiler books that hasn't been tried in gcc.
Code movement/aligning of the stack/global CSE/
aggressive inlining/ and a VERY long ETC!
The result is not really impressing. the compiler is very slow
and the program is not very fast:
A matrix multiplication program for instance: (time in seconds)
lcc-win -O 1.851
gcc -O2 1.690
gcc -O3 1.802
gcc -O9 1.766
MSVC -Ox 1.427
With -O3 gcc is as slow as lcc-win (what is obviously an excellent
result ) And the delta between gcc and lcc-win in the best case
for gcc is just... 3.1%
If you look at the compilation speed of lcc-win vs gcc (a factor
of 5 or more) and the size of the source code (11MB of C for gcc,
1MB of C for lcc-win) things look clearer.
What is worst for the optimizer compilers is that CPUs are now
so complex that optimizations that before were fine like inlining
have completely lost all their justification now that a processor
can wait up to 50 cycles doing nothing waiting that the RAM
gives it the information.
In this context optimizing for SIZE is a winning strategy. And
allows lcc-win to have almost the same speed as gcc with a FRACTION
of the effort.
Just my $0.02