Compiler optimizations

sammy · Jan 15, 2008

Word up!

If there are any gcc users here, maybe you could help me out. I have a
program and I've tried compiling it with -O2 and -O3 optimization
settings.

The wired thing is that it actually runs faster with -O2 than with -O3,
even though -O3 is a higher optimization setting.

Have I found a bug in gcc? Could I be doing something wrong?

Cheers.

jameskuyper · Jan 16, 2008

sammy wrote:
....

If there are any gcc users here, maybe you could help me out. I have a
program and I've tried compiling it with -O2 and -O3 optimization
settings.

The wired thing is that it actually runs faster with -O2 than with -O3,
even though -O3 is a higher optimization setting.

Have I found a bug in gcc? Could I be doing something wrong?

The answer to both questions is "possibly". WIthout knowing the
details of your code AND the details of gcc's optimization strategies,
we can't be sure. Something that is in general an optimization could
easily, for a specific program using specific inputs, be a
pessimization instead.

For gcc issues, I'd recommend using a forum specialized for gcc; this
isn't it.

user923005 · Jan 16, 2008

Word up!

If there are any gcc users here, maybe you could help me out. I have a
program and I've tried compiling it with -O2 and -O3 optimization
settings.

The wired thing is that it actually runs faster with -O2 than with -O3,
even though -O3 is a higher optimization setting.

Have I found a bug in gcc? Could I be doing something wrong?

Higher optimization levels just means that compilers are requested to
use more esoteric types of optimization tricks. There is no guarantee
that {for instance} -O1 is faster than -O0 for that matter.
Take the example of inlining... It may make the code run faster due to
reduced function calls or it may make the code so large that stuff
that used to fit in the cache no longer does.

I suggest you direct specific performance problems with the GCC
compiler to the GCC compiler newsgroups.

P.S.
You can get good results with recent versions of GCC by using profile
guided optimization.

CBFalconer · Jan 16, 2008

sammy wrote:
...

The answer to both questions is "possibly". WIthout knowing the
details of your code AND the details of gcc's optimization strategies,
we can't be sure. Something that is in general an optimization could
easily, for a specific program using specific inputs, be a
pessimization instead.

For gcc issues, I'd recommend using a forum specialized for gcc; this
isn't it.

And he may be getting confused by the quantum effects on the system
timer.

Kelsey Bjarnason · Jan 16, 2008

Word up!

If there are any gcc users here, maybe you could help me out. I have a
program and I've tried compiling it with -O2 and -O3 optimization
settings.

The wired thing is that it actually runs faster with -O2 than with -O3,
even though -O3 is a higher optimization setting.

Have I found a bug in gcc? Could I be doing something wrong?

One point to ponder, which isn't specific to gcc but to optimisation in
general...

Many optimisations consume more space than the less optimised code. Loop
unrolling, for example, can do this. While this _can_ result in faster
code, it can _also_ potentially result in side effects such as exhausting
the cache memory. The net result can be a significant slowdown.

This sort of thing isn't really a bug; the optimiser has no way to know
what machines the code will run on.

Joachim Schmitz · Jan 16, 2008

Kelsey said:
One point to ponder, which isn't specific to gcc but to optimisation
in general...

Many optimisations consume more space than the less optimised code.
Loop unrolling, for example, can do this. While this _can_ result in
faster code, it can _also_ potentially result in side effects such as
exhausting the cache memory. The net result can be a significant
slowdown.

This sort of thing isn't really a bug; the optimiser has no way to
know what machines the code will run on.

If not the compiler/optimizer, who else?

Bye, Jojo

Rui Maciel · Jan 16, 2008

Kelsey said:
This sort of thing isn't really a bug; the optimiser has no way to know
what machines the code will run on.

If the compiler writers think that this is relevant and they need to figure
out a way to know any property of the target machine, couldn't they simply
add an option that enables the user to specify those target properties?

Rui Maciel

Malcolm McLean · Jan 16, 2008

Kelsey Bjarnason said:
This sort of thing isn't really a bug; the optimiser has no way to know
what machines the code will run on.

What input is probably more significant.
The length of an unbounded string is almost certainly a few tens of bytes or
less, for instance, so it makes sense to run a byte-by-byte strlen().
However if the string happens to be a DNA sequence then it may be hundreds
of kilobytes long, and so aligning to a word boundary and doing 32 or 64 bit
fetches will speed up code considerably. It is extremely difficult to tell a
compiler the difference between a sequence and a username, they are both
just strings of arbitrary length, to it.

jacob navia · Jan 16, 2008

sammy said:
Word up!

If there are any gcc users here, maybe you could help me out. I have a
program and I've tried compiling it with -O2 and -O3 optimization
settings.

The wired thing is that it actually runs faster with -O2 than with -O3,
even though -O3 is a higher optimization setting.

Have I found a bug in gcc? Could I be doing something wrong?

Cheers.

According to my experience with gcc all optimizations beyond 2 are
a waste of time.

The problem with gcc is that every person interested in compiler
algorithms has hacked gcc to put his/her contribution, making
the whole quite messy.

Within lcc-win, I have targeted only ONE optimization strategy:

Code size.

There is nothing that runs faster than a deleted instruction. Lcc-win
features a very simple peephole optimizer, that is after a single
goal: delete redundant loads/stores, and in general to try to
reduce the code size as much as possible.

No other optimizations are done (besides the obvious ones done at
compile time like constant folding, division by constants, etc)

Gcc tries it all. I think there is no optimization that exists
somewhere in compiler books that hasn't been tried in gcc.
Code movement/aligning of the stack/global CSE/
aggressive inlining/ and a VERY long ETC!

The result is not really impressing. the compiler is very slow
and the program is not very fast:

A matrix multiplication program for instance: (time in seconds)

lcc-win -O 1.851
gcc -O2 1.690
gcc -O3 1.802
gcc -O9 1.766
MSVC -Ox 1.427

With -O3 gcc is as slow as lcc-win (what is obviously an excellent
result ) And the delta between gcc and lcc-win in the best case
for gcc is just... 3.1%

If you look at the compilation speed of lcc-win vs gcc (a factor
of 5 or more) and the size of the source code (11MB of C for gcc,
1MB of C for lcc-win) things look clearer.

What is worst for the optimizer compilers is that CPUs are now
so complex that optimizations that before were fine like inlining
have completely lost all their justification now that a processor
can wait up to 50 cycles doing nothing waiting that the RAM
gives it the information.

In this context optimizing for SIZE is a winning strategy. And
allows lcc-win to have almost the same speed as gcc with a FRACTION
of the effort.

Just my $0.02

Stephen Sprunk · Jan 16, 2008

sammy said:
If there are any gcc users here, maybe you could help me out. I have a
program and I've tried compiling it with -O2 and -O3 optimization
settings.

The wired thing is that it actually runs faster with -O2 than with -O3,
even though -O3 is a higher optimization setting.

That sometimes happens. Some optimizations, particularly at GCC's higher
levels, are not guaranteed: they pay off most of the time, but sometimes
they hurt you. Also, many of the optimizations depend on the compiler
knowing the exact characteristics of the machine you'll run the code on; if
you tell GCC you have a i386 or a P4, but run the code on an Opteron, you
may get slower execution than if you told it you had an Opteron or used a
lower optimization level.

Depending on the code, using profile-guided optimization can provide a
significant performance boost as the compiler has more data on your specific
program (and your input data) versus static predictions that are tuned for
the "average" program.

Have I found a bug in gcc? Could I be doing something wrong?

If you flip a coin and guess the wrong result, is the coin buggy? No.

A compiler bug is when it doesn't properly translate a correct program.
Unless you're an expert, the most likely cause of improper results is that
your program isn't as correct as you think it is. Many advanced
optimizations cause odd results in C's undefined corners that compiling
simpler optimizations (or none at all) won't expose.

S

Randy Howard · Jan 16, 2008

If not the compiler/optimizer, who else?

How can it possibly know which computer(s) you will install and run it
on after it is compiled?

Not every program is something for you to play with for a bit in ~/src
then forget about. ;-)

Randy Howard · Jan 16, 2008

If the compiler writers think that this is relevant and they need to figure
out a way to know any property of the target machine, couldn't they simply
add an option that enables the user to specify those target properties?

Some provide switches to optimize for different "families" of
processors. The problem is, it doesn't allow for any new hardware that
comes along after the compile is completed or after the compiler was
written. Also, it doesn't allow for a binary to be used on multiple
hardware platforms from the same build while enjoying this special
attention.

Randy Howard · Jan 16, 2008

Within lcc-win, I have targeted only ONE optimization strategy:

Code size.

There is nothing that runs faster than a deleted instruction.

<shakes head>

#include "examples of loop unrolling improving performance"

I wonder if everyone using lcc-win today realize just how narrow your
view on optimization is?

jacob navia · Jan 16, 2008

Randy said:
<shakes head>

#include "examples of loop unrolling improving performance"

I wonder if everyone using lcc-win today realize just how narrow your
view on optimization is?

Can you explain the results ?

Of course it is narrow. It is a Reduced Optimization Set Compiler
(ROSC).

Jokes aside, obviously for you, the results aren't important but...
what?

Rui Maciel · Jan 16, 2008

Randy said:
Some provide switches to optimize for different "families" of
processors. Â The problem is, it doesn't allow for any new hardware that
comes along after the compile is completed or after the compiler was
written. Â Also, it doesn't allow for a binary to be used on multiple
hardware platforms from the same build while enjoying this special
attention.

That isn't exactly a compiler problem, is it?

Rui Maciel

user923005 · Jan 16, 2008

How can it possibly know which computer(s) you will install and run it
on after it is compiled?

GCC aside:
-march is supposed to be a promise of that. If you run it on
something else then you won't get the sort of performance you were
hoping for. Many other compilers have this same sort of effect (even
producing code that will only run on certain CPUs in some instances).

Not every program is something for you to play with for a bit in ~/src
then forget about. ;-)

Rats. How deflating.

Chris Dollin · Jan 16, 2008

Randy said:
How can it possibly know which computer(s) you will install and run it
on after it is compiled?

The optimizer could run on the target machine, so "this one"
would be the appropriate answer.

[Warning: mere possibility isn't evidence of implementation.]

jacob navia · Jan 16, 2008

Chris said:
Randy said:

How can it possibly know which computer(s) you will install and run it
on after it is compiled?

Click to expand...

The optimizer could run on the target machine, so "this one"
would be the appropriate answer.

[Warning: mere possibility isn't evidence of implementation.]

Shipping the optimizer with your application?

Randy Howard · Jan 16, 2008

GCC aside:
-march is supposed to be a promise of that. If you run it on
something else then you won't get the sort of performance you were
hoping for. Many other compilers have this same sort of effect (even
producing code that will only run on certain CPUs in some instances).

I think you missed what I was saying there. You can optimize with
something like -march for a specific hardware type, but when you move
that binary to other machines it isn't a promise of anything, it may
not even run properly.

CJ · Jan 16, 2008

Chris said:
Chris said:

The optimizer could run on the target machine, so "this one"
would be the appropriate answer.

[Warning: mere possibility isn't evidence of implementation.]

Click to expand...

Shipping the optimizer with your application?

Yes, though that rather depends on shipping the source code with your
application too, so it wouldn't be much use for closed-source programs
like lcc-win for example...

C Compiler and "Profile Guided Optimizations"	7	May 12, 2007
gcc, spectral-norm benchmark faster without optimizations	2	Oct 4, 2012
Who can explain this bug?	57	Apr 17, 2013
Portably replacing -0.0 with 0.0	20	Jul 19, 2013
Determine if two integers have different sign	2	Apr 30, 2012
thread lockup when comiling with optimizations.	3	Sep 21, 2004
Scipy install Problems	1	Oct 17, 2023
Optimization idea: put (y&1) in [] instead of if()	11	Nov 4, 2009

Compiler optimizations

sammy

jameskuyper

user923005

CBFalconer

Kelsey Bjarnason

Joachim Schmitz

Rui Maciel

Malcolm McLean

jacob navia

Stephen Sprunk

Randy Howard

Randy Howard

Randy Howard

jacob navia

Rui Maciel

user923005

Chris Dollin

jacob navia

Randy Howard

CJ

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads