C Compiler and "Profile Guided Optimizations"

L

llothar

Does anybody have some benchmarks or links to articles that compare
this for different compiler implementations?
I would especially like to see if it is usefull on MSVC, Intel 9.0 C
and gcc.
Also what is about the effect of "interprocedural optimization".

All my use cases are 98% integer performance dominated. Currently i
only use -O2 or -O3 for MSVC and gcc but i would really like to now if
it is worth to spend time on optimization (which means that i would
see a 20% improvement by this two kinds of optimizations).
 
L

llothar

Also what is about the effect of "interprocedural optimization".

Let me clarify this, i'm already using inline for functions that i
think are good to inline (everything that is just a few statements
long and does not have local declared variables or conditionals).
 
I

Ian Collins

llothar said:
Let me clarify this, i'm already using inline for functions that i
think are good to inline (everything that is just a few statements
long and does not have local declared variables or conditionals).
Don't bother, a decent compiler will take care of inlining for you.
 
I

Ian Collins

llothar said:
Does anybody have some benchmarks or links to articles that compare
this for different compiler implementations?
I would especially like to see if it is usefull on MSVC, Intel 9.0 C
and gcc.
Also what is about the effect of "interprocedural optimization".

All my use cases are 98% integer performance dominated. Currently i
only use -O2 or -O3 for MSVC and gcc but i would really like to now if
it is worth to spend time on optimization (which means that i would
see a 20% improvement by this two kinds of optimizations).
You really should try this with your own code, all code is different and
what works well for one author may not work at all for you. I always
experiment with profile driven optimisations for each new application to
find the best combinations for it.
 
W

websnarf

Does anybody have some benchmarks or links to articles that compare
this for different compiler implementations?

PGO is usually best for runtime feedback on branch prediction
statistics. The compiler can then use the hinted branch instructions,
or flip the sense of the branch so it tends to be fall through more of
the time (this is better on the decoders and trace cache.) However,
this really tended to make more of a difference with the deeply
pipelined P4s than he relatively shorter pipeline Athlon/Opteron and
Core architectures.
I would especially like to see if it is usefull on MSVC, Intel 9.0 C
and gcc. Also what is about the effect of "interprocedural optimization".

I don't remember. I usually just turned it on and saw no difference.
But that's because my code tends to lean on inner loops, not call
overhead.
All my use cases are 98% integer performance dominated. Currently i
only use -O2 or -O3 for MSVC and gcc but i would really like to now if
it is worth to spend time on optimization (which means that i would
see a 20% improvement by this two kinds of optimizations).

Truly integer limited? As in cryptography or something of that
nature? If so, then your best bet is to try for SIMD or just general
parallelism. If that doesn't buy you anything, then there's not much
you can do with the "micro-optimization" angle.
 
L

llothar

Truly integer limited? As in cryptography or something of that
nature? If so, then your best bet is to try for SIMD or just general

As in data movement and script interpreter execution.
I really can't see any real use for SIMD in this case, but it has a
lot
of calls/jumps thats why i ased about PGO.

But it seems that nobody has a real success story that make me curious
enough to see if there is a speed burst.
 
I

Ian Collins

llothar said:
As in data movement and script interpreter execution.
I really can't see any real use for SIMD in this case, but it has a
lot
of calls/jumps thats why i ased about PGO.

But it seems that nobody has a real success story that make me curious
enough to see if there is a speed burst.
That depends how you judge success. The best I have seen is about a 10%
speed up with one of my applications, where I had a good set of
representative data to run through the training runs for the profiler.

It shouldn't take you long to try it out for yourself.
 
T

Tim Prince

Ian said:
That depends how you judge success. The best I have seen is about a 10%
speed up with one of my applications, where I had a good set of
representative data to run through the training runs for the profiler.

It shouldn't take you long to try it out for yourself.
Successful compilers have had to improve their handling of the case
without PGO, leaving less improvement to be gained. There are too many
applications where PGO is impractical, so good optimization without it
could also be taken as a success.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top