How to force 'inline' with GCC or ICC

C

Chris Theis

Patrick Laurent said:
I found nothing on the GCC mailing list.
I posted a mail...waiting...

As you're obviously making use of sophisticated template mechanisms you
should be careful comparing the results of different compilers. You should
be aware that Intel and GCC follow different approaches on how these things
are treated and this naturally has an impact on the generated code. Hence,
as it was mentioned before the lack of speed might not necessarily be
related only the inlining only! You would have to compare the whole
generated machine code to look deeper into this.

However, the question of to inline or not to inline should be left to the
compiler as it is closely connected to the other optimizations. For example
inlining might cause or even prevent thrashing and the same is true for
cache misses. The decision is based on many things, e.g., does the function
call another inlined function, are there loops, does it recur.

BTW have you tried your code with GCC 4.0 which is based on a totally
different design and also optimization approach?

Cheers
Chris
 
P

Patrick Laurent

As you're obviously making use of sophisticated template mechanisms you
should be careful comparing the results of different compilers.

I agree, but I still can compare the execution speed on the same computer,
with a chronometer.
The winner is the quickest.
I don't really care how the compiler does its job, but it must give a quick
executable.
And in this case GCC is between 3 times and 20 times slower.
However, the question of to inline or not to inline should be left to the
compiler as it is closely connected to the other optimizations.

I'd love not be concerned with inlining and let this difficulty to the
compiler, but I have many reasons to think that the slowness is due to bad
inlining from GCC.
BTW have you tried your code with GCC 4.0 which is based on a totally
different design and also optimization approach?
I could not test with GCC 4.0 yet, because the version is very new, and our
administrator thinks (I don't know why, he must have his reasons) that it is
not stable enough.

Pat
 
C

Chris Theis

Patrick said:
I agree, but I still can compare the execution speed on the same computer,
with a chronometer.
The winner is the quickest.
I don't really care how the compiler does its job, but it must give a quick
executable.
And in this case GCC is between 3 times and 20 times slower.

Ouch then something seems to go quite wrong here. GCC is not that bad
normally. We frequently use it here at CERN, also for speed sensitive
applications & simulations. I somehow have the impression that this
might be closely related to your code.

I just took a quick look at your code and you're swinging the keyword
inline like a hammer. Most of the non-class functions that you declare
inline won't be inlined anyway because of their structure. Inlining
those would presumably have more negative than positive effects
regarding cache misses. Furthermore, you have quite a lot of redundant
code, which could be optimized (and in most cases will be) by the
compiler. However, it strikes me a little odd, as your obviously very
speed concerned.
I'd love not be concerned with inlining and let this difficulty to the
compiler, but I have many reasons to think that the slowness is due to bad
inlining from GCC.



I could not test with GCC 4.0 yet, because the version is very new, and our
administrator thinks (I don't know why, he must have his reasons) that it is
not stable enough.

Pat

Okay GCC 4.0 might not be that stable, however it might be worth a shot
trying it with a local installation on your private machine.

Cheers
Chris
 
I

Ioannis Vranos

Patrick said:
I agree, but I still can compare the execution speed on the same computer,
with a chronometer.
The winner is the quickest.


Unless you did not express yourself well, I think that this one is original. :)
 
P

Paul Schneider

Patrick said:
I only wrote the parameters names.

Yes,I did supply a value (in fact I tried many values, most of the time big
values).

But GCC still doesn't inline many important functions, in comparison ICL on
windows does.



You are right, GCC tells which parameter is exceeded, so I always supplied a
bigger value to every corresponding parameter (up to astronomic values). But
it did not work: a few more functions were inlined, but the execution speed
is still very much slower than on Windows.



Is there no way to force inlining?



Pat
I had a similar issue not too long ago. It was a numerical math problem
that could be parametrized with function templates. With gcc 3.4 I
cranked the parameters up ( I don't recall the actual numbers and the
parameters, but I think I tried something well in the range of 100000)
until everything was inlined. It gave me a factor ten in execution
speed. Also change the right parameter. I am not kidding, there are lots
of parameters that have 'inline' in their name.

I have never programmed for a windows platform, but on linux gcc
produces very fast code in my experience. Comparabable in speed to the
code produced with icc or the portland compiler

p
 
P

Patrick Laurent

I just took a quick look at your code and you're swinging the keyword
inline like a hammer. Most of the non-class functions that you declare
inline won't be inlined anyway because of their structure. Inlining
those would presumably have more negative than positive effects
regarding cache misses. Furthermore, you have quite a lot of redundant
code, which could be optimized (and in most cases will be) by the
compiler. However, it strikes me a little odd, as your obviously very
speed concerned.

I declared many inline functions, but there are small functions.
I admit that the combinaison of many functions can become quite big.
But as far as I know the 'O2' or 'O3' options consider any function (inline
or not) as potentially inlinable.
GCC might be good for C programs, but not for generic C++ (in comparison to
ICL).
If you think that my program is redundant (or bad), contributions are
welcome, but not critics.
You could have a look to MTL,Pooma,Blitz++,Newmat: these libraries use
equivalent princips.
My philosophy was never to adapt my code to the compilers, I want to have my
code as clean as possible.
I had a similar issue not too long ago. It was a numerical math problem
that could be parametrized with function templates. With gcc 3.4 I
cranked the parameters up ( I don't recall the actual numbers and the
parameters, but I think I tried something well in the range of 100000)
until everything was inlined. It gave me a factor ten in execution
speed. Also change the right parameter. I am not kidding, there are lots
of parameters that have 'inline' in their name.

I have never programmed for a windows platform, but on linux gcc
produces very fast code in my experience. Comparabable in speed to the
code produced with icc or the portland compiler

I am glad to see that someone had a similar experience to me.
I already tried many parameters (with astronmic values as well), but nothing
was satisfying.

In my case, ICC is about as slow as GCC, I cannot explain why it does not
compile like ICL. I strongly suppose that ICC does not inline the functions
like I would like. But I don't know how to verify it.

Pat
 
C

Chris Theis

Patrick Laurent said:
I declared many inline functions, but there are small functions.
I admit that the combinaison of many functions can become quite big.
But as far as I know the 'O2' or 'O3' options consider any function
(inline or not) as potentially inlinable.
GCC might be good for C programs, but not for generic C++ (in comparison
to ICL).
If you think that my program is redundant (or bad), contributions are
welcome, but not critics.
You could have a look to MTL,Pooma,Blitz++,Newmat: these libraries use
equivalent princips.
My philosophy was never to adapt my code to the compilers, I want to have
my code as clean as possible.

And your philosophy is a good one, I absolutely agree on that. However, if
you really wanna drive it to the edge then you will have to resort to some
compiler specific things at some time. The aforementioned libs make
extensive use of meta-template programming, which is why they are fast. If I
remember correctly there should be a FFT implementation based on this
technique in Tod's Blitz++.

Anyway, first of all relax & take a depp breath. Second, if you put your
code on the web and make it public you will have to face criticism (it
happens and happened to the best), but you should realize that criticism is
not necessarily a bad thing! Speaking of this I never said that your code is
bad, but I said that after a quick glance I saw that there are redundant
parts and I'll stick to this statement.

In FFT for example you declare variables of value_type very often just to
assign a value of an array to it. Afterwards you call a new function with
the sum or the difference of these newly declared variables. Why not just
use the array itself? Of course, the compiler can and most certainly will
optimize this away after data-flow analysis, but you're making the life of
the compiler harder than necessary.

Regarding potential inlining candidates I'll have to clarify some things
here. The compiler sees every function which is explicitly declared inline
or defined within the class statement as a potential inline candidate. But,
this is only true if certain requirements are fullfilled and here is where
the problem is hidden and numerous issues are to be considered. I'll only
cover a few to give you an idea:

# Virtual functions:
Virtual functions were said to be out of the game regarding inlining,
however this is not necessarily true. The important thing is that for
virtual functions polymorphism must work - so this is the main condition.
However, if the compiler has a way to figure out the actual type of the
object which will take care of the function call, then even virtual
functions can be inlined. In some cases this is easy, whereas in others this
proves quite tricky and most compilers won't go into detailed analysis here.

# Large functions:
Large functions can very often result in cache misses and are thus not a
good candidate for inlining. In your code you declare quite a huge amount of
large functions as inlined. This won't have any effect as the compiler is
free to choose and thus, in most cases, will choose not to inline. The same
is true for recursive function calls of inlined functions or inlined
functions calling other functions. This is a construct you see quite often
in your implementation.
If you really want to inline large functions and recursive function calls
etc. then you should resort to meta-templates.

[SNIP]

Cheers
Chris
 
P

Patrick Laurent

And your philosophy is a good one, I absolutely agree on that. However, if
you really wanna drive it to the edge then you will have to resort to some
compiler specific things at some time. The aforementioned libs make
extensive use of meta-template programming, which is why they are fast. If
I remember correctly there should be a FFT implementation based on this
technique in Tod's Blitz++.

Anyway, first of all relax & take a depp breath. Second, if you put your
code on the web and make it public you will have to face criticism (it
happens and happened to the best), but you should realize that criticism
is not necessarily a bad thing! Speaking of this I never said that your
code is bad, but I said that after a quick glance I saw that there are
redundant parts and I'll stick to this statement.

In FFT for example you declare variables of value_type very often just to
assign a value of an array to it. Afterwards you call a new function with
the sum or the difference of these newly declared variables. Why not just
use the array itself? Of course, the compiler can and most certainly will
optimize this away after data-flow analysis, but you're making the life of
the compiler harder than necessary.

My FFT is somewhat quicker than FFTW (both compiled with ICL on Window, made
many speed tests with or without SIMD).
My FFT code is much smaller than FFTW, the only problem is that the
requirement to the compiler is much bigger.
That is where GCC and ICC fail (until yet).

I accept constructive critric, but since the beginning, you critic my
programming style without understanding the reason why I programmed it so.
Here I just cite you from memory
-GCC inlines very well
-virtual functions bla bla bla
-to many inline in the library
-code redundant.
-...

So just know that your remark about all the variables for the FFT is totaly
unfounded.
If you did understood the code, you would have understood that 'value_type'
is a very fluctuant type that can describle complex<float>, complex<double>,
SIMD registers, and other various types.
To store them in a variable make the program far much quicker because the
compiler can store them in a register once for all.
Only using the array itself results in a catastrofic loss of time because
the compilers are not good enough to understand it, and the progam lose its
time with read/write accesses to the memory. I know this effect, and I could
debat many hours. You obviously did not even make a try.
So I help very much the compiler not the opposite.
By the way FFTW uses this technic too.

I know the effect of large functions.
By the way in my FFT, large function instances are only called once but
always in a loop.

Don't tell me how virtual functions work, I know it and it is not the point.
Did you see dynamic polymorphism in my FFT? No.

So please don't say that my code is redundant, but say "I don't understand
this and this" or "I would have done this differently".
You should learn a little bit modesty, but instead of this you say:
We frequently use it here at CERN, also for speed sensitive applications &
simulations
I honestly do not feel at ease about your speed sensitive applications...

My question was "how force inlining", not "Should I inline".
Until yet, you did not tell a single thing that would help me a little bit.

Cheers
Patrick
 
C

Chris Theis

Patrick Laurent wrote:
[SNIP stuff that I won't start arguing about as it seems to be senseless
anyway]
I honestly do not feel at ease about your speed sensitive applications...

My question was "how force inlining", not "Should I inline".
Until yet, you did not tell a single thing that would help me a little bit.

I'm not gonna go into an argument regarding your coding style and
whatever you think I understand or I don't. However, you seem to be
rather ignorant about all the comments that some people (very
experienced ones like Ioannis and others) here gave you (like see a GCC
group, which was mentioned not only once!). Therefore, I'll just resort
to answering your original question again:

Read the standard! - Facit: You cannot force inlining - end of the story.

Regarding the issue of modesty - you might treat yourself to
reconsidering your attitude towards people who are spending time to help
you.

Best regards
Chris
 
P

Patrick Laurent

Did you ask in a GCC mailing list?

I did at gcc-help.
I was adviced by someone in charge, to post my problem as a bug.
It is not really a bug, nevertheless I have posted my problem for 2 days.
I am still waiting an answer...

Pat
 
C

Chris Theis

Patrick Laurent said:
I did at gcc-help.
I was adviced by someone in charge, to post my problem as a bug.
It is not really a bug, nevertheless I have posted my problem for 2 days.
I am still waiting an answer...

What do you expect? Do you really think that GCC folks are just waiting to
look into your problem - probably they have the same attitude towards
criticism (note that it is not spelled critic, which is something completely
different) like you. Patience is a virtue.

Chris
 
K

Karl Heinz Buchegger

[snip a lot of ranting]
My question was "how force inlining", not "Should I inline".

Then the answer is:
There is no way to 'force the compiler to inline'.

If the capabilities of one compiler are not good enough
for you, you are free to choose a different compiler.
Until yet, you did not tell a single thing that would help me a little bit.

Good luck with that altitude. After all it is *your* problem.
 
I

Ioannis Vranos

Patrick said:
I did at gcc-help.
I was adviced by someone in charge, to post my problem as a bug.
It is not really a bug, nevertheless I have posted my problem for 2 days.
I am still waiting an answer...


OK, I would also try the plain "gcc" mailing list. If their answer doesn't satisfy you,
then perhaps you should pick another compiler.
 
P

Patrick Laurent

OK, I would also try the plain "gcc" mailing list. If their answer doesn't
satisfy you, then perhaps you should pick another compiler.

I do, I use ICL on Windows.
But I want other people to use my library on other systems.
They can, but a similar speed to ICL is in some cases not available.

Pat
 
I

Ioannis Vranos

Patrick said:
I do, I use ICL on Windows.
But I want other people to use my library on other systems.
They can, but a similar speed to ICL is in some cases not available.


Well if it compiles, they can use it. You can't make much for compiler deficiencies (if
there is one in this case).

If by ICL you mean Intel C++ compiler, then if I recall well, there is also a Linux
version of it.


There are also two other ways to inline. Using macros, or convert your run-time
computations to compile-time (template meta-programming). Both are an entire world of
their own (for advanced uses). Or use assembly, but that one is not portable. :)
 
P

Patrick Laurent

If by ICL you mean Intel C++ compiler, then if I recall well, there is
also a Linux version of it.
Yes, Intel C++ Compiler for Linux, is called ICC.
But ICC is as slow as GCC (in comparison to ICL), I don't know why ICC does
not inline like ICL.
There are also two other ways to inline. Using macros, or convert your
run-time computations to compile-time (template meta-programming). Both
are an entire world of their own (for advanced uses). Or use assembly, but
that one is not portable. :)
I can exclude the macro suggestion.
I can exclude assembly too!
I think I know what you mean with 'meta programming'. I can exclude it too.
I use meta programming for type handling. I cannot imagine any way to
program a FFT, or any other signal processing functions with meta
programming. It seems to be theoretically possible, but anyway I think the
compilers are not quite ready yet.

Pat
 
L

Lionel B

Patrick Laurent said:
[...]
There are also two other ways to inline. Using macros, or convert your
run-time computations to compile-time (template meta-programming). Both
are an entire world of their own (for advanced uses). Or use assembly, but
that one is not portable. :)
I can exclude the macro suggestion.

On what grounds?
I can exclude assembly too!

On what grounds?
I think I know what you mean with 'meta programming'. I can exclude it too.

On what grounds?
I use meta programming for type handling. I cannot imagine any way to
program a FFT, or any other signal processing functions with meta
programming.

Because you can't imagine any way doesn't mean it's not possible.
It seems to be theoretically possible,

.... so you *can* imagine a way to do it, then ...?
but anyway I think the
compilers are not quite ready yet.

How do you know that?

Patrick: Ioannis has offered a range of (to my mind reasonable and helpful) suggestions which you have summarily trashed
without any hint of justification. How do we help you? You seem to be after some magic quick fix - has it dawned on you
yet that perhaps there isn't one?
 
I

Ioannis Vranos

Patrick said:
I can exclude the macro suggestion.


Actually this is what I would begin with. You could make macro functions (any effort to
force a compiler with source code, is going to be ugly anyway).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top