Proposed Standard Change: inline this


W

W Karas

An instance x of class X could be declared with this syntax:

inline this X x;

This would potentially affect the object code generated a call x.mem(...), mem() a member function of X which was defined inline. The compiler could simply inline the function call. But if it didn't, the above declaration would cause the compiler to "consider" generating and calling object code for mem() with the hard-coded address of x in place of the (eliminated) implied "this" parameter. The compiler could further "consider" cascading this behavior to member functions, defined inline, that are directly or indirectly called by mem().

It's not unusual for classes to have only one instance of the class in a program. It this case, it seems wasteful to be passing the "this" parameter over and over with a never-changing value. One can make all class members static, or use a namespace "pseudo-class". But that seems a akin to using macros as alternatives to inline functions. Object code optimization should (ideally) either be automatic, or only require a minimal hint added to source code describing logical behavior. Also, this facility might be a useful optimization in cases where there are a small number of instances (greater than 1) of X in the program.

If you find this syntax unaesthetic, I agree, I'm very open to suggestions for alternatives.

On the other hand, I suspect, for typical contemporary architectures, generated object code may use a "dedicated" register for the "this" pointer. Inwhich case, this optimization would save the execution of few/no instructions.

Another point against is that this change, like templates, would drive moreJava-esqe class declarations with all the member functions defined inline.
 
Ad

Advertisements

Ö

Öö Tiib

An instance x of class X could be declared with this syntax:

inline this X x;

This would potentially affect the object code generated a call
x.mem(...), mem() a member function of X which was defined inline.
The compiler could simply inline the function call. But if it didn't,
the above declaration would cause the compiler to "consider" generating
and calling object code for mem() with the hard-coded address of x in
place of the (eliminated) implied "this" parameter. The compiler
could further "consider" cascading this behavior to member functions,
defined inline, that are directly or indirectly called by mem().

So the effect on case the compiler does not inline is that it considers
generating several variants of same function? What prevents it from doing
that already without that whole thing?
It's not unusual for classes to have only one instance of the class in
a program.

You mean only one instance with static storage duration? Linker can detect
that relatively easily without any hints needed.
It this case, it seems wasteful to be passing the "this" parameter over
and over with a never-changing value. One can make all class members
static, or use a namespace "pseudo-class". But that seems a akin to
using macros as alternatives to inline functions.

No, those feel like preliminary optimizations. Orchestration of
initialization and destruction of such global statics (when the order
matters) causes lot more issues. Also in multi-threaded applications
lot more resouces go to synchronization of access to such singletons
than passing and additional indirection caused by that 'this' parameter.
Object code optimization should (ideally) either be automatic, or only
require a minimal hint added to source code describing logical behavior.
Also, this facility might be a useful optimization in cases where there
are a small number of instances (greater than 1) of X in the program.

What can prevent you from writing fully conforming optimizing linker
that does already what you describe without hints? Such optimization
just turns classes with all static members and namespace pseudoclasses
into even more rare nonsense.
If you find this syntax unaesthetic, I agree, I'm very open to
suggestions for alternatives.

I think outright opposite. Compiler switches and hints (like
'inline', 'constexpr' and 'register') should be attributes and/or
pragmas.

The current 'inline' can be removed from language by stating that all
function declarations in included file that are not marked 'static'
are implicitly 'inline' (in sense those don't violate ODR). Same with
'constexpr'. Just let to use function call where compile-time constant
expression is required and if it does not evaluate to one then
issue diagnostic. Nothing to say of 'register' that all compilers
ignore anyway.
On the other hand, I suspect, for typical contemporary
architectures, generated object code may use a "dedicated" register
for the "this" pointer. In which case, this optimization would save
the execution of few/no instructions.

We should leave such tinkering with performance optimizations
to late stages of development. It just distracts when applied early.
Therefore attributes and pragmas are more suitable for it.
Another point against is that this change, like templates, would drive
more Java-esqe class declarations with all the member functions
defined inline.

It is because you did put on unneeded condition that only the
members that are declared inline will be inlined. If it is decided
link-time anyway then there are no needs for such constraints.
 
W

W Karas

Most compilers consider inlining short member function calls in any

event (at least with optimization on), and when that happens the this

pointer is usually effectively eliminated. You can usually also

convince a compiler to inline longer member function, typically the

compiler will allow you to configure how aggressive inlining will be,

and many compiler can even inline across translation unit boundaries

if you turn on link time code generation.



So basically what you're asking for, usually already happens.

Suppose mbr(void) is an inline-defined but very long member function of class X, and x is the singleton instance of X. The call x.mbr() appears, say,10,000 times in the program. Presumably the compiler is going to generate(non-inlined) calls (expressed C-like as) _X_mbr_void(&x). What I'm aiming for is that the compiler generate calls to _X_mbr_void_this_is_x(), whichis object code for X::mbr() that has &x hardcoded as the value of "this". Can I really expect current compilers to do this under the current Standard?
 
W

W Karas

So the effect on case the compiler does not inline is that it considers

generating several variants of same function? What prevents it from doing

that already without that whole thing?







You mean only one instance with static storage duration? Linker can detect

that relatively easily without any hints needed.









No, those feel like preliminary optimizations. Orchestration of

initialization and destruction of such global statics (when the order

matters) causes lot more issues. Also in multi-threaded applications

lot more resouces go to synchronization of access to such singletons

than passing and additional indirection caused by that 'this' parameter.









What can prevent you from writing fully conforming optimizing linker

that does already what you describe without hints? Such optimization

just turns classes with all static members and namespace pseudoclasses

into even more rare nonsense.







I think outright opposite. Compiler switches and hints (like

'inline', 'constexpr' and 'register') should be attributes and/or

pragmas.



The current 'inline' can be removed from language by stating that all

function declarations in included file that are not marked 'static'

are implicitly 'inline' (in sense those don't violate ODR). Same with

'constexpr'. Just let to use function call where compile-time constant

expression is required and if it does not evaluate to one then

issue diagnostic. Nothing to say of 'register' that all compilers

ignore anyway.









We should leave such tinkering with performance optimizations

to late stages of development. It just distracts when applied early.

Therefore attributes and pragmas are more suitable for it.








It is because you did put on unneeded condition that only the

members that are declared inline will be inlined. If it is decided

link-time anyway then there are no needs for such constraints.

I think what you are proposing is that we have tools that can build executables where compilation units have no relevance to the final machine code. With good optimization, so inline can be depricated.

Sounds good to me. It's X-mas time, the time of year for wishing big!

In that case I would want an std::regflush() function to guarantee that reg-mapped vars were flushed as needed and marked as invalid. I often rely oncalling a function in another compilation unit for this purpose in multithread code.
 
I

Ian Collins

W said:
Suppose mbr(void) is an inline-defined but very long member function
of class X, and x is the singleton instance of X. The call x.mbr()
appears, say, 10,000 times in the program. Presumably the compiler
is going to generate (non-inlined) calls (expressed C-like as)
_X_mbr_void(&x). What I'm aiming for is that the compiler generate
calls to _X_mbr_void_this_is_x(), which is object code for X::mbr()
that has &x hardcoded as the value of "this". Can I really expect
current compilers to do this under the current Standard?

* Please Please clean up the mess that shite google interface makes of
your quotes! *

If mbr() is a long function, do you really think the overhead of passing
a parameter is significant?
 
W

W Karas

* Please Please clean up the mess that shite google interface makes of

your quotes! *

Sorry, in spite of their slogan, Google does more evil than I personally have time to undo.
If mbr() is a long function, do you really think the overhead of passing

a parameter is significant?

It boils down to whether "this" is passed and kept in a register. If it is, hardcoding it would be slower. If it isn't, there is a small speed advantage to hardcoding it (as the address of the singleton).

small advantage * much repetition = significant advantage
 
Ad

Advertisements

Ö

Öö Tiib

I think what you are proposing is that we have tools that can build
executables where compilation units have no relevance to the final
machine code. With good optimization, so inline can be depricated.

It currently is already like that. Compilers use 'inline' merely as
"does not violate ODR when included to several compilation units".
That takes link-time analysis anyway. So linkers inline also functions
not marked as inline accross compilation units already now.
Sounds good to me. It's X-mas time, the time of year for wishing big!
Right.

In that case I would want an std::regflush() function to guarantee
that reg-mapped vars were flushed as needed and marked as invalid.
I often rely on calling a function in another compilation unit for
this purpose in multithread code.

I perhaps didn't understand your problem but C++ is for describing
run-time behavior of a program. If you want instead to instruct
particular micro-controller then why not to instruct it in its
particular assembler?
 
W

W Karas

It currently is already like that. Compilers use 'inline' merely as

"does not violate ODR when included to several compilation units".

That takes link-time analysis anyway. So linkers inline also functions

not marked as inline accross compilation units already now.

My guess is that the object code for each inline function is put in a special "segment". If the linker sees more than one of these special segments with the same content identification (perhaps with checking that the contents are identical), it discards all but one copy (rather than indicating an error). If I'm right, compilation units would still matter.
I perhaps didn't understand your problem but C++ is for describing

run-time behavior of a program. If you want instead to instruct

particular micro-controller then why not to instruct it in its

particular assembler?

I think what I (flippantly) suggested is roughly the same as the call:

#include <atomic>
..
..
..

std::atomic_thread_fence(std::memory_order_seq_cst);

in C++11 . Except this would work I think with SMP even on a weakly-ordered architecture. Mine probably wouldn't.
 
Ad

Advertisements

W

W Karas

W Karas wrote in












10,000 times pushing a pointer to stack? The stack memory is probably well

cached by the cpu, in this case storing a pointer would probably take

roughly 1 cpu cycle. Current cpu-s are ca 2 GHz, so this makes 10^4/2*10^9

= 5*10^-6 s. So we are talking about 5 microseconds here. Do you really

notice if your program completes 5 microseconds earlier? I have not had a

customer complaint "this takes 5 microseconds too long".



So, next time you make such examples, please come up with some more

measurable numbers, like "assume this function is called a billion

times" ;-)



Cheers

Paavo

In an embedded environment, you can be dealing with comm links that run at 1Gbit/s+. The CPUs may run much slower because of power/heat/I-temp requirements, or need to be embedded in an FPGA.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top