# When to use std::pow(x,n) instead of times x for n times?

Discussion in 'C++' started by Peng Yu, Sep 10, 2008.

1. ### Peng YuGuest

Hi,

I'm wondering if there is any general guideline on when to using
something like
std:ow(x, n)
rather than
x * x * x * ... * x (n x's).

Thanks,
Peng

Peng Yu, Sep 10, 2008

2. ### jellybean stonerfishGuest

On Tue, 09 Sep 2008 17:30:04 -0700, Peng Yu wrote:

> Hi,
>
> I'm wondering if there is any general guideline on when to using
> something like
> std:ow(x, n)
> rather than
> x * x * x * ... * x (n x's).
>
> Thanks,
> Peng

It can be hard to represent fractional powers in x*x notation.

sf

jellybean stonerfish, Sep 10, 2008

3. ### Juha NieminenGuest

Peng Yu wrote:
> I'm wondering if there is any general guideline on when to using
> something like
> std:ow(x, n)
> rather than
> x * x * x * ... * x (n x's).

If you need to calculate that function millions of times per second,
then using the latter form can be considerably faster up to a certain n
(after which std:ow() becomes faster). The maximum n for which the
latter form is faster than std:ow() can be surprisingly large,
depending on the code (eg. something like n=8 might not be far-fetched).

Of course this is heavily system-dependent so there's no rule.

Juha Nieminen, Sep 10, 2008
4. ### gpderettaGuest

On Sep 10, 4:24 pm, Juha Nieminen <> wrote:
> Peng Yu wrote:
> > I'm wondering if there is any general guideline on when to using
> > something like
> > std:ow(x, n)
> > rather than
> > x * x * x * ... * x (n x's).

>
>   If you need to calculate that function millions of times per second,
> then using the latter form can be considerably faster up to a certain n
> (after which std:ow() becomes faster).

Why? There is no reason for the compiler not to transform pow(x,
<integral-constant>) to the latter form if it were actually faster
(and in fact some compilers do).

--
gpd

gpderetta, Sep 10, 2008
5. ### Michael DOUBEZGuest

Juha Nieminen a écrit :
> Peng Yu wrote:
>> I'm wondering if there is any general guideline on when to using
>> something like
>> std:ow(x, n)
>> rather than
>> x * x * x * ... * x (n x's).

>
> If you need to calculate that function millions of times per second,
> then using the latter form can be considerably faster up to a certain n
> (after which std:ow() becomes faster). The maximum n for which the
> latter form is faster than std:ow() can be surprisingly large,
> depending on the code (eg. something like n=8 might not be far-fetched).
>
> Of course this is heavily system-dependent so there's no rule.

In fact, there are some algorithms that are faster than others to
compute powers of natural number (by example derived from the russian
peasant multiplication).

I expect c++ libraries have a specialization with natural number as
second argument that gives better results in general.

If n is known at compile time, it is not hard to implement a template
with the russian paysan algorithm (the depth of instantiation recursion
should be sizeof(long)).

--
Michael

Michael DOUBEZ, Sep 11, 2008
6. ### Juha NieminenGuest

gpderetta wrote:
> Why? There is no reason for the compiler not to transform pow(x,
> <integral-constant>) to the latter form if it were actually faster
> (and in fact some compilers do).

Some compilers might be able to do that optimizations, others aren't.
And if n is a variable, then it cannot optimize it. (At most the pow()
function itself might have optimizations in it, but in my experience it
doesn't: With most compilers it just generates the FPU opcodes necessary
to calculate the result.)

test in practice. Go ahead and try it.

Juha Nieminen, Sep 11, 2008
7. ### gpderettaGuest

On Sep 11, 6:39 pm, Juha Nieminen <> wrote:
> gpderetta wrote:
> > Why? There is no reason for the compiler not to transform pow(x,
> > <integral-constant>) to the latter form if it were actually faster
> > (and in fact some compilers do).

>
> Some compilers might be able to do that optimizations, others aren't.
> And if n is a variable, then it cannot optimize it.

and if it is variable, you can't write an explicit expression either.
You could use a for loop

> (At most the pow()
> function itself might have optimizations in it, but in my experience it
> doesn't: With most compilers it just generates the FPU opcodes necessary
> to calculate the result.)

Today hand optimizations are tomorrow pessimizations. Let the compiler
do its job.

The usual rule apply: use pow, and only if the profiler tells it is a
bottleneck, try to optimize it by hand.

>
> test in practice. Go ahead and try it.

multiplies, at least with a recent gcc.

--
gpd

gpderetta, Sep 11, 2008
8. ### Juha NieminenGuest

gpderetta wrote:
>> Some compilers might be able to do that optimizations, others aren't.
>> And if n is a variable, then it cannot optimize it.

>
> and if it is variable, you can't write an explicit expression either.
> You could use a for loop

In my experience even performing a set of multiplications in a loop
while interpreting bytecode can be faster than a single std:ow() call,
up to a certain exponent.

I have made a function parser/interpreter, and in practice eg.
interpreting the function "x*x*x*x" (which it bytecompiles to three
multiplications) is faster than "x^4" (which it bytecompiles to one
std:ow() call). std:ow() can be incredibly slow.

Juha Nieminen, Sep 12, 2008
9. ### Peng YuGuest

On Sep 11, 4:56 pm, gpderetta <> wrote:
> On Sep 11, 6:39 pm, Juha Nieminen <> wrote:
>
> > gpderetta wrote:
> > > Why? There is no reason for the compiler not to transformpow(x,
> > > <integral-constant>) to the latter form if it were actually faster
> > > (and in fact some compilers do).

>
> > Some compilers might be able to do that optimizations, others aren't.
> > And if n is a variable, then it cannot optimize it.

>
> and if it is variable, you can't write an explicit expression either.
> You could use a for loop
>
> > (At most thepow()
> > function itself might have optimizations in it, but in my experience it
> > doesn't: With most compilers it just generates the FPU opcodes necessary
> > to calculate the result.)

>
> Today hand optimizations are tomorrow pessimizations. Let the compiler
> do its job.
>
> The usual rule apply: usepow, and only if the profiler tells it is a
> bottleneck, try to optimize it by hand.
>
>
>
> > test in practice. Go ahead and try it.

>
> I had already tried. 'pow(x, 16)' is inlined exactly as four
> multiplies, at least with a recent gcc.

Would you please let me know the details the procedure on how you
figure this out? Sometimes I want to know what the compiler compile
the code to.

Thanks,
Peng

Peng Yu, Sep 13, 2008
10. ### Peng YuGuest

On Sep 13, 4:09 pm, "Alf P. Steinbach" <> wrote:
> * Peng Yu:
>
> > Sometimes I want to know what the compiler compile
> > the code to.

>
> e.g.
>
> g++ -S -masm=intel x.cpp

It is pretty hard to figure out what part of assembly code is
associated with a give portion of source code. For example, the
following C++ and assembly code. How do I figure out where the pow
functions are at in the code?

Thanks,
Peng

\$cat main.cc main.s
#include <cmath>
#include <iostream>

int main() {
double x = 1;
std::cout << "pox(x, 1) = " << std:ow(x, 1) << std::endl;
std::cout << "pox(x, 2) = " << std:ow(x, 2) << std::endl;
std::cout << "pox(x, 3) = " << std:ow(x, 3) << std::endl;
std::cout << "pox(x, 4) = " << std:ow(x, 4) << std::endl;
std::cout << "pox(x, 5) = " << std:ow(x, 5) << std::endl;
std::cout << "pox(x, 6) = " << std:ow(x, 6) << std::endl;
std::cout << "pox(x, 7) = " << std:ow(x, 7) << std::endl;
std::cout << "pox(x, 8) = " << std:ow(x, 8) << std::endl;
std::cout << "pox(x, 9) = " << std:ow(x, 9) << std::endl;
std::cout << "pox(x, 10) = " << std:ow(x, 10) << std::endl;
std::cout << "pox(x, 11) = " << std:ow(x, 11) << std::endl;
std::cout << "pox(x, 12) = " << std:ow(x, 12) << std::endl;
std::cout << "pox(x, 13) = " << std:ow(x, 13) << std::endl;
std::cout << "pox(x, 14) = " << std:ow(x, 14) << std::endl;
std::cout << "pox(x, 15) = " << std:ow(x, 15) << std::endl;
std::cout << "pox(x, 16) = " << std:ow(x, 16) << std::endl;
}
.file "main.cc"
.intel_syntax
.section .ctors,"aw",@progbits
.align 8
.text
.align 2
.type _Z41__static_initialization_and_destruction_0ii,
@function
_Z41__static_initialization_and_destruction_0ii:
..LFB1504:
push %rbp
..LCFI0:
mov %rbp, %rsp
..LCFI1:
sub %rsp, 16
..LCFI2:
mov DWORD PTR [%rbp-4], %edi
mov DWORD PTR [%rbp-8], %esi
cmp DWORD PTR [%rbp-4], 1
jne .L5
cmp DWORD PTR [%rbp-8], 65535
jne .L5
mov %edi, OFFSET FLAT:_ZSt8__ioinit
call _ZNSt8ios_base4InitC1Ev
mov %edx, OFFSET FLAT:__dso_handle
mov %esi, 0
mov %edi, OFFSET FLAT:__tcf_0
call __cxa_atexit
..L5:
leave
ret
..LFE1504:
.size _Z41__static_initialization_and_destruction_0ii, .-
_Z41__static_initialization_and_destruction_0ii
..globl __gxx_personality_v0
.align 2
.type _GLOBAL__I_main, @function
_GLOBAL__I_main:
..LFB1506:
push %rbp
..LCFI3:
mov %rbp, %rsp
..LCFI4:
mov %esi, 65535
mov %edi, 1
call _Z41__static_initialization_and_destruction_0ii
leave
ret
..LFE1506:
.size _GLOBAL__I_main, .-_GLOBAL__I_main
.align 2
.type __tcf_0, @function
__tcf_0:
..LFB1505:
push %rbp
..LCFI5:
mov %rbp, %rsp
..LCFI6:
sub %rsp, 16
..LCFI7:
mov QWORD PTR [%rbp-8], %rdi
mov %edi, OFFSET FLAT:_ZSt8__ioinit
call _ZNSt8ios_base4InitD1Ev
leave
ret
..LFE1505:
.size __tcf_0, .-__tcf_0
..globl __powidf2
.section .text._ZSt3powdi,"axG",@progbits,_ZSt3powdi,comdat
.align 2
.weak _ZSt3powdi
.type _ZSt3powdi, @function
_ZSt3powdi:
..LFB54:
push %rbp
..LCFI8:
mov %rbp, %rsp
..LCFI9:
sub %rsp, 32
..LCFI10:
movsd QWORD PTR [%rbp-8], %xmm0
mov DWORD PTR [%rbp-12], %edi
mov %edi, DWORD PTR [%rbp-12]
movlpd %xmm0, QWORD PTR [%rbp-8]
call __powidf2
movsd QWORD PTR [%rbp-24], %xmm0
mov %rax, QWORD PTR [%rbp-24]
mov QWORD PTR [%rbp-24], %rax
movlpd %xmm0, QWORD PTR [%rbp-24]
leave
ret
..LFE54:
.size _ZSt3powdi, .-_ZSt3powdi
.section .rodata
..LC1:
.string "pox(x, 1) = "
..LC2:
.string "pox(x, 2) = "
..LC3:
.string "pox(x, 3) = "
..LC4:
.string "pox(x, 4) = "
..LC5:
.string "pox(x, 5) = "
..LC6:
.string "pox(x, 6) = "
..LC7:
.string "pox(x, 7) = "
..LC8:
.string "pox(x, 8) = "
..LC9:
.string "pox(x, 9) = "
..LC10:
.string "pox(x, 10) = "
..LC11:
.string "pox(x, 11) = "
..LC12:
.string "pox(x, 12) = "
..LC13:
.string "pox(x, 13) = "
..LC14:
.string "pox(x, 14) = "
..LC15:
.string "pox(x, 15) = "
..LC16:
.string "pox(x, 16) = "
.text
.align 2
..globl main
.type main, @function
main:
..LFB1496:
push %rbp
..LCFI11:
mov %rbp, %rsp
..LCFI12:
push %rbx
..LCFI13:
sub %rsp, 24
..LCFI14:
movabs %rax, 4607182418800017408
mov QWORD PTR [%rbp-16], %rax
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 1
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC1
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 2
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC2
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 3
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC3
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 4
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC4
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 5
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC5
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 6
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC6
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 7
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC7
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 8
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC8
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 9
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC9
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 10
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC10
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 11
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC11
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 12
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC12
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 13
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC13
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 14
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC14
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 15
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC15
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %rax, QWORD PTR [%rbp-16]
mov %edi, 16
mov QWORD PTR [%rbp-32], %rax
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZSt3powdi
movsd QWORD PTR [%rbp-32], %xmm0
mov %rbx, QWORD PTR [%rbp-32]
mov %esi, OFFSET FLAT:.LC16
mov %edi, OFFSET FLAT:_ZSt4cout
call
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
mov %rdi, %rax
mov QWORD PTR [%rbp-32], %rbx
movlpd %xmm0, QWORD PTR [%rbp-32]
call _ZNSolsEd
mov %rdi, %rax
mov %esi, OFFSET
FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
call _ZNSolsEPFRSoS_E
mov %eax, 0
pop %rbx
leave
ret
..LFE1496:
.size main, .-main
.local _ZSt8__ioinit
.comm _ZSt8__ioinit,1,1
.weakref
.weakref
.weakref
.weakref
.weakref
.weakref
.weakref
.weakref
.weakref
.weakref
.weakref
.weakref
.section .eh_frame,"a",@progbits
..Lframe1:
.long .LECIE1-.LSCIE1
..LSCIE1:
.long 0x0
.byte 0x1
.string "zPR"
.uleb128 0x1
.sleb128 -8
.byte 0x10
.uleb128 0x6
.byte 0x3
.long __gxx_personality_v0
.byte 0x3
.byte 0xc
.uleb128 0x7
.uleb128 0x8
.byte 0x90
.uleb128 0x1
.align 8
..LECIE1:
..LSFDE1:
.long .LEFDE1-.LASFDE1
..LASFDE1:
.long .LASFDE1-.Lframe1
.long .LFB1504
.long .LFE1504-.LFB1504
.uleb128 0x0
.byte 0x4
.long .LCFI0-.LFB1504
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI1-.LCFI0
.byte 0xd
.uleb128 0x6
.align 8
..LEFDE1:
..LSFDE3:
.long .LEFDE3-.LASFDE3
..LASFDE3:
.long .LASFDE3-.Lframe1
.long .LFB1506
.long .LFE1506-.LFB1506
.uleb128 0x0
.byte 0x4
.long .LCFI3-.LFB1506
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI4-.LCFI3
.byte 0xd
.uleb128 0x6
.align 8
..LEFDE3:
..LSFDE5:
.long .LEFDE5-.LASFDE5
..LASFDE5:
.long .LASFDE5-.Lframe1
.long .LFB1505
.long .LFE1505-.LFB1505
.uleb128 0x0
.byte 0x4
.long .LCFI5-.LFB1505
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI6-.LCFI5
.byte 0xd
.uleb128 0x6
.align 8
..LEFDE5:
..LSFDE7:
.long .LEFDE7-.LASFDE7
..LASFDE7:
.long .LASFDE7-.Lframe1
.long .LFB54
.long .LFE54-.LFB54
.uleb128 0x0
.byte 0x4
.long .LCFI8-.LFB54
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI9-.LCFI8
.byte 0xd
.uleb128 0x6
.align 8
..LEFDE7:
..LSFDE9:
.long .LEFDE9-.LASFDE9
..LASFDE9:
.long .LASFDE9-.Lframe1
.long .LFB1496
.long .LFE1496-.LFB1496
.uleb128 0x0
.byte 0x4
.long .LCFI11-.LFB1496
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI12-.LCFI11
.byte 0xd
.uleb128 0x6
.byte 0x4
.long .LCFI14-.LCFI12
.byte 0x83
.uleb128 0x3
.align 8
..LEFDE9:
.ident "GCC: (GNU) 4.1.2 20061115 (prerelease) (Debian
4.1.1-21)"
.section .note.GNU-stack,"",@progbits

Peng Yu, Sep 14, 2008
11. ### Kai-Uwe BuxGuest

Peng Yu wrote:

> On Sep 13, 4:09 pm, "Alf P. Steinbach" <> wrote:
>> * Peng Yu:
>>
>> > Sometimes I want to know what the compiler compile
>> > the code to.

>>
>> e.g.
>>
>> g++ -S -masm=intel x.cpp

>
> It is pretty hard to figure out what part of assembly code is
> associated with a give portion of source code.

Nobody said, it would be easy.

> For example, the
> following C++ and assembly code. How do I figure out where the pow
> functions are at in the code?

[snip]

Did you try modifying one line of code and seeing which portion of the
assembly changed?

Best

Kai-Uwe Bux

Kai-Uwe Bux, Sep 14, 2008
12. ### Peng YuGuest

On Sep 13, 7:50 pm, Kai-Uwe Bux <> wrote:
> Peng Yu wrote:
> > On Sep 13, 4:09 pm, "Alf P. Steinbach" <> wrote:
> >> * Peng Yu:

>
> >> > Sometimes I want to know what the compiler compile
> >> > the code to.

>
> >> e.g.

>
> >> g++ -S -masm=intel x.cpp

>
> > It is pretty hard to figure out what part of assembly code is
> > associated with a give portion of source code.

>
> Nobody said, it would be easy.
>
> > For example, the
> > following C++ and assembly code. How do I figure out where the pow
> > functions are at in the code?

>
> [snip]
>
> Did you try modifying one line of code and seeing which portion of the
> assembly changed?

g++ -O3 -S -masm=intel main.cc

I tried to compile the code and the variant of it. And I got the
difference by diff. But I still have difficulty to understand what it
does. Is there a way to annotate the C++ code in the assembly code?

Thanks,
Peng

Peng Yu, Sep 14, 2008
13. ### Erik WikstrÃ¶mGuest

On 2008-09-13 23:06, Peng Yu wrote:
> On Sep 11, 4:56 pm, gpderetta <> wrote:
>> On Sep 11, 6:39 pm, Juha Nieminen <> wrote:
>>
>> > gpderetta wrote:
>> > > Why? There is no reason for the compiler not to transformpow(x,
>> > > <integral-constant>) to the latter form if it were actually faster
>> > > (and in fact some compilers do).

>>
>> > Some compilers might be able to do that optimizations, others aren't.
>> > And if n is a variable, then it cannot optimize it.

>>
>> and if it is variable, you can't write an explicit expression either.
>> You could use a for loop
>>
>> > (At most thepow()
>> > function itself might have optimizations in it, but in my experience it
>> > doesn't: With most compilers it just generates the FPU opcodes necessary
>> > to calculate the result.)

>>
>> Today hand optimizations are tomorrow pessimizations. Let the compiler
>> do its job.
>>
>> The usual rule apply: usepow, and only if the profiler tells it is a
>> bottleneck, try to optimize it by hand.
>>
>>
>>
>> > test in practice. Go ahead and try it.

>>
>> I had already tried. 'pow(x, 16)' is inlined exactly as four
>> multiplies, at least with a recent gcc.

>
> Would you please let me know the details the procedure on how you
> figure this out? Sometimes I want to know what the compiler compile
> the code to.

In Visual Studio you can run the program in the debugger and then bring
up the assembly code and it will show you can step through it and switch
back and forth between the code and assembly code. I would imagine you
can do similar things in other IDEs and in gdb.

--
Erik WikstrÃ¶m

Erik WikstrÃ¶m, Sep 14, 2008
14. ### Ian CollinsGuest

Peng Yu wrote:
> On Sep 13, 7:50 pm, Kai-Uwe Bux <> wrote:
>> Peng Yu wrote:
>>> On Sep 13, 4:09 pm, "Alf P. Steinbach" <> wrote:
>>>> * Peng Yu:
>>>>> Sometimes I want to know what the compiler compile
>>>>> the code to.
>>>> e.g.
>>>> g++ -S -masm=intel x.cpp
>>> It is pretty hard to figure out what part of assembly code is
>>> associated with a give portion of source code.

>> Nobody said, it would be easy.
>>
>>> For example, the
>>> following C++ and assembly code. How do I figure out where the pow
>>> functions are at in the code?

>> [snip]
>>
>> Did you try modifying one line of code and seeing which portion of the
>> assembly changed?

>
> g++ -O3 -S -masm=intel main.cc
>
> I tried to compile the code and the variant of it. And I got the
> difference by diff. But I still have difficulty to understand what it
> does. Is there a way to annotate the C++ code in the assembly code?
>

Some compilers (Sun CC for example) do so by default. If you are
working on Solaris or Linux, give it a try.

--
Ian Collins.

Ian Collins, Sep 14, 2008
15. ### Peng YuGuest

On Sep 14, 4:37 am, Erik Wikström <> wrote:
> On 2008-09-13 23:06, Peng Yu wrote:
>
>
>
> > On Sep 11, 4:56 pm, gpderetta <> wrote:
> >> On Sep 11, 6:39 pm, Juha Nieminen <> wrote:

>
> >> > gpderetta wrote:
> >> > > Why? There is no reason for the compiler not to transformpow(x,
> >> > > <integral-constant>) to the latter form if it were actually faster
> >> > > (and in fact some compilers do).

>
> >> > Some compilers might be able to do that optimizations, others aren't.
> >> > And if n is a variable, then it cannot optimize it.

>
> >> and if it is variable, you can't write an explicit expression either.
> >> You could use a for loop

>
> >> > (At most thepow()
> >> > function itself might have optimizations in it, but in my experience it
> >> > doesn't: With most compilers it just generates the FPU opcodes necessary
> >> > to calculate the result.)

>
> >> Today hand optimizations are tomorrow pessimizations. Let the compiler
> >> do its job.

>
> >> The usual rule apply: usepow, and only if the profiler tells it is a
> >> bottleneck, try to optimize it by hand.

>
> >> > test in practice. Go ahead and try it.

>
> >> I had already tried. 'pow(x, 16)' is inlined exactly as four
> >> multiplies, at least with a recent gcc.

>
> > Would you please let me know the details the procedure on how you
> > figure this out? Sometimes I want to know what the compiler compile
> > the code to.

>
> In Visual Studio you can run the program in the debugger and then bring
> up the assembly code and it will show you can step through it and switch
> back and forth between the code and assembly code. I would imagine you
> can do similar things in other IDEs and in gdb.

Shall there be a problem if I use -O3 option? The source code and the
assembly code might have one-one relationship.

Thanks,
Peng

Peng Yu, Sep 14, 2008
16. ### Erik WikstrÃ¶mGuest

On 2008-09-14 15:47, Peng Yu wrote:
> On Sep 14, 4:37 am, Erik WikstrÃ¶m <> wrote:
>> On 2008-09-13 23:06, Peng Yu wrote:

>> > On Sep 11, 4:56 pm, gpderetta <> wrote:
>> >> On Sep 11, 6:39 pm, Juha Nieminen <> wrote:

>> >> Today hand optimizations are tomorrow pessimizations. Let the compiler
>> >> do its job.

>>
>> >> The usual rule apply: usepow, and only if the profiler tells it is a
>> >> bottleneck, try to optimize it by hand.

>>
>> >> > test in practice. Go ahead and try it.

>>
>> >> I had already tried. 'pow(x, 16)' is inlined exactly as four
>> >> multiplies, at least with a recent gcc.

>>
>> > Would you please let me know the details the procedure on how you
>> > figure this out? Sometimes I want to know what the compiler compile
>> > the code to.

>>
>> In Visual Studio you can run the program in the debugger and then bring
>> up the assembly code and it will show you can step through it and switch
>> back and forth between the code and assembly code. I would imagine you
>> can do similar things in other IDEs and in gdb.

>
> Shall there be a problem if I use -O3 option? The source code and the
> assembly code might have one-one relationship.

There might be a problem if the compiler decides to remove some code
completely, otherwise no.

--
Erik WikstrÃ¶m

Erik WikstrÃ¶m, Sep 14, 2008
17. ### Sherm PendleyGuest

Peng Yu <> writes:

> g++ -O3 -S -masm=intel main.cc
>
> I tried to compile the code and the variant of it. And I got the
> difference by diff. But I still have difficulty to understand what it
> does. Is there a way to annotate the C++ code in the assembly code?

Yes, and I'd also disable optimization, which can make the generated
asm code more difficult to follow.

g++ -S -masm=intel -fverbose-asm main.cc

sherm--

--
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net

Sherm Pendley, Sep 14, 2008
18. ### Peng YuGuest

On Sep 14, 10:49 am, Sherm Pendley <> wrote:
> Peng Yu <> writes:
> > g++ -O3 -S -masm=intel main.cc

>
> > I tried to compile the code and the variant of it. And I got the
> > difference by diff. But I still have difficulty to understand what it
> > does. Is there a way to annotate the C++ code in the assembly code?

>
> Yes, and I'd also disable optimization, which can make the generated
> asm code more difficult to follow.
>
> g++ -S -masm=intel -fverbose-asm main.cc

But I have to enable the optimization, because I want to know how the
compiler optimize std:ow(x,n).

Thanks,
Peng

Peng Yu, Sep 14, 2008