which code performs better

aryan · Dec 20, 2009

Hi,

1. which book would you recommend to study performance tuning in c++?

2. which of the functions f1() and f2() perform better?
struct xxx
{
int a;
int b;
};
struct yyy
{
xxx xList[10];
};

void f1()
{
yyy yList1;
yyy yList2;
for (int j =0 ; j = 10 ; j++)
{
if(yList1.xList[j].a < yList2.xList[j].a
and
yList1.xList[j].b > yList2.xList[j].b)

doSomething(yList1.xList[j].a, yList2.xList
[j].b);
}
}

//OR

void f2()
{
yyy yList1;
yyy yList2;

for (int j =0 ; j = 10 ; j++)
{
xxx &x1=yList1.xList[j],
&x2=yList2.xList[j];

if(x1.a < x2.a
and
x1.b > x2.b)

doSomething(x1.a, x2.b)
}

}

thanks.

Jonathan Lee · Dec 20, 2009

1. which book would you recommend to study performance tuning in c++?

No real opinion on this. If I'm optimizing I look up books
on the processor, not the language. Or generic concepts
like cache optimization. Even with this I basically hit
the University library and just pull stuff off the shelves.

2. which of the functions f1() and f2() perform better?

I think a compiler will automatically turn the first example
into the second. This hinges on the fact that in your
example xList is an array. If xList were a class and
operator[] was overloaded, then the first version will
likely (surely?) generate the extra calls.

In any event, I don't see how the second event could be
slower. It might be a little odd to read for some, but
where I've used it myself it was because it was easier
to read.

--Jonathan

Carlo Milanesi · Dec 20, 2009

aryan said:
1. which book would you recommend to study performance tuning in c++?
http://en.wikibooks.org/wiki/Optimizing_C++

2. which of the functions f1() and f2() perform better?
for (int j =0 ; j = 10 ; j++)

I assume you meant:
for (int j =0 ; j < 10 ; j++)

doSomething(yList1.xList[j].a, yList2.xList[j].b);

I assume you meant:
doSomething(yList1.xList[j].a, yList2.xList[j].b);

Then I think there should be no or little difference, however not
predictable.

joseph cook · Dec 20, 2009

aryan wrote:

Then I think there should be no or little difference, however not
predictable.

Interestingly enough, g++ 3.4.6 on my platform (a flavor of Linux)
spits out assembly that is considerably worse for the second example.
I tend to trust the compiler in these cases, and feel that it must be
slower for some reason. I just haven't figured out why yet..

Joe

Balog Pal · Dec 21, 2009

aryan said:
if(yList1.xList[j].a < yList2.xList[j].a
and
yList1.xList[j].b > yList2.xList[j].b)

doSomething(yList1.xList[j].a, yList2.xList
[j].b);

//OR

xxx &x1=yList1.xList[j],
&x2=yList2.xList[j];

if(x1.a < x2.a and x1.b > x2.b)
doSomething(x1.a, x2.b)

Click to expand...

The second definitely reads better, and will perform either better or the
same.

Making common subexpressions explicit is a good thing if you can give them a
meaningful name. If you can't, it is still often the better.

Unfortunately with current C++ the price is that you must repeat and
maintein the type. In next version that problem goes away as you can write
auto.

Don't forget to add const unless you actually want mutation.

Balog Pal · Dec 21, 2009

Jonathan Lee said:
I think a compiler will automatically turn the first example
into the second.

It will, if:
- it recognizes all the common subexpressions
- can prove that thevariable elements can not change

in this particular case it will likely hapen, but in a more general
situation it is too easy to have function calls in between. Or the expresion
can be a call itself, if the array was a collection with operator[], the
compiler may not know that it returns the same thing for passing the same i
or j value...

This hinges on the fact that in your
example xList is an array. If xList were a class and
operator[] was overloaded, then the first version will
likely (surely?) generate the extra calls.

If they are inline, or the class is in std:: the knowledge may be there.

But if the coder extracts it, there is no more doubt to either the compiler
or the next reader. So it has good value. DRY.

joseph cook · Dec 21, 2009

xxx &x1=yList1.xList[j],
&x2=yList2.xList[j];

Click to expand...

if(x1.a < x2.a and x1.b > x2.b)
doSomething(x1.a, x2.b)

Click to expand...

The second definitely reads better, and will perform either better or the
same.

That's a rather bold statement to make after my posting of showing an
example of where the second performs worse.
Joe

Balog Pal · Dec 21, 2009

joseph cook said:
Interestingly enough, g++ 3.4.6 on my platform (a flavor of Linux)
spits out assembly that is considerably worse for the second example.
I tend to trust the compiler in these cases, and feel that it must be
slower for some reason. I just haven't figured out why yet..

With optimizations turned on? Then it must be seriously broken.

Without optimization it may aim for debugger-friendly code that keep
temporary variables around and lays out as the source.

joseph cook · Dec 21, 2009

With optimizations turned on? Then it must be seriously broken.

Without optimization it may aim for debugger-friendly code that keep
temporary variables around and lays out as the source.

Yes, with optimization. -O3 I think it's too quick to call g++ 3.4.6
"seriously broken", but OK.

Joe

Balog Pal · Dec 21, 2009

That's a rather bold statement to make after my posting of showing an
example of where the second performs worse.

Could you post the compiler settings and assy output plz?

I certainly meant it with a compiler that does the job in optimization -- if
it geenrally can't optimize, then you won;t have well performing code
anyway, and if that is an actual aim the fix is to get a good one instead of
shifting code and hope.

joseph cook · Dec 21, 2009

Could you post the compiler settings and assy output plz?

Sure: Looks like it is making copies in f2. As I said, I don't
understand why at this point, so perhaps it is a compiler issue.

Linux i686
g++ 3.4.6
g++ -O3 -S

F1:

..LFB2:
pushl %ebp
..LCFI0:
movl %esp, %ebp
..LCFI1:
pushl %ebx
..LCFI2:
subl $164, %esp
..LCFI3:
xorl %ebx, %ebx
jmp .L6
.p2align 2,,3
..L4:
incl %ebx
cmpl $9, %ebx
jg .L12
cmpl $9, %ebx
jg .L12
..L6:
movl -88(%ebp,%ebx,8), %eax
cmpl -168(%ebp,%ebx,8), %eax
jge .L4
movl -164(%ebp,%ebx,8), %edx
cmpl %edx, -84(%ebp,%ebx,8)
jle .L4
subl $8, %esp
pushl %edx
pushl %eax
incl %ebx
..LCFI4:
call _Z3fooii
addl $16, %esp
cmpl $9, %ebx
jle .L6
..L12:
movl -4(%ebp), %ebx
leave
ret

F2:
..LFB3:
pushl %ebp
..LCFI5:
movl %esp, %ebp
..LCFI6:
pushl %edi
..LCFI7:
pushl %esi
..LCFI8:
pushl %ebx
..LCFI9:
subl $172, %esp
..LCFI10:
xorl %ebx, %ebx
leal -104(%ebp), %edi
xorl %ebx, %ebx
leal -104(%ebp), %edi
leal -184(%ebp), %esi
jmp .L18
.p2align 2,,3
..L16:
incl %ebx
cmpl $9, %ebx
jg .L23
..L18:
leal 0(,%ebx,8), %eax
leal (%edi,%eax), %edx
movl (%edx), %ecx
leal (%esi,%eax), %eax
cmpl (%eax), %ecx
jge .L16
movl 4(%eax), %eax
cmpl %eax, 4(%edx)
jle .L16
subl $8, %esp
pushl %eax
pushl %ecx
incl %ebx
pushl %ecx
incl %ebx
..LCFI11:
call _Z3fooii
addl $16, %esp
cmpl $9, %ebx
jle .L18
..L23:
leal -12(%ebp), %esp
popl %ebx
popl %esi
popl %edi
leave
ret

Balog Pal · Dec 21, 2009

joseph cook said:
Yes, with optimization. -O3 I think it's too quick to call g++ 3.4.6
"seriously broken", but OK.

I didn't look too much in the gcc(-based) output, but on those wew ocasions
it didn't impress me. (versions 2.96 for x86, 3.4.6 for x64-> mediocre,
Mplab C30 for PIC24 -> utter shit I'd never call optimized).

Also we recently discussed the code here:

http://shape-of-code.coding-guidelines.com/2009/06/searching-for-the-source-line-implementing-3n1/

expressions
n = n + n + n + 1 ;
n += n + n + 1;
n = (n << 1) + n + 1;
n += (n << 1) + 1;
n *= 3; n++;
t = (n << 1) ; n = t + n + 1;
n = (n << 2) - n + 1;

gcc 4.? produced different code for the last line, that a healthy optimizer
shouldn't.

"seriously broken" may not be a good term, feel free to exchange to anything
you feel politically correct... ;-)

Balog Pal · Dec 21, 2009

Sure: Looks like it is making copies in f2. As I said, I don't
understand why at this point, so perhaps it is a compiler issue.

[code in original]

Interesting case... The first version uses the fact that the two source
objects live close together at known offset. in the second it assigned 2
variables, and lost that information. As references were defined, it could
go with a single register and address the other with offset.

Also look at this portion:

xorl %ebx, %ebx
leal -104(%ebp), %edi
xorl %ebx, %ebx
leal -104(%ebp), %edi
leal -184(%ebp), %esi

This us utter nonsense. (I see a ton of similar in my PIC output :-((((( )
The first two instructins are excess and should not be there at all in any
case.

This is the final set of esi and edi -- the optimizer could remember the
original and "inline" the original expression too.

This section:

leal 0(,%ebx,8), %eax
leal (%edi,%eax), %edx
movl (%edx), %ecx
leal (%esi,%eax), %eax
cmpl (%eax), %ecx

is utter nuts too -- with the code above it should just be something like
movl (%edi), %eax
cmpl (%esi), %eax

and then next occourance same with +2 offset.

Err, no, sorry, it is too late ;-) the issue is worse -- this nut did NOT
use esi and edi to hold values of reference x1 and x2, they hold yList1 and
xList1. Why on earth? so the code would be

movl (%edi, ebx, 8), %eax
cmpl (%esi, ebx, 8), %eax
even without inlining the original ebp expression.

or if it is obsessed to calc the offset in eax
leal 0(,%ebx,8), %eax
movl (%edi,%eax), %ecx
cmpl (%esi,%eax), %ecx

Generating those separate leal-s jsut to trash a register and waste an
instruction is nuts.

This unnecessary juggling with registers seem to be prevalent in gcc output,
and I rather stay with my opinion of "seriously broken", be it bold or not.

joseph cook · Dec 21, 2009

"joseph cook" <[email protected]>

<snip>

The link you provided (and this analysis) has proven very
interesting. I'll have to think on that a bit more on what the
impact of the "doSomething()" function is, as it may change the values
in (global) struct xxx and struct yyy. (I called doSomething() "foo
()" in the assembly output). So it does need to reload those values
into the register frequently.

Thank you,
Joe Cook

Need help with this Python code.	2	Jun 13, 2023
Python code problem	2	Apr 23, 2023
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
help me to improve this program please	1	May 30, 2014
I Need Fix In Code	1	Apr 12, 2023
Fibonacci C code	1	Aug 4, 2018
Help for ActionPerformance and how to use HashMap.	2	Feb 10, 2022

which code performs better

aryan

Jonathan Lee

Carlo Milanesi

joseph cook

Balog Pal

Balog Pal

joseph cook

Balog Pal

joseph cook

Balog Pal

joseph cook

Balog Pal

Balog Pal

joseph cook

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads