Help in optimizing branches

John Malek · Sep 30, 2003

Hi,

I ran a profiler against this complex app that I'm trying to opimize.
This is an application I'm doing to test image processing. Even
though it does a lot of computation, two simple lines take 30% of the
running times! Both these lines are from Intel's OpenCV library.
Note that mhi, mask8u, and mask are arrays with one entry per pixel in
a 640x480 image.

If anyone has any hints on how to optimize this, it would be greatly
appreciated.

Thanks,
John

const int cts = (int&)ts;
for( y = 0; y < mhi->rows; y++ )
{
int* mhi_row = (int*)(mhi->data.ptr + y*mhi->step);
uchar* mask8u_row = mask8u->data.ptr + (y+1)*mask8u->step + 1;

for( x = 0; x said:
if( mhi_row[x] == cts && mask8u_row[x] == 0 )

Click to expand...

Click to expand...

THE LINE ABOVE TAKES 20% of the time

uchar* mask_row = mask->data.ptr + mask->step;

for( i = 1; i said:
mask_row[0] = mask_row[size.width+1] = (uchar)1;

Click to expand...

Click to expand...

THE LINE ABOVE TAKES 10% of the time

Gianni Mariani · Oct 1, 2003

John said:
Hi,

I ran a profiler against this complex app that I'm trying to opimize.
This is an application I'm doing to test image processing. Even
though it does a lot of computation, two simple lines take 30% of the
running times! Both these lines are from Intel's OpenCV library.
Note that mhi, mask8u, and mask are arrays with one entry per pixel in
a 640x480 image.

If anyone has any hints on how to optimize this, it would be greatly
appreciated.

This question is off-topic for comp.std.c++ - try comp.programming.

However, start by posting the whole routine or send us ap pointer to the
library.

As for some possible answers:-

You could try folding some constants out of the loop but it's possible
that the optimizer has already done this. The other thing is loop
unrolling. Finally you need to look at cache.

const int cts = (int&)ts;
for( y = 0; y < mhi->rows; y++ )
{
int* mhi_row = (int*)(mhi->data.ptr + y*mhi->step);
uchar* mask8u_row = mask8u->data.ptr + (y+1)*mask8u->step + 1;

if( mhi_row[x] == cts && mask8u_row[x] == 0 )

Click to expand...

Click to expand...

THE LINE ABOVE TAKES 20% of the time

uchar* mask_row = mask->data.ptr + mask->step;

add:
uchar* mask_row_plus_width_plus_1 = mask_row + size.width+1;
int step = mask->step;

for( i = 1; i <= size.height; i++, mask_row += mask->step )

// comparison with 0 is faster - i is only used to limit the loop
// count so you can reverse the loop.
replace:

for(
i = size.height;
i >= 0;
i--,
mask_row += step,
mask_row_plus_width_plus_1 += step
)
// you could theoretically also unroll this loop easily

{

mask_row[0] = mask_row[size.width+1] = (uchar)1;

Click to expand...

Click to expand...

replace:
mask_row_plus_width_plus_1[0] = (uchar)1;
mask_row[0] = (uchar)1;

THE LINE ABOVE TAKES 10% of the time

Both of these seem like cache thrashers depending on the value of
"step". Basically the cache optimizations are changing the algorithm to
limit the memory footprint to "blocks" at a time. Hence the name
"blocking". This may require a change in the data structure. That's
why many image algorithms work with "tiles".

Nick Savoiu · Oct 1, 2003

John Malek said:
Hi,

I ran a profiler against this complex app that I'm trying to opimize.
This is an application I'm doing to test image processing. Even
though it does a lot of computation, two simple lines take 30% of the
running times! Both these lines are from Intel's OpenCV library.
Note that mhi, mask8u, and mask are arrays with one entry per pixel in
a 640x480 image.

If anyone has any hints on how to optimize this, it would be greatly
appreciated.

John,

I don't think you can expect much improvement.

Those lines are in nested loops and will get executed many times. You can
try to do some constant propagation, loop invariant code motions by hand but
probably the compiler can do that too if the code's not too complicated.

Nick

Dan McLeran · Oct 1, 2003

If anyone has any hints on how to optimize this, it would be greatly

for( x = 0; x said:
appreciated.

Thanks,
John

const int cts = (int&)ts;
for( y = 0; y < mhi->rows; y++ )
{
int* mhi_row = (int*)(mhi->data.ptr + y*mhi->step);
uchar* mask8u_row = mask8u->data.ptr + (y+1)*mask8u->step + 1;

for( x = 0; x said:

if( mhi_row[x] == cts && mask8u_row[x] == 0 )

Click to expand...

Click to expand...

THE LINE ABOVE TAKES 20% of the time

uchar* mask_row = mask->data.ptr + mask->step;

for( i = 1; i said:

mask_row[0] = mask_row[size.width+1] = (uchar)1;

Click to expand...

Click to expand...

THE LINE ABOVE TAKES 10% of the time

I'm wondering if moving the pointer dereference out of the loop logic
would help. I assume that mhi->cols is not changing within the loop?
Try moving this out of the for loop. The compiler may already be doing
this for you. A quick look at the asm output would tell whether this
could buy you some time.

Ron Natalie · Oct 1, 2003

John Malek said:
int* mhi_row = (int*)(mhi->data.ptr + y*mhi->step);

Just what is mhi->data.ptr? Hopefully, it's something that converts
to int* nicely.

if( mhi_row[x] == cts && mask8u_row[x] == 0 )

Click to expand...

Click to expand...

Depending on the platform, using pointers might be faster:
if (*mhi_row++ == cts && *mask8u_row++ == 0)
you may need to put the increments elsewhere depending on what the rest
of the code does with these values.

for( i = 1; i said:
uchar* mask_row = mask->data.ptr + mask->step;

for( i = 1; i said:

mask_row[0] = mask_row[size.width+1] = (uchar)1;

Click to expand...

Click to expand...

You might precompute size.width+1 into a local variable rather than preforming
the addition each time, this looks invariant to the loop. There's no reason to cast
the literal 1.

Help With a Script	5	Jul 10, 2021
Help for my project in the last minute	0	Apr 23, 2022
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
Help in reading a .PLY file via C++	15	Mar 6, 2012
Can D simulated by H terminate normally?	4	Jun 12, 2023
Why struct not globally changed in function?	1	Aug 22, 2023
Need help optimizing....	16	Sep 22, 2003
How can I fix my pattern coding error in c++	0	Mar 19, 2023

Help in optimizing branches

John Malek

Gianni Mariani

Nick Savoiu

Dan McLeran

Ron Natalie

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads