# Optimizing a function?

C

#### carl

I have a function that gets called for 512*512*350 times. In this function
there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}
}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location/m_cellSize));
int y_dir = floor((location/m_cellSize))*m_split;
int z_dir = floor((location/m_cellSize))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the vectors
contains 216 elements the program takes around 50 seconds to complete (on a
3 GHZ machine with 6GB RAM and code compiled for release, using Visual
Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is the
program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations) that
is expensive when called so many times?

A

#### Alf P. Steinbach

* carl:
I have a function that gets called for 512*512*350 times. In this
function there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}
}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location/m_cellSize));
int y_dir = floor((location/m_cellSize))*m_split;
int z_dir = floor((location/m_cellSize))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the
vectors contains 216 elements the program takes around 50 seconds to
complete (on a 3 GHZ machine with 6GB RAM and code compiled for release,
using Visual Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is
the program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations)
that is expensive when called so many times?

Try an array.

Cheers & hth.,

- Alf

C

#### carl

Alf P. Steinbach said:
* carl:
I have a function that gets called for 512*512*350 times. In this
function there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}
}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location/m_cellSize));
int y_dir = floor((location/m_cellSize))*m_split;
int z_dir = floor((location/m_cellSize))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the
vectors contains 216 elements the program takes around 50 seconds to
complete (on a 3 GHZ machine with 6GB RAM and code compiled for release,
using Visual Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is the
program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations)
that is expensive when called so many times?

Try an array.

Cheers & hth.,

- Alf

Why should that be a solution? I only create the data once before running
the actual algorithm and all lookups are made using references (pointers).

E

C

#### carl

Eric "Böse-Wolf" said:
carl said:
Alf P. Steinbach said:
* carl:
return m_myMap[key];

This operation has a complexity of ln( number of elements in map ) / ln
( 2 )

The number of elements (keys) in the map are never larger than 100. lg(100)
approx= 7 operations so I don't think the overhead comes from using the map.

the map is at most 216
The same operation applied to a vector is constant time.

Find is linear in a vector and find is lgn for a map, thats why I use the
map.

J

#### Jonathan Lee

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey  (3 floor operations) that
is expensive when called so many times?

For testing purposes you could try extending VectorType to include the
key field, and store the key for each point with the rest of the data.
You should only need to calculate the key once for each object in that
case. If your timing goes down significantly, there you go.

If not, then you probably should reconsider how you've organized the
data. Is it possible to iterate over your map instead of "find"-ing
points one by one?

In any event you should run it through Valgrind or similar to find out
where the performance hit is. Otherwise it's all speculation.

--Jonathan

F

#### feverzsj

carl said:
I have a function that gets called for 512*512*350 times. In this function
there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs  =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}

}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];

}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location/m_cellSize));
int y_dir = floor((location/m_cellSize))*m_split;
int z_dir = floor((location/m_cellSize))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj>  is fixed. When the vectors
contains 216 elements the program takes around 50 seconds to complete (on a
3 GHZ machine with 6GB RAM and code compiled for release, using Visual
Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is the
program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey  (3 floor operations) that
is expensive when called so many times?

It's time to profile your program. A profiler can find out the
bottleneck on a real machine, while a complexity analysis works on an
abstract one(and is a very old one).
For example, floor() could be expensive on some machine while cheap on
Furthermore, cache performance of your code affects the run time for
about 20% ~ 200%. It's relates to the layout of your program's data
and the pattern your program access the data in the memory.
manual carefully.

Since you can't offer a compilable source code, I can't say nothing

C

#### Carlo Milanesi

carl ha scritto:
std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

Add an ampersand to the first line, and you'll be happy:
std::vector<MyObj>& FindMyObjs(VectorType point) {

P

#### Pavel

carl said:
I have a function that gets called for 512*512*350 times. In this
function there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}
}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location/m_cellSize));
int y_dir = floor((location/m_cellSize))*m_split;
int z_dir = floor((location/m_cellSize))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the
vectors contains 216 elements the program takes around 50 seconds to
complete (on a 3 GHZ machine with 6GB RAM and code compiled for release,
using Visual Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is
the program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations)
that is expensive when called so many times?
To profile was a good advice. If I had to guess, I would have tried to
cache the key in the vector (the function does look expensive but,
again, it is only a guess) and see if this helps. An extra int does not

Just my 2c

-Pavel

R

#### Rune Allnor

I have a function that gets called for 512*512*350 times. In this function
there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs  =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}

}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];

}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location/m_cellSize));
int y_dir = floor((location/m_cellSize))*m_split;
int z_dir = floor((location/m_cellSize))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj>  is fixed. When the vectors
contains 216 elements the program takes around 50 seconds to complete (on a
3 GHZ machine with 6GB RAM and code compiled for release, using Visual
Studio 2008).

From what I can see, the code contains a lot of vector
accesses. If correct, make sure your code is compiled
with pre-processor directive SECURE_SCL=0, no buffer
security check (flag /GS-) and with the enhanced
instruction set enabled (flag /arch:SSE2).

Rune

J

#### joseph cook

I have a function that gets called for 512*512*350 times. In this function
there is a loop that gets run each time the function is called:
Am I missing something very basic c++ optimization skills there or is the
program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey  (3 floor operations) that
is expensive when called so many times?

You should have timed at least this much to know this.

1. Are the values into the floor() function always non-negative? If
so, doing int() instead of floor() may give you a large speed-up.
2. On many systems, divide is (much) slower than multiply, so you
could consider precomputing 1/m_cellSize[x] for each m_cellSize[x],
and multiply by that value instead of dividing.
3. A nit, not an optimization, but you shouldn't pre-declare unsigned
int key, just declare it when you define it. i.e. "unsigned int key ="
4. Why aren't you pre-computing m_split*m_split. Nothing in your code
you posted prevents it.
5. Heck, why not precompute m_split*m_split / m_cellSize ?
6. Biggest possibility by far: PASS VectorType location/VectorType
point by reference!!!!

7. Repost after you time and find your bottlenecks
hth,
Joe