Optimizing a function?

carl · Dec 12, 2009

I have a function that gets called for 512*512*350 times. In this function
there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}
}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location[0]/m_cellSize[0]));
int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the vectors
contains 216 elements the program takes around 50 seconds to complete (on a
3 GHZ machine with 6GB RAM and code compiled for release, using Visual
Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is the
program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations) that
is expensive when called so many times?

Alf P. Steinbach · Dec 12, 2009

* carl:

I have a function that gets called for 512*512*350 times. In this
function there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}
}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location[0]/m_cellSize[0]));
int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the
vectors contains 216 elements the program takes around 50 seconds to
complete (on a 3 GHZ machine with 6GB RAM and code compiled for release,
using Visual Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is
the program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations)
that is expensive when called so many times?

It's all very roundabout.

Try an array.

Cheers & hth.,

- Alf

carl · Dec 12, 2009

Alf P. Steinbach said:
* carl:

I have a function that gets called for 512*512*350 times. In this
function there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}
}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location[0]/m_cellSize[0]));
int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the
vectors contains 216 elements the program takes around 50 seconds to
complete (on a 3 GHZ machine with 6GB RAM and code compiled for release,
using Visual Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is the
program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations)
that is expensive when called so many times?

Click to expand...

It's all very roundabout.

Try an array.

Cheers & hth.,

- Alf

Why should that be a solution? I only create the data once before running
the actual algorithm and all lookups are made using references (pointers).

Eric BÃ¶se-Wolf · Dec 12, 2009

carl said:
Alf P. Steinbach said:

* carl:

return m_myMap[key];

Click to expand...

Click to expand...

This operation has a complexity of ln( number of elements in map ) / ln ( 2 )

Why should that be a solution? I only create the data once before

The same operation applied to a vector is constant time.

carl · Dec 12, 2009

Eric "Böse-Wolf" said:
carl said:

Alf P. Steinbach said:

* carl:
return m_myMap[key];

Click to expand...

Click to expand...

This operation has a complexity of ln( number of elements in map ) / ln
( 2 )

The number of elements (keys) in the map are never larger than 100. lg(100)
approx= 7 operations so I don't think the overhead comes from using the map.

the map is at most 216

The same operation applied to a vector is constant time.

Find is linear in a vector and find is lgn for a map, thats why I use the
map.

Jonathan Lee · Dec 12, 2009

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations) that
is expensive when called so many times?

For testing purposes you could try extending VectorType to include the
key field, and store the key for each point with the rest of the data.
You should only need to calculate the key once for each object in that
case. If your timing goes down significantly, there you go.

If not, then you probably should reconsider how you've organized the
data. Is it possible to iterate over your map instead of "find"-ing
points one by one?

In any event you should run it through Valgrind or similar to find out
where the performance hit is. Otherwise it's all speculation.

--Jonathan

feverzsj · Dec 12, 2009

carl said:
I have a function that gets called for 512*512*350 times. In this function
there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}

}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];

}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location[0]/m_cellSize[0]));
int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the vectors
contains 216 elements the program takes around 50 seconds to complete (on a
3 GHZ machine with 6GB RAM and code compiled for release, using Visual
Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is the
program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations) that
is expensive when called so many times?

It's time to profile your program. A profiler can find out the
bottleneck on a real machine, while a complexity analysis works on an
abstract one(and is a very old one).
For example, floor() could be expensive on some machine while cheap on
others. The current system loads also affects your program.
Furthermore, cache performance of your code affects the run time for
about 20% ~ 200%. It's relates to the layout of your program's data
and the pattern your program access the data in the memory.
The optimization setting of your compiler is also important. Read the
manual carefully.

Since you can't offer a compilable source code, I can't say nothing
for your code

Carlo Milanesi · Dec 12, 2009

carl ha scritto:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

Add an ampersand to the first line, and you'll be happy:
std::vector<MyObj>& FindMyObjs(VectorType point) {

Pavel · Dec 13, 2009

carl said:
I have a function that gets called for 512*512*350 times. In this
function there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}
}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];
}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location[0]/m_cellSize[0]));
int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the
vectors contains 216 elements the program takes around 50 seconds to
complete (on a 3 GHZ machine with 6GB RAM and code compiled for release,
using Visual Studio 2008).

I have tried to remove the code from the while body but is has almost no
effect on the computation time.

Am I missing something very basic c++ optimization skills there or is
the program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations)
that is expensive when called so many times?

To profile was a good advice. If I had to guess, I would have tried to
cache the key in the vector (the function does look expensive but,
again, it is only a guess) and see if this helps. An extra int does not
seem to overload your object as it seem to contain few floats already.

Just my 2c

-Pavel

Rune Allnor · Dec 13, 2009

I have a function that gets called for 512*512*350 times. In this function
there is a loop that gets run each time the function is called:

myFun(VectorType point) {
std::vector<MyObj> & myobjs =this->FindMyObjs(point);
int num = myobjs.size();

for (int i=0; i<num; i++) {
// Do stuff with each neighbor
}

}

Where FindMyObjs(point) is a function that looks up the point in:

std::map<int, std::vector<MyObj>> m_myMap;

which has been created before the program is run:

std::vector<MyObj> FindMyObjs(VectorType point) {
int key = computeKey(point);
return m_myMap[key];

}

The computeKey(point) looks like this:

unsigned int computeKey(VectorType location ) {
unsigned int key = 0;
int x_dir = floor((location[0]/m_cellSize[0]));
int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
key = x_dir + y_dir + z_dir;
return key;
}

The number of elements in the std::vector<MyObj> is fixed. When the vectors
contains 216 elements the program takes around 50 seconds to complete (on a
3 GHZ machine with 6GB RAM and code compiled for release, using Visual
Studio 2008).

From what I can see, the code contains a lot of vector
accesses. If correct, make sure your code is compiled
with pre-processor directive SECURE_SCL=0, no buffer
security check (flag /GS-) and with the enhanced
instruction set enabled (flag /arch:SSE2).

Rune

joseph cook · Dec 13, 2009

I have a function that gets called for 512*512*350 times. In this function
there is a loop that gets run each time the function is called:

Am I missing something very basic c++ optimization skills there or is the
program not meant to run faster on this machine?

I have made sure that only references to the already existing data
structures are used. Could it be the computeKey (3 floor operations) that
is expensive when called so many times?

You should have timed at least this much to know this.

1. Are the values into the floor() function always non-negative? If
so, doing int() instead of floor() may give you a large speed-up.
2. On many systems, divide is (much) slower than multiply, so you
could consider precomputing 1/m_cellSize[x] for each m_cellSize[x],
and multiply by that value instead of dividing.
3. A nit, not an optimization, but you shouldn't pre-declare unsigned
int key, just declare it when you define it. i.e. "unsigned int key ="
4. Why aren't you pre-computing m_split*m_split. Nothing in your code
you posted prevents it.
5. Heck, why not precompute m_split*m_split / m_cellSize[2] ?
6. Biggest possibility by far: PASS VectorType location/VectorType
point by reference!!!!

7. Repost after you time and find your bottlenecks
hth,
Joe

Implementing a Q-Learning Algorithm with Logistic Regression Normalization in C++	0	Jun 4, 2025
Optimizing pow() function	45	Apr 22, 2013
Creating a vector of queues	2	Jan 23, 2025
Database Manager: A C++ Console Application	14	May 12, 2025
I am using 3D map generation with noise, the texture does not match the model.	1	Feb 11, 2026
optimizing function	0	Feb 26, 2009
optimizing function	0	Feb 26, 2009
Universal BMP Steganography Tool (AES-128-CTR + SP800-90A CSPRNG) Full Encoder/Decoder with 3LSB Payload, PasswordDerived Key & External Key File	4	Mar 26, 2026

Optimizing a function?

carl

Alf P. Steinbach

carl

Eric BÃ¶se-Wolf

carl

Jonathan Lee

feverzsj

Carlo Milanesi

Pavel

Rune Allnor

joseph cook

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads