Optimizing a function?

Discussion in 'C++' started by carl, Dec 12, 2009.

  1. carl

    carl Guest

    I have a function that gets called for 512*512*350 times. In this function
    there is a loop that gets run each time the function is called:

    myFun(VectorType point) {
    std::vector<MyObj> & myobjs =this->FindMyObjs(point);
    int num = myobjs.size();

    for (int i=0; i<num; i++) {
    // Do stuff with each neighbor
    }
    }


    Where FindMyObjs(point) is a function that looks up the point in:

    std::map<int, std::vector<MyObj>> m_myMap;


    which has been created before the program is run:

    std::vector<MyObj> FindMyObjs(VectorType point) {
    int key = computeKey(point);
    return m_myMap[key];
    }

    The computeKey(point) looks like this:


    unsigned int computeKey(VectorType location ) {
    unsigned int key = 0;
    int x_dir = floor((location[0]/m_cellSize[0]));
    int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
    int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
    key = x_dir + y_dir + z_dir;
    return key;
    }



    The number of elements in the std::vector<MyObj> is fixed. When the vectors
    contains 216 elements the program takes around 50 seconds to complete (on a
    3 GHZ machine with 6GB RAM and code compiled for release, using Visual
    Studio 2008).

    I have tried to remove the code from the while body but is has almost no
    effect on the computation time.

    Am I missing something very basic c++ optimization skills there or is the
    program not meant to run faster on this machine?

    I have made sure that only references to the already existing data
    structures are used. Could it be the computeKey (3 floor operations) that
    is expensive when called so many times?
     
    carl, Dec 12, 2009
    #1
    1. Advertising

  2. * carl:
    > I have a function that gets called for 512*512*350 times. In this
    > function there is a loop that gets run each time the function is called:
    >
    > myFun(VectorType point) {
    > std::vector<MyObj> & myobjs =this->FindMyObjs(point);
    > int num = myobjs.size();
    >
    > for (int i=0; i<num; i++) {
    > // Do stuff with each neighbor
    > }
    > }
    >
    >
    > Where FindMyObjs(point) is a function that looks up the point in:
    >
    > std::map<int, std::vector<MyObj>> m_myMap;
    >
    >
    > which has been created before the program is run:
    >
    > std::vector<MyObj> FindMyObjs(VectorType point) {
    > int key = computeKey(point);
    > return m_myMap[key];
    > }
    >
    > The computeKey(point) looks like this:
    >
    >
    > unsigned int computeKey(VectorType location ) {
    > unsigned int key = 0;
    > int x_dir = floor((location[0]/m_cellSize[0]));
    > int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
    > int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
    > key = x_dir + y_dir + z_dir;
    > return key;
    > }
    >
    >
    >
    > The number of elements in the std::vector<MyObj> is fixed. When the
    > vectors contains 216 elements the program takes around 50 seconds to
    > complete (on a 3 GHZ machine with 6GB RAM and code compiled for release,
    > using Visual Studio 2008).
    >
    > I have tried to remove the code from the while body but is has almost no
    > effect on the computation time.
    >
    > Am I missing something very basic c++ optimization skills there or is
    > the program not meant to run faster on this machine?
    >
    > I have made sure that only references to the already existing data
    > structures are used. Could it be the computeKey (3 floor operations)
    > that is expensive when called so many times?


    It's all very roundabout.

    Try an array.


    Cheers & hth.,

    - Alf
     
    Alf P. Steinbach, Dec 12, 2009
    #2
    1. Advertising

  3. carl

    carl Guest

    "Alf P. Steinbach" <> wrote in message
    news:hg0c1j$dke$-september.org...
    >* carl:
    >> I have a function that gets called for 512*512*350 times. In this
    >> function there is a loop that gets run each time the function is called:
    >>
    >> myFun(VectorType point) {
    >> std::vector<MyObj> & myobjs =this->FindMyObjs(point);
    >> int num = myobjs.size();
    >>
    >> for (int i=0; i<num; i++) {
    >> // Do stuff with each neighbor
    >> }
    >> }
    >>
    >>
    >> Where FindMyObjs(point) is a function that looks up the point in:
    >>
    >> std::map<int, std::vector<MyObj>> m_myMap;
    >>
    >>
    >> which has been created before the program is run:
    >>
    >> std::vector<MyObj> FindMyObjs(VectorType point) {
    >> int key = computeKey(point);
    >> return m_myMap[key];
    >> }
    >>
    >> The computeKey(point) looks like this:
    >>
    >>
    >> unsigned int computeKey(VectorType location ) {
    >> unsigned int key = 0;
    >> int x_dir = floor((location[0]/m_cellSize[0]));
    >> int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
    >> int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
    >> key = x_dir + y_dir + z_dir;
    >> return key;
    >> }
    >>
    >>
    >>
    >> The number of elements in the std::vector<MyObj> is fixed. When the
    >> vectors contains 216 elements the program takes around 50 seconds to
    >> complete (on a 3 GHZ machine with 6GB RAM and code compiled for release,
    >> using Visual Studio 2008).
    >>
    >> I have tried to remove the code from the while body but is has almost no
    >> effect on the computation time.
    >>
    >> Am I missing something very basic c++ optimization skills there or is the
    >> program not meant to run faster on this machine?
    >>
    >> I have made sure that only references to the already existing data
    >> structures are used. Could it be the computeKey (3 floor operations)
    >> that is expensive when called so many times?

    >
    > It's all very roundabout.
    >
    > Try an array.
    >
    >
    > Cheers & hth.,
    >
    > - Alf


    Why should that be a solution? I only create the data once before running
    the actual algorithm and all lookups are made using references (pointers).
     
    carl, Dec 12, 2009
    #3
  4. "carl" <carl@.com> writes:

    > "Alf P. Steinbach" <> wrote in message
    > news:hg0c1j$dke$-september.org...
    >>* carl:


    >>> return m_myMap[key];


    This operation has a complexity of ln( number of elements in map ) / ln ( 2 )

    > Why should that be a solution? I only create the data once before


    The same operation applied to a vector is constant time.
     
    Eric Böse-Wolf, Dec 12, 2009
    #4
  5. carl

    carl Guest

    "Eric "Böse-Wolf"" <> wrote in message
    news:...
    > "carl" <carl@.com> writes:
    >
    >> "Alf P. Steinbach" <> wrote in message
    >> news:hg0c1j$dke$-september.org...
    >>>* carl:

    >
    >>>> return m_myMap[key];

    >
    > This operation has a complexity of ln( number of elements in map ) / ln
    > ( 2 )



    The number of elements (keys) in the map are never larger than 100. lg(100)
    approx= 7 operations so I don't think the overhead comes from using the map.



    the map is at most 216
    >
    >> Why should that be a solution? I only create the data once before

    >
    > The same operation applied to a vector is constant time.



    Find is linear in a vector and find is lgn for a map, thats why I use the
    map.
     
    carl, Dec 12, 2009
    #5
  6. carl

    Jonathan Lee Guest

    On Dec 12, 10:05 am, "carl" <carl@.com> wrote:
    > I have made sure that only references to the already existing data
    > structures are used. Could it be the computeKey  (3 floor operations) that
    > is expensive when called so many times?


    For testing purposes you could try extending VectorType to include the
    key field, and store the key for each point with the rest of the data.
    You should only need to calculate the key once for each object in that
    case. If your timing goes down significantly, there you go.

    If not, then you probably should reconsider how you've organized the
    data. Is it possible to iterate over your map instead of "find"-ing
    points one by one?

    In any event you should run it through Valgrind or similar to find out
    where the performance hit is. Otherwise it's all speculation.

    --Jonathan
     
    Jonathan Lee, Dec 12, 2009
    #6
  7. carl

    feverzsj Guest

    carl wrote:
    > I have a function that gets called for 512*512*350 times. In this function
    > there is a loop that gets run each time the function is called:
    >
    > myFun(VectorType point) {
    >   std::vector<MyObj> & myobjs  =this->FindMyObjs(point);
    >   int num = myobjs.size();
    >
    >   for (int i=0; i<num; i++) {
    >     // Do stuff with each neighbor
    >    }
    >
    > }
    >
    > Where FindMyObjs(point) is a function that looks up the point in:
    >
    >    std::map<int, std::vector<MyObj>> m_myMap;
    >
    > which has been created before the program is run:
    >
    > std::vector<MyObj> FindMyObjs(VectorType point) {
    >   int key = computeKey(point);
    >   return m_myMap[key];
    >
    > }
    >
    > The computeKey(point) looks like this:
    >
    >    unsigned int computeKey(VectorType location ) {
    >      unsigned int key = 0;
    >      int x_dir = floor((location[0]/m_cellSize[0]));
    >      int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
    >      int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
    >      key = x_dir + y_dir + z_dir;
    >      return key;
    >    }
    >
    > The number of elements in the std::vector<MyObj>  is fixed. When the vectors
    > contains 216 elements the program takes around 50 seconds to complete (on a
    > 3 GHZ machine with 6GB RAM and code compiled for release, using Visual
    > Studio 2008).
    >
    > I have tried to remove the code from the while body but is has almost no
    > effect on the computation time.
    >
    > Am I missing something very basic c++ optimization skills there or is the
    > program not meant to run faster on this machine?
    >
    > I have made sure that only references to the already existing data
    > structures are used. Could it be the computeKey  (3 floor operations) that
    > is expensive when called so many times?


    It's time to profile your program. A profiler can find out the
    bottleneck on a real machine, while a complexity analysis works on an
    abstract one(and is a very old one).
    For example, floor() could be expensive on some machine while cheap on
    others. The current system loads also affects your program.
    Furthermore, cache performance of your code affects the run time for
    about 20% ~ 200%. It's relates to the layout of your program's data
    and the pattern your program access the data in the memory.
    The optimization setting of your compiler is also important. Read the
    manual carefully.

    Since you can't offer a compilable source code, I can't say nothing
    for your code
     
    feverzsj, Dec 12, 2009
    #7
  8. carl ha scritto:
    >
    > std::vector<MyObj> FindMyObjs(VectorType point) {
    > int key = computeKey(point);
    > return m_myMap[key];
    > }


    Add an ampersand to the first line, and you'll be happy:
    std::vector<MyObj>& FindMyObjs(VectorType point) {

    --

    Carlo Milanesi
    http://digilander.libero.it/carlmila
     
    Carlo Milanesi, Dec 12, 2009
    #8
  9. carl

    Pavel Guest

    carl wrote:
    > I have a function that gets called for 512*512*350 times. In this
    > function there is a loop that gets run each time the function is called:
    >
    > myFun(VectorType point) {
    > std::vector<MyObj> & myobjs =this->FindMyObjs(point);
    > int num = myobjs.size();
    >
    > for (int i=0; i<num; i++) {
    > // Do stuff with each neighbor
    > }
    > }
    >
    >
    > Where FindMyObjs(point) is a function that looks up the point in:
    >
    > std::map<int, std::vector<MyObj>> m_myMap;
    >
    >
    > which has been created before the program is run:
    >
    > std::vector<MyObj> FindMyObjs(VectorType point) {
    > int key = computeKey(point);
    > return m_myMap[key];
    > }
    >
    > The computeKey(point) looks like this:
    >
    >
    > unsigned int computeKey(VectorType location ) {
    > unsigned int key = 0;
    > int x_dir = floor((location[0]/m_cellSize[0]));
    > int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
    > int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
    > key = x_dir + y_dir + z_dir;
    > return key;
    > }
    >
    >
    >
    > The number of elements in the std::vector<MyObj> is fixed. When the
    > vectors contains 216 elements the program takes around 50 seconds to
    > complete (on a 3 GHZ machine with 6GB RAM and code compiled for release,
    > using Visual Studio 2008).
    >
    > I have tried to remove the code from the while body but is has almost no
    > effect on the computation time.
    >
    > Am I missing something very basic c++ optimization skills there or is
    > the program not meant to run faster on this machine?
    >
    > I have made sure that only references to the already existing data
    > structures are used. Could it be the computeKey (3 floor operations)
    > that is expensive when called so many times?

    To profile was a good advice. If I had to guess, I would have tried to
    cache the key in the vector (the function does look expensive but,
    again, it is only a guess) and see if this helps. An extra int does not
    seem to overload your object as it seem to contain few floats already.

    Just my 2c

    -Pavel
     
    Pavel, Dec 13, 2009
    #9
  10. carl

    Rune Allnor Guest

    On 12 Des, 16:05, "carl" <carl@.com> wrote:
    > I have a function that gets called for 512*512*350 times. In this function
    > there is a loop that gets run each time the function is called:
    >
    > myFun(VectorType point) {
    >   std::vector<MyObj> & myobjs  =this->FindMyObjs(point);
    >   int num = myobjs.size();
    >
    >   for (int i=0; i<num; i++) {
    >     // Do stuff with each neighbor
    >    }
    >
    > }
    >
    > Where FindMyObjs(point) is a function that looks up the point in:
    >
    >    std::map<int, std::vector<MyObj>> m_myMap;
    >
    > which has been created before the program is run:
    >
    > std::vector<MyObj> FindMyObjs(VectorType point) {
    >   int key = computeKey(point);
    >   return m_myMap[key];
    >
    > }
    >
    > The computeKey(point) looks like this:
    >
    >    unsigned int computeKey(VectorType location ) {
    >      unsigned int key = 0;
    >      int x_dir = floor((location[0]/m_cellSize[0]));
    >      int y_dir = floor((location[1]/m_cellSize[1]))*m_split;
    >      int z_dir = floor((location[2]/m_cellSize[2]))*m_split*m_split;
    >      key = x_dir + y_dir + z_dir;
    >      return key;
    >    }
    >
    > The number of elements in the std::vector<MyObj>  is fixed. When the vectors
    > contains 216 elements the program takes around 50 seconds to complete (on a
    > 3 GHZ machine with 6GB RAM and code compiled for release, using Visual
    > Studio 2008).


    From what I can see, the code contains a lot of vector
    accesses. If correct, make sure your code is compiled
    with pre-processor directive SECURE_SCL=0, no buffer
    security check (flag /GS-) and with the enhanced
    instruction set enabled (flag /arch:SSE2).

    Rune
     
    Rune Allnor, Dec 13, 2009
    #10
  11. carl

    joseph cook Guest

    On Dec 12, 10:05 am, "carl" <carl@.com> wrote:
    > I have a function that gets called for 512*512*350 times. In this function
    > there is a loop that gets run each time the function is called:
    >

    <snip>
    > Am I missing something very basic c++ optimization skills there or is the
    > program not meant to run faster on this machine?
    >
    > I have made sure that only references to the already existing data
    > structures are used. Could it be the computeKey  (3 floor operations) that
    > is expensive when called so many times?


    You should have timed at least this much to know this.

    1. Are the values into the floor() function always non-negative? If
    so, doing int() instead of floor() may give you a large speed-up.
    2. On many systems, divide is (much) slower than multiply, so you
    could consider precomputing 1/m_cellSize[x] for each m_cellSize[x],
    and multiply by that value instead of dividing.
    3. A nit, not an optimization, but you shouldn't pre-declare unsigned
    int key, just declare it when you define it. i.e. "unsigned int key ="
    4. Why aren't you pre-computing m_split*m_split. Nothing in your code
    you posted prevents it.
    5. Heck, why not precompute m_split*m_split / m_cellSize[2] ?
    6. Biggest possibility by far: PASS VectorType location/VectorType
    point by reference!!!!

    7. Repost after you time and find your bottlenecks
    hth,
    Joe
     
    joseph cook, Dec 13, 2009
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nickolay Kolev

    Optimizing a text statistics function

    Nickolay Kolev, Apr 21, 2004, in forum: Python
    Replies:
    13
    Views:
    568
    Terry Reedy
    Apr 22, 2004
  2. Jack
    Replies:
    2
    Views:
    346
  3. puneet vyas

    optimizing function

    puneet vyas, Feb 26, 2009, in forum: C++
    Replies:
    0
    Views:
    505
    puneet vyas
    Feb 26, 2009
  4. puneet vyas

    optimizing function

    puneet vyas, Feb 26, 2009, in forum: C++
    Replies:
    0
    Views:
    294
    puneet vyas
    Feb 26, 2009
  5. Optimizing pow() function

    , Apr 22, 2013, in forum: C Programming
    Replies:
    45
    Views:
    814
    Seebs
    Apr 26, 2013
Loading...

Share This Page