overhead of using std::sort

Discussion in 'C++' started by Ireneusz SZCZESNIAK, Jul 29, 2004.

  1. I want to sort a vector with the std::sort function. There are two
    functions: one with two arguments, the other with three arguments. I
    am using the one with three arguments. I noticed that there is a huge
    overhead of using this function, which comes from copying the function
    object, i.e., the object that compares the elements of the vector,
    which is passed to the function not by reference but by value.

    Now, at the end of this mail there is a source code. It's not the
    actual code of the problem I have, but only a distilled version of it.
    Here the SO class has no fields, so copying an object of this class
    is not a big deal. However, in my actual problem the function object
    has a very large vector inside, and this function object should not be
    copied even once. If you run the program, you'll see the function
    object is copied seven times for sorting a vector with three
    elements!

    So, the problem is that when I use the std::sort function the function
    object is copied many times! The larger the sequence to sort, the
    greater the number of copies of the function object. It's very bad
    for me because I sort small number of elements (up to four), but I do
    the sorting many times (millions).

    I want to use the sort function in such a way that the function object
    is not coppied. I tried to call the sort function this way:

    sort<vector<irek *>::iterator, SO &>(v.begin(), v.end(), so);

    but it reduced the number of copies by one only.

    I am convinced that I am doing something wrong. I would appreciate
    your advice on how to solve my problem.


    Thanks & best,
    Irek


    ****************************************
    Ireneusz SZCZESNIAK - research assistant
    http://www.iitis.gliwice.pl/~iszczesniak
    ****************************************

    *********************************************************************

    #include <algorithm>
    #include <vector>
    #include <iostream>
    #include <functional>

    using namespace std;

    class irek
    {
    int i;

    public:
    irek(int _i) {i = _i;}
    int get_i() const {return i;}
    };

    struct SO : binary_function<irek, irek, bool>
    {
    public:
    SO() {}
    SO(const SO&)
    {
    cout << "coppied!" << endl;
    }

    result_type operator()(const first_argument_type *a,
    const second_argument_type *b)
    {
    return a->get_i() < b->get_i();
    }
    };


    main()
    {
    // this is our function object
    SO so;

    irek a(2), b(3), c(1);

    vector<irek *> v;
    v.push_back(&b);
    v.push_back(&c);
    v.push_back(&a);

    sort(v.begin(), v.end(), so);

    for(int i = 0; i < v.size(); i++)
    cout << v->get_i() << endl;
    }
     
    Ireneusz SZCZESNIAK, Jul 29, 2004
    #1
    1. Advertising

  2. Ireneusz SZCZESNIAK, UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR. wrote:
    >
    > Now, at the end of this mail there is a source code. It's not the
    > actual code of the problem I have, but only a distilled version of it.
    > Here the SO class has no fields, so copying an object of this class
    > is not a big deal. However, in my actual problem the function object
    > has a very large vector inside,


    What for?
    The purpose of the functor is to compare 2 objects. Nothing more,
    nothing less.

    > and this function object should not be
    > copied even once. If you run the program, you'll see the function
    > object is copied seven times for sorting a vector with three
    > elements!


    If you really need that vector inside the functor, well, you can
    always have the vector outside the class and just give a pointer
    to it to the functor object. This way copying the functor object
    stays cheap.

    --
    Karl Heinz Buchegger
     
    Karl Heinz Buchegger, Jul 29, 2004
    #2
    1. Advertising

  3. "Ireneusz SZCZESNIAK" <>;
    <UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR.> wrote in message
    news:p...
    > I want to sort a vector with the std::sort function. There are two
    > functions: one with two arguments, the other with three arguments. I
    > am using the one with three arguments. I noticed that there is a huge
    > overhead of using this function, which comes from copying the function
    > object, i.e., the object that compares the elements of the vector,
    > which is passed to the function not by reference but by value.
    >
    > Now, at the end of this mail there is a source code. It's not the
    > actual code of the problem I have, but only a distilled version of it.
    > Here the SO class has no fields, so copying an object of this class
    > is not a big deal. However, in my actual problem the function object
    > has a very large vector inside, and this function object should not be
    > copied even once. If you run the program, you'll see the function
    > object is copied seven times for sorting a vector with three
    > elements!
    >
    > So, the problem is that when I use the std::sort function the function
    > object is copied many times! The larger the sequence to sort, the
    > greater the number of copies of the function object. It's very bad
    > for me because I sort small number of elements (up to four), but I do
    > the sorting many times (millions).
    >
    > I want to use the sort function in such a way that the function object
    > is not coppied. I tried to call the sort function this way:
    >
    > sort<vector<irek *>::iterator, SO &>(v.begin(), v.end(), so);
    >
    > but it reduced the number of copies by one only.
    >
    > I am convinced that I am doing something wrong. I would appreciate
    > your advice on how to solve my problem.
    >


    Recode your function object so that it holds a pointer to the vector data
    instead of holding the vector itself. This will make it cheap to copy. You
    could use an ordinary pointer for this, but a better choice might be the
    shared array class from boost. See
    http://www.boost.org/libs/smart_ptr/smart_ptr.htm

    john
     
    John Harrison, Jul 29, 2004
    #3
  4. Ireneusz SZCZESNIAK

    tom_usenet Guest

    On Thu, 29 Jul 2004 13:42:24 +0200, Ireneusz SZCZESNIAK
    <>, UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR.
    wrote:

    >I want to sort a vector with the std::sort function. There are two
    >functions: one with two arguments, the other with three arguments. I
    >am using the one with three arguments. I noticed that there is a huge
    >overhead of using this function, which comes from copying the function
    >object, i.e., the object that compares the elements of the vector,
    >which is passed to the function not by reference but by value.


    "huge overhead"!? Most function objects are stateless, and the
    overhead of copying them is precisely zero.

    Obviously taking the function object by reference is not a good
    option:

    sort(v.begin(), v.end(), functor()); //error!

    since you can't bind a temporary to a non-const refence. Equally,
    passing by const reference requires your operator() to be a const
    member function, which might be annoying (requiring mutable for any
    members you want to change).

    So passing by value is the best option.

    However, I don't see any reason why the internals of std::sort need to
    copy the functor - they could certainly pass it around by non-const
    reference; that's just a quality of implementation issue though - the
    standard allows the function object to be copied an arbitrary number
    of times by any algorithm.

    >Now, at the end of this mail there is a source code. It's not the
    >actual code of the problem I have, but only a distilled version of it.
    >Here the SO class has no fields, so copying an object of this class
    >is not a big deal. However, in my actual problem the function object
    >has a very large vector inside, and this function object should not be
    >copied even once. If you run the program, you'll see the function
    >object is copied seven times for sorting a vector with three
    >elements!


    The solution is to use reference semantics. For example, your function
    object should contain a reference to the vector, not the vector
    itself. Then the compiler generated copy constructor will be efficient
    and fast, and with inlining, the overhead may well be zero.

    >So, the problem is that when I use the std::sort function the function
    >object is copied many times! The larger the sequence to sort, the
    >greater the number of copies of the function object. It's very bad
    >for me because I sort small number of elements (up to four), but I do
    >the sorting many times (millions).


    If you are only sorting up to 4 elements, you could come up with a
    faster algorithm than std::sort in any case. An explicit if ladder
    with no loop will be fast - in effect unrolling the entire sorting
    operation.

    >I want to use the sort function in such a way that the function object
    >is not coppied. I tried to call the sort function this way:
    >
    >sort<vector<irek *>::iterator, SO &>(v.begin(), v.end(), so);
    >
    >but it reduced the number of copies by one only.


    Yes, that is not a solution.

    >I am convinced that I am doing something wrong. I would appreciate
    >your advice on how to solve my problem.


    If performance is important, make sure your function objects are cheap
    to copy.

    Tom
     
    tom_usenet, Jul 29, 2004
    #4
  5. On Thu, 29 Jul 2004, John Harrison wrote:

    > Recode your function object so that it holds a pointer to the vector
    > data instead of holding the vector itself.


    This is the way I have it now implemented.

    > This will make it cheap to copy.


    But it still costs to copy the function object.

    > You could use an ordinary pointer for this, but a better choice
    > might be the shared array class from boost. See
    > http://www.boost.org/libs/smart_ptr/smart_ptr.htm


    Thanks, I will have a look.


    Irek

    ****************************************
    http://www.iitis.gliwice.pl/~iszczesniak
    ****************************************
     
    Ireneusz SZCZESNIAK, Jul 29, 2004
    #5
  6. "Ireneusz SZCZESNIAK" <> wrote in message
    news:p...
    > On Thu, 29 Jul 2004, John Harrison wrote:
    >
    > > Recode your function object so that it holds a pointer to the vector
    > > data instead of holding the vector itself.

    >
    > This is the way I have it now implemented.
    >
    > > This will make it cheap to copy.

    >
    > But it still costs to copy the function object.
    >


    IIRC the standard allows the std::sort to make copies of the supplied
    function object. Function objects must be effectively stateless, so that if
    the sort algorithm uses one copy half the time and another copy the other
    half of the time it should not matter.

    I would imagine that your function object is being copied to other helper
    algorithms internal to std::sort. You are just going to have to live with
    this behaviour.

    john
     
    John Harrison, Jul 29, 2004
    #6
  7. > "huge overhead"!? Most function objects are stateless, and the
    > overhead of copying them is precisely zero.


    My functor is not stateless, because it needs some extra data. It
    used to have a vector inside, but now it has just a pointer to this
    vector. Coping my functor requires at least coping a pointer.

    I believe it's huge. For sorting 3 elements the functor was copied 7
    times, for 1000 elements - 1328 times, and for 1000000 elements -
    1312948 times. These numbers slightly change with different element
    arrangements in the vector.

    > However, I don't see any reason why the internals of std::sort need
    > to copy the functor - they could certainly pass it around by
    > non-const reference; (...)


    When you implement templated std::sort this way, then you can pass
    both a functor and a pointer to a function. If std::sort took a
    functor by reference, then you would not be able to pass a pointer to
    a function. All the std::sort function want to do with this that you
    pass is apply to it the "()" operator.

    > If you are only sorting up to 4 elements, you could come up with a
    > faster algorithm than std::sort in any case. An explicit if ladder
    > with no loop will be fast - in effect unrolling the entire sorting
    > operation.


    Now it's up to four elements, but soon I will sort 16, 64, 128 and
    more elements. I need to stick to std::sort.

    I didn't realize that the C++ standard imposes no restrictions on
    copying functors. This is why implementations are free to make those
    copies of my functor.


    Thanks,
    Irek
     
    Ireneusz SZCZESNIAK, Jul 29, 2004
    #7
  8. On Thu, 29 Jul 2004, John Harrison wrote:

    > Function objects must be effectively stateless, so that if the sort
    > algorithm uses one copy half the time and another copy the other
    > half of the time it should not matter.


    When I create my functor I give it some state, and this state will not
    change later. But different functors have different states. Well, a
    state for me is just the value of a pointer field in my functor. When
    I create a functor I pass a some pointer to its constructor.

    > I would imagine that your function object is being copied to other
    > helper algorithms internal to std::sort.


    Yes, this is what happens.

    > You are just going to have to live with this behaviour.


    I guess so.


    Thanks,
    Irek

    ****************************************
    http://www.iitis.gliwice.pl/~iszczesniak
    ****************************************
     
    Ireneusz SZCZESNIAK, Jul 29, 2004
    #8
  9. Ireneusz SZCZESNIAK

    tom_usenet Guest

    On Thu, 29 Jul 2004 15:54:39 +0200, Ireneusz SZCZESNIAK
    <> wrote:

    >> "huge overhead"!? Most function objects are stateless, and the
    >> overhead of copying them is precisely zero.

    >
    >My functor is not stateless, because it needs some extra data. It
    >used to have a vector inside, but now it has just a pointer to this
    >vector. Coping my functor requires at least coping a pointer.
    >
    >I believe it's huge. For sorting 3 elements the functor was copied 7
    >times, for 1000 elements - 1328 times, and for 1000000 elements -
    >1312948 times. These numbers slightly change with different element
    >arrangements in the vector.


    That is a surprisingly large number of copies (O(n) by the look of
    things). Clearly, if copying the functor is significantly more
    expensive than, say, calling the comparator, it will become the
    performance limiting factor until n is really huge.

    >> However, I don't see any reason why the internals of std::sort need
    >> to copy the functor - they could certainly pass it around by
    >> non-const reference; (...)

    >
    >When you implement templated std::sort this way, then you can pass
    >both a functor and a pointer to a function. If std::sort took a
    >functor by reference, then you would not be able to pass a pointer to
    >a function.


    You would, just not an rvalue one. If you pass just the function name,
    it would work fine, since you can form a reference to a function.

    But I wasn't suggesting that sort took its parameter by reference (I
    posted details of why not), but that the internals might. e.g.

    template...
    void sort_aux(It begin, It end, Comp& comp);

    template...
    void sort(It begin, It end, Comp comp)
    {
    sort_aux(begin, end, comp);
    }

    That would work fine for function pointers, since inside sort, comp is
    an lvalue. However, it would break if you explicitly made Comp a
    reference type (due to trying to form a reference to reference).
    Obviously an implementation could get around this, using
    boost::add_reference or similar.

    >> If you are only sorting up to 4 elements, you could come up with a
    >> faster algorithm than std::sort in any case. An explicit if ladder
    >> with no loop will be fast - in effect unrolling the entire sorting
    >> operation.

    >
    >Now it's up to four elements, but soon I will sort 16, 64, 128 and
    >more elements. I need to stick to std::sort.
    >
    >I didn't realize that the C++ standard imposes no restrictions on
    >copying functors. This is why implementations are free to make those
    >copies of my functor.


    Personally I'm quite surprised at the huge number of copies - I
    imagined there would be O(log n) copies at most (one for each divide
    and conquer, if you see what I mean), but I'm sure if I think about it
    the reason that it's O(n) copies may become clear. But, as you say
    (and I said), it is conforming in any case.

    Tom
     
    tom_usenet, Jul 29, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Jansson
    Replies:
    5
    Views:
    6,316
    Ivan Vecerina
    Mar 17, 2005
  2. Jeffrey Walton
    Replies:
    10
    Views:
    942
    Mathias Gaunard
    Nov 26, 2006
  3. Replies:
    1
    Views:
    489
    Marcus Kwok
    Apr 6, 2007
  4. Stephen Horne
    Replies:
    3
    Views:
    602
    Stephen Horne
    Aug 5, 2009
  5. Navin
    Replies:
    1
    Views:
    702
    Ken Schaefer
    Sep 9, 2003
Loading...

Share This Page