Containers and sorting objects vs. pointers

Discussion in 'C++' started by Matthias Kaeppler, Jul 15, 2005.

  1. Hi,

    in my program, I have to sort containers of objects which can be 2000
    items big in some cases. Since STL containers are based around copying
    and since I need to sort these containers quite frequently, I thought
    it'd be a better idea to manage additional containers which are
    initialized with pointers to the objects in the primary containers and
    sort those (only pointers have to be copied around then).

    However, that also means if I have 2000 objects, I have to create *two*
    containers holding 2000 objects each, the first with the real objects
    and the second with pointers to those.

    My question is:
    If those objects I'm talking about are between 8 and 16 bytes, would it
    be faster to simply perform the sort algos on the objects themselves or
    is it better to create that additional container with pointers and
    continue working with that?

    --
    Matthias Kaeppler
     
    Matthias Kaeppler, Jul 15, 2005
    #1
    1. Advertising

  2. Matthias Kaeppler

    pven Guest

    Hi,

    I think the best approach is to sort with the primary container itself
    with your object structures.

    Also, if you are planning to use an additional container that stores
    the pointer to the objects in the primary container, ensure that the
    objects in the primary container are on the heap.

    The reason for this is, When a container rearranges itself (lets say
    you keep adding objects and the container needs to move the objects to
    a new memory location which is bigger), it will shift objects around by
    calling the copy constructor of your object.

    So after such an operation, your pointers in the other container might
    become invalid.
     
    pven, Jul 15, 2005
    #2
    1. Advertising

  3. Matthias Kaeppler

    pven Guest

    > Also, if you are planning to use an additional container that stores
    > the pointer to the objects in the primary container, ensure that the
    > objects in the primary container are on the heap.
    >
    > The reason for this is, When a container rearranges itself (lets say
    > you keep adding objects and the container needs to move the objects to
    > a new memory location which is bigger), it will shift objects around by
    > calling the copy constructor of your object.
    >
    > So after such an operation, your pointers in the other container might
    > become invalid.


    My bad,

    The above logic applies only if you are using a vector container.

    After a little thought, I suggest you use a list container for storing
    the objects, Sorting and re-arranging will be efficent. List is similar
    to linked list datastructure..
     
    pven, Jul 15, 2005
    #3
  4. Matthias Kaeppler

    Mercator Guest

    Matthias Kaeppler wrote:
    > My question is:
    > If those objects I'm talking about are between 8 and 16 bytes, would it
    > be faster to simply perform the sort algos on the objects themselves or
    > is it better to create that additional container with pointers and
    > continue working with that?


    C++ has time functions which help you finding the answer. Performance
    assertions without measurement are moot. See also:
    http://www.informit.com/guides/content.asp?g=cplusplus&seqNum=156
     
    Mercator, Jul 15, 2005
    #4
  5. pven wrote:
    > My bad,
    >
    > The above logic applies only if you are using a vector container.
    >
    > After a little thought, I suggest you use a list container for storing
    > the objects, Sorting and re-arranging will be efficent. List is similar
    > to linked list datastructure..


    Yep, that's exactly how I do it right now; objects
    in a linked list, pointers in an std::vector (the
    latter mostly because IIRC std::partition doesn't
    work on lists).
     
    Matthias Kaeppler, Jul 15, 2005
    #5
  6. Mercator wrote:
    > Matthias Kaeppler wrote:
    >
    >>My question is:
    >>If those objects I'm talking about are between 8 and 16 bytes, would it
    >>be faster to simply perform the sort algos on the objects themselves or
    >>is it better to create that additional container with pointers and
    >>continue working with that?

    >
    >
    > C++ has time functions which help you finding the answer. Performance
    > assertions without measurement are moot. See also:
    > http://www.informit.com/guides/content.asp?g=cplusplus&seqNum=156
    >


    That's great, thanks. I have been looking for
    something like this for quite some time.
    Probably that'll answer my question most precisely
    (whereas I tend to agree with the other poster
    that it's probably less overhead to sort the real
    objects, since they can be copied with a trivial
    copy constructor and are quite lean).

    Regards,
    Matthias
     
    Matthias Kaeppler, Jul 15, 2005
    #6
  7. Matthias Kaeppler

    Bart Guest

    Matthias Kaeppler wrote:
    > Hi,
    >
    > in my program, I have to sort containers of objects which can be 2000
    > items big in some cases. Since STL containers are based around copying
    > and since I need to sort these containers quite frequently, I thought
    > it'd be a better idea to manage additional containers which are
    > initialized with pointers to the objects in the primary containers and
    > sort those (only pointers have to be copied around then).
    >
    > However, that also means if I have 2000 objects, I have to create *two*
    > containers holding 2000 objects each, the first with the real objects
    > and the second with pointers to those.
    >
    > My question is:
    > If those objects I'm talking about are between 8 and 16 bytes, would it
    > be faster to simply perform the sort algos on the objects themselves or
    > is it better to create that additional container with pointers and
    > continue working with that?


    As with anything performance-related, there's no point in discussing
    this without performing some tests, but I suspect the sorting algorithm
    would only copy the objects when required. Why do you think you would
    incur any penalty over copying them yourself?
     
    Bart, Jul 15, 2005
    #7
  8. Sorry, I'm not impressed with the answers so far.

    2000 is a small number. Try just sorting the container (use std::sort
    and provide either an operator< or a predicate). If this is not a
    performance problem, don't worry about it.

    If it is a performance problem, first check that it's implemented
    right:

    1. Make sure a < b < c ==> a < c and !(b < a). f your < operator or
    predicate is at all complicated you should add debugging tests to be
    sure. Otherwise std::sort() might underrun and give the impression of
    performance problem.

    2. Make sure you aren't running on Windows with condition of a sorted
    list + one unsorted item at the end. With only 2000 items this
    shouldn't be an issue, but if you had 100X this amount you may find
    yourself waiting minutes or hours for the sort to finish.

    3. If the implementation is correct and you still have a problem, there
    are two places your sort will be spending time:
    a) comparing
    b) swapping.

    (b) can be significant with large objects, especially if the object
    owns lots of additional data. There can be lots of swapping while you
    sort. As a general rule I never sort anything other than vectors of
    pointers or trivial objects, but I tend to deal with large numbers of
    objects (10s of thousands to millions). Here's where that heap comment
    someone else made came into play:

    If you add an item to a vector, every item in the vector can be moved.
    If you're going to have an auxiliary list of pointers, you must do one
    of four things:

    1) Allocate all items and never add another, or at least reserve() the
    required space. Now the pointers are fixed.
    2) Every time you add an item, toss your sorted list, and create it
    again as needed
    3) Use a linked list or other structure to hold your master list of
    objects. This keeps your objects from moving around
    4) Allocate objects on the heap, and only have vectors of pointers.

    I assume you don't have a problem copying objects in general because
    you're willing to have a vector of them by value. But this is the one
    time when you might consider using a linked list over some other data
    structure.

    Stuart
     
    Stuart MacMartin, Jul 15, 2005
    #8
  9. Three other little points:

    1. Sorting a singly linked list is really inefficent.
    Depending on the sort algorithm, sorting a
    doubly linked list can be inefficient

    2. If the objects are tight enough that you haven't used a heap
    but just have them in a vector, chances are sorting the vector
    will be good enough. If not, revisit your other decisions.

    3. The comparison operator makes a big difference
    the runtime of your sort:
    a) Make it as tight as possible
    b) Use sort() with an inlined operator< or a function object
    to make sure the comparison is inlined
    c) Never treat two objects as equal in your ordering.
    If you can have several equivalent objects, like "red" vs.
    "black",
    your comparison should return &obj1 < &obj2 if they are
    both red or both black.

    Stuart
     
    Stuart MacMartin, Jul 15, 2005
    #9
  10. Thanks Stuart, that was quite insightful.
    I am now simply holding a vector with the objects directly. Previously I
    had used a doubly linked list to hold the objects and a vector to hold
    the pointers.

    Here is a predicate function for sorting File objects by name ascending:

    struct FileLessNameA
    : std::binary_function<const File&,const File&,bool> {
    bool operator() (const File& lhs, const File& rhs) const {
    std::string path1( lhs.get_name().c_str() );
    std::string path2( rhs.get_name().c_str() );
    boost::algorithm::to_lower( path1 );
    boost::algorithm::to_lower( path2 );
    return path1 < path2;
    }
    };

    I am converting to all lowercase because I don't want FOO to be less
    than foo.

    PS: A File object is actually nothing more than a
    boost::filesystem::path and a reference counted smart pointer to a
    Gnome::Vfs::FileInfo object. I think it's maybe 8-16 bytes in size.

    --
    Matthias Kaeppler
     
    Matthias Kaeppler, Jul 16, 2005
    #10
  11. Matthias Kaeppler

    Mercator Guest

    Matthias Kaeppler wrote:
    > Here is a predicate function for sorting File objects by name ascending:
    >
    > struct FileLessNameA
    > : std::binary_function<const File&,const File&,bool> {
    > bool operator() (const File& lhs, const File& rhs) const {
    > std::string path1( lhs.get_name().c_str() );
    > std::string path2( rhs.get_name().c_str() );


    Oops, you copy (duplicate!) strings here all the time! Certainly a
    performance bottleneck. Better use something like stricmp:

    return stricmp (lhs.get_name().c_str(),rhs.get_name().c_str()) < 0;

    (assuming File::get_name() returns a reference to string)

    > boost::algorithm::to_lower( path1 );
    > boost::algorithm::to_lower( path2 );
    > return path1 < path2;
    > }
    > };
    >
    > I am converting to all lowercase because I don't want FOO to be less
    > than foo.
    >
    > PS: A File object is actually nothing more than a
    > boost::filesystem::path and a reference counted smart pointer to a
    > Gnome::Vfs::FileInfo object. I think it's maybe 8-16 bytes in size.


    BTW, do you really need _that_ complexity: Gnome, smart pointer, Boost?
     
    Mercator, Jul 16, 2005
    #11
  12. Mercator wrote:
    > Matthias Kaeppler wrote:
    >
    >>Here is a predicate function for sorting File objects by name ascending:
    >>
    >> struct FileLessNameA
    >> : std::binary_function<const File&,const File&,bool> {
    >> bool operator() (const File& lhs, const File& rhs) const {
    >> std::string path1( lhs.get_name().c_str() );
    >> std::string path2( rhs.get_name().c_str() );

    >
    >
    > Oops, you copy (duplicate!) strings here all the time! Certainly a
    > performance bottleneck. Better use something like stricmp:
    >
    > return stricmp (lhs.get_name().c_str(),rhs.get_name().c_str()) < 0;
    >


    What do you mean? I only want the lowercase conversion for comparison
    purposes, I don't want to modify the original strings (I am viewing
    directory contents, no way I would modify the filenames!).
    I /have/ to work with copies here.

    And where does that stricmp call take into account converting to all
    lowercase?

    > (assuming File::get_name() returns a reference to string)


    It returns by value.

    > BTW, do you really need _that_ complexity: Gnome, smart pointer, Boost?


    Yes. What should I say? I am developing a filemanager. I want an
    abstraction from the real filesystem, and that's what Gnome::Vfs does
    for me.
    I also want lambda expressions, smart pointers, easy working with paths
    and other convenience stuff, so I need Boost. For example:
    Boost.filesystem has functionality for determining root paths, easy
    concatenation of paths etc, which I found to be lacking in Gnome::Vfs.
    They combine quite nicely. Boost.Filesystem however has no decent
    functionality whatsoever for actually working with files (concurrent
    transfers with means to listen for progress, determining file types and
    so on).

    --
    Matthias Kaeppler
     
    Matthias Kaeppler, Jul 16, 2005
    #12
  13. Matthias Kaeppler

    Mercator Guest

    Matthias Kaeppler wrote:
    > Mercator wrote:
    > > Oops, you copy (duplicate!) strings here all the time! Certainly a
    > > performance bottleneck. Better use something like stricmp:
    > >
    > > return stricmp (lhs.get_name().c_str(),rhs.get_name().c_str()) < 0;

    >
    > What do you mean? I only want the lowercase conversion for comparison
    > purposes, I don't want to modify the original strings (I am viewing
    > directory contents, no way I would modify the filenames!).
    > I /have/ to work with copies here.


    stricmp doesn't change the string. It just compares strings case
    insensitive (of course, without creating/allocating new strings). BTW,
    you can easily write your own stricmp (with std::tolower) if your
    platform doesn't have one.

    > And where does that stricmp call take into account converting to all
    > lowercase?
    >
    > > (assuming File::get_name() returns a reference to string)

    >
    > It returns by value.


    Oops.

    > > BTW, do you really need _that_ complexity: Gnome, smart pointer, Boost?

    >
    > Yes. What should I say? I am developing a filemanager. I want an
    > abstraction from the real filesystem, and that's what Gnome::Vfs does
    > for me.
    > I also want lambda expressions, smart pointers, easy working with paths
    > and other convenience stuff, so I need Boost. For example:
    > Boost.filesystem has functionality for determining root paths, easy
    > concatenation of paths etc, which I found to be lacking in Gnome::Vfs.
    > They combine quite nicely. Boost.Filesystem however has no decent
    > functionality whatsoever for actually working with files (concurrent
    > transfers with means to listen for progress, determining file types and
    > so on).


    http://c2.com/cgi/wiki?KeepItSimple
    http://c2.com/cgi/wiki?DoTheSimplestThingThatCouldPossiblyWork
     
    Mercator, Jul 16, 2005
    #13
  14. Mercator wrote:
    > stricmp doesn't change the string. It just compares strings case
    > insensitive (of course, without creating/allocating new strings). BTW,
    > you can easily write your own stricmp (with std::tolower) if your
    > platform doesn't have one.


    Is stricmp platform independent? I don't think so! But if you know a
    platform independent wrapper for stricmp, let me know.

    > http://c2.com/cgi/wiki?KeepItSimple
    > http://c2.com/cgi/wiki?DoTheSimplestThingThatCouldPossiblyWork


    Honestly, I don't have the impression you even read what I wrote :-/
    Those "patterns", despite their poor expressiveness quite correct, are
    mostly bladdering about things which should be pretty obvious to anybody.

    So why not give concrete examples where you think my code could be made
    simpler without losing its level of abstraction and ease of use? Your
    suggestion for using stricmp may result in more efficient, but less
    portable code, so it's not a serious alternative.

    I'm also eager to hear what you suggest as a replacement for Gnome::Vfs
    and boost such that your solution is a) more efficient and b) still
    provides the same functionality.

    I don't have the impression that my code is overly complex, but I'm
    always open for /constructive/ feedback (those two articles were
    certainly not) which proves me wrong.

    Regards,
    Matthias

    --
    Matthias Kaeppler
     
    Matthias Kaeppler, Jul 16, 2005
    #14
  15. I did say keep your comparison function lean.
    In particular, anything allocating memory on the heap (like string
    copies) is a really bad idea.

    Isn't stricmp available everywhere?
    Whether it is or not, you're using std::string and I assume there's a
    standard method or function for comparing strings case insensitive.
    (Sorry, I don't use it so I don't know). Since you need it now, find
    out and you'll always know.

    BTW: using extra utility classes can make things simpler or more
    complex, depending on the situation. I didn't read those links either,
    but I assume they're a more long-winded way of quoting Einstein:
    "Everything should be made as simple as possible, but not simpler"

    Stuart
     
    Stuart MacMartin, Jul 16, 2005
    #15
  16. Stuart MacMartin wrote:
    > I did say keep your comparison function lean.
    > In particular, anything allocating memory on the heap (like string
    > copies) is a really bad idea.


    How do you know that std::string allocates memory on the heap? I can't
    remember standard C++ demanding that. AFAIK, this is up to the
    implementation.

    >
    > Isn't stricmp available everywhere?


    That's not the point.

    You may want to read this article by Matt Austern dealing with the
    problems of string comparison:
    http://lafstern.org/matt/col2_new.pdf

    > Whether it is or not, you're using std::string and I assume there's a
    > standard method or function for comparing strings case insensitive.


    No, there's not. Strings compare case sensitive. Read the article to
    learn why lexicographical_compare has its subtle problems.

    > BTW: using extra utility classes can make things simpler or more
    > complex, depending on the situation. I didn't read those links either,
    > but I assume they're a more long-winded way of quoting Einstein:
    > "Everything should be made as simple as possible, but not simpler"


    Let me answer this with a quote from Austerns paper:
    "For every complex problem there is a solution that is simple, neat, and
    wrong."
    Just like the suggestion using stricmp or std::lexicographical_compare.
    boost::algorithm::to_lower doesn't suffer from this problem, because it
    takes locales into account.

    However, instead of implementing the approach presented in Austerns
    paper on my own, I rather ask on the boost list if it's not already in
    their library (or something equivalent). Though I couldn't find it yet,
    I'd be surprised if it's not in there (if I'm not mistaken, Matt Austern
    is a long time contributor to the boost libraries).

    --
    Matthias Kaeppler
     
    Matthias Kaeppler, Jul 16, 2005
    #16
  17. Matthias Kaeppler wrote:
    > Stuart MacMartin wrote:
    >
    >> I did say keep your comparison function lean.
    >> In particular, anything allocating memory on the heap (like string
    >> copies) is a really bad idea.

    >
    >
    > How do you know that std::string allocates memory on the heap? I can't
    > remember standard C++ demanding that. AFAIK, this is up to the
    > implementation.


    .... and my C++ implementation indeed seems to allocate the string data
    on the heap. So much about that :D

    --
    Matthias Kaeppler
     
    Matthias Kaeppler, Jul 16, 2005
    #17
  18. Matthias Kaeppler

    Mercator Guest

    Matthias Kaeppler wrote:
    > Stuart MacMartin wrote:
    > > I did say keep your comparison function lean.
    > > In particular, anything allocating memory on the heap (like string
    > > copies) is a really bad idea.

    >
    > How do you know that std::string allocates memory on the heap? I can't
    > remember standard C++ demanding that. AFAIK, this is up to the
    > implementation.


    How would you implement a string template the holds (almost) arbitrary
    long strings without using the heap? (BTW, yes, know about 'small
    string optimization').

    > > Isn't stricmp available everywhere?

    >
    > That's not the point.
    >
    > You may want to read this article by Matt Austern dealing with the
    > problems of string comparison:
    > http://lafstern.org/matt/col2_new.pdf


    The point is that you need not allocate any new strings in order to
    compare two strings case insensitively. Things get a little more
    complicated if (and only if) you want to be 'locale aware'. But that's
    not the point ...
     
    Mercator, Jul 17, 2005
    #18
  19. Mercator wrote:
    > The point is that you need not allocate any new strings in order to
    > compare two strings case insensitively. Things get a little more
    > complicated if (and only if) you want to be 'locale aware'. But that's
    > not the point ...


    Yes, that's correct.
    I have already raised the point on boost users mailing list. We came to
    the conclusion that a functor similar to is_iequal for comparing using
    "less than" semantics would be a useful addition.

    I have slightly modified compare.hpp to feature this functor. Pavol
    Droba, the man behind the string algo library, has already agreed to add
    such a functor to the library if he finds the time. It will probably
    look something like this:

    struct is_iless
    {
    //! Constructor
    /*!
    \param Loc locales used for comparison
    */
    is_iless( const std::locale& Loc=std::locale() ) :
    m_Loc( Loc ) {}

    //! Function operator
    /*!
    Compare two operands. Case is ignored.
    */
    template< typename T1, typename T2 >
    bool operator ()( const T1& Arg1, const T2& Arg2 ) const
    {
    return std::toupper(Arg1,m_Loc)<std::toupper(Arg2,m_Loc);
    }

    private:
    std::locale m_Loc;
    };

    It works just fine and is exactly what I need. I can then use it as a
    predicate for std::lexicographical_compare to test two strings for a
    'less than' relation. Thus I can also get rid of the two copies without
    losing any of the functionality.

    Regards,
    Matthias

    --
    Matthias Kaeppler
     
    Matthias Kaeppler, Jul 17, 2005
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Koen
    Replies:
    1
    Views:
    500
  2. Marc Thrun
    Replies:
    15
    Views:
    861
    Tim Rentsch
    Oct 4, 2005
  3. Replies:
    7
    Views:
    554
    Pete Becker
    Jan 25, 2008
  4. cerr

    pointers, pointers, pointers...

    cerr, Apr 7, 2011, in forum: C Programming
    Replies:
    12
    Views:
    680
  5. Sebastian Mach
    Replies:
    5
    Views:
    313
Loading...

Share This Page