std::sort

J

Jeff Schwab

Would std::sort ever compare an object with itself? I'm not talking
about two distinct, equal-valued objects, but rather this == &that.
The container being sorted is a std::vector.

I've never seen this, but a coworker says he is. NB: I can't post
sample code that reproduces the issue, nor do I claim any bug in the
STL implementation (GCC 3.4.2). I'm just hoping a definitive answer
resides in one of the brains who frequent this group.
 
V

Victor Bazarov

Jeff said:
Would std::sort ever compare an object with itself?

I don't believe the Standard prohibits that.
I'm not talking
about two distinct, equal-valued objects, but rather this == &that.
The container being sorted is a std::vector.

Objects are never compared using '=='. They are compared using '<'
or the functor provided.
I've never seen this, but a coworker says he is. NB: I can't post
sample code that reproduces the issue, nor do I claim any bug in the
STL implementation (GCC 3.4.2). I'm just hoping a definitive answer
resides in one of the brains who frequent this group.

Tell your coworker to provide you with proof.

V
 
M

Mark P

Jeff said:
Would std::sort ever compare an object with itself? I'm not talking
about two distinct, equal-valued objects, but rather this == &that.
The container being sorted is a std::vector.

I've never seen this, but a coworker says he is. NB: I can't post
sample code that reproduces the issue, nor do I claim any bug in the
STL implementation (GCC 3.4.2). I'm just hoping a definitive answer
resides in one of the brains who frequent this group.

Well, first of all note that std:sort doesn't use == for comparison. It
either uses operator< or, optionally, a user-specified comparison
functor. Now with the understanding that, for sort, "a == b" means both
a < b and b < a are false...

I can't think of any reason why it *would* compare an object with
itself, but I don't know any reason why it *can't*. This ought to be
easy to check for your specific implementation by either looking at the
header file or by writing a comparison functor that will report if this
happens.
 
J

Jeff Schwab

I don't believe the Standard prohibits that.


Objects are never compared using '=='. They are compared using '<'
or the functor provided.

The standard sort algorithm defaults to using std::less. Since we are
sorting a container of pointers, we provide our own comparison
routine, which internally compares the results of the dereferenced
pointers. The pointers themselves are first checked for equality,
since an object is always equal to itself. This is very common, basic
stuff, and I assume you already know it.
Tell your coworker to provide you with proof.

No need. The following example took ~ 2 min. to write and debug.
Note the explicit operator== to which I referred earlier. I'm using a
POD object type here (int) for simplicity.

#include <iostream>

template<typename T>
bool
deref_less(T* p, T* q)
{
if (p == q) {
std::clog << "warning: object at " << p << " compared to
itself.\n";
}

return *p < *q;
}

#include <algorithm>
#include <vector>

int
main()
{
typedef int Object;

std::vector<Object*> v;

for (int i = 0; i < 1000; ++i) {
v.push_back(new Object(i));
}

std::sort(v.begin(), v.end(), deref_less<Object>);

// cleanup ...

return 0;
}
 
J

Jeff Schwab

I can't think of any reason why it *would* compare an object with
itself, but I don't know any reason why it *can't*.

That's what I told the coworker. Not very satisfying, is it? :)
This ought to be
easy to check for your specific implementation by either looking at the
header file or by writing a comparison functor that will report if this
happens.

We know it's happening, because our comparison function is telling us
so. Just "looking" at the header file -- I assume you mean the
template implementation file #included by <algorithm> -- is one thing,
but understanding it is quite another. My (mis?)understanding of
std::sort is that it's a modified form of qsort that switches to
something like shellsort for particular sorts (no pun) of input.
 
M

Mark P

Jeff said:
That's what I told the coworker. Not very satisfying, is it? :)

May I ask why this matters? Is the code unstable against these sorts of
comparisons, is it an efficiency concern, or something else?
We know it's happening, because our comparison function is telling us
so. Just "looking" at the header file -- I assume you mean the
template implementation file #included by <algorithm> -- is one thing,
but understanding it is quite another. My (mis?)understanding of
std::sort is that it's a modified form of qsort that switches to
something like shellsort for particular sorts (no pun) of input.

Agreed that this is probably not a trivial task. The details will vary
by implementation but I took a quick glance at mine. It appears to be
only a couple hundred lines of code to digest so, if it's really
important to know, I think it's doable in fairly short order. If you
have a good debugger that will let you step through the header file
implementation then you may find it even easier.
 
V

Victor Bazarov

Jeff said:
The standard sort algorithm defaults to using std::less. Since we are
sorting a container of pointers, we provide our own comparison
routine, which internally compares the results of the dereferenced
pointers. The pointers themselves are first checked for equality,
since an object is always equal to itself. This is very common, basic
stuff, and I assume you already know it.

Actually, not all objects are equal to themselves. For example, the
floating point "Not-A-Number" (NaN) yields false when compared with
itself using '=='. This is very basic stuff and I assume you already
know it.

The point is not that somewhere in the guts of your functor you use
'==' but that the 'std::sort' never uses '==' on the objects.
No need. The following example took ~ 2 min. to write and debug.
Note the explicit operator== to which I referred earlier. I'm using a
POD object type here (int) for simplicity.

And?... What do you expect I should get from your example? What do
you expect yourself to derive from your example? You're testing one
particular implementation of 'std::sort'.

Nothing in the Standard says that 'std::sort' never compares the object
to itself (IOW, passes the same reference to the provided functor). It
is your responsibility to provide the functor that functions correctly
(according to your model) when an object is compared to itself.

Nothing in the Standard says what algorithm is used in 'std::sort'.
#include <iostream>

template<typename T>
bool
deref_less(T* p, T* q)
{
if (p == q) {
std::clog << "warning: object at " << p << " compared to
itself.\n";
}

return *p < *q;
}

#include <algorithm>
#include <vector>

int
main()
{
typedef int Object;

std::vector<Object*> v;

for (int i = 0; i < 1000; ++i) {
v.push_back(new Object(i));
}

std::sort(v.begin(), v.end(), deref_less<Object>);

// cleanup ...

return 0;
}

V
 
J

Jeff Schwab

May I ask why this matters? Is the code unstable against these sorts of
comparisons, is it an efficiency concern, or something else?

It's purely a matter of curiosity.
Agreed that this is probably not a trivial task. The details will vary
by implementation but I took a quick glance at mine. It appears to be
only a couple hundred lines of code to digest so, if it's really
important to know, I think it's doable in fairly short order. If you
have a good debugger that will let you step through the header file
implementation then you may find it even easier.

Thanks for doing the reconn; you've inspired me.

A little further digging into GCC's bits/stl_algo.h (which apparently
is a direct descendant of the original, SGI STL implementation) shows
that std::sort mainly uses introsort. Introsort is supposed to be
like quicksort for most inputs, but like heapsort for others.

The implementation used for GCC 4.1 is mostly a loop around introsort,
followed by an insertionsort. The introsort recurses to a finite
depth, then just calls std::partial_sort, which looks like a pretty-
straightforward heapsort; it calls std::make_heap and std::sort_heap.
I wonder whether the compare-to-self operations are happening in
make_heap...
 
J

Jeff Schwab

Actually, not all objects are equal to themselves. For example, the
floating point "Not-A-Number" (NaN) yields false when compared with
itself using '=='. This is very basic stuff and I assume you already
know it.

Not all objects in the great wide world. All objects of the type
we're storing. One could always code bool operator==(T* other)
{ return false; }, but it would not be germane to this discussion.
The point is not that somewhere in the guts of your functor you use
'==' but that the 'std::sort' never uses '==' on the objects.

I did not mean to imply that std::sort was directly invoking
operator==. I said the condition my coworker was seeing, and which I
could not explain, was that two objects were identical in the sense
that this == &that.
And?... What do you expect I should get from your example? What do
you expect yourself to derive from your example? You're testing one
particular implementation of 'std::sort'.

Nothing in the Standard says that 'std::sort' never compares the object
to itself (IOW, passes the same reference to the provided functor). It
is your responsibility to provide the functor that functions correctly
(according to your model) when an object is compared to itself.

Nothing in the Standard says what algorithm is used in 'std::sort'.

Victor, I'm not claiming anything is wrong with std::sort, nor the
Standard, nor am I whining about any kind of problem. I'm asking a
question about behavior I had not expected. Could you please back off
the vitriol?
 
D

Daniel T.

Jeff Schwab said:
Would std::sort ever compare an object with itself? I'm not talking
about two distinct, equal-valued objects, but rather this == &that.
The container being sorted is a std::vector.

I've never seen this, but a coworker says he is. NB: I can't post
sample code that reproduces the issue, nor do I claim any bug in the
STL implementation (GCC 3.4.2). I'm just hoping a definitive answer
resides in one of the brains who frequent this group.

It seems your question has been answered. Apparently, there is nothing
in the standard that prohibits comparing an object to itself.

Using the sample code you provided, my implementation compares objects
to themselves 126 out of 7203 comparisons (less than 2%.) Using more
random input (so that the sort routine actually has to do something) it
was 121 out of 11839 calls (barely over 1%.)

If your comparison is very expensive, then I could see you wanting to do
a "if ( a == b ) return false;" at the beginning of the compare
function, but it would have to be pretty expensive before I bothered.
 
J

Jeff Schwab

It seems your question has been answered. Apparently, there is nothing
in the standard that prohibits comparing an object to itself.

Using the sample code you provided, my implementation compares objects
to themselves 126 out of 7203 comparisons (less than 2%.) Using more
random input (so that the sort routine actually has to do something) it
was 121 out of 11839 calls (barely over 1%.)

If your comparison is very expensive, then I could see you wanting to do
a "if ( a == b ) return false;" at the beginning of the compare
function, but it would have to be pretty expensive before I bothered.

Roughly 100x more expensive, I take it. Any idea why the comparisons
happen at all?
 
J

Juha Nieminen

Jeff said:
Roughly 100x more expensive, I take it. Any idea why the comparisons
happen at all?

My guess: Quicksort uses a pivot value for partitioning. The pivot
value is one of the elements in the array being sorted. The pivot value
is being compared to all the values in the array being sorted. Hence it
would seem natural that the pivot value would at some point be compared
to itself.
(It might be possible to implement quicksort in a way that the pivot
value is never compared with itself, but perhaps they simply didn't
bother? Perhaps making such an extra check would make the inner loop of
the partitioning slower?)
 
A

Andrey Tarasevich

Jeff said:
The standard sort algorithm defaults to using std::less. Since we are
sorting a container of pointers, we provide our own comparison
routine, which internally compares the results of the dereferenced
pointers. The pointers themselves are first checked for equality,
since an object is always equal to itself. This is very common, basic
stuff, and I assume you already know it.
...

Err... You are saying that what you are sorting is actually a container
of _pointers_ and that you provide a custom comparator that compares the
objects by dereferencing these pointers. In this case the "objects"
'std::sort' actually works with are the _pointers_. A quality
implementation probably will never compare an object with itself (even
though it is not directly prohibited). And quite possibly your
implementation never does.

What you seem to be having issues with here is the comparison of the
objects obtained by dereferencing pointers. This is, of course, always
possible. Just fill your container with identical pointers and call
'std::sort'. For 'std::sort' these _pointers_ are still different
objects, so it will try to compare them. Inside your comparator you will
end up comparing the same objects with themselves, of course, but that
has absolutely nothing to do with 'std::sort'. It is _you_, who's
comparing objects with themselves in this case, not 'std::sort'.
 
A

Andrey Tarasevich

Jeff said:
...
No need. The following example took ~ 2 min. to write and debug.
Note the explicit operator== to which I referred earlier. I'm using a
POD object type here (int) for simplicity.

#include <iostream>

template<typename T>
bool
deref_less(T* p, T* q)
{
if (p == q) {
std::clog << "warning: object at " << p << " compared to
itself.\n";
}

return *p < *q;
}

#include <algorithm>
#include <vector>

int
main()
{
typedef int Object;

std::vector<Object*> v;

for (int i = 0; i < 1000; ++i) {
v.push_back(new Object(i));
}

std::sort(v.begin(), v.end(), deref_less<Object>);

// cleanup ...

return 0;
}

This code does not demonstrate 'std::sort' comparing objects to itself.
'std::sort' in this case works with elements of 'v', which are pointers
of type 'Object*'. If you could demonstrate that 'std::sort' compared
say, 'v[5]' to 'v[5]' - that would be an attempt to "compare an object
to itself" that can be tied to 'std::sort's implementation. However,
when 'std::sort' compares 'v[3]' to 'v[7]' these are _two_ _different_
_objects_. Whether in the end they point to the same 'Object' object is
completely outside the limits of 'std::sort's jurisdiction and outside
the limits of the original question of "Would std::sort ever compare an
object with itself?".
 
A

Andrey Tarasevich

Andrey said:
This code does not demonstrate 'std::sort' comparing objects to itself.

Oops... Sorry. I haven't noticed that all values in 'v' are unique. In
that case, yes, this code does demonstrate what you want it to
demonstrate, assuming that it triggers the "warning".
 
J

Jeff Schwab

My guess: Quicksort uses a pivot value for partitioning. The
pivot value is one of the elements in the array being sorted. The
pivot value is being compared to all the values in the array being
sorted. Hence it would seem natural that the pivot value would at
some point be compared to itself.
(It might be possible to implement quicksort in a way that the
pivot value is never compared with itself, but perhaps they simply
didn't bother? Perhaps making such an extra check would make the
inner loop of the partitioning slower?)

Thanks, you nailed it; confirmed by inserting puts("HERE") in the
library header. Here is the (GPL) code from bits/stl_algo.h, sans
puts:

/**
* @if maint
* This is a helper function...
* @endif
*/
template<typename _RandomAccessIterator, typename _Tp, typename
_Compare>
_RandomAccessIterator
__unguarded_partition(_RandomAccessIterator __first,
_RandomAccessIterator __last,
_Tp __pivot, _Compare __comp)
{
while (true)
{
while (__comp(*__first, __pivot))
++__first;
--__last;
while (__comp(__pivot, *__last))
--__last;
if (!(__first < __last))
return __first;
std::iter_swap(__first, __last);
++__first;
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top