questions about the STL sort()

feverzsj · Jul 8, 2008

STL sort() as an implementation of introsort, below is the code
snippet from the gnu stl.

template<typename _RandomAccessIterator, typename _Size>
void
__introsort_loop(_RandomAccessIterator __first,
_RandomAccessIterator __last,
_Size __depth_limit)
{
typedef typename iterator_traits<_RandomAccessIterator>::value_type
_ValueType;

while (__last - __first > _S_threshold)
{
if (__depth_limit == 0)
{
std:

artial_sort(__first, __last, __last);
return;
}
--__depth_limit;
_RandomAccessIterator __cut =
std::__unguarded_partition(__first, __last,
_ValueType(std::__median(*__first,
*(__first
+ (__last
- __first)
/ 2),
*(__last
- 1))));
std::__introsort_loop(__cut, __last, __depth_limit);
__last = __cut;
}
}

my question is : why not use two recursive calls to __introsort_loop,
it will have a clear view to present a divide-and-conquer algorithm.
Yes, it is a optimization,but tail recursion elimination is now a
common technique of today's cpp compiler.

another question is about the __final_insertion_sort() placed after
the call to __introsort_loop()
below is the code snippet.

template<typename _RandomAccessIterator>
void
__final_insertion_sort(_RandomAccessIterator __first,
_RandomAccessIterator __last)
{
if (__last - __first > _S_threshold)
{
std::__insertion_sort(__first, __first + _S_threshold);
std::__unguarded_insertion_sort(__first + _S_threshold, __last);
}
else
std::__insertion_sort(__first, __last);
}

why not just call one __insertion_sort in the if-condition?

Bernd Strieder · Jul 8, 2008

Hello,

STL sort() as an implementation of introsort, below is the code
snippet from the gnu stl.

my question is : why not use two recursive calls to __introsort_loop,
it will have a clear view to present a divide-and-conquer algorithm.
Yes, it is a optimization,but tail recursion elimination is now a
common technique of today's cpp compiler.

I'm not sure tail recursion does apply here. Since the code is there,
you could try out what the compilers do. I'm pretty sure that saving
one recursive call saves enough, e.g. initializing the iterators for
every recursive instance, to justify doing it that way. Automatically
transforming non-tail recursion away, even partially, is a pretty
complicated operation with too many problematic cases. I would not
expect any compiler sacrificing lots of time in fruitless analysis in
that area.

another question is about the __final_insertion_sort() placed after
the call to __introsort_loop()
below is the code snippet.

template<typename _RandomAccessIterator>
void
__final_insertion_sort(_RandomAccessIterator __first,
_RandomAccessIterator __last)
{
if (__last - __first > _S_threshold)
{
std::__insertion_sort(__first, __first + _S_threshold);
std::__unguarded_insertion_sort(__first + _S_threshold, __last);
}
else
std::__insertion_sort(__first, __last);
}

why not just call one __insertion_sort in the if-condition?

__unguarded_insertion_sort saves bounds-checking, therefore it needs
some elements less or equal in the area before first + _S_threshold. It
will always read at least at position __first + _S_threshold - 1, and
it will sort arbitrarily far before __first + _S_threshold. The big
question is, why there must be an element less then all in range
[__first + _S_threshold, __last) in the range [__first, __first +
_S_threshold). An that can only be found when looking at the places
calling this function.

I think this splitting should make an easily measurable difference when
sorting containers of small elements like integer.

Bernd Strieder

Juha Nieminen · Jul 8, 2008

feverzsj said:
my question is : why not use two recursive calls to __introsort_loop,
it will have a clear view to present a divide-and-conquer algorithm.
Yes, it is a optimization,but tail recursion elimination is now a
common technique of today's cpp compiler.

What would be the advantage of not removing the second recursive call?
The only effect is that it will start relying on the tail recursion
optimization to exist and succeed, for no good reason.

The code itself is not even intended to be looked at.

James Kanze · Jul 9, 2008

STL sort() as an implementation of introsort, below is the
code snippet from the gnu stl.

template<typename _RandomAccessIterator, typename _Size>
void
__introsort_loop(_RandomAccessIterator __first,
_RandomAccessIterator __last,
_Size __depth_limit)
{
typedef typename iterator_traits<_RandomAccessIterator>::value_type
_ValueType;

while (__last - __first > _S_threshold)
{
if (__depth_limit == 0)
{
std:artial_sort(__first, __last, __last);
return;
}
--__depth_limit;
_RandomAccessIterator __cut =
std::__unguarded_partition(__first, __last,
_ValueType(std::__median(*__first,
*(__first
+ (__last
- __first)
/ 2),
*(__last
- 1))));
std::__introsort_loop(__cut, __last, __depth_limit);
__last = __cut;
}
}

my question is : why not use two recursive calls to __introsort_loop,

Probably because the code was originally a quick sort (without
the __depth_limit), and they didn't want it to core dump due to
stack overflow.

it will have a clear view to present a divide-and-conquer
algorithm. Yes, it is a optimization, but tail recursion
elimination is now a common technique of today's cpp compiler.

First, I'm not sure that it's that common. Compilers optimize
for real code, and tail recursion isn't that frequent in C++.
And second, I'm not sure that the compiler can do the
optimization; if _RandomAccessIterator has a non-trivial
destructor, it certainly can't.

another question is about the __final_insertion_sort() placed after
the call to __introsort_loop()
below is the code snippet.

template<typename _RandomAccessIterator>
void
__final_insertion_sort(_RandomAccessIterator __first,
_RandomAccessIterator __last)
{
if (__last - __first > _S_threshold)
{
std::__insertion_sort(__first, __first + _S_threshold);
std::__unguarded_insertion_sort(__first + _S_threshold, __last);
}
else
std::__insertion_sort(__first, __last);
}

why not just call one __insertion_sort in the if-condition?

Probably optimization. Without seeing where this function is
called, and what the differences are between the two insertion
sorts, it's hard to say, but just guessing... this function is
called when the quick sort has partitionned the total range into
blocks of _S_threshold size, and all of the values in one block
are greater than the values in all previous blocks, and less
then the values in all following blocks. The standard insertion
sort will have to check for the bottom of the array in the inner
loop, something like (i and j are iterators):

for ( i = first + 1 ; i != last ; ++ i ) {
T tmp = *i ;
j = i ;
while ( j != first && *(j-1) > tmp ) {
*j = *(j-1) ;
-- j ;
}
*j = tmp ;
}

If you're sorting anything but the first block after the
partitionning, you don't need the j != first test in the inner
loop.

feverzsj · Jul 9, 2008

Hello,

Bernd said:
I'm not sure tail recursion does apply here. Since the code is there,
you could try out what the compilers do. I'm pretty sure that saving
one recursive call saves enough, e.g. initializing the iterators for
every recursive instance, to justify doing it that way. Automatically
transforming non-tail recursion away, even partially, is a pretty
complicated operation with too many problematic cases. I would not
expect any compiler sacrificing lots of time in fruitless analysis in
that area.

i see what u mean. STL is designed for both good portability and high
performance.
We can study a lot from it by comparing and rewriting.
If the __introsort_loop() was written like this:

__introsort_loop(/*...*/)
{
//...

if(__last - __first > _S_threshold)
{
//....
}
std::__introsort_loop(__cut, __last, __depth_limit);
std::__introsort_loop(__first, __cut, __depth_limit);
}

Then, a tail recursion elimination could just simply replace the
params by the new one ,and goto the beginning of procedure.
The compiler may also identify only __last has changed , and just
replace the _last by __cut.
Finally ,we get the same effect as the GNU STL __introsort_loop does.

__introsort_loop(/*...*/)
{
//...

recu: if(__last - __first > _S_threshold)
{
//....
}

std::__introsort_loop(__cut, __last, __depth_limit);
// tail recursion elimination could be just simply like this
__first=__first;
__last=__cut;
__depth_limit=__depth_limit;
goto recu;
}

__unguarded_insertion_sort saves bounds-checking, therefore it needs
some elements less or equal in the area before first + _S_threshold. It
will always read at least at position __first + _S_threshold - 1, and
it will sort arbitrarily far before __first + _S_threshold. The big
question is, why there must be an element less then all in range
[__first + _S_threshold, __last) in the range [__first, __first +
_S_threshold). An that can only be found when looking at the places
calling this function.

I think this splitting should make an easily measurable difference when
sorting containers of small elements like integer.

yeah ,that's a speed-up trick, but I have the same question as you
mentioned.
The place calling this function makes me doubt ,how could there must
be an element less then all in range [__first + _S_threshold, __last)
in the range [__first, __first + _S_threshold]

sort(_RandomAccessIterator __first, _RandomAccessIterator __last)
{

if (__first != __last)
{
std::__introsort_loop(__first, __last, __lg(__last - __first) * 2);
std::__final_insertion_sort(__first, __last); // the only palce it
calls
}
}

or the __introsort_loop() has assured such thing?

Furthermore ,the __unguarded_partition() uses a simple 2-way
partitioning with no bound check, becuz the median-of-three method has
assured so. I think a 3-way partitioning could be a better partition
routine for common use with a slight overhead

feverzsj · Jul 10, 2008

hello:

Probably because the code was originally a quick sort (without
the __depth_limit), and they didn't want it to core dump due to
stack overflow.

First, I'm not sure that it's that common. Compilers optimize
for real code, and tail recursion isn't that frequent in C++.
And second, I'm not sure that the compiler can do the
optimization; if _RandomAccessIterator has a non-trivial
destructor, it certainly can't.

thx,you remind me of cpp's RAII feature, and i'm still on the road
from "c+" to "c++", a long way to pass through.

Probably optimization. Without seeing where this function is
called, and what the differences are between the two insertion
sorts, it's hard to say, but just guessing... this function is
called when the quick sort has partitionned the total range into
blocks of _S_threshold size, and all of the values in one block
are greater than the values in all previous blocks, and less
then the values in all following blocks. The standard insertion
sort will have to check for the bottom of the array in the inner
loop, something like (i and j are iterators):

for ( i = first + 1 ; i != last ; ++ i ) {
T tmp = *i ;
j = i ;
while ( j != first && *(j-1) > tmp ) {
*j = *(j-1) ;
-- j ;
}
*j = tmp ;
}

If you're sorting anything but the first block after the
partitionning, you don't need the j != first test in the inner
loop.

below r the routines of __insertion_sort() and the
__unguarded_insertion_sort().
looks like __insertion_sort() is just a common implementation of
insertion sort,
and __unguarded_insertion_sort() is some kind insertion sort without
bound check
they'll be called right after the __introsort_loop(/*...*/).

__insertion_sort(_RandomAccessIterator __first,
_RandomAccessIterator __last)
{
if (__first == __last)
return;

for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
{
typename iterator_traits<_RandomAccessIterator>::value_type
__val = *__i;
if (__val < *__first)
{
std::copy_backward(__first, __i, __i + 1);
*__first = __val;
}
else
std::__unguarded_linear_insert(__i, __val);
}
}

__unguarded_insertion_sort(_RandomAccessIterator __first,
_RandomAccessIterator __last)
{
typedef typename iterator_traits<_RandomAccessIterator>::value_type
_ValueType;

for (_RandomAccessIterator __i = __first; __i != __last; ++__i)
std::__unguarded_linear_insert(__i, _ValueType(*__i));
}

as you mentioned:

...the total range into blocks of _S_threshold size, and all of the values in one block are greater than the values in all previous blocks, and less then the values in all following blocks

it could be possibile, but i think the __unguarded_partition() may
fail to partition the array with reduplicate keys.

__unguarded_partition(_RandomAccessIterator __first,
_RandomAccessIterator __last, _Tp __pivot)
{
while (true)
{
while (*__first < __pivot)
++__first;
--__last;
while (__pivot < *__last)
--__last;
if (!(__first < __last))
return __first;
std::iter_swap(__first, __last);
++__first;
}
}

compile error trying to call STL sort	7	Jan 10, 2006
About matching a template function	6	Jul 11, 2010
Whether to choose an overload of template function or a partialspecialization of a functor(function	0	Aug 12, 2013
what is the default constructor for the POD type	5	Feb 19, 2012
Recursion crash in STL on linux	14	Apr 27, 2010
problem compiling functor with sort	2	Oct 8, 2004
custom iterator	13	Sep 12, 2005
Discard qualifiers	2	Aug 24, 2007

questions about the STL sort()

feverzsj

Bernd Strieder

Juha Nieminen

James Kanze

feverzsj

feverzsj

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads