Benchmarks

James Kanze · Nov 11, 2008

There is a published procedure that is very general that is one
the web somewhere that I can't find at the moment. The essence,
though, is that you use an oracle. It works like this: At each
step you tell me how you're going to pick your pivot, i.e., what
O(1) elements you are going to look at to choose the pivot. I
get to choose the values of the all of the rest. I choose them
so that they are all bigger (smaller) than your pivot. I don't
actually have to set their values until you look at them; I just
have to place bounds on them.

I'd vaguely thought of something like this. But it doesn't
really help for creating a test case. Especially if you use
rand() to choose the pivot.

Harald van DÄ³k · Nov 11, 2008

Yes, I do dispute it.

Then you should look up what exactly O(n*n) means. It isn't what you think
it is.

Antoninus Twink · Nov 11, 2008

O(N log N) is much faster than O(N * N).

Worthy to be a sig, I think...

Despite the patient explanations of half a dozen different people in
this thread, it's obvious that CBF is never going to grok the point
being made. As in so many other things, he is so pig-headed and
obstinate is his ignorance that he's simply unteachable.

CBFalconer · Nov 11, 2008

Jerry said:
(e-mail address removed) says...

[ ... ]

Not so. Extending Quicksort to include multiple fields does not
count. Mergesort remains simple. For an example, look at the
usage examples in hashlib (on my site).

Click to expand...

I'm not talking about using multiple fields. With an array,
quicksort is unstable because you partition by scanning from both
the beginning toward the and the end toward the middle, and when
you find two elements out of order with respect to the pivot
element, you swap them.

With linked lists, you pick your pivot value and separate your
items into two linked lists (then recursively sort those). Unlike
an array, however, it's trivial to splice new elements to the
ends of your linked lists. Having done that, you get a stable sort.

I see what you are talking about, but I don't think it is
worthwhile. The point of quicksort is the high speed of the
internalmost loop. Here you are making it relink several list
areas, rather than just jamming a single value into place. I think
you also need to make all your lists bidirectional, which is a
further complication (and slowdown).

Imagine operating on the sublist bagdcfe, after picking element d
as the marker. You will start by scanning e, then f, then c. This
requires removal from sublist dcfe and insertion after bag(c). Now
you have bagc and dfe. Now you scan bagc, and find g. I daresay
you want to insert it after e, getting dfeg. But what about a
slightly different input order, such as bafdcge? I think the
insertion points need revision (haven't really checked).

CBFalconer · Nov 11, 2008

James said:
Really? If I post a quicksort implementation here, could you
give me an algorithm which would generate the worst case?
(I guess there should a smiley here, because I don't think that
that's really what you meant. But just a half of one, because
I'd really like to find such. There are times where you do want
to test worst case.)

It exists and is available. I thought I had it here, but I can't
find it. I did try it out, and it works. Hopefully someone else
will come up with the right reference.

CBFalconer · Nov 11, 2008

James said:
.... snip ...

As I said, I'm interested, because I've been collecting
benchmarks of various hashing functions. Post a link to the
algorithm, and I'll add it in.

One of the original purposes of hashlib was to make testing hash
functions easy. It is much more than just that now, but still
fills the purpose. hashlib includes a couple of very simple, and
fast, hashes for text strings as a convenience for users. All
available on my site.

James Kanze · Nov 11, 2008

On Nov 11, 8:02 am, James Kanze <[email protected]> wrote:

[...]

There is an updated version of Bob Jenkin's hash that is faster.

Now he tells me

. I've just finished adding Paul's algorithm
(and I'll quote the results and comment on them below).

Another excellent hash is this
one:http://www.geocities.com/drone115b/Goulburn06.pdf

If you are hashing big keys, UMAC is
marvelous.http://fastcrypto.org/umac/2000/perf00.html

Two more to add. You're mention of big keys is significant,
however, see below.

[...]

I use frog.cpp as my testing harness. Is your hash testing
harness code available?

It's at my site, in the Test subsystem. While I won't swear to
it, I don't thing anything has changed in it since I put it up.
See http://kanze.james.neuf.fr/code-en.html. (The easiest
solution might be to simply download the lot, and browse around
in it. If all you're interested in is the harness, however, it
is in Test, and it only depends on Global---no other subsystems,
and no other component in test.)

[...]

FNV is not as good as some of the others. It cascades a bit.

I'm not sure. It seems to give very good results with my test
data.

[...]

Is your Mersenne prime hash based on the Mersenne twister RNG
or are just just big Mersenne primes as a large prime for
modulus operations with a very long period?

No. It's actually based on linear congruential random number
generation: x = (a * x[i-1] + b) % m. Where m is 2^n (not a
good idea of random number generation, but it doesn't seem to be
a problem here, and b is actually the next character, rather
than a constant. (FNV uses more or less the same principle,
except that it xor's in the next character.) The Mersenne prime
part is that a is a Mersenne prime. If you think about it, it's
important for a to be prime with respect to the size of your
table, so a prime number is indicated. And a Mersenne prime has
the advantage that it can be calculated with a simple shift and
subtraction if multiplication is expensive.

[...]

I think it is a good idea to test everything, and then later
on retest it all because assumptions are based on models that
can change over time.

Click to expand...

Several things are clear, and one is that the behavior will
depend on hardware. Which evolves, so yesterday's answer might
not hold today.

OK, for the results of my tests. Two comments to begin with,
though:

1. I did my tests using std::string as a key, which definitly
favors character by character access though, at least for
clarity. I used std::string::const_iterator throughout,
reading four bytes and assembling them into a uint32_t when
the algorithms called for it. In the case of Paul's and
Bob's algorithms, which are based on reading four bytes at
a time, I also implemented a hacked version (marked (opt),
for optimized), below, which called std::string::data() to
get the pointer, and did reinterpret_cast it to uint32_t (or
uint16_t, in the case of Paul's), to see what different that
made.

2. Which leads me to the second remark: both Paul's and Bob's
algorithms core dump for certain input on my usual platform
(Sun Sparc), because they don't ensure correct alignment;
incorrect alignment will also cause them to run considerably
slower on an Intel or AMD platform. In my case, the actual
implementations of std::string I use (g++, Sun CC, and once
I get the library back up and running with it, VC++) all
happen (by chance) to ensure that the pointer returned by
data() will be aligned, but I can easily imagine very viable
implementations where this is NOT the case. As such, I
would consider either implementation as unusable except in
certain cases, unless you do as I did above, and reassemble
the bytes.

Anyway, with excuses for the HTML (that's what my scripts
generate), and the fact that it probably won't look that good
(since the generated tables are meant to be incorporated into
pages with a CSS header), here are the complete results, for a
Linux based machine running on an AMD 64 X2 5200+ machine,
compiled with g++ 4.2.1, -O3:

<table border=3>
<tr>
<th> </th>
<th>75URLs</th>
<th>1275URLs</th>
<th>ProgSyms</th>
<th>FrWords</th>
<th>Constructed</th>
<th>Dense</th>
</tr>
<tr>
<td class="tcolHeader">sorted vector</td>
<td class="tFData">0.227</td>
<td class="tFData">0.430</td>
<td class="tFData">0.379</td>
<td class="tFData">0.976</td>
<td class="tFData">2.054</td>
<td class="tFData">0.351</td>
</tr>
<tr>
<td class="tcolHeader">std::map</td>
<td class="tFData">0.249</td>
<td class="tFData">0.467</td>
<td class="tFData">0.511</td>
<td class="tFData">1.319</td>
<td class="tFData">2.784</td>
<td class="tFData">0.465</td>
</tr>
<tr>
<td class="tcolHeader">FNVHash</td>
<td class="tFData">0.150</td>
<td class="tFData">0.163</td>
<td class="tFData">0.147</td>
<td class="tFData">0.296</td>
<td class="tFData">0.339</td>
<td class="tFData">0.106</td>
</tr>
<tr>
<td class="tcolHeader">FNVHash (opt.)</td>
<td class="tFData">0.149</td>
<td class="tFData">0.167</td>
<td class="tFData">0.147</td>
<td class="tFData">0.295</td>
<td class="tFData">0.338</td>
<td class="tFData">0.106</td>
</tr>
<tr>
<td class="tcolHeader">Java hash</td>
<td class="tFData">0.147</td>
<td class="tFData">0.168</td>
<td class="tFData">0.145</td>
<td class="tFData">0.296</td>
<td class="tFData">0.329</td>
<td class="tFData">0.131</td>
</tr>
<tr>
<td class="tcolHeader">Mersenne 2^7-1</td>
<td class="tFData">0.147</td>
<td class="tFData">0.167</td>
<td class="tFData">0.150</td>
<td class="tFData">0.297</td>
<td class="tFData">0.375</td>
<td class="tFData">0.106</td>
</tr>
<tr>
<td class="tcolHeader">Mersenne 2^7-1 (opt.)</td>
<td class="tFData">0.147</td>
<td class="tFData">0.166</td>
<td class="tFData">0.147</td>
<td class="tFData">0.296</td>
<td class="tFData">0.375</td>
<td class="tFData">0.105</td>
</tr>
<tr>
<td class="tcolHeader">Mersenne 2^9-1</td>
<td class="tFData">0.147</td>
<td class="tFData">0.167</td>
<td class="tFData">0.142</td>
<td class="tFData">0.296</td>
<td class="tFData">0.471</td>
<td class="tFData">0.107</td>
</tr>
<tr>
<td class="tcolHeader">Mersenne 2^11-1</td>
<td class="tFData">0.147</td>
<td class="tFData">0.167</td>
<td class="tFData">0.143</td>
<td class="tFData">0.297</td>
<td class="tFData">3.613</td>
<td class="tFData">0.106</td>
</tr>
<tr>
<td class="tcolHeader">Sedgwick</td>
<td class="tFData">0.181</td>
<td class="tFData">0.182</td>
<td class="tFData">0.152</td>
<td class="tFData">0.299</td>
<td class="tFData">0.346</td>
<td class="tFData">0.107</td>
</tr>
<tr>
<td class="tcolHeader">Sobel</td>
<td class="tFData">0.163</td>
<td class="tFData">0.182</td>
<td class="tFData">0.151</td>
<td class="tFData">0.300</td>
<td class="tFData">0.360</td>
<td class="tFData">0.128</td>
</tr>
<tr>
<td class="tcolHeader">Weinberger</td>
<td class="tFData">0.246</td>
<td class="tFData">0.262</td>
<td class="tFData">0.165</td>
<td class="tFData">0.313</td>
<td class="tFData">0.372</td>
<td class="tFData">0.184</td>
</tr>
<tr>
<td class="tcolHeader">K&R</td>
<td class="tFData">0.178</td>
<td class="tFData">0.195</td>
<td class="tFData">0.151</td>
<td class="tFData">0.300</td>
<td class="tFData">0.329</td>
<td class="tFData">0.105</td>
</tr>
<tr>
<td class="tcolHeader">Open SDBM</td>
<td class="tFData">0.165</td>
<td class="tFData">0.182</td>
<td class="tFData">0.152</td>
<td class="tFData">0.301</td>
<td class="tFData">0.368</td>
<td class="tFData">0.107</td>
</tr>
<tr>
<td class="tcolHeader">Bernstein</td>
<td class="tFData">0.147</td>
<td class="tFData">0.166</td>
<td class="tFData">0.147</td>
<td class="tFData">0.296</td>
<td class="tFData">0.366</td>
<td class="tFData">0.129</td>
</tr>
<tr>
<td class="tcolHeader">Knuth</td>
<td class="tFData">0.134</td>
<td class="tFData">0.153</td>
<td class="tFData">0.148</td>
<td class="tFData">0.296</td>
<td class="tFData">0.361</td>
<td class="tFData">0.131</td>
</tr>
<tr>
<td class="tcolHeader">Partow</td>
<td class="tFData">0.181</td>
<td class="tFData">0.197</td>
<td class="tFData">0.155</td>
<td class="tFData">0.303</td>
<td class="tFData">0.360</td>
<td class="tFData">0.110</td>
</tr>
<tr>
<td class="tcolHeader">Pearson</td>
<td class="tFData">0.434</td>
<td class="tFData">0.456</td>
<td class="tFData">0.224</td>
<td class="tFData">0.384</td>
<td class="tFData">0.413</td>
<td class="tFData">0.153</td>
</tr>
<tr>
<td class="tcolHeader">Jenkins</td>
<td class="tFData">0.149</td>
<td class="tFData">0.168</td>
<td class="tFData">0.157</td>
<td class="tFData">0.309</td>
<td class="tFData">0.362</td>
<td class="tFData">0.147</td>
</tr>
<tr>
<td class="tcolHeader">Jenkins (opt.)</td>
<td class="tFData">0.135</td>
<td class="tFData">0.153</td>
<td class="tFData">0.156</td>
<td class="tFData">0.307</td>
<td class="tFData">0.359</td>
<td class="tFData">0.147</td>
</tr>
<tr>
<td class="tcolHeader">Hsieh</td>
<td class="tFData">0.129</td>
<td class="tFData">0.150</td>
<td class="tFData">0.156</td>
<td class="tFData">0.299</td>
<td class="tFData">0.870</td>
<td class="tFData">0.140</td>
</tr>
<tr>
<td class="tcolHeader">Hsieh (opt).</td>
<td class="tFData">0.121</td>
<td class="tFData">0.143</td>
<td class="tFData">0.157</td>
<td class="tFData">0.296</td>
<td class="tFData">0.868</td>
<td class="tFData">0.138</td>
</tr>
<tr>
<td class="tcolHeader">CRC</td>
<td class="tFData">0.249</td>
<td class="tFData">0.270</td>
<td class="tFData">0.181</td>
<td class="tFData">0.325</td>
<td class="tFData">0.381</td>
<td class="tFData">0.143</td>
</tr>
<tr>
<td class="tcolHeader">CRC (opt.)</td>
<td class="tFData">0.250</td>
<td class="tFData">0.270</td>
<td class="tFData">0.178</td>
<td class="tFData">0.325</td>
<td class="tFData">0.383</td>
<td class="tFData">0.142</td>
</tr>
</table>

The times are in microseconds per access.

There's a lot more that could be said, but at the least, it's
important to describe the data sets:

75URLs: 73 URL's, taken from the history file of my browser at
one particular occasion.

1275URLs: 1259 URL's, from all of the files in my browser's
cache, at the same occasion.

(The numbers in the names of
these two represent the original number of entries, before I
realized that there were duplicates, and eliminated them.)

ProgSyms: The set of all symbols and numeric literals in my code
base (the one posted at my site, I think), 8554 entries.

FrWords: A set of French words, extracted from the sources for
the French ispell tables, 47451. (Note that this is the
only table with anything other than standard ASCII; it's in
ISO 8859-1, with accented characters.)

Constructed: An artificially constructed set of all of the words
from x000000 to x999999 (for exactly one million entries---I
wanted something big, with a lot of close matches).

Dense: Another artificially constructed set, will all two
character strings consisting of printable ASCII characters,
so 95*95 = 8836 entries. (This one was constructed
intentionally to display a theoretical weakness of the Java
hash code, which is in fact my Mersenne prime algorithm
using 2^5-1.)

About the only comment I'll make on the results is that with the
exception of the two sets of URL's, most of my strings are
fairly short; I'll write up an AWK script to calculate the
average, median, max and min lengths (although for the last two,
they're trivial), but off hand, ProgSym is the only other one
which contains a significant number of entries with more than,
say 10 characters. (Some of the URL's are also pretty short,
however.) This alone could account for the differences in my
measurements and Paul's; obviously, for very short strings,
there can be nothing gained by reading four characters at a
time. At any rate, it's precisely the two sets with the URL's
in which Paul's and Bob Jenken's algorithms come out best.

Which brings me back to what is probably the most important
point: which hash code is best will depend on your data sets,
the underlying hardware and the compiler, and (one thing we've
not mentionned) how the hash table works. (The above
measurements were done using my own AssocArrayOf class, also
present at my site. A class using a different growth strategy
or collision handling may perform differently.) IMHO, the
Mersenne 2^7-1 algorithm has the advantage of being among the
best pretty much everywhere, and is dead simple to implement, so
I'd stick with it in general. But if your profiler says that
hash table accesses are an issue, it's definitely worth trying
some of the others.

James Kanze · Nov 11, 2008

... snip ...

One of the original purposes of hashlib was to make testing hash
functions easy. It is much more than just that now, but still
fills the purpose. hashlib includes a couple of very simple, and
fast, hashes for text strings as a convenience for users. All
available on my site.

If they're already widely known hashing algorithms, I've
probably already gotten them. But it might be interesting to
compare our results---I'll just make a guess

-)) that your
library is written in C, so it will definitely have a different
implementation from mine.

(Just curious about one thing: how do you benchmark. I use
virtual functions to impose a firewall on optimization; do you
use pointers to functions, or some other technique?)

robertwessel2 · Nov 11, 2008

Really? If I post a quicksort implementation here, could you
give me an algorithm which would generate the worst case?
(I guess there should a smiley here, because I don't think that
that's really what you meant. But just a half of one, because
I'd really like to find such. There are times where you do want
to test worst case.)

David Musser's original paper on Introsort ("Introspective Sorting and
Selection Algorithms") provides a surprisingly simple algorithm that
produces worst case (or at least quadratic) behavior in median-of-
three Quicksort. The paper is worth reading just for that.

http://www.cs.rpi.edu/~musser/gp/introsort.ps

Phil Carmody · Nov 12, 2008

CBFalconer said:
It was no joke. This being c.l.c, we tend to use C notation.

It was a joke. You were called on it. You didn't man up to it.
Bad show.

(Or you've flipped, that's the alternative.)

Yes, I do dispute it.

In that case, you're misinformed.

The speed of mergesort is O(N log N).

Correct! Hoorah, a nugget of correctness in a swamp of stuff
that could be otherwise described.

That of course means it's also O(N^2), O(N**2), O(pow(N,2)), O(N*N),

and of course O(exp(2*log(N))) said:
You
can test it yourself. Try the wdfreq.c exercise in the hashlib
distribution.

That library's becoming a broken record. (Should I have used the
past tense rather than the present progressive, I dunno?)

O(N log N) is much faster than O(N * N).

Nope, O(N log N) contains O(N*N). If you wish to dabble in fields
mathematics, then at least be mathematical.

If you can do one thing before you next post, please may it be the
difference between O() and Theta().

Phil

CBFalconer · Nov 12, 2008

James said:
.... snip ...

If they're already widely known hashing algorithms, I've
probably already gotten them. But it might be interesting to
compare our results---I'll just make a guess -)) that your
library is written in C, so it will definitely have a different
implementation from mine.

(Just curious about one thing: how do you benchmark. I use
virtual functions to impose a firewall on optimization; do you
use pointers to functions, or some other technique?)

I suspect you do have them. However, the installation is dead
simple. Just open the hashtable with pointers to the hash
functions (and a few other things). The documentation explains
all.

Ben Bacarisse · Nov 12, 2008

CBFalconer said:
It was no joke. This being c.l.c, we tend to use C notation.
OK.

Yes, I do dispute it. The speed of mergesort is O(N log N).

Are any of these identifiers macros? If not, then this is not C
notation either. Maybe you meant O(N * log(N))?

You
can test it yourself. Try the wdfreq.c exercise in the hashlib
distribution. That does two mergesorts of the output list, just to
demonstrate the advantage of a stable sort. Time it on various
input files.

Tests tell you what happens for that particular data set on some
particular system. You can only guess at the asymptotic bounds of an
algorithm from such tests. However, based on analysis of merge sort I
am as sure as you are that its time complexity is O(N * log(N)). That
means it is also O(N * N) as I said. If you still dispute it, maybe
you could say why rather the just stating that I am wrong (I often am,
but I am pretty sure about this).

O(N log N) is much faster than O(N * N).

This is so informal as to be virtually meaningless. By the way, is
this all not off-topic? I am more than happy to drop it (or move it
to comp.programming or comp.theory).

Keith H Duggar · Nov 12, 2008

Juha said:
However, apparently the "in average" in this particular context means
something slightly different. More precisely, the function f(n) is
defined as:

f(n) = the average amount of steps the algorithm performs for an input
of size n (with all possible different inputs of that size)

In this case the f(n) function can indeed have a smaller upper bound
than the worst case for the algorithm in question.

I think the "in average" is a bit confusing.

That is one purpose of education: to formalize and normalize
vocabulary
and understanding. It exposes students to common conventions that
enable
them to communicate on a common ground. Unfortunately many internet
folk
mistakenly believe wikipeducation is a substitute for education and
that
vociferous noise is a substitute for listening and critical thinking.

They defend their misinformed ignorant view ad nauseam. They claim
vast
experience when in truth they have none. In the end if some modicum of
intelligence finally motivates them to capitulate, their pride does
not
allow them to admit "I was wrong. I learned something. Thank you!".
No.
Instead they attribute their "arguments" to a language barrier or to
"confusing" conventions, etc.

KHD

Old Wolf · Nov 12, 2008

But, as others have pointed out, it's clear that the reason for the
difference in performance is the searching mechanism.

Another reason for slowdown may be that you
malloc for every single word read. If the C++
program is using the small string optimization
then it won't need to malloc anything except
the new set node, for short words.

I suspect you could improve performance
(and memory consumption!) a lot with improved
allocation strategies too. Two mallocs per word
is a lot of work.

Kai-Uwe Bux · Nov 13, 2008

Jerry said:
[ ... ]

Leaving out the scanning forward and backward part (swapping offending
pairs), you are missing the most important point about quicksort, namely
the one that explains the speed. In terms of number of comparisons,
quicksort is not doing better than, say, heapsort. However, when it comes
to moving elements around in memory, quicksort has a huge advantage.
Experimentally, the number of moves is one place quicksort beats the
other algorithms whereas the number of comparisons is not. Declaring the
very feature of quicksort that makes this happen to be a minor issue
seems weird.

Of course, that only applies to sorting sequences. For linked structures,
the number of pointer operations would be relevant.

Click to expand...

I'll tell you what: if you're really convinced that scanning forward and
backward, and swapping elements in the list will reduce the number of
pointer operations, why don't you implement both and find out?

[snip]

I think, I did not write that. I am sorry if the second paragraph is taken
to say that statements, which pertain to quicksort, transfer to sort
algorithms for linked lists. This was not intended.

Best

Kai-Uwe Bux

Kai-Uwe Bux · Nov 13, 2008

Juha said:
Why? Although it's a surprisingly common misconception that quicksort
requires random access, it doesn't. Quicksort is implemented in terms of
purely linear traversals of the array. Thus it's perfectly possible to
implement the *exact* same algorithm for a doubly-linked list (which can
be traversed forwards and backwards).

Right. I missed the case of a doubly linked list. I was thinking singly
linked lists where there is an obvious partition-recurse scheme that would
not be quicksort. (Although it would be quick on average and quadratic in
the worst case).

Best

Kai-Uwe Bux

Kai-Uwe Bux · Nov 13, 2008

James said:
Jerry said:

[ ... ]
The result is that for most cases quicksort will be the
fastest sort. Right behind it are other O(nLOGn) methods.
I prefer mergesort and data input in lists, because then I
don't have to worry about the size to be sorted. In
addition, mergesort is stable (quicksort is not).
The most common implementation of Quicksort for arrays is
unstable -- but a Quicksort on an array _can_ be written to
be stable if you want to badly enough, and for a linked
list, it's quite easy to make it stable.

Click to expand...

Click to expand...

My linguistic intuitions differ: Quicksort _is_ the algorithm
published as Algorithm 64 (and 63) by C.A.R. Hoare in the
Communications of the ACM. That algorithm sorts a sequence in
place and is not stable. It makes 2n*log(n) comparisons on
average and (1/3)n*log(n) exchanges on average. Anything that
is stable or deals with linked lists is an implementation of a
_different_ algorithm (and probably has different average
complexities).

Click to expand...

The algorithm presented by Hoare in Algorithm 64 is very
general; it only refers to algorithm 63, and presumably,
anything that achieves a partition with similar properties would
be acceptable.

I am not saying that there are no refinements or other
algorithms based on similar ideas. Some of these algorithms
are stable and some apply to linked lists. Also, I am not a
native speaker of English. Nonetheless, I feel that algorithms
have a certain identity (which makes it meaningfull to say
that a certain sorting algorithm is stable); and that
implementations of an algorithm must preserve that identity
and the associated characteristics to count as implementations
_of_ that algorithm.

Click to expand...

So you're saying that the implementation proposed by Jon Bentley
(section 10.2 of _Programming_ _Pearls_, which is just a reprint
of one of his columns in the CACM) isn't a quick sort, although
he claims it to be one. (I think his algorithm is actually
stable, although I haven't verified; it may have problems if
there are objects equal to the pivot. But I think it could
easily be fixed.)

Click to expand...

I don't have access to this one, so I cannot tell. I will hit a library
shortly and hunt it down -- it sounds interesting: I have a hard time
seeing a stable and fast version of the partition-recurse idea.

My understanding is that the term quick sort is applied to all
algorithms that partition, then sort each partition recursively.

Click to expand...

I only gave my linguistic intuitions, and they differ. The way I see the
literature is way more specific about quicksort and analyses of that
algorithm have been put forth, which do not apply to other algorithms that
also exploit the partition-recurse idea.

Of course, sometimes algorithms are so closely related that you would call
one a variation of the other. In particular, sometimes the analysis for one
algorithm can be used to start the analysis of another. Nonetheless, I feel
that saying some sorting procedure is stable rules out the existence of
unstable implementations and vice versa.

Of course, it could be that most people have a wider notion of quicksort
and, as you pointed out, might consider any partition-recurse algorithm a
variant of quicksort.

(Also, of course, it's trivial to implement quick sort for a
list; just implement [] in terms of std::advance. Of course,
the adjective "quick" might not be so appropriate then, but the
algorithm stands.)

Click to expand...

Nonetheless, the underlying data structure does matter. For instance, the
median-of-three refinement is hard to do on lists. Also, choosing the pivot
randomly is hard on lists, essentially making the average runtime
quadratic. So, at least performance characteristics of a sorting routine
can vary with underlying data structure.

Best

Kai-Uwe Bux

Click to expand...

Infinite loop problem	1	Nov 4, 2023
C language. work with text	3	Dec 10, 2021
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
Fibonacci	0	May 13, 2023
String and list error while running a Markov Chain	1	Aug 26, 2020
Single put routine overlapping words during iteration	4	Jan 2, 2023
Help in hangman game	1	Jul 24, 2023
Queue in C	25	May 19, 2014

Benchmarks

James Kanze

Harald van DÄ³k

Antoninus Twink

CBFalconer

CBFalconer

CBFalconer

James Kanze

James Kanze

robertwessel2

Phil Carmody

CBFalconer

Ben Bacarisse

Keith H Duggar

Old Wolf

Kai-Uwe Bux

Kai-Uwe Bux

Kai-Uwe Bux

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads