Iteration vs. Recursion

P

pete

I've read about that, but never seen real C code for it.
One approach I've seen is to take the median of the first, middle, and
last elements, and use that as the pivot.

That goes quadratic with the "ramp" pattern,
which has ascending element values until the middle
and then descending until the end.

void ramp(long *a, size_t n)
{
const size_t N = n;
const size_t mid = N / 2;

while (n != mid) {
*a++ = N - n--;
}
while (n != 0) {
*a++ = n--;
}
}
 
P

pete

Keith Thompson wrote:
The qsort() function does impose a significant performance overhead
compared to a hand-written sorting routine. It has to perform an
indirect function call for each comparison. If you re-implement the
exact same algorithm used by your implementation, but using inline
comparisons, you'll probably get a constant factor speedup.

A constant factor speedup of the comparisons
won't give you a constant speedup factor of the algorithm
unless the algorithm spends 100% of it's time doing comparisons.

If you hand code for sorting arrays of a specific element type,
then you can use an assignment operation to move the data,
(unless the element types are arrays,)
which is probably going to be faster than what qsort does,
when the type is larger than int.
I recall seeing on this newsgroup, a URL to a paper which
described a highly, if not totally,
portable way for qsort to tell if the element types
are aligned for type int.
 
D

David Wagner

pete said:
A constant factor speedup of the comparisons
won't give you a constant speedup factor of the algorithm
unless the algorithm spends 100% of it's time doing comparisons.

Huh? If the sorting algorithm spends a constant fraction of its time
doing comparisons (which it does), then speeding up the time to do
a comparison by a constant factor will speed up the algorithm by a
constant factor.

Example: If 50% of the time is spent on comparisons, then speeding up
comparisons by a factor of 10x will speed up the overall computation
time by a factor of 1.82x. To put it another way, if the original
implementation spends 0.5 N cycles on comparisons and 0.5 N cycles on
other operations, and you find some way to reduce the total cost of the
comparisons to 0.05 N cycles, then you'll have reduced the total cost
to 0.55 N cycles. Reducing the runtime from N to 0.55 N is a speedup
by a constant factor.
 
P

pete

David said:
Huh? If the sorting algorithm spends a constant fraction of its time
doing comparisons (which it does),

It doesn't.
The tendency is for the ratio of comparisons to data movements
to increase as the number of elements increases.
 
P

pete

pete said:
It doesn't.
The tendency is for the ratio of comparisons to data movements
to increase as the number of elements increases.

I just made some measurments
on a quicksort with Sedgewick style loops.

The average comparisons/swap ratio with 1999 elements was 5.23
The average comparisons/swap ratio with 19999 elements was 5.53

.... which is small enough of an increase
for me to say that you guys are right,
and I was wrong.
 
?

=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=

Ben Pfaff said:

Yes, but the deterministic O(n) algorithms for median finding are so
slow that it isn't worth the effort. It is faster to first randomize
the array and then use the first element of each subarray as the
pivot. While the worst case is still quadratic, the probability of
that is very low and no fixed pattern (ascending, descending, ramp,
etc.) can defeat it.

When I want to find the median of a long sequence, I usually use a
modified quicksort algorithm:

median xs
= let (len,xs1) = randomize xs -- shuffle and find length
in
nth (len `div` 2) xs1

nth 1 (x:xs) = x
nth n (x:xs)
= let (m,ys,zs) = splitAt x xs 0 [] [x]
in
if n<=m then nth n ys
else nth (n-m) n zs

splitAt x [] m ys zs = (m,ys,zs)
splitAt x (v:vs) m ys zs
= if v<x then splitAt x vs (m+1) (v:ys) zs
else splitAt x vs m ys (v:zs)



Torben
 
G

Googmeister

pete said:
I just made some measurments
on a quicksort with Sedgewick style loops.

The average comparisons/swap ratio with 1999 elements was 5.23
The average comparisons/swap ratio with 19999 elements was 5.53

These seems consistent with a more rigorous analysis of
(randomized) quicksort.
Avg # comparisons ~ 2n ln n
Avg # swaps ~ 1/3 n ln n
The ratio converges to 6 as n gets large.
 
R

rpost

Keith said:
[...] But I wouldn't criticize computer scientists for not being
programmers, as long as they don't claim to be.

It reminds me of an Edsger Dijkstra quotation: "Computer science is no
more about computers than astronomy is about telescopes."

On the other hand, he was no stranger to computers,
claiming to be his country's first professional programmer

http://www.cs.utexas.edu/users/EWD/transcriptions/EWD03xx/EWD340.html

and having developed a multitasking operating system in the 1960s.
 
R

Robert Maas, see http://tinyurl.com/uh3t

From: Jonathan Bartlett said:
You might be interested in an article I wrote for IBM
DeveloperWorks on recursive programming:
http://www-128.ibm.com/developerworks/linux/library/l-recurs.html

I dislike that article, because it starts right off with the
wrongheaded way of computing factorial(N) by building up N stack
frames, with no actual computation done all the way down, then
actually doing the computation on the way back up. At least if
you're going to compute factorial, implement it in a true
"recursive spirit" divide-and-conquor way. Don't mention factorial
at all at the beginning. Instead ask how you would compute the
product of all the integers from M to N. Answer: Divide the problem
in half, from M to the midpoint, and then from after the midpoint
to N. Then after you've demonstrated this really fine algorithm
which takes only log(n) stack depth to compute a product of n
integers (n = N-M), finally mention in passing that prod(1,N) =
factorial(N).

And use Common Lisp or Java instead of C, so you have big integers
built-in instead of being limited by word size where you hit
arithmetic overflow before you hit stack overflow thereby masking
the really important issue.

Or compute modular factorial where fixed precision numbers isn't a
problem.
Impress somebody by recursively computing factorial(one billion)
modulo some prime slightly larger than one billion.
Then show how you get stack overflow if you try it the wrongheaded
way described in the cited article.
 
R

Robert Maas, see http://tinyurl.com/uh3t

From: Keith Thompson said:
I would criticize any programmer who *bothers* to implement a
sub-quadratic sorting algorithm without consulting a textbook or
using a library routine.

There's a difference between proving you're competant at writing
algorithms, by programming a famous old task without looking in a
book, vs. proving you're a wise manager of software who knows when
to use libraries and when to write your own code. Both should be
taught to any prospective software programmer.
I seem to recall that there was a gap of a decade or so between
the first implementation of a binary search algorithm, and the
first *correct* implementation of a binary search algorithm.

Assuming that's really true, not an "urban legend", I don't
understand why they had so much trouble getting the algorithm
correct. I did my first binary search so long ago I don't remember
which language I used, but I remember the algorithm was rather
obvious:
You have an array, as a global or passed as an argument.
Elements in that array are in ascending sequence already.
You have two indexes, for the first and last elements in the
array where you want to search. I called them LO and HI.
If LO>HI then your range is empty, which means you've already
bottomed out without finding the target, so you immediately
return a not-found indication, for example -1.
Otherwise you compute MID = (LO+HI)/2 (doesn't matter which way
you round, you can even be inconsistent from moment to moment,
the algorithm works in any case).
Test target against element at MID.
- If target smaller, tail-recurse with LO, MID-1, or if you're
doing it iteratively then do assignment HI := MID-1.
- If target larger, tail-recurse with MID+1, HI, or if you're
doing it iteratively then do assignment LO := MID+1.
- Else (target exactly matches) you've found it, return MID.
The nice thing about my algorithm is it's symmetric, MID-1 or
MID+1, so it's easy at a glance to check you got it right.
 
R

Robert Maas, see http://tinyurl.com/uh3t

Nowadays, that's good advice. When I started, it wasn't so true.
I agree completely. In addition, if after doing it you find your
libraries quicksort implementation isn't fast enough the best
thing to do is see if you can find a better one that someone else
has implemented and use that.

Nowadays you have such choice, with a whole InterNet and Web for
efficiently locating such alternates to the library that was
bundled with your programming system. But long ago it would have
been prohibitive to search for alternatives that you didn't code
yourself.

When I started programming in 1964 (IBM 1620, with Fortran with
Format, later Fortran IId, and with SPS = assemler), and still in
1969 (IBM 1130 with Fortran IV), the bundled FORTRAN library was
cruddy, at least the one routine I needed, random number generator,
was horrible.

In 1969 I became suspicious, and decided to test it. I didn't trust
numerical tests such as chi-squared where you're supposed to
compute this meaningless statistic and then look in a book to see
if it's good enough. So I devised a graphic presentation of
randomness. If the data was sufficiently random, I'd see a
statistically uniform filling of a rectangle. If not, I'd see some
apparent pattern to the filling.

I tested the built-in FORTRAN random-number generator. What I got
out was was horribly non-random: Images like chess pieces (rook or
bishop), or like that two-horned electric monster on the Mexican
beach in the sci-fi movie (or equivalently that ugly black
skyscraper in Chicago).

Then I wrote my own crock of a random number generator. I didn't
know anything about linear congruential or CRC generators, so I
invented my own messy algorithm for essentially hashing the
contents of memory on the computer and reading out partial hash
results. When I tested it, I got statistically uniform-density
rectangles, no apparent pattern, like snow on a TV screen. Success!

The punchcards containing my software burned up in a warehouse fire
in 1972, but I think I still have the CalComp 565 plotter output in
storage, if anybody wants to see the comparison.
 
D

Dr A. N. Walker

Assuming that's really true, not an "urban legend",

It's certainly an urban legend. Scarcely any of the code
in those days was ever published, hundreds of people must have
written that algorithm during that decade, many of them were
quite bright .... Basically, there's *no way* of knowing when
the first "correct" implementation was written, only when the
first correct implementation was published.

In any case, correctness is somewhat in the eye of the
beholder. If in a particular application it is known that the
array is not empty, or that the sought-for element is present,
then you can write slightly simpler code that would fail some
of the stress testing that a more general code needs. People
in those days got jobs done, they weren't concerned with formal
correctness according to some abstract spec. [I don't claim
that as good, bad or indifferent, it was just the way things
were.]
I don't
understand why they had so much trouble getting the algorithm
correct.

You're talking about a period that was about a decade
before the first undergraduate CS courses, and long before
anyone but the odd fanatic worried about "correctness", as
opposed to "working". Even the notion of an algorithm was
little discussed. You learned programming by attending a
two-day course on the Autocode of a particular computer, and
then just doing it. There were no books about computing, no
published code, no standards; and professors got very uppity
if some spotty oik from the Computing Service tried to tell
them how to write programs.

Nowadays, that particular algorithm is one of the
common early tasks in an algorithms module. You tell the
class what you want their code to do, and give them a few
minutes to write it. Then you find out whether their code
works if the array is empty, if there is only one element,
if the target is larger/smaller than the last/first element,
and so on. It doesn't, so that gives you the chance to talk
about invariants, pre- and post-conditions, program structure
in general, blah-de-blah, *after* which:
[...] I remember the algorithm was rather
obvious: [...]

Quite so! Lots of science in general, and maths/CS in
particular, is rather obvious. The job of education is to make
it so. *Before* you've been taught how to make things obvious,
even the simplest concepts are obscure.
If LO>HI then your range is empty, which means you've already
bottomed out without finding the target, so you immediately
return a not-found indication, for example -1.

You've already hit a design problem! What will you do if
-1 is a permitted index? Do you want the complexity of returning
a structure consisting of a Boolean and an integer [if your chosen
language permits that]? Or do you want the caller to have to test
whether the returned index is in range?
Otherwise you compute MID = (LO+HI)/2 (doesn't matter which way
you round, you can even be inconsistent from moment to moment,
the algorithm works in any case).

[The algorithm works, of course, as long as LO <= MID <= HI,
it's just more efficient if MID is somewhere around the middle. But
if we're picking nits, then you're in trouble if LO+HI > Maxint, or
in various other similar potential overflow cases.]
Test target against element at MID.
- If target smaller, tail-recurse with LO, MID-1, or if you're
doing it iteratively then do assignment HI := MID-1. [...]

In the old days, this wasn't necessarily as easy as you
make it seem. For integer elements, OK. For reals, there was the
permitted error to worry about. Few languages had proper strings
before the mid-'60s, so if you had an application like trying to
see if a word was in a list, there was a lot of encoding and
decoding to do here. Many languages didn't permit recursion;
and even "iteration" was discouraged -- "while (lo >= hi) {...}"
is a relatively late invention -- so you got spaghetti code using
"goto" to emulate the iteration.
 
S

sillybanter

In comp.edu Keith Thompson said:
I would criticize any programmer who *bothers* to implement a
sub-quadratic sorting algorithm without consulting a textbook or using
a library routine.

In fairness to Torben, he was talking about the *ability* to do this,
not whether it was a good idea from a software development sense.

Sorting is a nice, clean, well-defined problem, and efficient
algorithms for it (either mergesort or quicksort) are very
straightforward uses of the basic concept of divide-and-conquer.

It just so happens that this *specific* problem is such a useful thing
that there are many good implementations that can be used, so a good
programmer should take advantage of that. However, if they *can't*
roll their own on such a clean problem, what are the odds that they
can come up with good solutions to other, not-so-clean, and
not-so-well-studied problems that they might encounter for which they
*can't* rely on pre-written libraries?
I don't remember where I read this, but I seem to recall that there
was a gap of a decade or so between the first implementation of a
binary search algorithm, and the first *correct* implementation of a
binary search algorithm.

I seem to remember reading that too. If I had to bet on a source, I'd
suggest Bentley's "Programming Pearls" - he has quite the obsession
with binary search... :) Unfortunately, I'm at home, and my copy is
at work, so I can't verify this now.
 
C

Chris Smith

Sorting is a nice, clean, well-defined problem, and efficient
algorithms for it (either mergesort or quicksort) are very
straightforward uses of the basic concept of divide-and-conquer.

I'd expect someone with a fair practical knowledge of programming to
come up with merge sort or even heap sort on their own; but quicksort is
completely non-obvious to me. In particular, it's a little non-obvious
both that partitioning can be done in linear time, and it's completely
non-obvious that choosing an appropriate pivot can be done well without
prior knowledge about the distribution of values and without negatively
impacting the asymptotic complexity of the routine.
I seem to remember reading that too. If I had to bet on a source, I'd
suggest Bentley's "Programming Pearls" - he has quite the obsession
with binary search... :)

I do. On page 34, Bentley quotes Knuth as saying that the first binary
search was published in 1946, but the first correct binary search was
published in 1962. If you're as pedantic as me, note the word
"published" rather than "devised" or "implemented".
 
J

Jon Harrop

Chris said:
I do. On page 34, Bentley quotes Knuth as saying that the first binary
search was published in 1946, but the first correct binary search was
published in 1962. If you're as pedantic as me, note the word
"published" rather than "devised" or "implemented".

Binary search has probably been used for thousands of years. I expect it was
used before calculus-based optimisation methods like Newton-Raphson. I'd be
amazed if it wasn't first published hundreds of years ago as well. It seems
like an odd thing to claim to be so modern, given how old much more
sophisticated concepts like calculus are...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,786
Messages
2,569,626
Members
45,328
Latest member
66Teonna9

Latest Threads

Top