How to improve this sort?

P

pete

Stephen said:
LS

Yes. That is what David Musser's Introsort does.
Primarily it has been used for implenting C++'s sort() but there is no
reason why it cannot be used for C or even qsort()
can be used for it. It
uses quicksort, but if it discovers a portion of the array which is
partitioning badly it switches to heapsort for that
portion of the array.
The net result is that this hybrid sort is always guranteed
O(N * log N) in
the worse case and gives quicksort like performance in
the average case.

See
http://en.wikipedia.org/wiki/Introsort
for details.

David Musser also talked about Introselect which is a hybrid version of
quickselect - the algorithm used to find the kth element in an unsorted
array and place it into the kth position. quickselect can also go "bad" and
so Introselect is used to guarantee O(N) behaviour.

In this file,
http://www.mindspring.com/~pfilandr/C/e_driver/e_sort.c

the following sorts, are all introspective quicksorts:
itsort
tisort
ti7qsort
qrsort
qisort
 
A

At_sea_with_C

At_sea_with_C said:
Hello all,
{snipped}

Big thanks to everyone in the thread. You guys have given me much food
for thought. I decided to go with Shell sort, for now.

I also looked up the book recommended to me. I did'nt know it was
written by the founders of C! I got a copy. It does look anachronic
compared with the usual 500+ pages computer books, with eye-catching
covers, bundled DVDs and all.

:)
 
L

Lane Straatman

pete said:
Stephen Howe wrote:

In this file,
http://www.mindspring.com/~pfilandr/C/e_driver/e_sort.c

the following sorts, are all introspective quicksorts:
itsort
tisort
ti7qsort
qrsort
qisort
Interesting. To permute an entire set randomly is quite dramatic. Do these
sorts admit of a subset whose order is sufficiently "nice" that one can
select it to survive? If so, then I would derange the other elements
randomly as opposed to permute. The kth element would very likely be either
nice or different. LS
 
M

mark_bluemel

I also looked up the book recommended to me. I did'nt know it was
written by the founders of C! I got a copy. It does look anachronic
compared with the usual 500+ pages computer books, with eye-catching
covers, bundled DVDs and all.

K & R is an example of how a good textbook doesn't need all that
jazz :)

I've generally found anything with Brian Kernighan's name on it to be
well-written, concise and worth reading.
 
C

Chris Dollin

David said:
Quicksort is considered the best general-purpose sorting algorithm. Unless
the data is ordered very antagonistically, it performs well. As others have
pointed out, it is supported directly by the library.

The Standard C library routine `qsort` need not be a Quicksort implementation.
One just has to hope that QoI issues make it a /decent/ sort implementation,
Tony-Quick or no.
 
P

pete

Lane said:
Interesting. To permute an entire set randomly is quite dramatic.
Do these sorts admit of a subset whose order is sufficiently
"nice" that one can select it to survive?
If so, then I would derange the other elements
randomly as opposed to permute.
The kth element would very likely be either nice or different.
LS

itsort, tisort and qisort are foward biased.
First the value of the middle element is swapped with e0.
Then they use an if-else tree to select the median from among
e0, e1, and e(n-1),
and to sort them into place with respect to one another
into this order {1,0,2},
then the first element becomes the pivot
and the other two elements become sorted into their partitions.

ti7qsort and qrsort both use a prng to select
a random element, eR, and to swap it with e0,
before using the if-else tree.
The initial value of eR will then either become the pivot, or
it will be stored in e1 or in e(n-1)
so that the value will be considered during the next partitioning.

If the depth of partitioning becomes twice what
an optimaly ordered array would require for sorting,
then that sets off the heapsort function.
 
R

Richard Tobin

Anyway, what's wrong with using static variables?
[/QUOTE]
Try re-entering that routine. System routines should be
re-entrant. C doesn't guarantee that.

In the case of qsort(), does the standard anywhere say that the
comparison function can't call qsort()?

-- Richard
 
C

CBFalconer

Richard said:
In the case of qsort(), does the standard anywhere say that the
comparison function can't call qsort()?

The standard says that library routines are not guaranteed to be
re-entrant. That suffices.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
B

Ben Pfaff

Try re-entering that routine. System routines should be
re-entrant. C doesn't guarantee that.

In the case of qsort(), does the standard anywhere say that the
comparison function can't call qsort()?[/QUOTE]

Yes. It's a QoI, not a conformance issue. See C99 7.1.4:

4 The functions in the standard library are not guaranteed to be
reentrant and may modify objects with static storage
duration.158)
 
R

Richard Tobin

In the case of qsort(), does the standard anywhere say that the
comparison function can't call qsort()?

Yes. It's a QoI, not a conformance issue. See C99 7.1.4:

4 The functions in the standard library are not guaranteed to be
reentrant and may modify objects with static storage
duration.158)[/QUOTE]

I don't have C99 to hand, but in the C89 standard that sentence occurs
in a section titled "Signals and Interrupts". Has it been explicitly
determined that that sentence is intended to apply to system library
functions simply called recursively, rather than from signal handlers?

-- Richard
 
B

Ben Pfaff

Yes. It's a QoI, not a conformance issue. See C99 7.1.4:

4 The functions in the standard library are not guaranteed to be
reentrant and may modify objects with static storage
duration.158)

I don't have C99 to hand, but in the C89 standard that sentence occurs
in a section titled "Signals and Interrupts". Has it been explicitly
determined that that sentence is intended to apply to system library
functions simply called recursively, rather than from signal handlers?[/QUOTE]

In C99 it is nested inside:

7. Library
7.1 Introduction
7.1.4 Use of library functions
 
R

Richard Tobin

In the case of qsort(), does the standard anywhere say that the
comparison function can't call qsort()?
[/QUOTE]
The standard says that library routines are not guaranteed to be
re-entrant. That suffices.

But was it intended to have that effect? Or was it jsut a mistake?

-- Richard
 
U

user923005

To sort N floats (if float is 4 bytes) takes exactly 5*N passes using
radix sort with radix 256 and 3*N passes with radix 65536.

You make one counting pass, that counts the buckets for each float,
and then one distribution pass that distributes the data.
It is simplified if you transform the data first.

Here are some (architecture specific) sample transformations you can
use in order to use radix on IEEE floating point:
/*
** The proper usage and copyright information for
** this software is covered in DSCRLic.TXT
** This code is Copyright 1999 by Dann Corbit
*/


/*
** This file is INTEL specific. It should be pretty easy to generate
one
** for any sort of architecture.
**
** The whole thing *can* be done generically by using frexp().
** The most important bit is the sign of the number.
** The next most important bit is the sign of the exponent.
** Then comes the exponent magnitude, followed by the mantissa.
** You will need to add the bias to the exponent, so that it becomes
unsigned.
**
** Finally, it would actually be a lot faster to transform the whole
** vector *first* then sort it, then transform it back. Some people
** might be a bit nervous about that, but there is no reason it should
** not work.
**
** Assume typedef's for uint32 and uint64, of the same size and endian-
ness
** as float and double parameters!
** These floating point mapping functions are due to Terje Mathisen.
** They are reprinted with his permission.
*/
#include <assert.h>
#include "inteltyp.h"

#ifdef ASSERT
#undef ASSERT
#endif

#ifdef _DEBUG
#define ASSERT(x) assert((x))
#else
#define ASSERT(x)
#endif


uint32
float2key(float f)
{
uint32 sign,
mant,
mask;

ASSERT(sizeof(float) == sizeof(uint32));
mant = *(uint32 *) & f; /* Load float as array of bits */
sign = mant & SB_MASK32; /* Isolate the leading sign bit */
mant ^= SB_MASK32; /* Invert the sign bit, making + > -
*/
mask = sign - (sign >> 31); /* Either 0 or 0x7fffffff */
mant ^= mask; /* Invert exp and mant if negative */
return mant;
}

uint64
double2key(double d)
{
uint64 sign,
mant,
mask;

ASSERT(sizeof(double) == sizeof(uint64));
mant = *(uint64 *) & d; /* Load float as array of bits */
sign = mant & SB_MASK64; /* Isolate the leading sign bit */
mant ^= SB_MASK64; /* Invert the sign bit, making + > -
*/
mask = sign - (sign >> 63); /* Either 0 or 0x7fffffffffffffff */
mant ^= mask; /* Invert exp and mant if negative */
return mant;
}


The general idea can be used against any floating point type.

You can write your own also, by simply packing the bits in order of
importance for any data type.
E.g., for float, the sign bit of the number is the most important
thing, then the exponent bits and then the mantissa bits.
 
U

user923005

To sort N floats (if float is 4 bytes) takes exactly 5*N passes using
radix sort with radix 256 and 3*N passes with radix 65536.

You make one counting pass, that counts the buckets for each float,
and then one distribution pass that distributes the data.
That should read:
and then one distribution pass for each radix bucket that distributes
the data.
It is simplified if you transform the data first.

[snip]
 
C

CBFalconer

Richard said:
But was it intended to have that effect? Or was it jsut a mistake?

It has to do with not invalidating existing code. And such
routines as strtok.

--
If you want to post a followup via groups.google.com, ensure
you quote enough for the article to make sense. Google is only
an interface to Usenet; it's not Usenet itself. Don't assume
your readers can, or ever will, see any previous articles.
More details at: <http://cfaj.freeshell.org/google/>
 
R

Richard Tobin

But was it intended to have that effect? Or was it jsut a mistake?

It has to do with not invalidating existing code. And such
routines as strtok.[/QUOTE]

Do you mean existing library implementations? Allowing the comparison
function in qsort() to call qsort() couldn't break any user code.

-- Richard
 
C

CBFalconer

Richard said:
Do you mean existing library implementations? Allowing the comparison
function in qsort() to call qsort() couldn't break any user code.

If qsort is using static memory, it would.

Don't snip attribution lines for material you quote. The
attribution lines are those that say "Foo wrote:" at the beginning.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
U

user923005

On Jan 31, 4:25 pm, (e-mail address removed) (Richard Tobin) wrote:
[snip]
Do you mean existing library implementations? Allowing the comparison
function in qsort() to call qsort() couldn't break any user code.

If qsort() calls compare() and compare() calls qsort(), then it will
break things if qsort() is not reentrant.
Not sure why you would want to call qsort() from compare, but that's
neither here nor there.

If (for instance) a lame library qsort() had static read/write
variables in it, all heck would break loose if it was called in a
manner as described above.
 
C

Christopher Layne

Ben said:
In C99 it is nested inside:

7. Library
7.1 Introduction
7.1.4 Use of library functions

I cannot possibly take that as being across the board. But then again, it
might just be another case where we've to refer to OS specific manpages. I
thought the rule is that they were reentrant unless otherwise stated as not
required to be reentrant and/or was specifically NOT reentrant (strtok, etc)?

A non-reentrant qsort() would quality as a POS in my book.

Like I said, I wonder if this was applied across the board because of the
outliers - rather than deciding to state it as required unless otherwise
explictly documented.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top