NaN, Null, and Sorting

E

Ethan Furman

With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
--> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object with the semantics that if the returned
object is Null, it's actual value is unknown.

From a purist point of view if it is unknown then comparison results
are also unknown since the actual value might be greater, lesser, or the
same as the value being compared against.

From a practical point of view a list with Nulls scattered throughout
is a pain in the backside.

So I am strongly leaning towards implementing the comparisons such that
Null objects are less than other objects so they will always sort together.

Thoughts/advice/criticisms/etc?

~Ethan~
 
S

Steven D'Aprano

With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN] --> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object with the semantics that if the returned
object is Null, it's actual value is unknown.

From a purist point of view if it is unknown then comparison results
are also unknown since the actual value might be greater, lesser, or the
same as the value being compared against.

From a purist point of view, NANs are unordered with respect to numbers,
and so one of two behaviours should occur:

(1) nan OP x should raise an exception, for all comparison operators
except == and !=

(2) nan OP x should return False for all OPs except !=

I believe the current version of the standard supports operators for both
sets of behaviour; the 1990s version of Apple's numeric framework (SANE)
included both.

I think Python chooses the second behaviour, although it may be version
and platform dependent. This is from Python 2.6:
False


I would expect the same behaviour for your Null objects. But as you say:

From a practical point of view a list with Nulls scattered throughout
is a pain in the backside.

And this is why sorting should be defined in terms of a separate sorting
operator, not < or >, so that lists containing unordered values like NANs,
Nulls, and complex numbers, can be sorted. "Sorted" is a property of the
list, not the values within the list.

So I am strongly leaning towards implementing the comparisons such that
Null objects are less than other objects so they will always sort
together.

Thoughts/advice/criticisms/etc?

Possibly the least-worst solution.
 
J

jmfauth

With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
--> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object with the semantics that if the returned
object is Null, it's actual value is unknown.

Short answer.

- NaN != NA()

- I find the actual implementation (Py3.2) quite satisfying. (M.
Dickinson's work)

jmf
 
E

Eelco

With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
--> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object with the semantics that if the returned
object is Null, it's actual value is unknown.

 From a purist point of view if it is unknown then comparison results
are also unknown since the actual value might be greater, lesser, or the
same as the value being compared against.

 From a practical point of view a list with Nulls scattered throughout
is a pain in the backside.

So I am strongly leaning towards implementing the comparisons such that
Null objects are less than other objects so they will always sort together.

Thoughts/advice/criticisms/etc?

~Ethan~

My suggestion would be thus: nans/nulls are unordered; sorting them is
fundamentally an ill defined notion. What you want, conceptually, is a
sorted list of the sortable entries, and a seperate list of the
unsorted entries. Translated into code, the most pure solution would
be to filter out the nanas/nulls in their own list first, and then
sort the rest. If the interface demands it, you can concatenate the
lists afterwards, but probably it is most convenient to keep them in
seperate lists.

Perhaps arbitrarily defining the ordering of nulls/nans is slightly
more efficient than the above, but it should not make a big
difference, and in terms of purity its no contest.
 
C

Chris Angelico

What you want, conceptually, is a
sorted list of the sortable entries, and a seperate list of the
unsorted entries. Translated into code, the most pure solution would
be to filter out the nanas/nulls in their own list first, and then
sort the rest. If the interface demands it, you can concatenate the
lists afterwards, but probably it is most convenient to keep them in
seperate lists.

So... you split it into two lists, sort the two lists (one of which
can't be sorted), and then concatenate them. Sounds like the quicksort
algorithm.

ChrisA
 
R

Robert Kern

So... you split it into two lists, sort the two lists (one of which
can't be sorted), and then concatenate them. Sounds like the quicksort
algorithm.

Not at all. The "split it into two lists" steps are entirely different in what
Eelco suggested and quicksort. It's misleading to attempt to describe both using
the same words.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,265
Latest member
TodLarocca

Latest Threads

Top