NaN, Null, and Sorting

Ethan Furman · Jan 13, 2012

With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
--> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object with the semantics that if the returned
object is Null, it's actual value is unknown.

From a purist point of view if it is unknown then comparison results
are also unknown since the actual value might be greater, lesser, or the
same as the value being compared against.

From a practical point of view a list with Nulls scattered throughout
is a pain in the backside.

So I am strongly leaning towards implementing the comparisons such that
Null objects are less than other objects so they will always sort together.

Thoughts/advice/criticisms/etc?

~Ethan~

Steven D'Aprano · Jan 14, 2012

With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN] --> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object with the semantics that if the returned
object is Null, it's actual value is unknown.

From a purist point of view if it is unknown then comparison results
are also unknown since the actual value might be greater, lesser, or the
same as the value being compared against.

From a purist point of view, NANs are unordered with respect to numbers,
and so one of two behaviours should occur:

(1) nan OP x should raise an exception, for all comparison operators
except == and !=

(2) nan OP x should return False for all OPs except !=

I believe the current version of the standard supports operators for both
sets of behaviour; the 1990s version of Apple's numeric framework (SANE)
included both.

I think Python chooses the second behaviour, although it may be version
and platform dependent. This is from Python 2.6:
False

I would expect the same behaviour for your Null objects. But as you say:

From a practical point of view a list with Nulls scattered throughout
is a pain in the backside.

And this is why sorting should be defined in terms of a separate sorting
operator, not < or >, so that lists containing unordered values like NANs,
Nulls, and complex numbers, can be sorted. "Sorted" is a property of the
list, not the values within the list.

So I am strongly leaning towards implementing the comparisons such that
Null objects are less than other objects so they will always sort
together.

Thoughts/advice/criticisms/etc?

Possibly the least-worst solution.

jmfauth · Jan 14, 2012

With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
--> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object with the semantics that if the returned
object is Null, it's actual value is unknown.

Short answer.

- NaN != NA()

- I find the actual implementation (Py3.2) quite satisfying. (M.
Dickinson's work)

jmf

Eelco · Jan 16, 2012

With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
--> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object with the semantics that if the returned
object is Null, it's actual value is unknown.

From a purist point of view if it is unknown then comparison results
are also unknown since the actual value might be greater, lesser, or the
same as the value being compared against.

From a practical point of view a list with Nulls scattered throughout
is a pain in the backside.

So I am strongly leaning towards implementing the comparisons such that
Null objects are less than other objects so they will always sort together.

Thoughts/advice/criticisms/etc?

~Ethan~

My suggestion would be thus: nans/nulls are unordered; sorting them is
fundamentally an ill defined notion. What you want, conceptually, is a
sorted list of the sortable entries, and a seperate list of the
unsorted entries. Translated into code, the most pure solution would
be to filter out the nanas/nulls in their own list first, and then
sort the rest. If the interface demands it, you can concatenate the
lists afterwards, but probably it is most convenient to keep them in
seperate lists.

Perhaps arbitrarily defining the ordering of nulls/nans is slightly
more efficient than the above, but it should not make a big
difference, and in terms of purity its no contest.

Chris Angelico · Jan 16, 2012

What you want, conceptually, is a
sorted list of the sortable entries, and a seperate list of the
unsorted entries. Translated into code, the most pure solution would
be to filter out the nanas/nulls in their own list first, and then
sort the rest. If the interface demands it, you can concatenate the
lists afterwards, but probably it is most convenient to keep them in
seperate lists.

So... you split it into two lists, sort the two lists (one of which
can't be sorted), and then concatenate them. Sounds like the quicksort
algorithm.

ChrisA

Robert Kern · Jan 16, 2012

So... you split it into two lists, sort the two lists (one of which
can't be sorted), and then concatenate them. Sounds like the quicksort
algorithm.

Not at all. The "split it into two lists" steps are entirely different in what
Eelco suggested and quicksort. It's misleading to attempt to describe both using
the same words.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

"sorting the news"	84	Feb 5, 2008
Sorting Dates and Times in an array	19	Mar 19, 2007
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Erratic Container Behavior (list & vector)	2	Apr 13, 2014
Sencha Touch--Support 2 browsers in just 228K!	64	Jul 16, 2010
Engineering a List container Part 2: Implementations	20	Dec 8, 2013
ANN: Sequel 3.13.0 Released	0	Jul 1, 2010
Engineering a list container. Part 1.	71	Dec 7, 2013

NaN, Null, and Sorting

Ethan Furman

Steven D'Aprano

jmfauth

Eelco

Chris Angelico

Robert Kern

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads