Efficient Rank Ordering of Nested Lists

pablo.mitchell · Aug 2, 2007

A naive approach to rank ordering (handling ties as well) of nested
lists may be accomplished via:

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(singleList)
sortedList.sort()
return map(sortedList.index, singleList)
return map(rankList, nestedList)

>>> unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2, 0, -1.1, 13 ] ]
>>> print rankLists(unranked)

Click to expand...

Click to expand...

[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]

This works nicely when the dimensions of the nested list are small.
It is slow when they are big. Can someone suggest a clever way to
speed it up?

beginner · Aug 2, 2007

A naive approach to rank ordering (handling ties as well) of nested
lists may be accomplished via:

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(singleList)
sortedList.sort()
return map(sortedList.index, singleList)
return map(rankList, nestedList)

unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2, 0, -1.1, 13 ] ]
print rankLists(unranked)

Click to expand...

Click to expand...

[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]

This works nicely when the dimensions of the nested list are small.
It is slow when they are big. Can someone suggest a clever way to
speed it up?

Indexing the sorted list with a dictionary will speed it up a little.

Neil Cerutti · Aug 3, 2007

A naive approach to rank ordering (handling ties as well) of nested
lists may be accomplished via:

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(singleList)
sortedList.sort()
return map(sortedList.index, singleList)
return map(rankList, nestedList)

unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2, 0, -1.1, 13 ] ]
print rankLists(unranked)

Click to expand...

Click to expand...

[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]

This works nicely when the dimensions of the nested list are
small. It is slow when they are big. Can someone suggest a
clever way to speed it up?

Use binary search instead of linear.

import bisect
import functools

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(sorted(singleList))
return map(functools.partial(bisect.bisect_left, sortedList), singleList)
return map(rankList, nestedList)

Alex Martelli · Aug 3, 2007

A naive approach to rank ordering (handling ties as well) of nested
lists may be accomplished via:

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(singleList)
sortedList.sort()
return map(sortedList.index, singleList)
return map(rankList, nestedList)

unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2, 0, -1.1, 13 ] ]
print rankLists(unranked)

Click to expand...

Click to expand...

[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]

This works nicely when the dimensions of the nested list are small.
It is slow when they are big. Can someone suggest a clever way to
speed it up?

Each use of sortedList.index is O(N) [where N is len(singleList)], and
you have N such uses in the map in the inner function, so this approach
is O(N squared). Neil's suggestion to use bisect replaces the O(N)
..index with an O(log N) search, so the overall performance is O(N log N)
[[and you can't do better than that, big-O wise, because the sort step
is also O(N log N)]].

"beginner"'s advice to use a dictionary is also good and may turn out to
be faster, just because dicts are SO fast in Python -- but you need to
try and measure both alternatives. One way to use a dict (warning,
untested code):

def rankList(singleList):
d = {}
for i, item in reversed(enumerate(sorted(singleList))):
d[item] = i
return [d[item] for item in singleList]

If you find the for-clause too rich in functionality, you can of course
split it up a bit; but note that you do need the 'reversed' to deal with
the corner case of duplicate items (without it, you'd end up with 1s
instead of 0s for the third one of the sublists in your example). If
speed is of the essence you might try to measure what happens if you
replace the returned expression with map(d.__getitem__, singleList), but
I suspect the list comprehension is faster as well as clearer.

Another potential small speedup is to replace the first 3 statements
with just one:

d = dict((item,i) for i,item in reversed(enumerate(sorted(singleList))))

but THIS density of functionality is a bit above my personal threshold
of comfort ("sparse is better than dense"

.

Alex

pyscottishguy · Aug 3, 2007

A naive approach to rank ordering (handling ties as well) of nested
lists may be accomplished via:

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(singleList)
sortedList.sort()
return map(sortedList.index, singleList)
return map(rankList, nestedList)

unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2, 0, -1.1, 13 ] ]
print rankLists(unranked)

Click to expand...

Click to expand...

[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]

This works nicely when the dimensions of the nested list are small.
It is slow when they are big. Can someone suggest a clever way to
speed it up?

Isn't there something wrong with the ordering?

Pablo's answers are:
[ 1, 2, 3, 4, 5 ] == [0, 1, 2, 3, 4] correct

[ 3, 1, 5, 2, 4 ] == [2, 0, 4, 1, 3] wrong?

[ -1.1, 2.2, 0, -1.1, 13 ] == [0, 3, 2, 0, 4] wrong?

Doing it in my head I get:
[ 3, 1, 5, 2, 4 ] == [ 1, 3, 0, 4, 2 ]

[ -1.1, 2.2, 0, -1.1, 13 ] == [0, 3, 2, 1, 4]

What gives?

Did I misunderstand what "rank ordering (handling ties as well)" means?

pablo.mitchell · Aug 4, 2007

[email protected] said:
[email protected] said:

A naive approach to rank ordering (handling ties as well) of nested
lists may be accomplished via:

Click to expand...

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(singleList)
sortedList.sort()
return map(sortedList.index, singleList)
return map(rankList, nestedList)

unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2, 0, -1.1, 13 ] ]
print rankLists(unranked)

Click to expand...

Click to expand...

[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]

Click to expand...

This works nicely when the dimensions of the nested list are small.
It is slow when they are big. Can someone suggest a clever way to
speed it up?

Click to expand...

Each use of sortedList.index is O(N) [where N is len(singleList)], and
you have N such uses in the map in the inner function, so this approach
is O(N squared). Neil's suggestion to use bisect replaces the O(N)
.index with an O(log N) search, so the overall performance is O(N log N)
[[and you can't do better than that, big-O wise, because the sort step
is also O(N log N)]].

"beginner"'s advice to use a dictionary is also good and may turn out to
be faster, just because dicts are SO fast in Python -- but you need to
try and measure both alternatives. One way to use a dict (warning,
untested code):

def rankList(singleList):
d = {}
for i, item in reversed(enumerate(sorted(singleList))):
d[item] = i
return [d[item] for item in singleList]

If you find the for-clause too rich in functionality, you can of course
split it up a bit; but note that you do need the 'reversed' to deal with
the corner case of duplicate items (without it, you'd end up with 1s
instead of 0s for the third one of the sublists in your example). If
speed is of the essence you might try to measure what happens if you
replace the returned expression with map(d.__getitem__, singleList), but
I suspect the list comprehension is faster as well as clearer.

Another potential small speedup is to replace the first 3 statements
with just one:

d = dict((item,i) for i,item in reversed(enumerate(sorted(singleList))))

but THIS density of functionality is a bit above my personal threshold
of comfort ("sparse is better than dense".

Alex

Thanks for breaking it down so thoroughly. I try the different
recipes and see which gives the best trade offs between readability
and performance. Agreed that dict() approach looks promising.

pablo.mitchell · Aug 4, 2007

A naive approach to rank ordering (handling ties as well) of nested
lists may be accomplished via:

Click to expand...

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(singleList)
sortedList.sort()
return map(sortedList.index, singleList)
return map(rankList, nestedList)

unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2, 0, -1.1, 13 ] ]
print rankLists(unranked)

Click to expand...

Click to expand...

[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]

Click to expand...

This works nicely when the dimensions of the nested list are small.
It is slow when they are big. Can someone suggest a clever way to
speed it up?

Click to expand...

Isn't there something wrong with the ordering?

Pablo's answers are:
[ 1, 2, 3, 4, 5 ] == [0, 1, 2, 3, 4] correct

[ 3, 1, 5, 2, 4 ] == [2, 0, 4, 1, 3] wrong?

[ -1.1, 2.2, 0, -1.1, 13 ] == [0, 3, 2, 0, 4] wrong?

Doing it in my head I get:
[ 3, 1, 5, 2, 4 ] == [ 1, 3, 0, 4, 2 ]

[ -1.1, 2.2, 0, -1.1, 13 ] == [0, 3, 2, 1, 4]

What gives?

Did I misunderstand what "rank ordering (handling ties as well)" means?

There are many ways to skin this cat. I am considering the following
approach:

http://en.wikipedia.org/wiki/Rank_order#Standard_competition_ranking_.28.221224.22_ranking.29

pablo.mitchell · Aug 4, 2007

A naive approach to rank ordering (handling ties as well) of nested
lists may be accomplished via:

Click to expand...

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(singleList)
sortedList.sort()
return map(sortedList.index, singleList)
return map(rankList, nestedList)

unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2, 0, -1.1, 13 ] ]
print rankLists(unranked)

Click to expand...

Click to expand...

[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]

Click to expand...

This works nicely when the dimensions of the nested list are
small. It is slow when they are big. Can someone suggest a
clever way to speed it up?

Click to expand...

Use binary search instead of linear.

import bisect
import functools

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(sorted(singleList))
return map(functools.partial(bisect.bisect_left, sortedList), singleList)
return map(rankList, nestedList)

Very clean and precisely what I was looking for. Thanks for expanding
my knowledge of the language.

Cousin Stanley · Aug 4, 2007

....
"beginner"'s advice to use a dictionary is also good
and may turn out to be faster, just because dicts are
SO fast in Python -- but you need to try and measure
both alternatives.

One way to use a dict ( warning, untested code ) :

def rank_list( single_list ) :

d = { }

for i , item in reversed( enumerate( sorted( single_list ) ) ) :

d[ item ] = i

return [ d[ item ] for item in single_list ]

Alex ....

I tested it but I don't know how to fix it ....

Both Pablo's original version and Neil's version work
but I get the following traceback from your version ....

Traceback (most recent call last):
File "list_nested_rank.py", line 100, in <module>
test_01( list_source )
File "list_nested_rank.py", line 87, in test_01
list_print( this_func( list_source ) )
File "list_nested_rank.py", line 62, in rank_lists_02
return map( rank_list , nested_list )
File "list_nested_rank.py", line 56, in rank_list
for i , item in reversed( enumerate( sorted( single_list ) ) ) :
TypeError: argument to reversed() must be a sequence

Alex Martelli · Aug 4, 2007

Cousin Stanley said:
for i , item in reversed( enumerate( sorted( single_list ) ) ) : ...
TypeError: argument to reversed() must be a sequence

Oops, right. Well then,

aux_seq = list(enumerate(sorted(single_list)))
for i, item in reversed(aux_seq): ...

or the like.

Alex

Cousin Stanley · Aug 4, 2007

Oops, right. Well then,

aux_seq = list(enumerate(sorted(single_list)))
for i, item in reversed(aux_seq): ...

Alex ....

That fixed it.

Thanks ....

pablo.mitchell · Aug 5, 2007

...

Oops, right. Well then,

aux_seq = list(enumerate(sorted(single_list)))
for i, item in reversed(aux_seq): ...

or the like.

Alex

The particular embedded list I have been tinkering with is 43 x 3884
and the dictionary approach (as you can probably guess) is a big
improvement over the linear one. It also has the added benefit of
being compatible with Python24. If I'm not mistaken the binary
approach suggested by Neil requires Python25. Since in some
situations I don't have control over the version available I will go
with a variant of the dictionary recipe.

Calculate rang and derang of ordering of subsets	1	Feb 6, 2024
Calculate rang and derang of ordering of subsets	0	Feb 6, 2024
How to return list of lists	9	Apr 25, 2013
Walking deeply nested lists	11	Aug 27, 2010
fetchall to python lists	2	Sep 30, 2013
index nested lists	3	Jul 28, 2009
Minimum Total Difficulty	0	Nov 15, 2023
Count each unique element in list of lists	1	Nov 8, 2013

Efficient Rank Ordering of Nested Lists

pablo.mitchell

beginner

Neil Cerutti

Alex Martelli

pyscottishguy

pablo.mitchell

pablo.mitchell

pablo.mitchell

Cousin Stanley

Alex Martelli

Cousin Stanley

pablo.mitchell

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads