[2.5.1.1/dictionary] Change sorting order?

S

Steven D'Aprano

O(N*Log N) + O(N) == O(N**2)?

Oops! :(

Of course, the sort is in fast C, and the linear search is in relatively
slow Python, so it is quite conceivable that for realistic amounts of
data, the time could be dominated by the searching.

Or the OP might just change his requirements and allow starting the
display in the middle of the letter.

Besides, the idea was to avoid sorting altogether.

I don't see why you think that's necessary. Python's sort is very fast
when the list is nearly sorted, so if you re-sort after any addition of a
username, it might even be faster than trying to keep the list sorted all
the time. That's the sort of thing that we'd need to do some timing tests
to see which was faster.
 
N

Neil Cerutti

Oops! :(

Of course, the sort is in fast C, and the linear search is in
relatively slow Python, so it is quite conceivable that for
realistic amounts of data, the time could be dominated by the
searching.

Or the OP might just change his requirements and allow starting
the display in the middle of the letter.

That's what I would advocate. The scheme of starting at a random
(or cycling through) first letters still grants an arbitrary
advantage to some user names. A random starting position is thus
more fair *and* more efficient. ;)
 
D

Dave Angel

Gilles said:
Seems to me the other solutions I've seen so far are more complex than
needed. I figure you either want an unordered list, in which case you
could use random.shuffle(), or you want a list that's sorted, but starts
somewhere in the middle, at an arbitrary place, goes to the end, and
wraps back to the beginning. So use random.randint() to choose an index
within the list, and concatenate two slices of the list, based on that
index in reverse order.

Yes, this is exactly what I need: Start listing items from a given
index (actually, using a character since the list contains names) all
the way to the end of the list; If the character wasn't the very
first, go back to the first time and display the list until we get to
the current character.

Python is so feature-rich, I'm sure there's a much simpler way to do
this than this crappy code of mine:

=============
connected = []
connected.append("0dummy")
connected.append("aa")
connected.append("bb")
connected.append("cc")

index = 0
for item in connected:
#For testing purposes;
#Final code will read/increment character from file
if item[index] == "b":
break
else:
index = index + 1

#Print items between current character and end of list
tempindex = index
while(tempindex < len(connected)):
print connected[tempindex]
tempindex = tempindex + 1

#if current letter not first character,
#display beginning of list up to current character
if index != 0:
tempindex = 0
while tempindex < index:
print connected[tempindex]
tempindex = tempindex + 1
=============

Thank you for any help
Your code would fail if you picked a letter for which there were no
entries. That's easily fixed, just use >= instead of == in your if test.

Jean-Michel Pichavant post implements my second option. So I'll borrow
his code here.

But apparently you want my third option: random starting place, but
with all the names starting with the same character together. To get
that, but with the distribution proportional to the number of names that
start with the given character, you need to get the index in two
stages. First use the random.randint(). Then do a search for the first
name that starts with that same character. For that, I borrow from
Duncan Booth, and use bisect.bisect().

import random, bisect

connected = sorted( ['jj', '0test', '3test', 'aa', 'aj', 'bb', 'bz',
'dc', 'df'] )

def getNewOrder(myList):
index = random.randint(0,len(myList)-1)
startch = myList[index][0]
pos = bisect.bisect_left(myList, startch)
print index, myList[index], pos, myList[pos]
return myList[pos:] + myList[:pos] # using slicing instead of list
comprehension (suggested by DaveA)

print connected
for i in range(6):
print getNewOrder(connected)

I have the extra print inside the function to show (for example) that if
'aj' is picked, it actually will start with'aa'

Note that with your constraints, there's no way it can be fair. But
experiment a bit, and see if it meets your needs.

DaveA
 
D

Dennis Lee Bieber

Hello

I use a dictionary to keep a list of users connected to a web site.
Why not a database...
To avoid users from creating login names that start with digits in
order to be listed at the top, I'd like to sort the list differently
every minute so that it'll start with the next letter, eg. display the
list from A...Zdigits the first time, then B...ZAdigits, etc.

That way, users have no incentive to create login names that start
with either a digit or letter A.

I see that dictionaries can be sorted using the... sort() method, but
is it possible to have Python start sorting from a different letter?
Database SQL makes it relatively simple... Only catch is I don't
recall if "order by" can be applied to parts of a "union" -- if not,
just do a pair of selects...

#assuming order by is valid in parts and 'c' represents the current
# break point character (note: this is likely a case sensitive
# result set
select uname from users
where uname >= 'c'
order by uname
union
select uname from users
where uname < 'c'
order by uname
#gather results into Python list for display

#if not valid, use two selects
select uname from users
where uname >= 'c'
order by uname
#gather results into Python list
select uname from users
where uname < 'c'
order by uname
#append results to previous list; display results


Which operation is more costly (that is, performed more often) in
your application? Retrieving data for ONE user by uname, or updating the
display order of ALL uname's?

If you do more display updating/sorting, maybe a dictionary is not
the best data structure. Instead of a dictionary, use a list of tuples
(uname, user-data). Sort the list once, and keep it sorted. Use the
bisect module (I believe) to do uname lookups and inserts. Use list
slicing to order the results for display (you might manage a side list
or dictionary consisting of the starting indices for each break point
character). Using the side-list, or a bisect search for the break point
index, you'd then use slicing to order the display

unames[side['c']:] + unames[:side['c']]

If you need fast access to both individual uname's and to the break
list, maybe define a class that internally keeps both the dictionary and
the list, with methods to retrieve a desired break list, and methods for
dictionary key retrieval -- insert will have to update both the
dictionary and the list.
 
G

Gilles Ganault

Here's another:

Thanks for the sample. It work great, except that it also runs when
the header character doesn't match any item in the list:

=======
import bisect

connected = []
connected.append("_test")
connected.append("-test")
connected.append("0test")
connected.append("1test")
connected.append("Aa")
connected.append("Bb")
connected.append("Cc")

def rotated_sort(data, startch):
data = sorted(data)
pos = bisect.bisect_left(data, startch)
return data[pos:] + data[:pos]

for ch in "_-0123456789ABCDEFGHIJKLMNOPQRSTUVWYZ":
print ch, rotated_sort(connected, ch)
=======
C:\>test.py
_ ['_test', '-test', '0test', '1test', 'Aa', 'Bb', 'Cc']
- ['-test', '0test', '1test', 'Aa', 'Bb', 'Cc', '_test']
0 ['0test', '1test', 'Aa', 'Bb', 'Cc', '_test', '-test']
1 ['1test', 'Aa', 'Bb', 'Cc', '_test', '-test', '0test']
2 ['Aa', 'Bb', 'Cc', '_test', '-test', '0test', '1test']
3 ['Aa', 'Bb', 'Cc', '_test', '-test', '0test', '1test']
4 ['Aa', 'Bb', 'Cc', '_test', '-test', '0test', '1test']
=======

Should I add an "if" block in the "for ch in"?
 
G

Gilles Ganault

Ok I realized that picking up a random index prevent from grouping names
starting with the same letter (to ease visual lookup).
Then go for the random char, and use char comparison (my first example).

Yup, I think it's a good enough solution. Thanks much for the help.
 
D

Dave Angel

Steven said:
According to the OP's stated requirements, he needs it every minute.
I read it as he wants viewers of the page that are more than a minute
apart to see a different order.
Personally, from a usability perspective, I think a refresh of a
(potentially huge) list of users every minute would be horrible. I think
the whole design needs to be rethought, but that's not my decision.
Certainly if there are more than 100 users or so, the idea of presenting
a web page with a single list of users is unwieldy. So I assumed this
problem did not have to scale. Once the list goes on multiple pages,
entirely different algorithms are appropriate, along with the display
mechanisms to support them.
I doubt if the cost of generating it is much
different than the cost of checking the time.

Checking the time is quite fast -- it's a simple system call. Slicing and
copying a list containing potentially tens or hundreds of thousands of
names is anything but cheap!

from timeit import Timer
t1 = Timer('x = time.time()', 'import time')
t2 = Timer('y = x[:1000] + x[1000:]', 'x = [None]*100000')

t1.timeit(1000)
0.0025529861450195312
t2.timeit(1000)
3.0900199413299561

Slicing is more than 1000 times slower than checking the time. If you
think I'm being unfair, that the OP's application will never have 100,000
names, try it with 1000 names:

t3 = Timer('y = x[:1000] + x[1000:]', 'x = [None]*1000')
t3.timeit(1000)
0.041487932205200195

That's 20 times slower than checking the time.

That's the advantage of my suggestion: the only slicing done is the tiny
master list, which means copying 27 pointers. Easy and fast, and it
doesn't matter whether you have one user or one million, that operation
will take exactly the same amount of time. Your suggestion to keep all
the users in one giant list, and slice it repeatedly, scales terribly.


I guess there's a third possibility, that you don't want some of the G
names at the beginning, and some at the end. In that case, I'd generate
the random index as above, then increment it till the first character of
the item changes. Use that index as your split point.

Another solution that doesn't scale terrible well. (Although not as bad
as the previous one.)

And what's with the random index thing? Why are you throwing away
perfectly good information every time you want to shift letters? Your
solution is something like this:

You have a book opened at the start of Chapter 11, and you now want to
jump to the start of Chapter 12.

(1) Open the book at a random page.
(2) Is it the start of Chapter 12?
Wrong. We're checking whether it's the start of whatever the next
chapter is, so if we randomly jumped into chapter 5, we'd be going a
page at a time till chapter 6. After I thought about it, I realized we
should be going backwards, not forward, but the idea is the same. We're
only walking through a half chapter, typically.
If yes, you're done.
If no, turn to the next page. If you reach the end of
the book, start again at the beginning.
(3) Go to step 2.

There are search techniques that start with a random index, but they
don't advance one position at a time! (E.g. binary search, or hunt-and-
bisect search, or similar.) But why search each time when you can keep an
index to the start of each chapter and jump straight there in constant
time?
First, the time that will be spent turning this list into a web page
will far exceed any of these numbers.

And if we assume he's using CGI, then the time spent reloading and
unmarshaling the list will exceed by even more.

Next, when it comes to an actual implementation, I used bisect(), which
is O(logn).

And finally, it was more important to get the concepts together, as the
spec wasn't well-thought out, than it was to get some optimal solution.
That's why I used slice instead of doing the index-len() trick. In real
code, he'll be looping over the list, calling some logic to add <p> and
</p> for example. By running that loop from index-len() to index, it
can operate on the original list.

In my opinion, the only "fair" answer is to choose the starting point
randomly, or sequentially, but without regard to starting letter. And
random has the distinct advantage of not needing to store state between
CGI invocations.

The rest are premature optimizations, in my opinion. But once a final
spec is decided, if the letters wanted to be equally weighted, then a
dictionary of sorted lists might very well be the best storage
mechanism. And once each starting letter gets its own web page, it also
makes sense.

DaveA
 
J

John Posner

Hello

I use a dictionary to keep a list of users connected to a web site.

To avoid users from creating login names that start with digits in
order to be listed at the top, I'd like to sort the list differently
every minute so that it'll start with the next letter, eg. display the
list from A...Zdigits the first time, then B...ZAdigits, etc.

I don't believe anyone has suggested customizing the way that the
dictionary's keys are sorted. How about using a string-like object
("RotaKey"), with a changeable sort order:

#--------------------------
import string, time

class RotaKey(object):
"""
string with rotating sort order
"""
LETTERS = string.ascii_uppercase

def __init__(self, instring):
self.string = instring

@staticmethod
def RotateSortOrder():
"""
rotate the letter sequence for sorting RotaKey objects
"""
RotaKey.LETTERS = RotaKey.LETTERS[1:] + RotaKey.LETTERS[0]

def __repr__(self): return "<%s>" % self.string

def __cmp__(self, other):
self_first, self_rest = self.string[0], self.string[1:]
other_first, other_rest = other.string[0], other.string[1:]
result = cmp(self.LETTERS.index(self_first.upper()),
self.LETTERS.index(other_first.upper()))
if result != 0:
# first letters differ
return result
else:
# first letters same, compare rest of strings
return cmp(self_rest, other_rest)

if __name__ == "__main__":
# create dictionary: keys are RotaKey's, values in range 1001 - ...
names = map(RotaKey,
"Ant Zebra Bob Cat2 Ant2 Eagle Bob3 Bob2 Cat Dog".split())
datadict = dict(zip(names, range(1001, 1001+len(names))))

# TEST: print, rotate, print, rotate, ...
for i in range(7):
if i > 0: RotaKey.RotateSortOrder()
print "=" * 15, i
for k in sorted(datadict): print k, datadict[k]
#--------------------------

NOTE: I tried implementing RotaKey as a subclass of "str", but found
that the redefinition of __cmp__() was ignored.

You can invoke RotaKey.RotateSortOrder() every minute (or whatever) to
change the sort order.

Program output:

=============== 0
<Ant> 1001
<Ant2> 1005
<Bob> 1003
<Bob2> 1008
<Bob3> 1007
<Cat> 1009
<Cat2> 1004
<Dog> 1010
<Eagle> 1006
<Zebra> 1002
=============== 1
<Bob> 1003
<Bob2> 1008
<Bob3> 1007
<Cat> 1009
<Cat2> 1004
<Dog> 1010
<Eagle> 1006
<Zebra> 1002
<Ant> 1001
<Ant2> 1005
=============== 2
<Cat> 1009
<Cat2> 1004
<Dog> 1010
<Eagle> 1006
<Zebra> 1002

... etc.

-John
 
G

Gilles Ganault

To avoid users from creating login names that start with digits in
order to be listed at the top, I'd like to sort the list differently
every minute so that it'll start with the next letter, eg. display the
list from A...Zdigits the first time, then B...ZAdigits, etc.

Thanks everyone for the great feedback. I ended up using
- a list instead of a dictionary to hold the list of names
- use the random library to pick a character from which to start
listing names (simpler than saving the next letter into a file, and
starting from this the next time the loop is gone through)

For those interested, here's some working code:

==========
import random

connected =
["_test","-test","0test","1test","Aa","Aab","Bb","Bbbbb","Cc","Ccaaaaaa"]
connected.sort()

#Fill list with non-duplicate first letter of all items
characters=[]
for name in connected:
char = name[0]
#if this character not in list, add to array
if not char in characters:
characters.append(char)

#Pick a random character from which to start iterating
index = random.randint(0,len(characters)-1)
startch = characters[index]
print "Will start from %s" % startch

#Go through list starting with items that start with 'starch', and
rotate from beginning
result = [name for name in connected if name[0] >= startch] + [name
for name in connected if name[0] < startch]
print result
==========

Thanks again.
 
A

Arnaud Delobelle

Gilles Ganault said:
To avoid users from creating login names that start with digits in
order to be listed at the top, I'd like to sort the list differently
every minute so that it'll start with the next letter, eg. display the
list from A...Zdigits the first time, then B...ZAdigits, etc.

Thanks everyone for the great feedback. I ended up using
- a list instead of a dictionary to hold the list of names
- use the random library to pick a character from which to start
listing names (simpler than saving the next letter into a file, and
starting from this the next time the loop is gone through)

For those interested, here's some working code:

==========
import random

connected =
["_test","-test","0test","1test","Aa","Aab","Bb","Bbbbb","Cc","Ccaaaaaa"]
connected.sort()

#Fill list with non-duplicate first letter of all items
characters=[]
for name in connected:
char = name[0]
#if this character not in list, add to array
if not char in characters:
characters.append(char)

characters = list(set(name[0] for name in connected))
#Pick a random character from which to start iterating
index = random.randint(0,len(characters)-1)
startch = characters[index]

startch = random.choice(characters)
print "Will start from %s" % startch

#Go through list starting with items that start with 'starch', and
rotate from beginning
result = [name for name in connected if name[0] >= startch] + [name
for name in connected if name[0] < startch]

It would make sense to keep track of the indices where the first word
starting with a certain letter starts in the earlier part of the code
and to use it now to 'rotate' the list.
 
A

Arnaud Delobelle

Arnaud Delobelle said:
Gilles Ganault said:
To avoid users from creating login names that start with digits in
order to be listed at the top, I'd like to sort the list differently
every minute so that it'll start with the next letter, eg. display the
list from A...Zdigits the first time, then B...ZAdigits, etc.

Thanks everyone for the great feedback. I ended up using
- a list instead of a dictionary to hold the list of names
- use the random library to pick a character from which to start
listing names (simpler than saving the next letter into a file, and
starting from this the next time the loop is gone through)

For those interested, here's some working code:

==========
import random

connected =
["_test","-test","0test","1test","Aa","Aab","Bb","Bbbbb","Cc","Ccaaaaaa"]
connected.sort()

#Fill list with non-duplicate first letter of all items
characters=[]
for name in connected:
char = name[0]
#if this character not in list, add to array
if not char in characters:
characters.append(char)

characters = list(set(name[0] for name in connected))
#Pick a random character from which to start iterating
index = random.randint(0,len(characters)-1)
startch = characters[index]

startch = random.choice(characters)
print "Will start from %s" % startch

#Go through list starting with items that start with 'starch', and
rotate from beginning
result = [name for name in connected if name[0] >= startch] + [name
for name in connected if name[0] < startch]

I forgot this line:

result = sorted(connected, key=lambda name: name[0] < startch)

This takes advantage of the fact that timsort is stable.

So you could do it like this, assuming connected is sorted:

characters = list(set(name[0] for name in connected))
startch = random.choice(characters)
result = sorted(connected, key=lambda name: name[0] < startch)

If connected is not sorted, you could replace the last line with:

result = sorted(connected, key=lambda name: (name[0] < startch, name))
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,778
Messages
2,569,605
Members
45,237
Latest member
AvivMNS

Latest Threads

Top