sorting question

K

Ksenia Marasanova

Hi,

I have a list that contains nodes from a tree. Each node is a class
instance, but I'll use dictionary here to simplify the example.
So the list looks like this:
[
{'id': 1,
'name': 'Parent node',
'ord_number': 1,
'parent_id': 0,
'url': '/parentnode/'},
{'id': 2,
'name': 'My node',
'ord_number': 1,
'parent_id': 1,
'url': '/parentnode/mynode/'}
]

Where 'ord_number' is the "sibling index". It's not always properly
filled in; sometimes it's the same for several siblings, or doesn't
starts with 1.

I want to sort this list with the following rules:
1. The parent must always come before the children in the list
2. Nodes with the same parent must be sorted by 'ord_number'

The first rule is easy, cause I can use 'url' for it. List with nodes
is coming from the database, so I just do "ORDER BY url".
The second rule is kind of tricky to do in the database. I probably
would need to do something like "ORDER BY
cut_off_lastpart_from_url(url), ord_number". But there seems to be no
native string function in Postgres to do it easily, so I desided to
sort it in Python.

So I've come up with this:

def cmp_tree(x, y):
if x['parent_id'] == y['parent_id']:
return cmp(x['ord_number'], y['ord_number'])
else:
return cmp(x['url'], y['url'])

nodes.sort(cmp_tree)

but it doesn't work as expected. Apparently I don't have a clue about
how sorting function work :(

Can anybody help?
 
D

Devan L

Ksenia said:
Hi,

I have a list that contains nodes from a tree. Each node is a class
instance, but I'll use dictionary here to simplify the example.
So the list looks like this:
[
{'id': 1,
'name': 'Parent node',
'ord_number': 1,
'parent_id': 0,
'url': '/parentnode/'},
{'id': 2,
'name': 'My node',
'ord_number': 1,
'parent_id': 1,
'url': '/parentnode/mynode/'}
]

Where 'ord_number' is the "sibling index". It's not always properly
filled in; sometimes it's the same for several siblings, or doesn't
starts with 1.

I want to sort this list with the following rules:
1. The parent must always come before the children in the list
2. Nodes with the same parent must be sorted by 'ord_number'

The first rule is easy, cause I can use 'url' for it. List with nodes
is coming from the database, so I just do "ORDER BY url".
The second rule is kind of tricky to do in the database. I probably
would need to do something like "ORDER BY
cut_off_lastpart_from_url(url), ord_number". But there seems to be no
native string function in Postgres to do it easily, so I desided to
sort it in Python.

So I've come up with this:

def cmp_tree(x, y):
if x['parent_id'] == y['parent_id']:
return cmp(x['ord_number'], y['ord_number'])
else:
return cmp(x['url'], y['url'])

nodes.sort(cmp_tree)

but it doesn't work as expected. Apparently I don't have a clue about
how sorting function work :(

Can anybody help?

class Node:
def __init__(self, name, url, order, pid, id):
self.name = name
self.url = url
self.order = order
self.pid = pid
self.id = id
def __repr__(self):
return self.url
def mycmp(x,y):
if x.pid == y.pid:
if x.order != y.order:
return cmp(x.order,y.order)
return cmp(x.url,y.url)
return cmp(x.pid,y.pid)
a = Node('ham','/test/ham/',1,0,1)
b = Node('eggs','/test/ham/eggs/',1,1,2)
c = Node('bacon','/test/ham/bacon/',1,1,3)
d = Node('spam','/test/ham/bacon/spam/',1,3,4)

Does this work for you? I haven't tested it much.
 
K

Ksenia Marasanova

class Node:
def __init__(self, name, url, order, pid, id):
self.name = name
self.url = url
self.order = order
self.pid = pid
self.id = id
def __repr__(self):
return self.url
def mycmp(x,y):
if x.pid == y.pid:
if x.order != y.order:
return cmp(x.order,y.order)
return cmp(x.url,y.url)
return cmp(x.pid,y.pid)
a = Node('ham','/test/ham/',1,0,1)
b = Node('eggs','/test/ham/eggs/',1,1,2)
c = Node('bacon','/test/ham/bacon/',1,1,3)
d = Node('spam','/test/ham/bacon/spam/',1,3,4)

Does this work for you? I haven't tested it much.

Thank you for help :)

It doesn't work yet. The id's are not sequentional, so it fails when
id of the child is lower than id of the parent.
If I change the last line in mycmp from
return cmp(x.pid,y.pid)
to
return cmp(x.url,y.url) # almost the same cmp function I use
than your example work and I can't break it. But the real code
(hunderds of nodes) doesn't, and I can't yet find the node(s) that
cause it :(
 
P

Peter Otten

Ksenia said:
I want to sort this list with the following rules:
1. The parent must always come before the children in the list
2. Nodes with the same parent must be sorted by 'ord_number'

The first rule is easy, cause I can use 'url' for it. List with nodes
is coming from the database, so I just do "ORDER BY url".

I think you cannot get away with your first rule, but have to operate on the
full path instead. Otherwise the position of inner nodes would sometimes be
determined by their url and sometimes by their ord_number *during* *the*
*same* *sort*.

class Node(object):
nodes = {} # defeats garbage collection
def __init__(self, id, name, ord_number, parent_id, url):
self.id = id
self.name = name
self.ord_number = ord_number
self.parent_id = parent_id
self.url = url
assert id not in self.nodes
self.nodes[id] = self
def get_path(self):
node = self
path = []
while node is not None:
path.append(node)
node = self.nodes.get(node.parent_id)
path.reverse()
return path
def get_key(self):
return [node.ord_number for node in self.get_path()]
def __cmp__(self, other):
return cmp(self.get_key(), other.get_key())
def __str__(self):
return "%s %s" % (self.name, self.get_key())

nodes = [Node(**row) for row in data]

# key argument not necessary, but should be faster
nodes.sort(key=Node.get_key)

I'm too lazy to do the test cases, so you have to check it yourself.

Peter
 
K

Ksenia Marasanova

Example of the wrong sort:



class Node:

def __init__(self, name, url, order, pid, id):
self.name = name
self.url = url
self.order = order
self.pid = pid
self.id = id

def __repr__(self):
return '%s [order: %s]' % (self.url, self.order)

def mycmp(x,y):
print "COMPARING", x, y
if x.pid == y.pid:
if x.order != y.order:
return cmp(x.order,y.order)
return cmp(x.url,y.url)
#return cmp(x.pid,y.pid)
return cmp(x.url,y.url)

list = [
Node('top','/', order=1, pid=0, id=1000),
Node('ham','/test/ham/', order=1, pid=400, id=21),
Node('ham','/test2/spam/', order=1, pid=300, id=32),
Node('ham','/test2/spam/spam/', order=1, pid=32, id=320),
Node('ham','/test2/ham/', order=2, pid=300, id=31),
Node('eggs','/test/ham/eggs/', order=1, pid=21, id=121),
Node('eggs','/test2/ham/eggs/', order=1, pid=21, id=1210),
Node('spam','/test2/', order=10, pid=1000, id=300),
Node('spam','/test/', order=1, pid=1000, id=400)
]

print "++++++++++++++ BEFORE SORTING +++++++++++++"
for item in list:
print item

list.sort(mycmp)

print "++++++++++++++ AFTER SORTING +++++++++++++"
for item in list:
print item
~
 
K

Ksenia Marasanova

2005/8/10 said:
I think you cannot get away with your first rule, but have to operate on the
full path instead. Otherwise the position of inner nodes would sometimes be
determined by their url and sometimes by their ord_number *during* *the*
*same* *sort*.


Rrr.. of course, you're right! Thanks.

Your code works great, I only made one slightly change:

def get_key(self):
return [(node.id, node.ord_number) for node in self.get_path()]

Because of errors in my tree some siblings has the same ord_number.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top