heapq question

G

Giampaolo Rodola'

Hi,
this is related to what I'm trying to implement here:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/20796724c1daf1e1#

My question is the following: is it safe to avoid to re-heapify() a
heap when I remove or move an element which is not the first one?
Example:
from heapq import *
heap = [2,4,6,7,1,2,3]
heapify(heap)
del heap[4]
# Am I forced to heapify() the heap here?

Thanks in advance.


--- Giampaolo
http://code.google.com/p/pyftpdlib/
 
G

Giampaolo Rodola'

Giampaolo Rodola' said:
My question is the following: is it safe to avoid to re-heapify() a
heap when I remove or move an element which is not the first one?
Example:
from heapq import *
heap = [2,4,6,7,1,2,3]
heapify(heap)
del heap[4]
# Am I forced to heapify() the heap here?
Thanks in advance.

No, it is not safe to avoid re-heapifying.

After you call heapify the list is ordered such that for any 'n' in the
range 1..len(heap)//2 it is the case that heap[n-1] <= heap[2*n-1] and when
heap[2*n] exists heap[n-1] <= heap[2*n].

So:
heap = [0, 100, 1, 101, 102, 2, 3]
heapify(heap)
heap

[0, 100, 1, 101, 102, 2, 3]>>> del(heap[4])
[0, 100, 1, 101, 2, 3]>>> heapify(heap)
[0, 2, 1, 101, 100, 3]

after deleting heap[4] it is no longer the case that heap[1] >= heap[4] as
the old heap[5] was a child of heap[2] not of heap[1].

Even if I avoid to re-heapify() it seems that the first element
returned by heappop() is always the smaller one.
heap = [0, 100, 1, 101, 102, 2, 3]
heapify(heap)
heap [0, 100, 1, 101, 102, 2, 3]
del heap[4]
heap [0, 100, 1, 101, 2, 3]
heappop(heap) 0
heappop(heap) 1
heappop(heap) 2
heappop(heap) 3
heappop(heap) 100
heappop(heap) 101
heappop(heap)
Traceback (most recent call last):


That would be ok for me, as long as heappop() always returns the
smallest item.
Having said that I'd like to understand if there are cases where
deleting or moving an element of the heap, causes heappop() to return
an element which is not the smallest one.


--- Giampaolo
http://code.google.com/p/pyftpdlib/
 
B

bearophileHUGS

Giampaolo Rodola':
Even if I avoid to re-heapify() it seems that the first element
returned by heappop() is always the smaller one.

Yes, the heappop() function keeps the heap invariant, so it will keep
giving you the smallest item.

I'd like to understand if there are cases where
deleting or moving an element of the heap, causes heappop() to return
an element which is not the smallest one.

To be an heap it must keep its heap invariant true. The heap functions
keep that invariant. Generally any time you move/delete an item
yourself, you break the invariant, so you don't have an heap anymore
and you need to re-heapify to restore the heap invariant :)

Bye,
bearophile
 
G

Giampaolo Rodola'

Giampaolo Rodola' said:
Having said that I'd like to understand if there are cases where
deleting or moving an element of the heap, causes heappop() to return
an element which is not the smallest one.

Yes, of course there are: any time you delete element 0 of the heap you can
do that.
heap = [0, 2, 1, 4, 5, 6, 7, 8, 9]
heapify(heap)
heap

[0, 2, 1, 4, 5, 6, 7, 8, 9]
del heap[0]
heappop(heap)
2

By definition, element 0 is always the smallest but the next two elements
could be in any order. If you delete an element other than 0 then the next
pop won't be guaranteed to work.

If you delete an element other than 0 the very next pop will work, but the
heap conditions won't necessarily be restored so subsequent elements may
come out in the wrong order. A simple example:
heap = [0, 1, 3, 4, 2, 5]
heapify(heap)
heap [0, 1, 3, 4, 2, 5]
del heap[1]
heappop(heap), heappop(heap), heappop(heap)

(0, 3, 2)

Thanks, that's what I wanted to know.
I understand that heapq is not that efficient to implement timeouts as
I thought at first.
It would have been perfect if there were functions to remove arbitrary
elements withouth needing to re-heapify() the heap every time.
Thanks for your help anyway.


--- Giampaolo
http://code.google.com/p/pyftpdlib/
 
M

Martin v. Löwis

I understand that heapq is not that efficient to implement timeouts as
I thought at first.
It would have been perfect if there were functions to remove arbitrary
elements withouth needing to re-heapify() the heap every time.

It is efficient for that - you just need to use it correctly.

To remove the nth element from a heap, replace it with the last element,
and then _siftup that element:

def remove_at_index_n(heap, n):
if n == len(heap)-1:
return heap.pop()
result = heap[n]
heap[n] = heap.pop()
heapq._siftup(heap, n)
return result

The time complexity for that operation is O(log(len(heap))).

HTH,
Martin
 
G

Giampaolo Rodola'

I understand that heapq is not that efficient to implement timeouts as
I thought at first.
It would have been perfect if there were functions to remove arbitrary
elements withouth needing to re-heapify() the heap every time.

It is efficient for that - you just need to use it correctly.

To remove the nth element from a heap, replace it with the last element,
and then _siftup that element:

def remove_at_index_n(heap, n):
    if n == len(heap)-1:
        return heap.pop()
    result = heap[n]
    heap[n] = heap.pop()
    heapq._siftup(heap, n)
    return result

The time complexity for that operation is O(log(len(heap))).

HTH,
Martin


Thanks, by doing some quick benchamrks it seems 20% faster than using
heapify().
And if instead of removing an element I'd want to change its value?
E.g.:
>>> heap = [0, 2, 1, 4, 5, 6, 7, 8, 9]
>>> heapify(heap)
>>> heap [0, 2, 1, 4, 5, 6, 7, 8, 9]
>>> heap[4] = 12



--- Giampaolo
http://code.google.com/p/pyftpdlib/
 
M

Miles

There could be suitable functions, but there aren't any.
...
Bottom line though is that heaps aren't really suitable for timeouts.

I would argue that heaps in general are ideally suited for timeouts;
it's just that the Python heapq implementation is lacking if you ever
need to cancel a timeout before it expires.

-Miles
 
G

Giampaolo Rodola'

The problem is that in order to remove an arbitrary element from a
heap, you usually have to do an O(n) linear search in order to find it
first, since you can't know ahead of time which index an arbitrary
element is at.  (You can, actually, but only if your heap
implementation somehow notifies the elements of their new index when
it moves them in the heap, which the Python heapq module doesn't do,
so you'd have to write your own heap implementation.)
And if instead of removing an element I'd want to change its value?
E.g.:
 >>> heap = [0, 2, 1, 4, 5, 6, 7, 8, 9]
 >>> heapify(heap)
 >>> heap
 [0, 2, 1, 4, 5, 6, 7, 8, 9]
 >>> heap[4] = 12

Don't do that; you must first remove the element and then reinsert it.

-Miles

That seems to be slower than re-heapifying() the heap.
The code I used (which is probably wrong):

def reset(self):
"""Reschedule this call resetting the current countdown."""
assert not self.cancelled, "Already cancelled"
self.timeout = time.time() + self.__delay
n = heap.index(self)
if n == len(heap) - 1:
heap.pop()
else:
heap[n] = heap.pop()
heapq._siftup(heap, n)
heapq.heappush(heap, self)


Moreover I have the feeling that doing such thing requires a different
code whether the new value I use as replacement is lower or higher in
comparison to the older one.
Am I right?


--- Giampaolo
http://code.google.com/p/pyftpdlib/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,781
Messages
2,569,615
Members
45,294
Latest member
LandonPigo

Latest Threads

Top