Self reordering list in Python

L

Laszlo Zsolt Nagy

Hello,

Do you know how to implement a really efficient self reordering list in
Python? (List with a maximum length. When an item is processed, it
becomes the first element in the list.) I would like to use this for
caching of rendered images. Of course I could implement this in pure
Python, I just wonder if there is a faster implementation that uses some
cool feature of the standard library. (Maybe a C implementation could be
added to the collections module?)

Les
 
T

Thomas Guettler

Am Thu, 15 Sep 2005 15:14:09 +0200 schrieb Laszlo Zsolt Nagy:
Hello,

Do you know how to implement a really efficient self reordering list in
Python? (List with a maximum length. When an item is processed, it
becomes the first element in the list.) I would like to use this for
caching of rendered images. Of course I could implement this in pure
Python, I just wonder if there is a faster implementation that uses some
cool feature of the standard library. (Maybe a C implementation could be
added to the collections module?)

Hi,

Maybe the bisect module is what you need:

"This module provides support for maintaining a list in sorted order
without having to sort the list after each insertion."

HTH,
Thomas
 
J

Jules Dubois

Do you know how to implement a really efficient self reordering list in
Python?
Yes.

(List with a maximum length. When an item is processed, it
becomes the first element in the list.)

Search for "move to front list". If you want a much bigger improvement in
speed -- O(N * log N) instead of O(N**2) -- search for "splay tree"; of
course, the algorithm is more complex and the optimized algorithm is even
more complex.
Of course I could implement this in pure Python, I just wonder if there is
a faster implementation that uses some cool feature of the standard
library. (Maybe a C implementation could be added to the collections
module?)

Yes, you could write a C-language extension for more speed.
 
K

Kay Schluehr

Laszlo said:
Hello,

Do you know how to implement a really efficient self reordering list in
Python? (List with a maximum length. When an item is processed, it
becomes the first element in the list.) I would like to use this for
caching of rendered images.

I wonder why you don't use a dictionary? The only time I used a
move-front algorithm I stored algorithms and had to evaluate a
condition to select the correct algo. That means no constant search key
was available for accessing the correct one. In case of an image list I
would implement a self-ordering list if I have to do some pattern
recognition in order to select the correct image. Holding a reference
as a search key a Python hash-table will always have a better average
time complexity no matter which language is used to implement
move-front. In either way I'd use Python as an implementation language.


Kay
 
L

Laszlo Zsolt Nagy

I wonder why you don't use a dictionary? The only time I used a
move-front algorithm I stored algorithms and had to evaluate a
condition to select the correct algo. That means no constant search key
was available for accessing the correct one. In case of an image list I
would implement a self-ordering list if I have to do some pattern
recognition in order to select the correct image. Holding a reference
as a search key a Python hash-table will always have a better average
time complexity no matter which language is used to implement
move-front.
You are right in that holding a reference will have a better time
complexity. But holding a reference makes it impossible to free the
object. :) As I said, my list has a maximum length. I just can't store
any number of images in memory. I need to keep only the most frequently
used ones. I do not know which ones will be used the most frequently,
this is why I need a self reordering list. Accessing an element makes it
"more imporant" than the others. I already implemented this in Python
and it was ten times faster compared to the original version. (200
frames per sec instead of 20 fps)

Probably my problem was a "no-problem". I realized that it does not
matter how fast is my list. The most time consuming part is still the
rendering of the images that are not in the cache. I need to speed up
rendering and have more RAM, of course. :)

Les
 
T

Terry Hancock

You are right in that holding a reference will have a better time
complexity. But holding a reference makes it impossible to free the
object. :) As I said, my list has a maximum length. I just can't store
any number of images in memory. I need to keep only the most frequently
used ones. I do not know which ones will be used the most frequently,
this is why I need a self reordering list. Accessing an element makes it
"more imporant" than the others. I already implemented this in Python
and it was ten times faster compared to the original version. (200
frames per sec instead of 20 fps)

This is actually the use-case for an "LRU cache"="least recently used
cache", which is probably already implemented in Python somewhere (or
even as a fast extension). I'd do a google search for it. It almost
certainly would use a dictionary interface within Python, ISTM.

Cheers,
Terry
 
F

Fredrik Lundh

Terry said:
This is actually the use-case for an "LRU cache"="least recently used
cache", which is probably already implemented in Python somewhere (or
even as a fast extension). I'd do a google search for it.

reposted, in case there are more people who cannot be bothered
to read other replies before posting:

http://www.python.org/pypi/lrucache/0.2

(it's a straight-forward and rather flexible implementation, which uses
a dictionary to quickly find existing objects, and a heap to keep track
of access times. if you have more control over the stuff that goes into
the cache, you can strip things down to 10-20 lines of code and code
it in about the same time it took me to write this post)

</F>
 
A

ABO

LRU caches are nice and simple, but if you want something fancier, with
support for squid-like expiry models (ie, using mtime and atime to
estimate a "stale time", and IMS fetches), you can have a look at my
GCache;

http://minkirri.apana.org.au/~abo/projects/GCache

Even if you don't want something that fancy, it uses a PQueue priority
queue to achieve exactly what you want. I provide a pure-python
implementation using bisect, but recommend a C extension module with
the same name by Andrew Snare which uses a fibonacci heap
(python-pqueue is the Debian Package).

Note that python 2.3 introduced a heapq module that does for queue
lists what bisect does for heap lists. I am planning to modify my
PQueue to use it instead of bisect.
 
A

ABO

Actually, after posting this I did some more work on the PQueue modules
I had, implementing both bisect and heapq versions. It turns out the
bisect version is heaps faster if you ever delete or re-set values in
the queue.

The problem is heapq is O(N) for finding a particular entry in the
Queue, and any time you change or remove something from it you need to
heapify it again, which is also O(N).

Andew Snare has a C PQueue extension module that is heaps faster from
all angles. It uses a fibonacci heap and gets O(lg N) deletes and
re-sets. I think it does this by using the dict to speed finding
entries it in the heap, and uses some properties of the heap to "assign
lower" efficiently.

The queue used in the lrucache will also suffer from the O(N) problem
when deleting or reseting values in the cache.
 
Z

zooko

I've implemented such an LRU Cache in Python. My technique was to
weave a doubly-linked list into the dict, so that it is O(dict) for all
LRU operations. I benchmarked it against someone's Python-list-based
implementation from the ActiveState cookbook and noted that on my
machine the better constant factors of the Python list win out when the
list is cache contains fewer than about 16000 elements. Of course,
once you exceed that cross-over point, the asymptotically worse
behavior of the list-based implementation becomes a big factor. If you
have more than 16000 or so elements then you really oughtn't use a
list-based LRU cache.

http://zooko.com/repos/pyutil/pyutil/pyutil/cache.py

I haven't benchmarked it against Evan Podromou's heap implementation
yet, but obviously inserting and removing things from a heapq heap is
O(N).

You can find unit tests and benchmarking tools in the pyutil/test
directory.

Regards,

Zooko

P.S. I read this list sporadically, so if you want me to read your
response, please Cc: (e-mail address removed). Thanks.
 
P

Paul Rubin

zooko said:
I haven't benchmarked it against Evan Podromou's heap implementation
yet, but obviously inserting and removing things from a heapq heap is
O(N).

Good heavens, I should hope not. The whole point of heaps is that
those operations are O(log(N)).
 
Z

zooko

Paul said:
Good heavens, I should hope not. The whole point of heaps is that
those operations are O(log(N)).

Oh, you are right -- Python's heapq implementation does not naively do
list.pop(0).

How silly of me to think that Kevin O'Connor, Tim Peters, and Raymond
Hettinger would be so silly.

Regards,

Zooko
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top