dictionary that discards old items

W

Will McGugan

Hi folks,

I need a collection class that behaves like a dictionary but when it
reaches 'n' items it discards the oldest item so that the length never
goes above 'n'. (Its for caching search results)

I have a vague memory of a module that does this, but I cant remember
where I read about it. Can anyone enlighten me?


Regards,

Will McGugan
 
M

Michael Hoffman

Will said:
I need a collection class that behaves like a dictionary but when it
reaches 'n' items it discards the oldest item so that the length never
goes above 'n'. (Its for caching search results)

I have a vague memory of a module that does this, but I cant remember
where I read about it. Can anyone enlighten me?

You want a Least Recently Used, or LRU, cache. Here's one:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/252524

Google for <LRU python> to find others.
 
R

Raymond Hettinger

[Will McGugan]
I need a collection class that behaves like a dictionary but when it
reaches 'n' items it discards the oldest item so that the length never
goes above 'n'. (Its for caching search results)


import collections

class Cache(dict):
def __init__(self, n, *args, **kwds):
self.n = n
self.queue = collections.deque()
dict.__init__(self, *args, **kwds)
def __setitem__(self, k, v):
self.queue.append(k)
dict.__setitem__(self, k, v)
if len(self) > self.n:
oldk = self.queue.popleft()
del self[oldk]

# . . .
# make similar modifications to setdefault, __delitem__, fromkeys
# and other mutating methods as needed

# Example
c = Cache(3)
for w in 'the quick brown fox jumped over the lazy dog'.split():
c[w] = w[:1].upper()
print repr(c)
 
G

gene tani

not clear if you want a FIFO, or a LRU on read or write, so for the
non-dictionary component of this, you could also ( (search Cookbook)
|| google) for terms like "ring buffer", "fixed-size cache", um,
"bounded queue", "circular buffer", "deque", there's probably a bunch
of others.
 
M

Michael Hoffman

Will said:
Thanks. I found the one I saw originally in the Python Cookbook. Only
they call it a FIFO cache.

A FIFO cache is different, as gene tani points out. You need to consider
which one it is you want.
 
B

Bengt Richter

[Will McGugan]
I need a collection class that behaves like a dictionary but when it
reaches 'n' items it discards the oldest item so that the length never
goes above 'n'. (Its for caching search results)


import collections

class Cache(dict):
def __init__(self, n, *args, **kwds):
self.n = n
self.queue = collections.deque()
dict.__init__(self, *args, **kwds)
def __setitem__(self, k, v):
self.queue.append(k)
dict.__setitem__(self, k, v)
if len(self) > self.n:
oldk = self.queue.popleft()
del self[oldk]

# . . .
# make similar modifications to setdefault, __delitem__, fromkeys
# and other mutating methods as needed

# Example
c = Cache(3)
for w in 'the quick brown fox jumped over the lazy dog'.split():
c[w] = w[:1].upper()
print repr(c)

Minor comment: There is a potential name collision problem for keyword n=something,
so what is considered best practice to avoid that? __n or such as the n arg?
Traceback (most recent call last):

Regards,
Bengt Richter
 
S

Steven Bethard

[Raymond Hettinger]
[Bengt Richter]
Minor comment: There is a potential name collision problem for keyword n=something,
so what is considered best practice to avoid that? __n or such as the n arg?

I don't know what best practice is, but if you want to guarantee to
avoid the name collision, you can write:

def __init__(*args, **kwargs):
self = args[0]
self.n = args[1]
self.queue = collections.deque()
dict.__init__(self, *args[2:], **kwargs)

It's not pretty though. ;)

STeVe
 
B

Bengt Richter

[Raymond Hettinger]
[Bengt Richter]
Minor comment: There is a potential name collision problem for keyword n=something,
so what is considered best practice to avoid that? __n or such as the n arg?

I don't know what best practice is, but if you want to guarantee to
avoid the name collision, you can write:

def __init__(*args, **kwargs):
self = args[0]
self.n = args[1]
self.queue = collections.deque()
dict.__init__(self, *args[2:], **kwargs)

It's not pretty though. ;)
Bullet proofing doesn't have to be ;-)
Best solution I've seen yet, thanks.

Regards,
Bengt Richter
 
R

Raymond Hettinger

[Raymond Hettinger]

[Bengt Richter]
Minor comment: There is a potential name collision problem for keyword n=something,
so what is considered best practice to avoid that? __n or such as the n arg?

One solution is to drop the *args and **kwds part of the __init__() API
for the subclass. For a cache class, it doesn't make sense that you
would know all of the values when the instance is first created. If
the need arises, the update() method will suffice:

class Cache(dict):
def __init__(self, n):
self.n = n
dict.__init__(self)
. . .


The __n form doesn't get name mangled so its only advantage is in being
a less likely key.


Raymond
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top