how to keep collection of existing instances and return one oninstantiation

M

marduk

I couldn't think of a good subject..

Basically, say I have a class

class Spam:
def __init__(self, x):
self.x = x


then if I create two instances:

a = Spam('foo')
b = Spam('foo')

a == b # False

What I *really* want is to keep a collection of all the Spam instances,
and if i try to create a new Spam instance with the same contructor
parameters, then return the existing Spam instance. I thought new-style
classes would do it:

class Spam(object):
cache = {}
def __new__(cls, x):
if cls.cache.has_key(x):
return cls.cache[x]
def __init__(self, x):
self.x = x
self.cache[x] = self

a = Spam('foo')
b = Spam('foo')

Well, in this case a and b are identical... to None! I assume this is
because the test in __new__ fails so it returns None, I need to then
create a new Spam.. but how do I do that without calling __new__ again?
I can't call __init__ because there's no self...

So what is the best/preferred way to do this?
 
D

Diez B. Roggisch

J

Jonathan LaCour

class Spam(object):
cache = {}
def __new__(cls, x):
if cls.cache.has_key(x):
return cls.cache[x]
def __init__(self, x):
self.x = x
self.cache[x] = self

a = Spam('foo')
b = Spam('foo')

Well, in this case a and b are identical... to None! I assume this is
because the test in __new__ fails so it returns None, I need to then
create a new Spam.. but how do I do that without calling __new__
again?
I can't call __init__ because there's no self...

Oops, you forgot to return object.__new__(cls, x) in the case the
object isn't in the cache. That should fix it.

Jonathan
http://cleverdevil.org
 
P

Peter Otten

marduk said:
What I *really* want is to keep a collection of all the Spam instances,
and if i try to create a new Spam instance with the same contructor
parameters, then return the existing Spam instance. I thought new-style
classes would do it:

class Spam(object):
cache = {}
def __new__(cls, x):
if cls.cache.has_key(x):
return cls.cache[x]

On cache misses you implicitly return None. But your __init__() method will
only be called if __new__() returns a Spam instance.
def __init__(self, x):
self.x = x
self.cache[x] = self

a = Spam('foo')
b = Spam('foo')

Well, in this case a and b are identical... to None! I assume this is
because the test in __new__ fails so it returns None, I need to then
create a new Spam.. but how do I do that without calling __new__ again?
I can't call __init__ because there's no self...

So what is the best/preferred way to do this?

class Spam(object):
cache = {}
def __new__(cls, x):
try:
inst = cls.cache[x]
print "from cache"
except KeyError:
cls.cache[x] = inst = object.__new__(cls)
print "new instance"
return inst # always return a Spam instance

def __init__(self, x):
# put one-off initialization into __new__() because __init__()
# will be called with instances from cache hits, too.
print "init", x

a = Spam('foo')
b = Spam('foo')

print a, b, a is b

Peter
 
M

marduk

Oops, you forgot to return object.__new__(cls, x) in the case the
object isn't in the cache. That should fix it.

Ahh, that did it. I didn't even think of calling object...

so the new class looks like:

class Spam(object):
cache = {}
def __new__(cls, x):
if cls.cache.has_key(x):
return cls.cache[x]
else:
new_Spam = object.__new__(cls, x)
cls.cache[x] = new_Spam
return new_Spam
def __init__(self, x):
self.x = x

a = Spam(2)
b = Spam(2)

a == b # => True
id(a) == id(b) # => True

Thanks for all your help.
 
M

marduk

Use the BORG-pattern. See


http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66531

Together with your caching, that should do the trick.

I looked at the Borg Pattern, but I don't think it was exactly what I
want.

The Borg patten appears to be if you want multiple instances that point
to the same "data".

What I wanted is multiple calls to create a new object with the same
parameters points to the "original" object instead of creating a new
one.
 
M

marduk

class Spam(object):
cache = {}
def __new__(cls, x):
if cls.cache.has_key(x):
return cls.cache[x]
def __init__(self, x):
self.x = x
self.cache[x] = self

a = Spam('foo')
b = Spam('foo')

Well, in this case a and b are identical... to None! I assume this is
because the test in __new__ fails so it returns None, I need to then
create a new Spam.. but how do I do that without calling __new__
again?
I can't call __init__ because there's no self...

Oops, you forgot to return object.__new__(cls, x) in the case the
object isn't in the cache. That should fix it.

Okay, one more question... say I then

c = Spam('bar')
del a
del b

I've removed all references to the object, except for the cache. Do I
have to implement my own garbage collecting is or there some "magical"
way of doing this within Python? I pretty much want to get rid of the
cache as soon as there are no other references (other than the cache).
 
P

Peter Otten

marduk said:
class Spam(object):
cache = {}
def __new__(cls, x):
if cls.cache.has_key(x):
return cls.cache[x]
def __init__(self, x):
self.x = x
self.cache[x] = self

a = Spam('foo')
b = Spam('foo')

Well, in this case a and b are identical... to None! I assume this is
because the test in __new__ fails so it returns None, I need to then
create a new Spam.. but how do I do that without calling __new__
again?
I can't call __init__ because there's no self...

Oops, you forgot to return object.__new__(cls, x) in the case the
object isn't in the cache. That should fix it.

Okay, one more question... say I then

c = Spam('bar')
del a
del b

I've removed all references to the object, except for the cache. Do I
have to implement my own garbage collecting is or there some "magical"
way of doing this within Python? I pretty much want to get rid of the
cache as soon as there are no other references (other than the cache).

Use a weakref.WeakValueDictionary as the cache instead of a normal dict.

Peter
 
L

Laszlo Zsolt Nagy

I've removed all references to the object, except for the cache. Do I
have to implement my own garbage collecting is or there some "magical"
way of doing this within Python? I pretty much want to get rid of the
cache as soon as there are no other references (other than the cache).
Store weak references to instances.

from weakref import ref

class Spam(object):
cache = {}
def __new__(cls, x):
instance = None
if cls.cache.has_key(x):
instance = cls.cache[x]()
if instance is None:
instance = object.__new__(cls, x)
cls.cache[x] = ref(instance )
return instance
def __init__(self, x):
self.x = x

Then:

Well, of course this is still not thread safe, and weak references will
use some memory (but it can be much less expensive).
You can grabage collect dead weak references periodically, if you wish.

Best,

Les
 
D

Diez B. Roggisch

I looked at the Borg Pattern, but I don't think it was exactly what I
want.

The Borg patten appears to be if you want multiple instances that point
to the same "data".

What I wanted is multiple calls to create a new object with the same
parameters points to the "original" object instead of creating a new
one.

Read the comments. What you say is essentially the same - the data
matters, after all. What do you care if there are several instances around?

Diez
 
M

marduk

Read the comments. What you say is essentially the same - the data
matters, after all. What do you care if there are several instances
around?

Diez

In my case it matters more that the objects are the same.

For example I want set([Spam(1), Spam(2),
Spam(3)]).intersect(set([Spam(1), Spam(2)]) to contain two items instead
of 0.

For this and many other reasons it's important that Spam(n) is Spam(n).
 
M

marduk

Use a weakref.WeakValueDictionary as the cache instead of a normal
dict.

Peter

Thanks for the reference to the weakref module. Until now I've never
had a use for it, but it sounds like what I'm looking for.

-m
 
D

Diez B. Roggisch

Read the comments. What you say is essentially the same - the data
matters, after all. What do you care if there are several instances
around?
In my case it matters more that the objects are the same.

For example I want set([Spam(1), Spam(2),
Spam(3)]).intersect(set([Spam(1), Spam(2)]) to contain two items instead
of 0.

For this and many other reasons it's important that Spam(n) is Spam(n).

Ah, ok. Well, you could always use the __hash__ method to ensure that -
might be better anyway, because then _you_ define what equality means.
But YMMV.

Diez
 
D

Diez B. Roggisch

Diez said:
Read the comments. What you say is essentially the same - the data
matters, after all. What do you care if there are several instances
around?
In my case it matters more that the objects are the same.

For example I want set([Spam(1), Spam(2),
Spam(3)]).intersect(set([Spam(1), Spam(2)]) to contain two items instead
of 0.

For this and many other reasons it's important that Spam(n) is Spam(n).


Ah, ok. Well, you could always use the __hash__ method to ensure that -
might be better anyway, because then _you_ define what equality means.
But YMMV.

And the __cmp__ or __eq__/__ne__ methdos of course....

Diez
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top