Postpone creation of attributes until needed

F

Frank Millman

Hi all

I have a small problem. I have come up with a solution, but I don't
know if it is a) safe, and b) optimal.

I have a class with a number of attributes, but for various reasons I
cannot assign values to all the attributes at __init__ time, as the
values depend on attributes of other linked classes which may not have
been created yet. I can be sure that by the time any values are
requested, all the other classes have been created, so it is then
possible to compute the missing values.

At first I initialised the values to None, and then when I needed a
value I would check if it was None, and if so, call a method which
would compute all the missing values. However, there are a number of
attributes, so it got tedious. I was looking for one trigger point
that would work in any situation. This is what I came up with.
.... __slots__ = ('x','y','z')
.... def __init__(self,x,y):
.... self.x = x
.... self.y = y
.... def __getattr__(self,name):
.... print 'getattr',name
.... if name not in self.__class__.__slots__:
.... raise AttributeError,name
.... self.z = self.x * self.y
.... return getattr(self,name)
getattr z
12getattr q
Attribute Error: q

In other words, I do not declare the unknown attributes at all. This
causes __getattr__ to be called when any of their values are
requested, and __getattr__ calls the method that sets up the
attributes and computes the values.

I use __slots__ to catch any invalid attributes, otherwise I would get
a 'maximum recursion depth exceeded' error.

Is this ok, or is there a better way?

Thanks

Frank Millman
 
P

Phil Thompson

Hi all

I have a small problem. I have come up with a solution, but I don't
know if it is a) safe, and b) optimal.

I have a class with a number of attributes, but for various reasons I
cannot assign values to all the attributes at __init__ time, as the
values depend on attributes of other linked classes which may not have
been created yet. I can be sure that by the time any values are
requested, all the other classes have been created, so it is then
possible to compute the missing values.

At first I initialised the values to None, and then when I needed a
value I would check if it was None, and if so, call a method which
would compute all the missing values. However, there are a number of
attributes, so it got tedious. I was looking for one trigger point
that would work in any situation. This is what I came up with.


... __slots__ = ('x','y','z')
... def __init__(self,x,y):
... self.x = x
... self.y = y
... def __getattr__(self,name):
... print 'getattr',name
... if name not in self.__class__.__slots__:
... raise AttributeError,name
... self.z = self.x * self.y
... return getattr(self,name)


getattr z
12


getattr q
Attribute Error: q

In other words, I do not declare the unknown attributes at all. This
causes __getattr__ to be called when any of their values are
requested, and __getattr__ calls the method that sets up the
attributes and computes the values.

I use __slots__ to catch any invalid attributes, otherwise I would get
a 'maximum recursion depth exceeded' error.

Is this ok, or is there a better way?

Properties...

@property
def z(self):
return self.x * self.y

Phil
 
F

Frank Millman

Properties...

@property
def z(self):
return self.x * self.y

In my simple example I showed only one missing attribute - 'z'. In
real life I have a number of them, so I would have to set up a
separate property definition for each of them.

With my approach, __getattr__ is called if *any* of the missing
attributes are referenced, which seems easier and requires less
maintenance if I add additional attributes.

Another point - the property definition is called every time the
attribute is referenced, whereas __getattr__ is only called if the
attribute does not exist in the class __dict__, and this only happens
once. Therefore I think my approach should be slightly quicker.

Frank
 
S

Steven D'Aprano

Hi all

I have a small problem. I have come up with a solution, but I don't
know if it is a) safe, and b) optimal.

I have a class with a number of attributes, but for various reasons I
cannot assign values to all the attributes at __init__ time, as the
values depend on attributes of other linked classes which may not have
been created yet. I can be sure that by the time any values are
requested, all the other classes have been created, so it is then
possible to compute the missing values.

Unless you're doing something like creating classes in one thread while
another thread initiates your instance, I don't understand how this is
possible.

Unless... you're doing something like this?


def MyClass(object):
def __init__(self):
self.x = Parrot.plumage # copy attributes of classes
self.y = Shrubbery.leaves


Maybe you should force the creation of the classes?

def MyClass(object):
def __init__(self):
try:
Parrot
except Some_Error_Or_Other: # NameError?
# do something to create the Parrot class
pass
self.x = Parrot.plumage
# etc.

At first I initialised the values to None, and then when I needed a
value I would check if it was None, and if so, call a method which
would compute all the missing values. However, there are a number of
attributes, so it got tedious. I was looking for one trigger point
that would work in any situation. This is what I came up with.

... __slots__ = ('x','y','z')

By using slots, you're telling Python not to reserve space for a __dict__,
which means that your class cannot create attributes on the fly.


... def __init__(self,x,y):
... self.x = x
... self.y = y
... def __getattr__(self,name):
... print 'getattr',name
... if name not in self.__class__.__slots__:
... raise AttributeError,name
... self.z = self.x * self.y
... return getattr(self,name)
[snip]

In other words, I do not declare the unknown attributes at all. This
causes __getattr__ to be called when any of their values are
requested, and __getattr__ calls the method that sets up the
attributes and computes the values.

I use __slots__ to catch any invalid attributes, otherwise I would get
a 'maximum recursion depth exceeded' error.

That's the wrong solution to that problem. To avoid that problem,
__getattr__ should write directly to self.__dict__.

Is this ok, or is there a better way?

At the interactive Python prompt:

help(property)
 
F

Frank Millman

Unless you're doing something like creating classes in one thread while
another thread initiates your instance, I don't understand how this is
possible.

I was hoping not to have to explain this, as it gets a bit complicated
(yes, I have read The Zen of Python ;-), but I will try.

I have a class that represents a database table, and another class
that represents a database column. When the application 'opens' a
table, I create an instance for the table and separate instances for
each column.

If there are foreign keys, I used to automatically open the foreign
table with its columns, and build cross-references between the foreign
key column on the first table and the primary key column on the second
table.

I found that as the database grew, I was building an increasing number
of links, most of which would never be used during that run of the
program, so I stopped doing it that way. Now I only open the foreign
table if the application requests it, but then I have to find the
original table and update it with attributes representing the link to
the new table.

It gets more complicated than that, but that is the gist of it.
By using slots, you're telling Python not to reserve space for a __dict__,
which means that your class cannot create attributes on the fly.

I understand that. In fact I was already using slots, as I was
concerned about the number of 'column' instances that could be created
in any one program, and wanted to minimise the footprint. I have since
read some of caveats regarding slots, but I am not doing anything out
of the ordinary so I feel comfortable with them so far.
That's the wrong solution to that problem. To avoid that problem,
__getattr__ should write directly to self.__dict__.

Are you saying that instead of

self.z = self.x * self.y
return getattr(self.name)

I should have

self.__dict__['z'] = self.x * self.y
return self.__dict__[name]

I tried that, but I get AttributeError: 'A' object has no attribute
'__dict__'.

Aslo, how does this solve the problem that 'name' may not be one of
the attributes that my 'compute' method sets up. Or are you saying
that, if I fixed the previous problem, it would just raise
AttributeError anyway, which is what I would want to happen.
At the interactive Python prompt:

help(property)

See my reply to Phil - I would use property if there was only one
attribute, but there are several.

Thanks

Frank
 
M

Marc 'BlackJack' Rintsch

Frank Millman said:
That's the wrong solution to that problem. To avoid that problem,
__getattr__ should write directly to self.__dict__.

Are you saying that instead of

self.z = self.x * self.y
return getattr(self.name)

I should have

self.__dict__['z'] = self.x * self.y
return self.__dict__[name]

I tried that, but I get AttributeError: 'A' object has no attribute
'__dict__'.

That's because you used `__slots__`. One of the drawbacks of `__slots__`.

Ciao,
Marc 'BlackJack' Rintsch
 
P

Peter Otten

Frank said:
I tried that, but I get AttributeError: 'A' object has no attribute
'__dict__'.

That's what you get for (ab)using __slots__ without understanding the
implications ;)

You can instead invoke the __getattr__() method of the superclass:

super(A, self).__getattr__(name)

Peter
 
G

Giles Brown

In my simple example I showed only one missing attribute - 'z'. In
real life I have a number of them, so I would have to set up a
separate property definition for each of them.

With my approach, __getattr__ is called if *any* of the missing
attributes are referenced, which seems easier and requires less
maintenance if I add additional attributes.

Another point - the property definition is called every time the
attribute is referenced, whereas __getattr__ is only called if the
attribute does not exist in the class __dict__, and this only happens
once. Therefore I think my approach should be slightly quicker.

Frank

You could treat the property access like a __getattr__ and use it
to trigger the assignment of instance variables. This would mean that
all future access would pick up the instance variables. Following a
kind
"class variable access causes instance variable creation" pattern
(anyone
know a better name for that?).

You may want to construct a little mechanism that sets up these
properties
(a loop, a list of attribute names, and a setattr on the class?).

If you've got to allow access from multiple threads and aren't happy
that
the calculations being idempotent is going to be sufficient (e.g. if
the calculations are really expensive) then you need some kind of
threading
lock in your (one and only?) lazy loading function.

Ok. Enough lunchtime diversion (I should get some fresh air).

Giles
 
S

Steven D'Aprano

I understand that. In fact I was already using slots, as I was
concerned about the number of 'column' instances that could be created
in any one program, and wanted to minimise the footprint.

Unless you have thousands and thousands of instances, __slots__ is almost
certainly not the answer. __slots__ is an optimization to minimize the
size of each instance. The fact that it prevents the creation of new
attributes is a side-effect.

I have since
read some of caveats regarding slots, but I am not doing anything out
of the ordinary so I feel comfortable with them so far.
That's the wrong solution to that problem. To avoid that problem,
__getattr__ should write directly to self.__dict__.

Are you saying that instead of

self.z = self.x * self.y
return getattr(self.name)

I should have

self.__dict__['z'] = self.x * self.y
return self.__dict__[name]

I tried that, but I get AttributeError: 'A' object has no attribute
'__dict__'.

Of course you do, because you are using __slots__ and so there is no
__dict__ attribute.

I really think you need to lose the __slots__. I don't see that it really
gives you any advantage.


Aslo, how does this solve the problem that 'name' may not be one of
the attributes that my 'compute' method sets up. Or are you saying
that, if I fixed the previous problem, it would just raise
AttributeError anyway, which is what I would want to happen.

You haven't told us what the 'compute' method is.

Or if you have, I missed it.

See my reply to Phil - I would use property if there was only one
attribute, but there are several.

Writing "several" properties isn't that big a chore, especially if they
have any common code that can be factored out.

Another approach might be to create a factory-function that creates the
properties for you, so you just need to call it like this:

class MyClass(object):
x = property_maker(database1, tableX, 'x', other_args)
y = property_maker(database2, tableY, 'y', other_args)
# blah blah blah

def property_maker(database, table, name, args):
def getx(self):
return getattr(database
, name) # or whatever...
def setx(self, value):
setattr(database
, name, value)
return property(getx, setx, None, "Some doc string")
 
F

Frank Millman

Unless you have thousands and thousands of instances, __slots__ is almost
certainly not the answer. __slots__ is an optimization to minimize the
size of each instance. The fact that it prevents the creation of new
attributes is a side-effect.

Understood - I am getting there slowly.

I now have the following -
.... def __init__(self,x,y):
.... self.x = x
.... self.y = y
.... def __getattr__(self,name):
.... print 'getattr',name
.... self.compute()
.... return self.__dict__[name]
.... def compute(self): # compute all missing attributes
.... self.__dict__['z'] = self.x * self.y
[there could be many of these]
getattr z
12KeyError: 'q'

The only problem with this is that it raises KeyError instead of the
expected AttributeError.
You haven't told us what the 'compute' method is.

Or if you have, I missed it.

Sorry - I made it more explicit above. It is the method that sets up
all the missing attributes. No matter which attribute is referenced
first, 'compute' sets up all of them, so they are all available for
any future reference.

To be honest, it feels neater than setting up a property for each
attribute.

I would prefer it if there was a way of raising AttributeError instead
of KeyError. I suppose I could do it manually -

try:
return self.__dict__[name]
except KeyError:
raise AttributeError,name

Frank
 
G

George Sakkis

Unless you have thousands and thousands of instances, __slots__ is almost
certainly not the answer. __slots__ is an optimization to minimize the
size of each instance. The fact that it prevents the creation of new
attributes is a side-effect.

Understood - I am getting there slowly.

I now have the following -

... def __init__(self,x,y):
... self.x = x
... self.y = y
... def __getattr__(self,name):
... print 'getattr',name
... self.compute()
... return self.__dict__[name]
... def compute(self): # compute all missing attributes
... self.__dict__['z'] = self.x * self.y
[there could be many of these]

getattr z
12>>> a.z
12
KeyError: 'q'

The only problem with this is that it raises KeyError instead of the
expected AttributeError.


You haven't told us what the 'compute' method is.
Or if you have, I missed it.

Sorry - I made it more explicit above. It is the method that sets up
all the missing attributes. No matter which attribute is referenced
first, 'compute' sets up all of them, so they are all available for
any future reference.

To be honest, it feels neater than setting up a property for each
attribute.

I don't see why this all-or-nothing approach is neater; what if you
have a hundred expensive computed attributes but you just need one ?
Unless you know this never happens in your specific situation because
all missing attributes are tightly coupled, properties are a better
way to go. The boilerplate code can be minimal too with an appropriate
decorator, something like:

class A(object):

def __init__(self,x,y):
self.x = x
self.y = y

@cachedproperty
def z(self):
return self.x * self.y


where cachedproperty is

def cachedproperty(func):
name = '__' + func.__name__
def wrapper(self):
try: return getattr(self, name)
except AttributeError: # raised only the first time
value = func(self)
setattr(self, name, value)
return value
return property(wrapper)


HTH,

George
 
F

Frank Millman

I don't see why this all-or-nothing approach is neater; what if you
have a hundred expensive computed attributes but you just need one ?
Unless you know this never happens in your specific situation because
all missing attributes are tightly coupled, properties are a better
way to go.

It so happens that this is my specific situation. I can have a foreign
key column in one table with a reference to a primary key column in
another table. I have for some time now had the ability to set up a
pseudo-column in the first table with a reference to an alternate key
column in the second table, and this requires various attributes to be
set up. I have recently extended this concept where the first table
can have a pseudo-column pointing to a column in the second table,
which is in turn a pseudo-column pointing to a column in a third
table. This can chain indefinitely provided that the end of the chain
is a real column in the final table.

My problem is that, when I create the first pseudo-column, the target
column, also pseudo, does not exist yet. I cannot call it recursively
due to various other complications. Therefore my solution was to wait
until I need it. Then the first one makes a reference to the second
one, which in turn realises that in needs a reference to the third
one, and so on. So it is recursive, but at execution-time, not at
instantiation-time.

Hope this makes sense.
The boilerplate code can be minimal too with an appropriate
decorator, something like:

class A(object):

def __init__(self,x,y):
self.x = x
self.y = y

@cachedproperty
def z(self):
return self.x * self.y

where cachedproperty is

def cachedproperty(func):
name = '__' + func.__name__
def wrapper(self):
try: return getattr(self, name)
except AttributeError: # raised only the first time
value = func(self)
setattr(self, name, value)
return value
return property(wrapper)

This is very neat, George. I will have to read it a few more times
before I understand it properly - I still have not fully grasped
decorators, as I have not yet had a need for them.

Actually I did spend a bit of time trying to understand it before
posting, and I have a question.

It seems that this is now a 'read-only' attribute, whose value is
computed by the function the first time, and after that cannot be
changed. It would probably suffice for my needs, but how easy would it
be to convert it to read/write?

Thanks

Frank
 
G

George Sakkis

This is very neat, George. I will have to read it a few more times
before I understand it properly - I still have not fully grasped
decorators, as I have not yet had a need for them.

You never *need* decorators, in the sense it's just syntax sugar for
things you might do without them, but they're handy once you get your
head around them.
Actually I did spend a bit of time trying to understand it before
posting, and I have a question.

It seems that this is now a 'read-only' attribute, whose value is
computed by the function the first time, and after that cannot be
changed. It would probably suffice for my needs, but how easy would it
be to convert it to read/write?

It's straightforward, just define a setter wrapper and pass it in the
property along with the getter:

def cachedproperty(func):
name = '__' + func.__name__
def getter(self):
try: return getattr(self, name)
except AttributeError: # raised only the first time
value = func(self)
setattr(self, name, value)
return value
def setter(self, value):
setattr(self, name, value)
return property(getter,setter)


HTH,

George
 
F

Frank Millman

You never *need* decorators, in the sense it's just syntax sugar for
things you might do without them, but they're handy once you get your
head around them.



It's straightforward, just define a setter wrapper and pass it in the
property along with the getter:

def cachedproperty(func):
name = '__' + func.__name__
def getter(self):
try: return getattr(self, name)
except AttributeError: # raised only the first time
value = func(self)
setattr(self, name, value)
return value
def setter(self, value):
setattr(self, name, value)
return property(getter,setter)

Wonderful - this is very educational for me :)

Thanks very much

Frank
 
S

Steven Bethard

George said:
Unless you have thousands and thousands of instances, __slots__ is almost
certainly not the answer. __slots__ is an optimization to minimize the
size of each instance. The fact that it prevents the creation of new
attributes is a side-effect.
Understood - I am getting there slowly.

I now have the following -
class A(object):
... def __init__(self,x,y):
... self.x = x
... self.y = y
... def __getattr__(self,name):
... print 'getattr',name
... self.compute()
... return self.__dict__[name]
... def compute(self): # compute all missing attributes
... self.__dict__['z'] = self.x * self.y
[there could be many of these]
a = A(3,4)
a.x 3
4
a.z
getattr z
12>>> a.z
12
KeyError: 'q'

The only problem with this is that it raises KeyError instead of the
expected AttributeError.


You haven't told us what the 'compute' method is.
Or if you have, I missed it.
Sorry - I made it more explicit above. It is the method that sets up
all the missing attributes. No matter which attribute is referenced
first, 'compute' sets up all of them, so they are all available for
any future reference.

To be honest, it feels neater than setting up a property for each
attribute.

I don't see why this all-or-nothing approach is neater; what if you
have a hundred expensive computed attributes but you just need one ?
Unless you know this never happens in your specific situation because
all missing attributes are tightly coupled, properties are a better
way to go. The boilerplate code can be minimal too with an appropriate
decorator, something like:

class A(object):

def __init__(self,x,y):
self.x = x
self.y = y

@cachedproperty
def z(self):
return self.x * self.y


where cachedproperty is

def cachedproperty(func):
name = '__' + func.__name__
def wrapper(self):
try: return getattr(self, name)
except AttributeError: # raised only the first time
value = func(self)
setattr(self, name, value)
return value
return property(wrapper)

And, if you don't want to go through the property machinery every time,
you can use a descriptor that only calls the function the first time:
.... def __init__(self, func):
.... self.func = func
.... def __get__(self, obj, cls=None):
.... if obj is None:
.... return self
.... else:
.... value = self.func(obj)
.... setattr(obj, self.func.__name__, value)
.... return value
........ def __init__(self, x, y):
.... self.x = x
.... self.y = y
.... @Once
.... def z(self):
.... print 'calculating z'
.... return self.x * self.y
....calculating z
66

With this approach, the first time 'z' is accessed, there is no
instance-level 'z', so the descriptor's __get__ method is invoked. That
method creates an instance-level 'z' so that every other time, the
instance-level attribute is used (and the __get__ method is no longer
invoked).

STeVe
 
S

Steven D'Aprano

I now have the following -
... def __init__(self,x,y):
... self.x = x
... self.y = y
... def __getattr__(self,name):
... print 'getattr',name
... self.compute()
... return self.__dict__[name]
... def compute(self): # compute all missing attributes
... self.__dict__['z'] = self.x * self.y
[there could be many of these]
getattr z
12KeyError: 'q'

The only problem with this is that it raises KeyError instead of the
expected AttributeError.


Yes, because you never assign __dict__['q'].

Sorry - I made it more explicit above. It is the method that sets up
all the missing attributes. No matter which attribute is referenced
first, 'compute' sets up all of them, so they are all available for
any future reference.


If you're going to do that, why not call compute() from your __init__ code
so that initializing an instance sets up all the attributes? That way you
can remove all the __getattr__ code. Sometimes avoiding the problem is
better than solving the problem.
 
F

Frank Millman

If you're going to do that, why not call compute() from your __init__ code
so that initializing an instance sets up all the attributes?

Because, as I have tried to explain elsewhere (probably not very
clearly), not all the information required to perform compute() is
available at __init__ time.

I have gained a lot of valuable advice from this thread, but I do have
a final question.

Every respondent has tried to nudge me away from __getattr__() and
towards property(), but no-one has explained why. What is the downside
of my approach? And if this is not a good case for using
__getattr__(), what is? What kind of situation is it intended to
address?

Thanks

Frank
 
S

Steven D'Aprano

Because, as I have tried to explain elsewhere (probably not very
clearly), not all the information required to perform compute() is
available at __init__ time.

I'm sorry, but this explanation doesn't make sense to me.

Currently, something like this happens:

(1) the caller initializes an instance
=> instance.x = some known value
=> instance.y is undefined
(2) the caller tries to retrieve instance.y
(3) which calls instance.__getattr__('y')
(4) which calls instance.compute()
=> which forces the necessary information to be available
=> instance.__dict__['y'] = some value
(5) finally returns a value for instance.y

Since, as far as I can tell, there is no minimum time between creating the
instance at (1) and trying to access instance.y at (2), there is no
minimum time between (1) and calling compute() at (4), except for the
execution time of the steps between them. So why not just make compute()
the very last thing that __init__ does?


I have gained a lot of valuable advice from this thread, but I do have
a final question.

Every respondent has tried to nudge me away from __getattr__() and
towards property(), but no-one has explained why.

Not me! I'm trying to nudge you away from the entire approach!


What is the downside of my approach?

It is hard to do at all, harder to do right, more lines of code, more bugs
to fix, slower to write and slower to execute.

And if this is not a good case for using
__getattr__(), what is? What kind of situation is it intended to
address?


Delegation is probably the poster-child for the use of __getattr__. Here's
a toy example: a list-like object that returns itself when you append to
it, without sub-classing.

class MyList:
def __init__(self, *args):
self.__dict__['data'] = list(args)
def __getattr__(self, attr):
return getattr(self.data, attr)
def __setattr__(self, attr, value):
return setattr(self.data, attr, value)
def append(self, value):
self.data.append(value)
return self
 
G

Gabriel Genellina

En Tue, 12 Jun 2007 08:18:40 -0300, Steven D'Aprano
Because, as I have tried to explain elsewhere (probably not very
clearly), not all the information required to perform compute() is
available at __init__ time.

I'm sorry, but this explanation doesn't make sense to me.

Currently, something like this happens:

(1) the caller initializes an instance
=> instance.x = some known value
=> instance.y is undefined
(2) the caller tries to retrieve instance.y
(3) which calls instance.__getattr__('y')
(4) which calls instance.compute()
=> which forces the necessary information to be available
=> instance.__dict__['y'] = some value
(5) finally returns a value for instance.y

Since, as far as I can tell, there is no minimum time between creating
the
instance at (1) and trying to access instance.y at (2), there is no
minimum time between (1) and calling compute() at (4), except for the
execution time of the steps between them. So why not just make compute()
the very last thing that __init__ does?

As far as I understand what the OP said, (2) may never happen. And since
(4) is expensive, it is avoided until it is actually required.
 
G

Gabriel Genellina

En Tue, 12 Jun 2007 08:18:40 -0300, Steven D'Aprano
Because, as I have tried to explain elsewhere (probably not very
clearly), not all the information required to perform compute() is
available at __init__ time.

I'm sorry, but this explanation doesn't make sense to me.

Currently, something like this happens:

(1) the caller initializes an instance
=> instance.x = some known value
=> instance.y is undefined
(2) the caller tries to retrieve instance.y
(3) which calls instance.__getattr__('y')
(4) which calls instance.compute()
=> which forces the necessary information to be available
=> instance.__dict__['y'] = some value
(5) finally returns a value for instance.y

Since, as far as I can tell, there is no minimum time between creating
the
instance at (1) and trying to access instance.y at (2), there is no
minimum time between (1) and calling compute() at (4), except for the
execution time of the steps between them. So why not just make compute()
the very last thing that __init__ does?

As far as I understand what the OP said, (2) may never happen. And since
(4) is expensive, it is avoided until it is actually required.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top