Magic function

dg.google.groups · Jan 11, 2008

Hi all,

I'm part of a small team writing a Python package for a scientific
computing project. The idea is to make it easy to use for relatively
inexperienced programmers. As part of that aim, we're using what we're
calling 'magic functions', and I'm a little bit concerned that they
are dangerous code. I'm looking for advice on what the risks are (e.g.
possibility of introducing subtle bugs, code won't be compatible with
future versions of Python, etc.).

Quick background: Part of the way our package works is that you create
a lot of objects, and then you create a new object which collects
together these objects and operates on them. We originally were
writing things like:

obj1 = Obj(params1)
obj2 = Obj(params2)
....
bigobj = Bigobj(objects=[obj1,obj2])
bigobj.run()

This is fine, but we decided that for clarity of these programs, and
to make it easier for inexperienced programmers, we would like to be
able to write something like:

obj1 = Obj(params1)
obj2 = Obj(params2)
....
run()

The idea is that the run() function inspects the stack, and looks for
object which are instances of class Obj, creates a Bigobj with those
objects and calls its run() method.

So, any comments on that approach?

I'm including the code I've written to do this, and if you have time
to look through it, I'd also be very grateful for any more specific
comments about the way I've implemented it (in particular, can it be
made faster, is my program creating cycles that stop the garbage
collection from working, etc.). I hope the code will be formatted
correctly:

def
getInstances(instancetype,level=1,includeglobals=True,containersearchdepth=1,exclude={},predicate=lambda
x:True):
"""Find all instances of a given class at a given level in the
stack
"""
vars = {}
# Note: we use level+1 because level refers to the level relative
to the function calling this one
if includeglobals: vars.update(stack()[level+1][0].f_globals)
vars.update(stack()[level+1][0].f_locals)
# Note that you can't extract the names from vars.itervalues() so
we provide via knownnames the names vars.iterkeys(),
# containersearchdepth+1 is used because vars.itervalues() is the
initial container from the point of view of this
# function, but not from the point of view of the person calling
getInstances
objs, names =
extractInstances(instancetype,vars.itervalues(),containersearchdepth
+1,knownnames=vars.iterkeys(),exclude=exclude,predicate=predicate)
return (objs,names)

def
extractInstances(instancetype,container,depth,containingname='vars()',knownnames=None,exclude={},predicate=lambda
x:True):
if depth<=0: return ([],[])
if isinstance(container,str): return ([],[]) # Assumption: no need
to search through strings
# Ideally, this line wouldn't be here, but it seems to cause
programs to crash, probably because
# some of the simulator objects are iterable but shouldn't be
iterated over normally
# TODO: Investigate what is causing this to crash, and possibly
put in a global preference to turn this line off?
if not isinstance(container,
(list,tuple,dict,type({}.itervalues()))): return ([],[])
# Note that knownnames is only provided by the initial call of
extractInstances and the known
# names are from the dictionary of variables. After the initial
call, names can only come from
# the __name__ attribute of a variable if it has one, and that is
checked explicitly below
if knownnames is None:
knewnames = False
knownnames = repeat(containingname)
else:
knewnames = True
objs = []
names = []
try: # container may not be a container, if it isn't, we'll
encounter a TypeError
for x,name in zip(container,knownnames):
# Note that we always have a name variable defined, but if
knewnames=False then this is just
# a copy of containingname, so the name we want to give it
in this instance is redefined in this
# case. We have to use this nasty check because we want to
iterate over the pair (x,name) as
# variables in the same position in the container have the
same name, and we can't necessarily
# use __getitem__
if hasattr(x,'__name__'): name = x.__name__
elif not knewnames: name = 'Unnamed object, id =
'+str(id(x))+', contained in: '+containingname
if isinstance(x,instancetype):
if x not in exclude and predicate(x):
objs.append(x)
names.append(name)
else: # Assumption: an object of the instancetype is not
also a container we want to search in.
# Note that x may not be a container, but then
extractInstances will just return an empty list
newobjs, newnames =
extractInstances(instancetype,x,depth-1,containingname=name,predicate=predicate)
objs += newobjs
names += newnames
return (objs,names)
except: # if we encounter a TypeError from the for loop, we just
return an empty pair, container wasn't a container
return ([],[])

In case that doesn't work, here it is without the comments:

def
getInstances(instancetype,level=1,includeglobals=True,containersearchdepth=1,exclude={},predicate=lambda
x:True):
vars = {}
if includeglobals: vars.update(stack()[level+1][0].f_globals)
vars.update(stack()[level+1][0].f_locals)
objs, names =
extractInstances(instancetype,vars.itervalues(),containersearchdepth
+1,knownnames=vars.iterkeys(),exclude=exclude,predicate=predicate)
return (objs,names)

def
extractInstances(instancetype,container,depth,containingname='vars()',knownnames=None,exclude={},predicate=lambda
x:True):
if depth<=0: return ([],[])
if isinstance(container,str): return ([],[])
if not isinstance(container,
(list,tuple,dict,type({}.itervalues()))): return ([],[])
if knownnames is None:
knewnames = False
knownnames = repeat(containingname)
else:
knewnames = True
objs = []
names = []
try:
for x,name in zip(container,knownnames):
if hasattr(x,'__name__'): name = x.__name__
elif not knewnames: name = 'Unnamed object, id =
'+str(id(x))+', contained in: '+containingname
if isinstance(x,instancetype):
if x not in exclude and predicate(x):
objs.append(x)
names.append(name)
else:
newobjs, newnames =
extractInstances(instancetype,x,depth-1,containingname=name,predicate=predicate)
objs += newobjs
names += newnames
return (objs,names)
except:
return ([],[])

Mike Meyer · Jan 11, 2008

Hi all,

I'm part of a small team writing a Python package for a scientific
computing project. The idea is to make it easy to use for relatively
inexperienced programmers. As part of that aim, we're using what we're
calling 'magic functions', and I'm a little bit concerned that they
are dangerous code. I'm looking for advice on what the risks are (e.g.
possibility of introducing subtle bugs, code won't be compatible with
future versions of Python, etc.).

Quick background: Part of the way our package works is that you create
a lot of objects, and then you create a new object which collects
together these objects and operates on them. We originally were
writing things like:

obj1 = Obj(params1)
obj2 = Obj(params2)
...
bigobj = Bigobj(objects=[obj1,obj2])
bigobj.run()

This is fine, but we decided that for clarity of these programs, and
to make it easier for inexperienced programmers, we would like to be
able to write something like:

obj1 = Obj(params1)
obj2 = Obj(params2)
...
run()

The idea is that the run() function inspects the stack, and looks for
object which are instances of class Obj, creates a Bigobj with those
objects and calls its run() method.

So, any comments on that approach?

The basic idea is ok, but looking at the stack makes me a bit
nervous. That makes the code complicated, and probably fragile in the
face of changing python versions.

The unittest module does much the same thing - you run unittest.main,
and it runs all the tests in any TestCase subclass in your module
(assuming you didn't do something to limit it). However, it does it by
examining the module, not the stack. The real difference is that your
"magic" classes have to be global to your module. On the other hand,
it provides some nice tools to let you partition things, so you can
easily run subsets of the classes from the command line.

It's probably worth a look.

<mike

oj · Jan 11, 2008

Hi all,

I'm part of a small team writing a Python package for a scientific
computing project. The idea is to make it easy to use for relatively
inexperienced programmers. As part of that aim, we're using what we're
calling 'magic functions', and I'm a little bit concerned that they
are dangerous code. I'm looking for advice on what the risks are (e.g.
possibility of introducing subtle bugs, code won't be compatible with
future versions of Python, etc.).

Quick background: Part of the way our package works is that you create
a lot of objects, and then you create a new object which collects
together these objects and operates on them. We originally were
writing things like:

obj1 = Obj(params1)
obj2 = Obj(params2)
...
bigobj = Bigobj(objects=[obj1,obj2])
bigobj.run()

This is fine, but we decided that for clarity of these programs, and
to make it easier for inexperienced programmers, we would like to be
able to write something like:

obj1 = Obj(params1)
obj2 = Obj(params2)
...
run()

The idea is that the run() function inspects the stack, and looks for
object which are instances of class Obj, creates a Bigobj with those
objects and calls its run() method.

So, any comments on that approach?

I'm including the code I've written to do this, and if you have time
to look through it, I'd also be very grateful for any more specific
comments about the way I've implemented it (in particular, can it be
made faster, is my program creating cycles that stop the garbage
collection from working, etc.). I hope the code will be formatted
correctly:

def
getInstances(instancetype,level=1,includeglobals=True,containersearchdepth=1,exclude={},predicate=lambda
x:True):
"""Find all instances of a given class at a given level in the
stack
"""
vars = {}
# Note: we use level+1 because level refers to the level relative
to the function calling this one
if includeglobals: vars.update(stack()[level+1][0].f_globals)
vars.update(stack()[level+1][0].f_locals)
# Note that you can't extract the names from vars.itervalues() so
we provide via knownnames the names vars.iterkeys(),
# containersearchdepth+1 is used because vars.itervalues() is the
initial container from the point of view of this
# function, but not from the point of view of the person calling
getInstances
objs, names =
extractInstances(instancetype,vars.itervalues(),containersearchdepth
+1,knownnames=vars.iterkeys(),exclude=exclude,predicate=predicate)
return (objs,names)

def
extractInstances(instancetype,container,depth,containingname='vars()',knownnames=None,exclude={},predicate=lambda
x:True):
if depth<=0: return ([],[])
if isinstance(container,str): return ([],[]) # Assumption: no need
to search through strings
# Ideally, this line wouldn't be here, but it seems to cause
programs to crash, probably because
# some of the simulator objects are iterable but shouldn't be
iterated over normally
# TODO: Investigate what is causing this to crash, and possibly
put in a global preference to turn this line off?
if not isinstance(container,
(list,tuple,dict,type({}.itervalues()))): return ([],[])
# Note that knownnames is only provided by the initial call of
extractInstances and the known
# names are from the dictionary of variables. After the initial
call, names can only come from
# the __name__ attribute of a variable if it has one, and that is
checked explicitly below
if knownnames is None:
knewnames = False
knownnames = repeat(containingname)
else:
knewnames = True
objs = []
names = []
try: # container may not be a container, if it isn't, we'll
encounter a TypeError
for x,name in zip(container,knownnames):
# Note that we always have a name variable defined, but if
knewnames=False then this is just
# a copy of containingname, so the name we want to give it
in this instance is redefined in this
# case. We have to use this nasty check because we want to
iterate over the pair (x,name) as
# variables in the same position in the container have the
same name, and we can't necessarily
# use __getitem__
if hasattr(x,'__name__'): name = x.__name__
elif not knewnames: name = 'Unnamed object, id =
'+str(id(x))+', contained in: '+containingname
if isinstance(x,instancetype):
if x not in exclude and predicate(x):
objs.append(x)
names.append(name)
else: # Assumption: an object of the instancetype is not
also a container we want to search in.
# Note that x may not be a container, but then
extractInstances will just return an empty list
newobjs, newnames =
extractInstances(instancetype,x,depth-1,containingname=name,predicate=predicate)
objs += newobjs
names += newnames
return (objs,names)
except: # if we encounter a TypeError from the for loop, we just
return an empty pair, container wasn't a container
return ([],[])

In case that doesn't work, here it is without the comments:

def
getInstances(instancetype,level=1,includeglobals=True,containersearchdepth=1,exclude={},predicate=lambda
x:True):
vars = {}
if includeglobals: vars.update(stack()[level+1][0].f_globals)
vars.update(stack()[level+1][0].f_locals)
objs, names =
extractInstances(instancetype,vars.itervalues(),containersearchdepth
+1,knownnames=vars.iterkeys(),exclude=exclude,predicate=predicate)
return (objs,names)

def
extractInstances(instancetype,container,depth,containingname='vars()',knownnames=None,exclude={},predicate=lambda
x:True):
if depth<=0: return ([],[])
if isinstance(container,str): return ([],[])
if not isinstance(container,
(list,tuple,dict,type({}.itervalues()))): return ([],[])
if knownnames is None:
knewnames = False
knownnames = repeat(containingname)
else:
knewnames = True
objs = []
names = []
try:
for x,name in zip(container,knownnames):
if hasattr(x,'__name__'): name = x.__name__
elif not knewnames: name = 'Unnamed object, id =
'+str(id(x))+', contained in: '+containingname
if isinstance(x,instancetype):
if x not in exclude and predicate(x):
objs.append(x)
names.append(name)
else:
newobjs, newnames =
extractInstances(instancetype,x,depth-1,containingname=name,predicate=predicate)
objs += newobjs
names += newnames
return (objs,names)
except:
return ([],[])

If you are the author of class Obj, then why not just make the class
maintain a record of any objects that have been instantiated?

That way, run could simply call a class method to obtain a list of all
the objects it needs.

Ruediger · Jan 11, 2008

Hi all,

I'm part of a small team writing a Python package for a scientific
computing project. The idea is to make it easy to use for relatively
inexperienced programmers. As part of that aim, we're using what we're
calling 'magic functions', and I'm a little bit concerned that they
are dangerous code. I'm looking for advice on what the risks are (e.g.
possibility of introducing subtle bugs, code won't be compatible with
future versions of Python, etc.).

Quick background: Part of the way our package works is that you create
a lot of objects, and then you create a new object which collects
together these objects and operates on them. We originally were
writing things like:

obj1 = Obj(params1)
obj2 = Obj(params2)
...
bigobj = Bigobj(objects=[obj1,obj2])
bigobj.run()

This is fine, but we decided that for clarity of these programs, and
to make it easier for inexperienced programmers, we would like to be
able to write something like:

obj1 = Obj(params1)
obj2 = Obj(params2)
...
run()

The idea is that the run() function inspects the stack, and looks for
object which are instances of class Obj, creates a Bigobj with those
objects and calls its run() method.

Well i would do it this way:
no fancy stuff, all standard and fast.

from weakref import ref

class bigobject(set):
def __iter__(self):
for obj in set.__iter__(self):
yield obj()
def run(self):
for obj in self:
print obj.value

class foo(object):
""" weakref doesn't prevent garbage collection if last instance
is destroyed """
__instances__ = bigobject()
def __init__(self, value):
foo.__instances__.add(ref(self,foo.__instances__.remove))
self.value = value

if __name__ == "__main__":
obj1 = foo("obj1")
obj2 = foo("obj2")
obj3 = foo("obj3")
obj4 = foo("obj4")
foo.__instances__.run()
print "test garbage collection."
del obj1, obj2, obj3, obj4
foo.__instances__.run()

Paul Rubin · Jan 11, 2008

obj1 = Obj(params1)
obj2 = Obj(params2)
...
run()

The idea is that the run() function inspects the stack, and looks for
object which are instances of class Obj, creates a Bigobj with those
objects and calls its run() method.

So, any comments on that approach?

Bleeearrrrrggggh!!!! Just make the object initializer remember where
the instances are. Or, write something like:

newobj = Bigobj()
# give Bigobj a __call__ method to create and record an object

obj1 = newobj(params1)
obj2 = newobj(params2)
...
newobj.run()

Steven D'Aprano · Jan 12, 2008

Hi all,

I'm part of a small team writing a Python package for a scientific
computing project. The idea is to make it easy to use for relatively
inexperienced programmers.
....

This is fine, but we decided that for clarity of these programs, and to
make it easier for inexperienced programmers, we would like to be able
to write something like:

obj1 = Obj(params1)
obj2 = Obj(params2)
...
run()

The idea is that the run() function inspects the stack, and looks for
object which are instances of class Obj, creates a Bigobj with those
objects and calls its run() method.

So, any comments on that approach?

Your users are *scientists*, and you don't trust their intellectual
ability to learn a programming language as simple as Python?

Instead of spending time and effort writing, debugging and maintaining
such a fragile approach, why not invest in a couple of introductory books
on Python programming and require your scientists to go through the first
few chapters? Or write out a one-page "cheat sheet" showing them simple
examples. Or, and probably most effectively, make sure all your classes
have doc strings with lots of examples, and teach them how to use help().

Some people problems are best dealt with by a technical solution, and
some are not.

Michael Tobis · Jan 12, 2008

Your users are *scientists*, and you don't trust their intellectual
ability to learn a programming language as simple as Python?

Instead of spending time and effort writing, debugging and maintaining
such a fragile approach, why not invest in a couple of introductory books
on Python programming and require your scientists to go through the first
few chapters? Or write out a one-page "cheat sheet" showing them simple
examples. Or, and probably most effectively, make sure all your classes
have doc strings with lots of examples, and teach them how to use help().

Some people problems are best dealt with by a technical solution, and
some are not.

I am currently talking very similar trash on my blog, See
http://initforthegold.blogspot.com/2008/01/staying-geeky.html and
http://initforthegold.blogspot.com/2007/12/why-is-climate-modeling-stuck.html

You seem to think that learning the simple language is equivalent to
grasping the expressive power that the language provides.

Yes, users are scientists. Therefore they do not have the time or
interest to gain the depth of skill to identify the right abstractions
to do their work.

There are many abstractions that could be useful in science that are
currently provided with awkward libraries or messy one-off codes.

The idea that a scientist should be expected to be able to write
correct and useful Python is reasonable. I and the OP are relying on
it.

The idea that a scientist should be expected to identify and build
clever and elegant abstractions is not. If you think every scientist
can be a really good programmer you underestimate at least one of what
good scientists do or what good programmers do or what existing high
performance scientific codes are called upon to do.

mt

Steven D'Aprano · Jan 12, 2008

I am currently talking very similar trash on my blog, See
http://initforthegold.blogspot.com/2008/01/staying-geeky.html and
http://initforthegold.blogspot.com/2007/12/why-is-climate-modeling- stuck.html

You seem to think that learning the simple language is equivalent to
grasping the expressive power that the language provides.

I do? What did I say that led you to that conclusion?

Yes, users are scientists. Therefore they do not have the time or
interest to gain the depth of skill to identify the right abstractions
to do their work.

I don't follow you. If they aren't learning the skills they need to do
their work, what are they doing? Hammering screws in with a hacksaw?
(Metaphorically speaking.)

There are many abstractions that could be useful in science that are
currently provided with awkward libraries or messy one-off codes.

I'm sure you're right. Attempts to make elegant libraries and re-usable
code should be encouraged. The OP's attempt to dumb-down his library
strikes me as a step in the wrong direction.

The idea that a scientist should be expected to be able to write correct
and useful Python is reasonable. I and the OP are relying on it.

Please go back and look at the example the OP gave. According to the
example given, his users would find this too difficult to deal with:

obj1 = Obj(params1)
obj2 = Obj(params2)
....
bigobj = Bigobj(objects=[obj1,obj2])
bigobj.run()

That's not terribly complex code, thanks to Python's easy-to-use object
model. Dropping the explicit construction of the Bigobj in favour of a
mysterious, implicit auto-magic run() is a serious step in the wrong
direction. Any scientist working with this can see exactly what is being
run(), and not have to rely on hunting through the entire source code
looking for Obj() calls he might have missed.

As simple as the above is, it could be made simpler. Judging from the
example given, the Bigobj constructor doesn't need a keyword argument, it
could just as easily take an arbitrary number of arguments:

bigobj = Bigobj(obj1, obj2, obj3, obj4...)

The idea that a scientist should be expected to identify and build
clever and elegant abstractions is not.

But that's their job. That's what scientists do: identify and build
clever and elegant abstractions, such as Newton's Laws of Motion, Special
Relativity, Evolution by Natural Selection, the Ideal Gas Laws, and so on.

Even climate change models are abstractions, and we would hope they are
clever and elegant rather than stupid and ugly.

If you think every scientist can
be a really good programmer you underestimate at least one of what good
scientists do or what good programmers do or what existing high
performance scientific codes are called upon to do.

Read the OP's post again. His (her?) users aren't expected to create the
toolkit, merely to use it. To create good toolkits you need both a master
programmer and an expert in the field. It is an advantage if they are the
same person. But to use such a good toolkit, you shouldn't need to be a
master programmer.

Michael Tobis · Jan 12, 2008

Read the OP's post again. His (her?) users aren't expected to create the
toolkit, merely to use it. To create good toolkits you need both a master
programmer and an expert in the field. It is an advantage if they are the
same person. But to use such a good toolkit, you shouldn't need to be a
master programmer.

It appears we are in agreement, then.

But that leaves me in a position where I can't understand your
complaint. There's no reason I can see for the sort of compromise you
ask for.

Clean abstractions benefit from their cleanliness.

Of course the users will have a lot to learn regardless, but that's
the point. A user has to decide whether to take on a new tool.

If that learning is about meaningless incantations (the way beginning
programmers are currently taught to say "abracadabra public static
void main") users will be less impressed with the advantage of the
abstractions and be less likely to engage the new methods on offer. If
the learning exposes new potential, that makes your tool more
attractive.

What's more, the next higher layer of abstraction will be easier to
compose if the composer of that abstraction doesn't have to make the
sort of compromise you suggest. Abstractions that stay out of the way
until you need to expand on them is a big part of what Python is all
about.

It's not clear that this is the sort of application where cutting
corners makes sense, so I don't see how your advice is justified.

mt

Steven D'Aprano · Jan 12, 2008

It appears we are in agreement, then.

But that leaves me in a position where I can't understand your
complaint. There's no reason I can see for the sort of compromise you
ask for.

What compromise do you think I'm asking for?

I'm suggesting that the scientists be given a brief, introductory
education in *how to use their tool*, namely, Python.

Instead of creating some sort of magic function that "just works" (except
when it doesn't) by doing some sort of implicit "grab every object of
type Obj() you can find and do processing on that", stick to the more
reliable and safer technique of having the programmer explicitly provide
the objects she wants to work with.

Clean abstractions benefit from their cleanliness.

An automatic "run()" that uses a bunch of stuff you can't see as input is
not a clean abstraction. "Do What I Mean" functions have a long and
inglorious history of not doing what the user meant.

There's a fundamental difference between (say) Python's automatic garbage
collection and what the OP is suggesting. Explicitly deleting variables
is almost always the sort of trivial incantation you rightly decry. The
computer can tell when a variable is no longer reachable, and therefore
is safe to delete. But the computer can't safely tell when the user wants
to use a variable as input to a function. The user needs to explicitly
tell the computer what is input and what isn't.

The OP is suggesting taking that decision out of the hands of the user,
and making every variable of type Obj automatically input. If you think
that's a good idea, consider a programming tool kit with a function sum()
which inspects every variable you have and adds up every one that is a
number.

It's not clear that this is the sort of application where cutting
corners makes sense, so I don't see how your advice is justified.

Sorry, are you suggesting that training the scientists to use their tools
is cutting corners? Because I'd call the OP's suggestion to use magic
functions a dangerous, ill-conceived cut corner.

Carl Banks · Jan 12, 2008

Hi all,

I'm part of a small team writing a Python package for a scientific
computing project. The idea is to make it easy to use for relatively
inexperienced programmers. As part of that aim, we're using what we're
calling 'magic functions', and I'm a little bit concerned that they are
dangerous code. I'm looking for advice on what the risks are (e.g.
possibility of introducing subtle bugs, code won't be compatible with
future versions of Python, etc.).

Quick background: Part of the way our package works is that you create a
lot of objects, and then you create a new object which collects together
these objects and operates on them. We originally were writing things
like:

obj1 = Obj(params1)
obj2 = Obj(params2)
...
bigobj = Bigobj(objects=[obj1,obj2])
bigobj.run()

This is fine, but we decided that for clarity of these programs, and to
make it easier for inexperienced programmers, we would like to be able
to write something like:

obj1 = Obj(params1)
obj2 = Obj(params2)
...
run()

The idea is that the run() function inspects the stack, and looks for
object which are instances of class Obj, creates a Bigobj with those
objects and calls its run() method.

So, any comments on that approach?

1. Even if you implement magic functions, don't get rid of the
straightforward "hard way".

Magic functions should be for convenience only. The user should be free
to choose to do it the straightforward, explicit "hard way", and not rely
on the magic. In your example, Bigobj should still be available to
users, and should be documented at least as well as the magic run()
function.

The main reason for this (aside from the philosophical question) is that
users often have different needs that you can anticipate, and your magic
might not meet those unanticipated needs, forcing the user to resort to
hacks and workarounds.

2. If your intention is to perform this operation on all Objs, then it
might be a good idea to arrange your code so that Objs are already
registered by the time the user gets them.

One way to do this has already been mentioned: by having the Obj class
track all its instances.

Another way that might be preferable is to have Bigobj create Objs on
behalf of the user. Here's a stripped down example:

class Bigobj(object):
def __init__(self):
self.tracked_objs = set()
def create_object(self,*args):
obj = Obj(*args)
self.tracked_objs.add(obj)
return obj
def run(self):
for obj in self.tracked_objs:
# do something with obj

bigobj = Bigobj()

obj1 = bigobj.create_object(params1)
obj2 = bigobj.create_object(params2)

# maybe do something with obj1 and obj2 here

bigobj.run()

Carl Banks

bearophileHUGS · Jan 12, 2008

Steven D'Aprano:

As simple as the above is, it could be made simpler. Judging from the
example given, the Bigobj constructor doesn't need a keyword argument,
it could just as easily take an arbitrary number of arguments:
bigobj = Bigobj(obj1, obj2, obj3, obj4...)

I agree; "Things should be made as simple as possible, but not any
simpler". Hiding those important details from the user is "too much
simple", and in the long run it will give problems.
They don't need to become expert (Python) programmers, but it's
positive for them to learn how to use their tools a bit, and that
example is simple.

For them Python is a good choice as a shell/interface/glue language
(despite the lack of built-in multiprecision floating point numbers
and TONS of other things you can find built-in in Mathematica, that is
quite less easy to program than Python), you can add MatPlotLib to
visualize data on the fly. In the future the Fortress language (by
Sun) may be good for them to use the CPU multi core CPUs too, etc. At
the moment Java/D/Cython/C may be used to write the numerically
intensive routines (and to interface to the already written ones in
Fortran/C/C++, etc) (note: Cython is a better version of Pyrex).

Bye,
bearophile

dg.google.groups · Jan 12, 2008

Thanks everyone for the comments.

I had previously thought about the possibility of the classes keeping
track of their instances. I guess this could probably be done quite
transparently with a decorator too (as we have many different types of
objects being collected together). The only issue is that this
approach forces you to use what are essentially global variables,
whereas the searching through the stack method allows you to use the
structure of the program to organise what objects each 'magic'
function sees. Is this a good idea or not? I'm not entirely sure. I
think that personally I would lean towards using this method of
classes keeping track of their instances. It's not entirely my
decision so I'll see what the others say about it.

Any comments on this possibility: classes could keep track of their
instances, and also keep track of which function or module the
instances were defined in. Then, the magic functions could pick out
objects defined in the same function or module rather than looking at
the stack. This would achieve a similar thing, but is there any great
advantage in doing it this way? My first thought is that you'd still
have to go digging around in the stack to do this, but just not as
much.

Also, does anyone know of any specific things I should be aware of in
taking this stack searching approach? I'm thinking of, for example,
any planned changes in the execution model of Python or the
inspect.stack() function in the next version of Python.

Paul,

"Your users are *scientists*, and you don't trust their intellectual
ability to learn a programming language as simple as Python?"

Well, it's not quite as simple as that. One thing is that we're not
going to be able to force people to use our package. We believe it's
going to be considerably better - particularly in terms of ease of use
and extensibility - than the existing alternatives, but one of the
factors that will affect how many people start using it is how simple
we can make it to do basic things that they'll be familiar with. Many
scientists are using Python now, but it's not yet quite well known
enough that we can just assume that people will know it, and having to
learn the details of a new programming language is a considerable
disincentive for someone thinking about switching to a new piece of
software (even if, as you say, Python is not the most difficult
language to learn). Although the difference between the two pieces of
hypothetical code I presented seems quite trivial to an experienced
programmer, I think that the clarity and simplicity of the version
that uses the magic functions might make a difference. The difference
between being able to define and run a model with 10 lines or 20-30
lines of code might, somewhat perversely, be a significant factor.
(The example I gave was simplified to illustrate what was going on,
but the actual situation is more like you have 5 or 6 different types
of object, each of which uses other types of object to initialise
themselves, so that the magic function approach really reduces the
length of the program considerably.)

So, there's an aspect of PR about our wanting to have something like
the magic functions, but it's not entirely about self promotion,
because we think that in the long term it will be better for the users
if they switch to using our package (or something like it). The reason
being that the alternatives available at the moment all use their own
custom made programming languages which have nothing like the power of
a well developed general purpose language like Python, and are much
more difficult to use and extend. One of them is a stack based
language of all things!

Carl,

"Even if you implement magic functions, don't get rid of the
straightforward "hard way"."

Absolutely not! A very good point. In fact, the magic functions don't
actually do any work themselves, they just create and call the 'hard
way' functions (which are still visible to the user). They're an
additional layer of abstraction which you can choose to use or not
use. And actually, there will be situations where there is no
alternative but to use the 'hard way'. We already learnt this lesson:
a couple of our magic functions were behaving differently and causing
some odd behaviour, so we changed them and now we're working on
building a more consistent and explicit interface (and enforcing it
works as expected with the unit testing module, a tedious but
hopefully very useful exercise in the long run).

Rüdiger Werner · Jan 13, 2008

Well as I understand your problem now,
you would not like all instances of an specific object that are still alive,
but all references to an object (created somewhere, sometimes) in an local
context (stack frame),
that are accessible from 'that' context ( but also from many others).

However in python a stack frame does not 'contain' an object. It only
contains a reference to an
object. You may delete this reference whithin this frame, but the object may
still be alive.

So you can do following:

def run(att):
for k, v in att.iteritems():
if isinstance(v, dict):
print k, v, id(v)

def foo(bar):
x = list()
y = object()
run(locals())
del bar
run(locals())

bazz = dict()
print "bazz has id ", id(bazz)
foo(bazz)
print "bazz has id ", id(bazz)

pythonw -u "console_play.py"

bazz has id 11068592
bar {} 11068592
bazz has id 11068592

Exit code: 0

Note that bar {} is printed only once, since the reference 'bar' defined in
foo has been deleted. The object itself is still alive
because the referece 'bazz' still exists. You should consider, that
inspecting the stack will not tell you if an object is alive or not.
It also doesn't tell you that an object can't be used by your users. If you
come from an C++ background, then consider that python
is different. Creating an object in an local context will not destroy this
object if you leafe this context.
There is no such thing like a 'destructor' in python. You should also
consider, that frame objects are not destroyed if used by an
generator or if there is still a reference to them. A frame object may life
forever. Read the manual about the inspect module!

Inspecting the stack may give you wrong and difficult to debug results. I
just wouldn't do that.
Keeping track of instances isn't that difficult.

However if you need instances (not references to them!) that have been
created within a specific stack frame
you may use my example below. It will extend the weakref with the id of the
stackframe that created it. However
the instance may still live while the frame had been destroyed long ago!

Remember:
Inspecting the stack will not tell you weather a user can use an specific
object nor will it tell you, if the object is alive or not.

from weakref import ref
from inspect import getouterframes, currentframe

class ExtendedRef(ref):
def __init__(self, ob, callback=None, **annotations):
super(ExtendedRef, self).__init__(ob, callback)
self.__id = 0

class WeakSet(set):
def add(self, value, id=0):
wr = ExtendedRef(value, self.remove)
wr.__id = id
set.add(self, wr)
def get(self, id):
return [ _() for _ in self if _.__id == id]

class bigobject(WeakSet):
def run(self):
outer_frame = id(getouterframes( currentframe())[1][0])
for obj in self.get(outer_frame):
# process object's
print obj.value

class foo(object):
__instances__ = bigobject()
def __init__(self, value):
outer_frame = id(getouterframes( currentframe())[1][0])
foo.__instances__.add(self, outer_frame)
self.value = value

def main( depth ):
obj1 = foo("obj1 at depth %s" % depth)
obj2 = foo("obj2 at depth %s" % depth)
foo.__instances__.run()
print "processed objects created at %s" % id(currentframe())
if depth == 0:
return
else:
main(depth-1)

if __name__ == "__main__":
obj1 = foo("obj1 at depth root")
main(3)
foo.__instances__.run()
print "processed objects created at %s" % id(currentframe())

pythonw -u "test12.py"

obj1 at depth 3
obj2 at depth 3
processed objects created at 11519672
obj2 at depth 2
obj1 at depth 2
processed objects created at 11496496
obj2 at depth 1
obj1 at depth 1
processed objects created at 11813904
obj2 at depth 0
obj1 at depth 0
processed objects created at 11814272
obj1 at depth root
processed objects created at 11443120

dg.google.groups · Jan 14, 2008

Hi Rüdiger,

Thanks for your message. I liked your approach and I've been trying
something along exactly these sorts of lines, but I have a few
problems and queries.

The first problem is that the id of the frame object can be re-used,
so for example this code (where I haven't defined InstanceTracker and
getInstances, but they are very closely based on the ideas in your
message):

class A(InstanceTracker):
gval = 0
def __init__(self):
self.value = A.gval # each time you make a new object, give
A.gval += 1 # it a value one larger
def __repr__(self):
return str(self.value)

def f2():
a = A() # objects 0 and 2
return getInstances(A)

def f3():
a = A() # object 1
return f2()

inst2 = f2()
inst3 = f3()
print inst2
print inst3

The output is:

[0]
[0, 2]

The A-variable with value 0 is not being garbage collected because
it's saved in the variable inst2, but it's also being returned by the
second call to getInstances because the frame of f2 is the same each
time (which makes sense, but may be implementation specific?). The
same problem doesn't exist when you use the stack searching method
because from f2's point of view, the only bound instance of A is the
one in that particular call of f2. If you had at the end instead of
the inst2, inst3 stuff:

print f2()
print f3()

The output is:

[0]
[2]

Again, I guess this because A with value 0 is being garbage collected
between print f2() and print f3(), but again I think this is
implementation specific? You don't have a guarantee that this object
will be garbage collected straight away do you?

So my concern here is that this approach is actually less safe than
the stack based approach because it depends on implementation specific
details in a non-straightforward way. That said, I very much like the
fact that this approach works if I write:

a = [A()]
a = [[A()]]
etc.

To achieve the same thing with the stack based approach you have to
search through all containers to (perhaps arbitrary) depth.

I also have another problem which is that I have a function decorator
which returns a callable object (a class instance not a function).
Unfortunately, the frame in which the callable object is created is
the frame of the decorator, not the place where the definition is.
I've written something to get round this, but it seems like a bit of a
hack.

Can anyone suggest an approach that combines the best of both worlds,
the instance tracking approach and the stack searching approach? Or do
I need to just make a tradeoff here?

Thanks again for all your help everyone,
Dan Goodman

Ruediger · Jan 15, 2008

Hi RÃ¼diger,

Thanks for your message. I liked your approach and I've been trying
something along exactly these sorts of lines, but I have a few
problems and queries.

The first problem is that the id of the frame object can be re-used,
so for example this code (where I haven't defined InstanceTracker and
getInstances, but they are very closely based on the ideas in your
message):

class A(InstanceTracker):
gval = 0
def __init__(self):
self.value = A.gval # each time you make a new object, give
A.gval += 1 # it a value one larger
def __repr__(self):
return str(self.value)

def f2():
a = A() # objects 0 and 2
return getInstances(A)

def f3():
a = A() # object 1
return f2()

inst2 = f2()
inst3 = f3()
print inst2
print inst3

The output is:

[0]
[0, 2]

The A-variable with value 0 is not being garbage collected because
it's saved in the variable inst2, but it's also being returned by the
second call to getInstances because the frame of f2 is the same each
time (which makes sense, but may be implementation specific?). The

Yes and No. id basically returns the memory address of an object.
and yes this is implementation specific. As of my knowledge a stackframe is
of constant size in cPython. Though you get always the same id for the same
call level as you would always get the same number from your instance
tracker.

No A-variable with value 0 is reported the second time because it had been
created at the same call level __and__ it is still accessible from that
call level.

If you do want such object's to be destroyed you must not create hard
references to them. This may be hard for your users.

However you could still do something like:

def f2():
InstanceTracker.prepare() # <-- delete previously created Entrys
# here or calculate some magic hash value
# or random number.
a = A() # objects 0 and 2
return getInstances(A)

or

@managedInstance # <-- see above
def f2():
a = A() # objects 0 and 2
return getInstances(A)

same problem doesn't exist when you use the stack searching method
because from f2's point of view, the only bound instance of A is the
one in that particular call of f2. If you had at the end instead of
the inst2, inst3 stuff:

print f2()
print f3()

The output is:

[0]
[2]

You basically guess here how a user would write his programm.

what if your user's writes code like this?
.... my_global_dict["a"] = object()
.... my_global_list.append(object())
.... print locals()
....
you would not find such a references by inspecting the stack.

Again, I guess this because A with value 0 is being garbage collected
between print f2() and print f3(), but again I think this is
implementation specific? You don't have a guarantee that this object
will be garbage collected straight away do you?

Yes inspecting the stack is pure guesswork.
You don't know anything about your users program structures and inspecting
the stack won't tell you.

So my concern here is that this approach is actually less safe than
the stack based approach because it depends on implementation specific
details in a non-straightforward way. That said, I very much like the
fact that this approach works if I write:

a = [A()]
a = [[A()]]
etc.

To achieve the same thing with the stack based approach you have to
search through all containers to (perhaps arbitrary) depth.

Yes and as pointed out above you will also have to search the global
namespace and all available memory because an instance could have been
created by psyco, ctypes, Swig, Assembly code .....

I also have another problem which is that I have a function decorator
which returns a callable object (a class instance not a function).
Unfortunately, the frame in which the callable object is created is
the frame of the decorator, not the place where the definition is.
I've written something to get round this, but it seems like a bit of a
hack.

Can anyone suggest an approach that combines the best of both worlds,
the instance tracking approach and the stack searching approach? Or do
I need to just make a tradeoff here?

well that's my last example. I hope it will help.

from weakref import ref
from random import seed, randint
seed()

class ExtendedRef(ref):
def __init__(self, ob, callback=None, **annotations):
super(ExtendedRef, self).__init__(ob, callback)
self.__id = 0

class WeakSet(set):
__inst__ = 0
def add(self, value ):
wr = ExtendedRef(value, self.remove)
wr.__id = WeakSet.__inst__
set.add(self, wr)
def get(self, _id=None):
_id = _id if _id else WeakSet.__inst__
return [ _() for _ in self if _.__id == _id]
@classmethod
def prepare(self):
WeakSet.__inst__ = randint(0, 2**32-1)
return WeakSet.__inst__

class bigobject(WeakSet):
def run(self, _id=None):
for obj in self.get(_id):
# process object's
print obj.value

class foo(object):
__instances__ = bigobject()
def __init__(self, value):
foo.__instances__.add(self)
self.value = value

def managed(fun):
def new(*att, **katt):
_id = WeakSet.prepare()
_result = fun(*att, **katt)
foo.__instances__.run(_id)
return _result
return new

@managed
def main( depth, txt ):
obj1 = foo("%s obj1 at depth %s " % (txt, depth))
obj2 = foo("%s obj2 at depth %s" % (txt, depth))
print "processing objects created in %s at depth %s" % (txt, depth)
foo.__instances__.run()
if depth == 0:
return
else:
main(depth-1, "foo")
main(depth-1, "bar")

if __name__ == "__main__":
_id = WeakSet.prepare()
obj1 = foo("obj1 at __main__")
main(3, "root")
print "processing objects created in __main__"
foo.__instances__.run(_id)

ruediger@linux-ehvh:~/tmp> python test12.py
processing objects created in root at depth 3
root obj1 at depth 3
root obj2 at depth 3
processing objects created in foo at depth 2
foo obj1 at depth 2
foo obj2 at depth 2
processing objects created in foo at depth 1
foo obj2 at depth 1
foo obj1 at depth 1
processing objects created in foo at depth 0
foo obj1 at depth 0
foo obj2 at depth 0
processing objects created in bar at depth 0
bar obj1 at depth 0
bar obj2 at depth 0
processing objects created in bar at depth 1
bar obj1 at depth 1
bar obj2 at depth 1
processing objects created in foo at depth 0
foo obj2 at depth 0
foo obj1 at depth 0
processing objects created in bar at depth 0
bar obj1 at depth 0
bar obj2 at depth 0
processing objects created in bar at depth 2
bar obj1 at depth 2
bar obj2 at depth 2
processing objects created in foo at depth 1
foo obj1 at depth 1
foo obj2 at depth 1
processing objects created in foo at depth 0
foo obj1 at depth 0
foo obj2 at depth 0
processing objects created in bar at depth 0
bar obj1 at depth 0
bar obj2 at depth 0
processing objects created in bar at depth 1
bar obj2 at depth 1
bar obj1 at depth 1
processing objects created in foo at depth 0
foo obj2 at depth 0
foo obj1 at depth 0
processing objects created in bar at depth 0
bar obj2 at depth 0
bar obj1 at depth 0
processing objects created in __main__
obj1 at __main__
ruediger@linux-ehvh:~/tmp>

How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	1	Mar 22, 2024
Constraints -//- first release -//- Flexible abstract class basedvalidation for attributes, function	0	Jan 26, 2012
Namespace hack	1	May 24, 2012
Checking Signature of Function Parameter	10	Aug 28, 2011
Need help with this error!	0	Mar 5, 2011
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
Python battle game help	2	Feb 23, 2023
Persistence API - magic?	15	Sep 6, 2011

Magic function

dg.google.groups

Mike Meyer

oj

Ruediger

Paul Rubin

Steven D'Aprano

Michael Tobis

Steven D'Aprano

Michael Tobis

Steven D'Aprano

Carl Banks

bearophileHUGS

dg.google.groups

Rüdiger Werner

dg.google.groups

Ruediger

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads