Conflicting needs for __init__ method

D

dickinsm

Here's an example of a problem that I've recently come up against for
the umpteenth time. It's not difficult to solve, but my previous
solutions have never seemed quite right, so I'm writing to ask whether
others have encountered this problem, and if so what solutions they've
come up with.

Suppose you're writing a class "Rational" for rational numbers. The
__init__ function of such a class has two quite different roles to
play. First, it's supposed to allow users of the class to create
Rational instances; in this role, __init__ is quite a complex beast.
It needs to allow arguments of various types---a pair of integers, a
single integer, another Rational instance, and perhaps floats, Decimal
instances, and suitably formatted strings. It has to validate the
input and/or make sure that suitable exceptions are raised on invalid
input. And when initializing from a pair of integers---a numerator
and denominator---it makes sense to normalize: divide both the
numerator and denominator by their greatest common divisor and make
sure that the denominator is positive.

But __init__ also plays another role: it's going to be used by the
other Rational arithmetic methods, like __add__ and __mul__, to return
new Rational instances. For this use, there's essentially no need for
any of the above complications: it's easy and natural to arrange that
the input to __init__ is always a valid, normalized pair of integers.
(You could include the normalization in __init__, but that's wasteful
when gcd computations are relatively expensive and some operations,
like negation or raising to a positive integer power, aren't going to
require it.) So for this use __init__ can be as simple as:

def __init__(self, numerator, denominator):
self.numerator = numerator
self.denominator = denominator

So the question is: (how) do people reconcile these two quite
different needs in one function? I have two possible solutions, but
neither seems particularly satisfactory, and I wonder whether I'm
missing an obvious third way. The first solution is to add an
optional keyword argument "internal = False" to the __init__ routine,
and have all internal uses specify "internal = True"; then the
__init__ function can do the all the complicated stuff when internal
is False, and just the quick initialization otherwise. But this seems
rather messy.

The other solution is to ask the users of the class not to use
Rational() to instantiate, but to use some other function
(createRational(), say) instead. Then __init__ is just the simple
method above, and createRational does all the complicated stuff to
figure out what the numerator and denominator should be and eventually
calls Rational(numerator, denomiator) to create the instance. But
asking users not to call Rational() seems unnatural. Perhaps with
some metaclass magic one can ensure that "external" calls to
Rational() actually go through createRational() instead?

Of course, none of this really has anything to do with rational
numbers. There must be many examples of classes for which internal
calls to __init__, from other methods of the same class, require
minimal argument processing, while external calls require heavier and
possibly computationally expensive processing. What's the usual way
to solve this sort of problem?

Mark
 
Z

Ziga Seilnacht

Mark wrote:

[a lot of valid, but long concerns about types that return
an object of their own type from some of their methods]

I think that the best solution is to use an alternative constructor
in your arithmetic methods. That way users don't have to learn about
two different factories for the same type of objects. It also helps
with subclassing, because users have to override only a single method
if they want the results of arithmetic operations to be of their own
type.

For example, if your current implementation looks something like
this:

class Rational(object):

# a long __init__ or __new__ method

def __add__(self, other):
# compute new numerator and denominator
return Rational(numerator, denominator)

# other simmilar arithmetic methods


then you could use something like this instead:

class Rational(object):

# a long __init__ or __new__ method

def __add__(self, other):
# compute new numerator and denominator
return self.result(numerator, denominator)

# other simmilar arithmetic methods

@staticmethod
def result(numerator, denominator):
"""
we don't use a classmethod, because users should
explicitly override this method if they want to
change the return type of arithmetic operations.
"""
result = object.__new__(Rational)
result.numerator = numerator
result.denominator = denominator
return result


Hope this helps,
Ziga
 
G

Gabriel Genellina

At said:
Of course, none of this really has anything to do with rational
numbers. There must be many examples of classes for which internal
calls to __init__, from other methods of the same class, require
minimal argument processing, while external calls require heavier and
possibly computationally expensive processing. What's the usual way
to solve this sort of problem?

In some cases you can differentiate by the type or number of
arguments, so __init__ is the only constructor used.
In other cases this can't be done, then you can provide different
constructors (usually class methods or static methods) with different
names, of course. See the datetime class, by example. It has many
constructors (today(), fromtimestamp(), fromordinal()...) all of them
class methods; it is a C module.

For a slightly different approach, see the TarFile class (this is a
Python module). It has many constructors (classmethods) like taropen,
gzopen, etc. but there is a single public constructor, the open()
classmethod. open() is a factory, dispatching to other constructors
depending on the combination of arguments used.


--
Gabriel Genellina
Softlab SRL






__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
 
S

Steven D'Aprano

Suppose you're writing a class "Rational" for rational numbers. The
__init__ function of such a class has two quite different roles to
play. First, it's supposed to allow users of the class to create
Rational instances; in this role, __init__ is quite a complex beast.
It needs to allow arguments of various types---a pair of integers, a
single integer, another Rational instance, and perhaps floats, Decimal
instances, and suitably formatted strings. It has to validate the
input and/or make sure that suitable exceptions are raised on invalid
input. And when initializing from a pair of integers---a numerator
and denominator---it makes sense to normalize: divide both the
numerator and denominator by their greatest common divisor and make
sure that the denominator is positive.

But __init__ also plays another role: it's going to be used by the
other Rational arithmetic methods, like __add__ and __mul__, to return
new Rational instances. For this use, there's essentially no need for
any of the above complications: it's easy and natural to arrange that
the input to __init__ is always a valid, normalized pair of integers.
(You could include the normalization in __init__, but that's wasteful

Is it really? Have you measured it or are you guessing? Is it more or less
wasteful than any other solution?
when gcd computations are relatively expensive and some operations,
like negation or raising to a positive integer power, aren't going to
require it.) So for this use __init__ can be as simple as:

def __init__(self, numerator, denominator):
self.numerator = numerator
self.denominator = denominator

So the question is: (how) do people reconcile these two quite
different needs in one function? I have two possible solutions, but
neither seems particularly satisfactory, and I wonder whether I'm
missing an obvious third way. The first solution is to add an
optional keyword argument "internal = False" to the __init__ routine,
and have all internal uses specify "internal = True"; then the
__init__ function can do the all the complicated stuff when internal
is False, and just the quick initialization otherwise. But this seems
rather messy.

Worse than messy. I guarantee you that your class' users will,
deliberately or accidentally, end up calling Rational(10,30,internal=True)
and you'll spent time debugging mysterious cases of instances not being
normalised when they should be.

The other solution is to ask the users of the class not to use
Rational() to instantiate, but to use some other function
(createRational(), say) instead.

That's ugly! And they won't listen.
Of course, none of this really has anything to do with rational
numbers. There must be many examples of classes for which internal
calls to __init__, from other methods of the same class, require
minimal argument processing, while external calls require heavier and
possibly computationally expensive processing. What's the usual way
to solve this sort of problem?

class Rational(object):
def __init__(self, numerator, denominator):
print "lots of heavy processing here..."
# processing ints, floats, strings, special case arguments,
# blah blah blah...
self.numerator = numerator
self.denominator = denominator
def __copy__(self):
cls = self.__class__
obj = cls.__new__(cls)
obj.numerator = self.numerator
obj.denominator = self.denominator
return obj
def __neg__(self):
obj = self.__copy__()
obj.numerator *= -1
return obj

I use __copy__ rather than copy for the method name, so that the copy
module will do the right thing.
 
S

Steven D'Aprano

class Rational(object):
def __init__(self, numerator, denominator):
print "lots of heavy processing here..."
# processing ints, floats, strings, special case arguments,
# blah blah blah...
self.numerator = numerator
self.denominator = denominator
def __copy__(self):
cls = self.__class__
obj = cls.__new__(cls)
obj.numerator = self.numerator
obj.denominator = self.denominator
return obj
def __neg__(self):
obj = self.__copy__()
obj.numerator *= -1
return obj


Here's a variation on that which is perhaps better suited for objects with
lots of attributes:

def __copy__(self):
cls = self.__class__
obj = cls.__new__(cls)
obj.__dict__.update(self.__dict__) # copy everything quickly
return obj
 
M

Mark Dickinson

Mark wrote:[a lot of valid, but long concerns about types that return
an object of their own type from some of their methods]

I think that the best solution is to use an alternative constructor
in your arithmetic methods. That way users don't have to learn about
two different factories for the same type of objects. It also helps
with subclassing, because users have to override only a single method
if they want the results of arithmetic operations to be of their own
type.

Aha. I was wondering whether __new__ might appear in the solution
somewhere, but couldn't figure out how that would work; I'd previously
only ever used it for its advertised purpose of subclassing immutable
types.
Hope this helps,

It helps a lot. Thank you.

Mark
 
M

Mark Dickinson

Is it really? Have you measured it or are you guessing? Is it more or less
wasteful than any other solution?

Just guessing :). But when summing the reciprocals of the first 2000
positive integers, for example, with:

sum((Rational(1, n) for n in range(1, 2001)), Rational(0))

the profile module tells me that the whole calculation takes 8.537
seconds, 8.142 of which are spent in my gcd() function. So it seemed
sensible to eliminate unnecessary calls to gcd() when there's an easy
way to do so.
def __copy__(self):
cls = self.__class__
obj = cls.__new__(cls)
obj.numerator = self.numerator
obj.denominator = self.denominator
return obj

Thank you for this.

Mark
 
B

Ben Finney

Suppose you're writing a class "Rational" for rational numbers. The
__init__ function of such a class has two quite different roles to
play.

That should be your first clue to question whether you're actually
needing separate functions, rather than trying to force one function
to do many different things.
First, it's supposed to allow users of the class to create Rational
instances; in this role, __init__ is quite a complex beast.

The __init__ function isn't the "constructor" you find in other
languages. Its only purpose is to initialise an already-created
instance, not make a new one.
It needs to allow arguments of various types---a pair of integers, a
single integer, another Rational instance, and perhaps floats, Decimal
instances, and suitably formatted strings. It has to validate the
input and/or make sure that suitable exceptions are raised on invalid
input. And when initializing from a pair of integers---a numerator
and denominator---it makes sense to normalize: divide both the
numerator and denominator by their greatest common divisor and make
sure that the denominator is positive.

All of this points to having a separate constructor function for each
of the inputs you want to handle.
But __init__ also plays another role: it's going to be used by the
other Rational arithmetic methods, like __add__ and __mul__, to
return new Rational instances.

No, it won't; those methods won't "use" the __init__ method. They will
use a constructor, and __init__ is not a constructor (though it does
get *called by* the construction process).
For this use, there's essentially no need for any of the above
complications: it's easy and natural to arrange that the input to
__init__ is always a valid, normalized pair of integers.

Therefore, make your __init__ handle just the default, natural case
you identify.

class Rational(object):
def __init__(self, numerator, denominator):
self.numerator = numerator
self.denominator = denominator
So the question is: (how) do people reconcile these two quite
different needs in one function?

By avoiding the tendency to crowd a single function with disparate
functionality. Every function should do one narrowly-defined task and
no more.

@classmethod
def from_string(input):
(n, d) = parse_elements_of_string_input(input)
return Rational(n, d)

@classmethod
def from_int(input):
return Rational(input, 1)

@classmethod
def from_rational(input):
(n, d) = (input.numerator, input.denominator)
return Rational(n, d)

def __add__(self, other):
result = perform_addition(self, other)
return result

def __sub__(self, other):
result = perform_subtraction(self, other)
return result

Put whatever you need to for 'parse_elements_of_string_input',
'perform_addition', 'perform_subtraction', etc; either the calculation
itself, if simple, or a call to a function that can contain the
complexity.

Use Python's exception system to avoid error-checking all over the
place; if there's a problem with the subtraction, for instance, let
the exception propagate up to the code that gave bad input.

The alternate constructors are decorated as '@classmethod' since they
won't be called as instance methods, but rather:

foo = Rational.from_string("355/113")
bar = Rational.from_int(17)
baz = Rational.from_rational(foo)
 
F

fumanchu

Steven said:
Here's a variation on that which is perhaps better suited for objects with
lots of attributes:

def __copy__(self):
cls = self.__class__
obj = cls.__new__(cls)
obj.__dict__.update(self.__dict__) # copy everything quickly
return obj

I recently had to do something similar for my ORM, where a
user-instantiated object gets expensive default values, but the back
end just overwrites those defaults when "resurrecting" objects, so it
shouldn't pay the price. However (and this is the tricky part), I also
wanted to allow subclasses to extend the __init__ method, so just using
cls.__new__(cls) didn't quite go far enough. Here's what I ended up
with [1]:

def __init__(self, **kwargs):
self.sandbox = None

cls = self.__class__
if self._zombie:
# This is pretty tricky, and deserves some detailed
explanation.
# When normal code creates an instance of this class, then
the
# expensive setting of defaults below is performed
automatically.
# However, when a DB recalls a Unit, we have its entire
properties
# dict already and should skip defaults in the interest of
speed.
# Therefore, a DB which recalls a Unit can write:
# unit = UnitSubClass.__new__(UnitSubClass)
# unit._zombie = True
# unit.__init__()
# unit._properties = {...}
# instead of:
# unit = UnitSubClass()
# unit._properties = {...}
# If done this way, the caller must make CERTAIN that all
of
# the values in _properties are set, and must call
cleanse().
self._properties = dict.fromkeys(cls.properties, None)
else:
# Copy the class properties into self._properties,
# setting each value to the UnitProperty.default.
self._properties = dict([(k, getattr(cls, k).default)
for k in cls.properties])

# Make sure we cleanse before assigning properties from
kwargs,
# or the new unit won't get saved if there are no further
changes.
self.cleanse()

for k, v in kwargs.iteritems():
setattr(self, k, v)

The _zombie argument is therefore a flag which allows you to keep the
initialization code inside __init__ (rather than repeating it inside
every method).


Robert Brewer
System Architect
Amor Ministries
(e-mail address removed)

[1] http://projects.amor.org/dejavu/browser/trunk/units.py#l552
 
S

Steven D'Aprano

Suppose you're writing a class "Rational" for rational numbers. The
__init__ function of such a class has two quite different roles to
play.

That should be your first clue to question whether you're actually
needing separate functions, rather than trying to force one function
to do many different things.
[snip]

All of this points to having a separate constructor function for each
of the inputs you want to handle.
[snip]

The alternate constructors are decorated as '@classmethod' since they
won't be called as instance methods, but rather:

foo = Rational.from_string("355/113")
bar = Rational.from_int(17)
baz = Rational.from_rational(foo)

That's one way of looking at it. Another way is to consider that __init__
has one function: it turns something else into a Rational. Why should the
public interface of "make a Rational" depend on what you are making it
from?

Think of built-ins like str() and int(). I suggest that people would be
*really* unhappy if we needed to do this:

str.from_int(45)
str.from_float(45.0)
str.from_list([45, 45.5])
etc.

Why do you consider that Rationals are different from built-ins in this
regard?

def __add__(self, other):
result = perform_addition(self, other)
return result

But that could just as easily be written as:

def __add__(self, other):
return perform_addition(self, other)

which then raises the question, why delegate the addition out of __add__
to perform_addition? There is at least three distinct costs: a larger
namespace, an extra function to write tests for; and an extra method
call for every addition. What benefit do you gain? Why not put the
perform_addition code directly in __add__?

Just creating an extra layer to contain the complexity of rational
addition doesn't gain you anything -- you haven't done anything to reduce
the complexity of the problem, but you have an extra layer to deal with.

And you still haven't dealt with another problem: coercions from other
types. If you want to be able to add Rationals to (say) floats, ints and
Rationals without having to explicitly convert them then you need some
method of dispatching to different initialiser methods. (You should be
asking whether you really do need this, but let's assume you do.)

Presumably you create a method Rational.dispatch_to_initialisers that
takes any object and tries each initialiser in turn until one succeeds,
then returns the resultant Rational. Or you could just call it
Rational.__init__.

This doesn't mean that __init__ must or even should contain all the
initialisation logic -- it could dispatch to from_string, from_float and
other methods. But the caller doesn't need to call the individual
initialisers -- although of course they are public methods and can be
called if you want -- since __init__ will do the right thing.
 
B

Ben Finney

Steven D'Aprano said:
But that could just as easily be written as:

def __add__(self, other):
return perform_addition(self, other)

Yes. I was merely suggesting that there would probably be more steps
involved than "return some_simple_expression", without actually
detailing those steps.
 
M

Mark Dickinson

That should be your first clue to question whether you're actually
needing separate functions, rather than trying to force one function
to do many different things.

Agreed. It was clear that I wanted two separate functions, but it
seemed that the system was effectively forcing me to use just one;
that is, I was working from the following two assumptions:

(1) *Every* time a Rational is created, __init__ must eventually be
called, and
(2) The user of the class expects to call Rational() to create
rationals.

(1) argues for __init__ being small, simple and efficient, while (2)
wants it to be large and user friendly (possibly dispatching to other
methods to do most of the real work). But as has been pointed out,
thanks to the existence of __new__, (1) is simply false. Time to be
thankful for new-style classes.
No, it won't; those methods won't "use" the __init__ method. They will
use a constructor, and __init__ is not a constructor (though it does
get *called by* the construction process).

Sorry---I'm well aware that __init__ isn't a constructor; I wasn't
being precise enough in my use of `used'.

Mark
 
M

Mark Dickinson

that is, I was working from the following two assumptions:

(1) *Every* time a Rational is created, __init__ must eventually be
called, and
(2) The user of the class expects to call Rational() to create
rationals.

(with apologies for replying to myself)

I'm still not being careful. The assumptions should be these:

(1) Any creation of a Rational instance must eventually go through
Rational().
(2) A call to Rational() eventually results in a call to __init__.
(3) The user of the class expects to call Rational() to create
rationals.

There you are: three flawed assumptions for the price of two! (1)
fails because __new__ provides an alternative, (2) could easily become
untrue by changing the metaclass, and (3)---well, who knows what the
user expects?

Right. Now that I've thoroughly analyzed my own stupidity, things are
much clearer. Thank you to all who replied.

Mark
 
?

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=

The alternate constructors are decorated as '@classmethod' since they
won't be called as instance methods, but rather:

foo = Rational.from_string("355/113")
bar = Rational.from_int(17)
baz = Rational.from_rational(foo)

I agree with you that that method is the right approach. But you can
also use module level functions, and sometimes that is even better:

def from_string(str):
(n, d) = parse_elements_of_string_input(str)
return Rational(n, d)

That way, you do not even have to expose the class at all to users of
the module. I think it depends on how you want users to use your
module. If you prefer:

import rational
rat = rational.from_string("123/456")

Then module level functions is best. But if you prefer:

from rational import Rational
rat = Rational.from_string("123/456")

class methods are better.
 
?

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=

On Tue, 16 Jan 2007 08:54:09 +1100, Ben Finney wrote:
Think of built-ins like str() and int(). I suggest that people would be
*really* unhappy if we needed to do this:

str.from_int(45)
str.from_float(45.0)
str.from_list([45, 45.5])
etc.

Why do you consider that Rationals are different from built-ins in this
regard?

I would solve that with:

import rational
rational.rational(45)
rational.rational(45.0)
rational.rational([45, 45.5])

def rational(obj):
initers = [(int, from_int), (basestring, from_str), (list, from_list)]
for obj_type, initer in initers:
if isinstance(obj, obj_type):
return initer(obj)
raise ValueError("Can not create a rational from a %r" % type(obj).__name__)
 
B

Ben Finney

BJörn Lindqvist said:
import rational
rational.rational(45)
rational.rational(45.0)
rational.rational([45, 45.5])

def rational(obj):
initers = [(int, from_int), (basestring, from_str), (list, from_list)]
for obj_type, initer in initers:
if isinstance(obj, obj_type):
return initer(obj)
raise ValueError("Can not create a rational from a %r" % type(obj).__name__)

You've just broken polymorphism. I can't use your factory function
with an instance of my custom type that behaves like a list, but is
not derived from list (or a non-'int' int, or a non-'basestring'
string).

Use the supplied value as you expect to be able to use it, and catch
the exception (somewhere) if it doesn't work. That will allow *any*
type that exhibits the correct behaviour, without needlessly
restricting it to a particular inheritance.
 
?

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=

BJörn Lindqvist said:
import rational
rational.rational(45)
rational.rational(45.0)
rational.rational([45, 45.5])

def rational(obj):
initers = [(int, from_int), (basestring, from_str), (list, from_list)]
for obj_type, initer in initers:
if isinstance(obj, obj_type):
return initer(obj)
raise ValueError("Can not create a rational from a %r" % type(obj).__name__)

You've just broken polymorphism. I can't use your factory function
with an instance of my custom type that behaves like a list, but is
not derived from list (or a non-'int' int, or a non-'basestring'
string).

Indeed, but I do not think that is fatal. There are many functions in
Pythons stdlib that breaks duck typing exactly like that.
Use the supplied value as you expect to be able to use it, and catch
the exception (somewhere) if it doesn't work. That will allow *any*
type that exhibits the correct behaviour, without needlessly
restricting it to a particular inheritance.

Can you show an example of that? It seems like that approach would
lead to some very convoluted code.
 
B

Ben Finney

BJörn Lindqvist said:
Can you show an example of that? It seems like that approach would
lead to some very convoluted code.

Perhaps so. I don't recommend either of those approaches; I recommend,
instead, separate factory functions for separate input types.
 
C

Chuck Rhode

Ben Finney wrote this on Wed, Jan 17, 2007 at 08:27:54PM +1100. My reply is below.
I recommend, instead, separate factory functions for separate input
types.

Uh, how 'bout separate subclasses for separate input types?
 
B

Ben Finney

Chuck Rhode said:
Uh, how 'bout separate subclasses for separate input types?

The resulting object in each case would be the same type ('Rational'),
with the same behaviour. Subclassing would make sense only if the
resulting objects needed to have different behaviour in each case.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top