Copying objects style questions

B

Bob Halley

In dnspython I have a set class, SimpleSet. (I don't use Python 2.3's
sets.Set class so I can keep supporting Python 2.2, and because the
objects in my sets are mutable). The SimpleSet class has a single
attribute, "items", which is a list. (I know a list is not going to
lead to fast set operations in general, but my typical set has only
one or two elements in it, so the potential performance issues don't
really matter for my needs.)

I then subclass SimpleSet to make other kinds of sets, e.g. RRset
subclasses Rdataset which subclasses SimpleSet. RRsets and Rdatasets
each add additional attributes.

I want to have a copy operation which is an "almost shallow" copy.
Specifically, all of the attributes of the object may be shallow
copied except for one, the 'items' list of the SimpleSet, for which I
want a new list containing references to the same elements, so that
the user of the copy may add or remove elements subsequently without
affecting the original.

I can't use copy.copy()'s default behavior, because it is too shallow.
I don't want to use copy.deepcopy() because it's too deep. I
contemplated __copy__, __initargs__, __getstate__, and __setstate__,
but they didn't seem to fit the bill, or seemed more complicated than
the solution I ended up with (see below).

I can, of course, write my own copy() method, but I don't want to
require each subclass of Set have to make a copy() method which
implements the entire copying effort. Rather I'd like cooperating
superclasses; I'd like RRset to copy the name, and then let Rdataset
copy its attributes, and then let SimpleSet do the copy of the items
attribute.

My first solution was like clone() in Java:

In SimpleSet:

def copy(self):
"""Make a (shallow) copy of the set.

There is a 'copy protocol' that subclasses of
this class should use. To make a copy, first
call your super's copy() method, and use the
object returned as the new instance. Then
make shallow copies of the attributes defined
in the subclass.

This protocol allows us to write the set
algorithms that return new instances
(e.g. union) once, and keep using them in
subclasses.
"""

cls = self.__class__
# I cannot call self.__class__() because the
# __init__ method of the subclasses cannot be
# called meaningfully with no arguments
obj = cls.__new__(cls)
obj.items = list(self.items)
return obj

In Rdataset, which subclasses SimpleSet:

def copy(self):
obj = super(Rdataset, self).copy()
obj.rdclass = self.rdclass
obj.rdtype = self.rdtype
obj.covers = self.covers
obj.ttl = self.ttl
return obj

I've also noticed that if I just make SimpleSet subclass list instead
of having an "items" element, then "the right thing" happens with
copy.copy(). I'm a little leery of subclassing the base types,
because although I've done it to good effect sometimes, I've also had
it cause odd problems because the built-in types behave a little
differently than new-style classes in some cases. Also, at least in
this case, it fails the is-a test. A set is *not* a list; the fact
that I'm using a list is an implementation detail that I might not
want to expose.

So, what advice do Python experts have for this kind of situation?
Should I keep the first solution? Should I subclass list in spite of
my misgivings? Is there some other, more elegant solution I've
missed?

Thanks in advance!

/Bob
 
A

Alex Martelli

Bob Halley wrote:
...
I can't use copy.copy()'s default behavior, because it is too shallow.
I don't want to use copy.deepcopy() because it's too deep. I

So far, so perfectly clear.
contemplated __copy__, __initargs__, __getstate__, and __setstate__,
but they didn't seem to fit the bill, or seemed more complicated than
the solution I ended up with (see below).

I don't understand this. What's wrong with, e.g.:

def __copy__(self):
class EmptyClass: pass
obj = EmptyClass()
obj.__class__ = self.__class__
obj.__dict__.update(self.__dict__)
obj.items = list(self.items)
return obj

??? It seems simpler and more automatic than your 'copy protocol';
subclasses don't need to do anything special unless they need to
"deepen" the copy of some of their attributes. Btw, if you're
using new-style classes, you'll want to use object instead of
EmptyClass, or start with obj = self.__class__.new(self.__class__)
as you're doing in your protocol, of course -- but the key idea
is "bulk copy all that's in self's __dict__, then special-case
only what little needs to be specialcased". And doing it in a
method called __copy__ means any user of your class needs not
learn about a new copy protocol but rather just uses copy.copy.

It may be that I'm not correctly understanding your issues, of
course, but I hope these suggestions can help.


Alex
 
B

Bengt Richter

Bob Halley wrote:
...

So far, so perfectly clear.


I don't understand this. What's wrong with, e.g.:

def __copy__(self):
class EmptyClass: pass
obj = EmptyClass()
obj.__class__ = self.__class__
obj.__dict__.update(self.__dict__)
obj.items = list(self.items)
return obj

??? It seems simpler and more automatic than your 'copy protocol';
subclasses don't need to do anything special unless they need to
"deepen" the copy of some of their attributes. Btw, if you're
using new-style classes, you'll want to use object instead of
EmptyClass, or start with obj = self.__class__.new(self.__class__)

<nits>
I don't think you meant object() as a direct substitute for EmptyClass() above, right?
And you meant to put tails on that "new," presumably.
as you're doing in your protocol, of course -- but the key idea
is "bulk copy all that's in self's __dict__, then special-case
only what little needs to be specialcased". And doing it in a
method called __copy__ means any user of your class needs not
learn about a new copy protocol but rather just uses copy.copy.

It may be that I'm not correctly understanding your issues, of
course, but I hope these suggestions can help.


Alex

Regards,
Bengt Richter
 
A

Alex Martelli

Bengt Richter wrote:
...
<nits>
I don't think you meant object() as a direct substitute for EmptyClass()

Right -- you couldn't assign __class__ on an object() result, for example.
above, right? And you meant to put tails on that "new," presumably.
</nits>

Right again. So, for completeness, with a new-style class you could
do, e.g. (probably optimal or close to it):

def __copy__(self):
obj = self.__class__.__new__(self.__class__)
obj.__dict__.update(self.__dict__)
obj.items = list(self.items)
return obj

Sorry for the sloppy expression in those hurriedly-written two lines;-).

To forestall further misunderstandings -- this would not work for
objects with __slots__ (no __dict__) or inheriting from classes with
__slots__ (wouldn't copy attributes not in __dict__) -- there is
always more work needed for such cases. But if one uses __slots__
one *deserves* to have to work a bit harder as a result;-).


Alex
 
A

Alex Martelli

Bob said:
I use new-style classes with __slots__ :), because it makes a really
noticeable difference in memory usage if you use dnspython to work
with a good-sized zone (i.e. one with tens of thousands of records).

If you have huge numbers of instances of a class, it sure may well make
sense to give the class __slots__ in order to save memory -- that's what it
was introduced for. But then, giving that class a __copy__, or the like,
which knows about its slots (or using the deep dark magic of _reduce_ex,
I guess...), if you need to perform fancier copies than ordinary copy.copy
or copy.deepcopy would do for you, seems fair enough to me:).


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top