Why aren't copy and deepcopy in __builtins__?

J

John Ladasky

Simple question. I use these functions much more frequently than many
others which are included in __builtins__. I don't know if my
programming needs are atypical, but my experience has led me to wonder
why I have to import these functions.
 
C

Carl Banks

Simple question.  I use these functions much more frequently than many
others which are included in __builtins__.  I don't know if my
programming needs are atypical, but my experience has led me to wonder
why I have to import these functions.

I rarely use them (for things like lists I use list() constructor to
copy, and for most class instances I usually don't want a straight
copy of all members), but I wouldn't have a problem if they were
builtin. They make more sense than a lot of builtins.

I'd guess the main reason they're not builtin is that they aren't
really that simple. The functions make use of a lot of knowledge
about Python types. Builtins tend to be for straightforward, simple,
building-block type functions.


Carl Banks
 
A

Andrea Crotti

John Ladasky said:
Simple question. I use these functions much more frequently than many
others which are included in __builtins__. I don't know if my
programming needs are atypical, but my experience has led me to wonder
why I have to import these functions.

I almost never use them either, maybe also in many cases you could avoid
using them...
When for example you use them?

I noticed some time ago in a program that needed speed that deepcopy in
particular is incredibly slow, but I guess is normal since it has to
copy every bit of the data structure.
 
D

Dave Angel

I almost never use them either, maybe also in many cases you could avoid
using them...
When for example you use them?

I noticed some time ago in a program that needed speed that deepcopy in
particular is incredibly slow, but I guess is normal since it has to
copy every bit of the data structure.

I'd expect it to be very slow. I presume it not only has to visit and
duplicate every bit of the data structure, but also has to detect loops,
and avoid infinite loops recreating them.

This loop detection is probably an n-squared algorithm, so it grows much
faster than the number of items being deep-copied.

DaveA
 
J

John Ladasky

I almost never use them either, maybe also in many cases you could avoid
using them...
When for example you use them?

To take one example: neural network programming. I construct a
network object (which is quite complex), test it, and keep it. Next,
I copy the network object, apply a slight variation to it, and test
that. If the variant performs less well than the original, I discard
the variant, and try another slightly-varied copy of the original. If
the variant is superior, I discard the original. And so on.

Another use: When I'm profiling code for speed, I generate a sequence
of function calls in a specific order. I would like to retain that
ordered sequence for when I print out the results of my speed test,
but also generate shuffled variations of that sequence. But
random.shuffle() alters the sequence in place, it does not return a
copy. If shuffle() did return a copy, I would not need to invoke copy
myself, but that's how the library function is written.
I noticed some time ago in a program that needed speed that deepcopy in
particular is incredibly slow, but I guess is normal since it has to
copy every bit of the data structure.

That may be, but when it already takes several seconds for my neural
network object to compute the output from a 100-element input
sequence, the time that the occasional deepcopy call takes is
insignificant to me.
 
I

Ian Kelly

I'd expect it to be very slow.  I presume it not only has to visit and
duplicate every bit of the data structure, but also has to detect loops, and
avoid infinite loops recreating them.

This loop detection is probably an n-squared algorithm, so it grows much
faster than the number of items being deep-copied.

I'm not certain how it's implemented, but I would presume that it just
builds a dict cache of the copied objects, keyed by the original
object ids, to prevent recurring into objects that have already been
copied. Adding an entry and checking for existence are both O(1)
average-case, so it should just be O(n) overall.
 
E

Ethan Furman

John said:
That may be, but when it already takes several seconds for my neural
network object to compute the output from a 100-element input
sequence, the time that the occasional deepcopy call takes is
insignificant to me.

A random thought that I haven't had time to test -- perhaps it would be
quicker if you could save the existing object to a config object, modify
the config object, then create a new instance of your neural net using
that config object... obviously, it would take some work to get the
initialization code going, but I would expect a speed increase from the
result.

Of course, if that's not where your bottlenext is, we'll never know...
(unless you're the curious type ;).

Oh, and as for the original question, I think I've used the copy module
once.

~Ethan~
 
S

Steven D'Aprano

To take one example: neural network programming. I construct a network
object (which is quite complex), test it, and keep it. Next, I copy the
network object, apply a slight variation to it, and test that. If the
variant performs less well than the original, I discard the variant, and
try another slightly-varied copy of the original. If the variant is
superior, I discard the original. And so on.

Without knowing the details of your neural network code, it isn't clear
that *copying* is the right way to do this. Presumably you initialize
your network object with some set of input parameters, as in:

obj = neural_network(a, b, c, d, e, f)

Those input params are probably far less complex than the internal
details of the object, particularly after it is trained. It is probably
faster and simpler to create a new variant directly from the modified
parameters:

obj = neural_network(a, b, c, d+1, e, f) # (say)

but of course your mileage may vary.

Another use: When I'm profiling code for speed, I generate a sequence of
function calls in a specific order. I would like to retain that ordered
sequence for when I print out the results of my speed test, but also
generate shuffled variations of that sequence. But random.shuffle()
alters the sequence in place, it does not return a copy. If shuffle()
did return a copy, I would not need to invoke copy myself, but that's
how the library function is written.

Just make sure you're using a standard list (as opposed to some arbitrary
sequence), and then use string-slicing to make a copy:

func_calls = [a, b, c, d, e, f]
another = func_calls[:]
shuffle(another)

If the string-slicing notation is too mysteriously Perl-like for you, you
can write that line as:

another = list(func_calls)

There's usually no need for a deep copy, particularly if you're writing
code in a semi-functional style with a minimum of in-place modifications
-- a shallow copy will do.

I don't believe I have ever needed to use either copy.deepcopy or
copy.copy in production code. Normally techniques like list slicing, list
comprehensions, {}.copy() and similar are more than enough. It's nice to
have the copy module there as a fall-back if I should ever need it, but I
haven't needed it yet.
 
I

Ian Kelly

error:
Traceback (most recent call last):
 File "Score_8.py", line 38, in <module>
   tokens = lines.split(",")
AttributeError: 'list' object has no attribute 'split'

So, what am I doing wrong?

'lines' is a list of strings.
'split' is a string method, not a list method, hence the error.

You want to call split on each line individually, not on the list.
 
R

Raymond Hettinger

Simple question.  I use these functions much more frequently than many
others which are included in __builtins__.  I don't know if my
programming needs are atypical, but my experience has led me to wonder
why I have to import these functions.

I asked Guido about this once and he said that he didn't
consider them to be part of the core. He worried that
they would be overused by beginners and they would be
a distraction from learning plain, simple Python which
doesn't often need either copy() or deepcopy().

AFAICT, he was right. I've seen large projects where
deepcopy and copy where not used even once.

Raymond
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top