improvement for copy.deepcopy : no memo for immutable types

Inquisitive Scientist · Jul 16, 2010

I am having problems with running copy.deepcopy on very large data
structures containing lots of numeric data:

1. copy.deepcopy can be very slow
2. copy.deepcopy can cause memory errors even when I have plenty of
memory

I think the problem is that the current implementation keeps a memo
for everything it copies even immutable types. In addition to being
slow, this makes the memo dict grow very large when there is lots of
simple numeric data to be copied. For long running programs, large
memo dicts seem to cause memory fragmentation and result in memory
errors.

It seems like this could be easily fixed by adding the following lines
at the very start of the deepcopy function:

if isinstance(x, (type(None), int, long, float, bool, str)):
return x

This seems perfectly safe, should speed things up, keep the memo dict
smaller, and be easy to add. Can someone add this to copy.py or point
me to the proper procedure for requesting this change in copy.py?

Thanks,
-I.S.

Stefan Behnel · Jul 16, 2010

Inquisitive Scientist, 16.07.2010 14:45:

I am having problems with running copy.deepcopy on very large data
structures containing lots of numeric data:

1. copy.deepcopy can be very slow
2. copy.deepcopy can cause memory errors even when I have plenty of
memory

I think the problem is that the current implementation keeps a memo
for everything it copies even immutable types. In addition to being
slow, this makes the memo dict grow very large when there is lots of
simple numeric data to be copied. For long running programs, large
memo dicts seem to cause memory fragmentation and result in memory
errors.

It seems like this could be easily fixed by adding the following lines
at the very start of the deepcopy function:

if isinstance(x, (type(None), int, long, float, bool, str)):
return x

This seems perfectly safe, should speed things up, keep the memo dict
smaller, and be easy to add.

and - have you tried it?

Stefan

Steven D'Aprano · Jul 16, 2010

I am having problems with running copy.deepcopy on very large data
structures containing lots of numeric data: [...]
This seems perfectly safe, should speed things up, keep the memo dict
smaller, and be easy to add. Can someone add this to copy.py or point me
to the proper procedure for requesting this change in copy.py?

These are the minimum steps you can take:

(1) Go to the Python bug tracker: http://bugs.python.org/

(2) If you don't already have one, create an account.

(3) Create a new bug report, explaining why you think deepcopy is buggy,
the nature of the bug, and your suggested fix.

If you do so, it might be a good idea to post a link to the bug here, for
interested people to follow up.

However doing the minimum isn't likely to be very useful. Python is
maintained by volunteers, and there are more bugs than person-hours
available to fix them. Consequently, unless a bug is serious, high-
profile, or affects a developer personally, it is likely to be ignored.
Sometimes for years. Sad but true.

You can improve the odds of having the bug (assuming you are right that
it is a bug) fixed by doing more than the minimum. The more of these you
can do, the better the chances:

(4) Create a test that fails with the current code, following the
examples in the standard library tests. Confirm that it fails with the
existing module.

(5) Patch the copy module to fix the bug. Confirm that the new test
passes with your patch, and that you don't cause any regressions (failed
tests).

(6) Create a patch file that adds the new test and the patch. Upload it
to the bug tracker.

There's no point in writing the patch for Python 2.5 or 3.0, don't waste
your time. Version 2.6 *might* be accepted. 2.7 and/or 3.1 should be,
provided people agree that it is a bug.

If you do all these things -- demonstrate successfully that this is a
genuine bug, create a test for it, and fix the bug without breaking
anything else, then you have a good chance of having the fix accepted.

Good luck! Your first patch is always the hardest.

Mark Lawrence · Jul 16, 2010

On 16/07/2010 14:59, Steven D'Aprano wrote:

[snip]

However doing the minimum isn't likely to be very useful. Python is
maintained by volunteers, and there are more bugs than person-hours
available to fix them. Consequently, unless a bug is serious, high-
profile, or affects a developer personally, it is likely to be ignored.
Sometimes for years. Sad but true.

To give people an idea, here's the weekly Summary of Python tracker
Issues on python-dev and timed at 17:07 today.

"
2807 open (+44) / 18285 closed (+18) / 21092 total (+62)

Open issues with patches: 1144

Average duration of open issues: 703 days.
Median duration of open issues: 497 days.

Open Issues Breakdown
open 2765 (+42)
languishing 14 ( +0)
pending 27 ( +2)

Issues Created Or Reopened (64)
"

I've spent a lot of time helping out in the last few weeks on the issue
tracker. The oldest open issue I've come across was dated 2001, and
there could be older. Unless more volunteers come forward, particularly
to do patch reviews or similar, the situation as I see it can only get
worse.

Kindest regards.

Mark Lawrence.

PROBLEM?? deepcopy	0	Jul 4, 2005
More On - deepcopy, Tkinter	6	Jul 5, 2005
ANN: dbf (aka Python dBase)	0	Mar 1, 2013
Semi-newbie, rolling my own __deepcopy__	12	Apr 4, 2005
ANN: dbf.py 0.94.003	0	Jul 26, 2012
help on Implementing a list of dicts with no data pattern	12	May 9, 2013
C99 integer types	24	Jul 29, 2012
save tuple of simple data types to disk (low memory foot print)	5	Oct 28, 2011

improvement for copy.deepcopy : no memo for immutable types

Inquisitive Scientist

Stefan Behnel

Steven D'Aprano

Mark Lawrence

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads