"RuntimeError: dictionary changed size during iteration" ; Good atomiccopy operations?

Alex Martelli · Mar 13, 2006

Jean-Paul Calderone said:
This is vaguely possible using sys.setcheckinterval() now, although one
has to pick a ridiculously large number and hope that the atomic operation
takes fewer than that many opcodes.

Spelling "do not switch threads" as sys.setcheckinterval(None) seems
somewhat natural, though. Perhaps that would be a fruitful direction to
explore.

Indeed, it's a simple enough idea that it's worth proposing to
python-dev for consideration for 2.5, since the patch connected to
implementing it should be tiny. The only issue is whether Python
(meaning Guido) WANTS to change in order support this rough-and-ready
approach to multithreading; I hope he doesn't, but if he does we might
as well get it over with now.

Alex

fumanchu · Mar 13, 2006

You can also *almost* do it with a tracehook that blocks until released
by another thread. See http://projects.amor.org/misc/wiki/PyConquer for
the tool I'm sporadically working on that does that (in an effort to
test all possible execution paths). The only limitation is that trace
functions aren't called on every bytecode.

Robert Brewer
System Architect
Amor Ministries
(e-mail address removed)

robert · Mar 13, 2006

Raymond said:
No. It is non-atomic.

It seems that your application design intrinsically incorporates a race
condition -- even if deepcopying and pickling were atomic, there would
be no guarantee whether the pickle dump occurs before or after another
thread modifies the structure. While that design smells of a rat, it
may be that your apps can accept a dump of any consistent state and
that possibly concurrent transactions may be randomly included or
excluded without affecting the result.

Yes it is designed so with a discipline to be consistent and to allow
many threads without much locking; and the .dump is a autosave/backup
(=> ok at any time).

The requirement is weaker than atomicity. "Not crash" would be ok. In
that case the file was stored half => a corrupt pickle.
( In that case the app worked with a auto-multi-backup strategy, so the
crashed app recovered auto form the next backup at UnpicklingError, but
a real workaround is not possible without rewriting dump or deepcopy -
I use this multi-try on RuntimeError so far, but thats not "legal Python
code" )

Python's traditional recommendation is to put all access to a resource
in one thread and to have other threads communicate their transaction
requests via the Queue module. Getting results back was either done
through other Queues or by passing data through a memory location
unique to each thread. The latter approach has become trivially simple
with the advent of Py2.4's thread-local variables.

(passing through TLS? TLS are usally used for not-passing, or?)

That queue/passing-through-only-an-extra-global-var communication is
acceptable for thin thread interaction.
( hope this extra global var is thread-safe in future Python's

)

But "real" thread-programming should also be possible in Python - and it
is with the usual discipline in thread programming. This RuntimeError in
iterations is the (necessary) only compromise, I know of. (Maybe this
RuntimeError must not even be thrown from Python, when walking through
variable sequences is done smartly - but smart practice may cost speed,
so a compromise.)

It can be handled commonly by keys() and some error catching. key
functions like deepcopy and dump (which cannot easily be subclassed)
should fit into that "highest common factor" and not "judge" themselves
about _how_ thread programming has to be done.

Thinking about future directions for Python threading, I wonder if
there is a way to expose the GIL (or simply impose a temporary
moratorium on thread switches) so that it becomes easy to introduce
atomicity when needed:

gil.acquire(BLOCK=True)
try:
#do some transaction that needs to be atomic
finally:
gil.release()

Thats exectly what I requested here:

<[email protected]>

and here:

<[email protected]>

That "practical hammer" (little ugly, but very practical) would enable
to keep big threaded code VHL pythonic and keep us from putting
thousands of trivial locks into the code in low level language manner.
Some OS-functions like those of the socket module (on UNIX) do so anyway
( often unwanted :-( )

In addition Python should define its time atoms, and thus also the
definite sources of this (unavoidable?) RuntimeError - as explained in
the later link.

Since the app doesn't seem to care when the dump occurs, it might be
natural to put it in a while-loop that continuously retries until it
succeeds; however, you still run the risk that other threads may never
leave the object alone long enough to dump completely.

I have 5 trials max as of now. The error was about once in 3 months in
my case: that should solve the problem for the rest of the universe ...
If not, there is another bug going on.

I may switch to a solution with subclassed deepcopy withough
..iteritems(). But its lot of work to ensure,that it is really ok - and
consumes another few megs of memory and a frequent CPU peakload. So I
may leave the loop and may probably not switch at all ...

Robert

anamax · Mar 13, 2006

robert said:
Meanwhile I think this is a bug of cPickle.dump: It should use .keys()
instead of free iteration internally, when pickling elementary dicts.
I'd file a bug if no objection.

What should happen if there's a delete between the time the .keys()
runs and the time that the deleted element is processed by
cPickle.dump?

-andy

Terry Reedy · Mar 13, 2006

Though mostly ignorant of threading issues, I wonder if the following would
work. Derive a class from dict. Define a backup method that sets and
unsets a private self.lock. Define setitem and delitem methods that wrap
calls to the real methods with while self.lock: sleep(1 second).

tjr

Marc 'BlackJack' Rintsch · Mar 13, 2006

* Ruby without refcounts provides no deterministic __del__ in
non-circular refs ==> your type finally finally finally .close .close
.close all the time

Which is what you should type in Python too as there's no guarantee that
`__del__()` will be called immidiatly when the file object goes out of
scope or isn't referenced anymore. The reference counting memory
management is an implementation detail.

Ciao,
Marc 'BlackJack' Rintsch

Raymond Hettinger · Mar 14, 2006

[robert]

That queue/passing-through-only-an-extra-global-var communication is
acceptable for thin thread interaction.
( hope this extra global var is thread-safe in future Python's )

But "real" thread-programming should also be possible in Python - and it
is with the usual discipline in thread programming.

LOL, I'm glad you put "real" in quotes, and I'm glad that you recognize
that apps with intrinsic race conditions are not following "the usual
discipline in thread programming."

Embedded in this discussion is a plausable suggestion for Py2.5 to
offer a way for a thread to temporarily block thread switches while it
does something that needs to be atomic; however, the given use case is
on thin ice as a motivating example (because of the intrinsic race
condition, avoidance of locks, and avoidance of queues).

This RuntimeError in
iterations is the (necessary) only compromise, I know of. (Maybe this
RuntimeError must not even be thrown from Python, when walking through
variable sequences is done smartly - but smart practice may cost speed,
so a compromise.)

It can be handled commonly by keys() and some error catching. key
functions like deepcopy and dump (which cannot easily be subclassed)
should fit into that "highest common factor" and not "judge" themselves
about _how_ thread programming has to be done. . . .
In addition Python should define its time atoms, and thus also the
definite sources of this (unavoidable?) RuntimeError - as explained in
the later link.

Since others have responded directly to these thoughts, I'll aim at the
bigger picture and make an observation on Python sociology. Most users
are aware that Python is not cast in stone and that good ideas are
always welcome. That is usually a good thing, but it sometimes creates
a pitfall. When someone programs themselves into a corner, they
usually get a cue that something is wrong with their design concept or
that they are not working harmoniously with the language; however, in
the Python world, it is tempting avoid questioning one's own design and
instead start to assume that the language itself is misconcieved.

A good language suggestion should be general purpose,
easy-to-understand, universal across implementations, and solve more
than one use case. It is bad sign if you have to propose multiple,
simultaneous language changes just to get your script to work.
Likewise, it is a bad sign if the use case is somewhat unusual (i.e.
supporting an app with an intrinsic race-condition). Also, it is a bad
sign if the proposal is over-specific and ad-hoc (i.e. imposing
memory-intensive requirements on random pieces of library code about
how the code is allowed to loop over dictionaries).

Those criteria for good language proprosals are not met by requests to
start making pieces of pure python code try to fake atomicity.
However, there is a some chance for Py2.5 to introduce a thread-switch
lock (as I proposed in my previous post). If that is what you want,
feel free to log a SourceForge feature request (preferably with a
sensible use case and no troll phrases like "real" thread programming).

I have 5 trials max as of now. The error was about once in 3 months in
my case: that should solve the problem for the rest of the universe ...
If not, there is another bug going on.

I sure hope your code isn't being used in mission critical apps like
air traffic control :=0

Raymond

Alex Martelli · Mar 14, 2006

Marc 'BlackJack' Rintsch said:
Which is what you should type in Python too as there's no guarantee that
`__del__()` will be called immidiatly when the file object goes out of
scope or isn't referenced anymore. The reference counting memory
management is an implementation detail.

Absolutely true -- which is why Python 2.5 adds a new `with' statement
that allows using the powerful idiom "resource acquisition is
initialization" without relying on any implementation details (yay!).

Alex

Dennis Lee Bieber · Mar 14, 2006

a real workaround is not possible without rewriting dump or deepcopy -
I use this multi-try on RuntimeError so far, but thats not "legal Python
code" )

Why rewrite deepcopy?

Given how much time has gone into this thread, I've reached the
conclusion one could have coded a solution faster...

Let's see... One complaint is that all accesses to the dictionary
object being archived would need to be wrapped by locks...

Has anyone suggested that, instead of using the plain Python
dictionary, one replace (subclass/extend) dictionary to create a
protected dictionary -- that is, one in which all the "magic" access
methods are overridden to include the lock operation (and an instance
specific lock). Thereby, the rest of the code never sees the locks --
and only the code that initially creates the dictionary changes:

shared_copy = {}

replaced by:

shared_copy = Protected_Dict()

THEN, for purposes of the archival dump, give the protected
dictionary a deepcopy method, which locks at the start, makes the copy
(a regular dictionary is good enough, but to properly deepcopy it should
also be a protected version) using the base class methods, unlocks, and
returns the copy. The thread that creates the archival dump would only
need to invoke something like:

dump_copy = shared_copy.deepcopy()

and then perform the pickle or whatever on "dump_copy" -- that will be
static while the other threads may continue to process. The lock during
the "deepcopy" operation will only block those threads actually trying
to update the object -- whereas the proposed global threading lock would
affect ALL threads, and you'd still have to remember to wrap the
copy/pickle operation with those calls.

Maybe parallel this with a Protected_List... And if you have lists
inside the dictionary, a bit more code modification to ensure they are
initialized as protected types.

If I understand the system, unpickling may or may not be affected --
if you can unpickle on a node by node basis, and use regular (protected)
operations to recreate the main object...

That "practical hammer" (little ugly, but very practical) would enable
to keep big threaded code VHL pythonic and keep us from putting
thousands of trivial locks into the code in low level language manner.
Some OS-functions like those of the socket module (on UNIX) do so anyway
( often unwanted :-( )

That "hammer" sounds like Windows 95, having to drop into some near
"real mode" to do I/O, then back to VM mode for processing.

I may switch to a solution with subclassed deepcopy withough
.iteritems(). But its lot of work to ensure,that it is really ok - and
consumes another few megs of memory and a frequent CPU peakload. So I
may leave the loop and may probably not switch at all ...

As mentioned above, I think you may be subclassing the wrong item...
You should subclass the dictionary/list that is giving you the problem
and make /it/ behave safely by adding a deepcopy operation to it.
--

Raymond Hettinger · Mar 14, 2006

[robert]

In very rare cases a program crashes (hard to reproduce) :

* several threads work on an object tree with dict's etc. in it. Items
are added, deleted, iteration over .keys() ... ). The threads are "good"
in such terms, that this core data structure is changed only by atomic
operations, so that the data structure is always consistent regarding
the application. Only the change-operations on the dicts and lists
itself seem to cause problems on a Python level ..

* one thread periodically pickle-dumps the tree to a file:

"RuntimeError: dictionary changed size during iteration" is raised by
.dump ( or a similar "..list changed ..." )

What can I do about this to get a stable pickle-dump without risiking
execution error or even worse - errors in the pickled file ?

See if this fixes the problem for you:

try:
sys.setcheckinterval(sys.maxint)
cPickle.dump(obj, f) # now runs atomically
finally:
sys.setcheckinterval(100)

Be careful where you use this technique. In addition to suspending
other threads, it has the side-effect of suspending control-break
checks. IOW, you won't be able to break out of the dump().

Raymond

dictionary size changed during iteration	3	Apr 20, 2011
RuntimeError: dictionary changed size during iteration	6	Dec 8, 2008
CPython 2.7: Weakset data changing size during internal iteration	4	Jun 1, 2012
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

"RuntimeError: dictionary changed size during iteration" ; Good atomiccopy operations?

Alex Martelli

fumanchu

robert

anamax

Terry Reedy

Marc 'BlackJack' Rintsch

Raymond Hettinger

Alex Martelli

Dennis Lee Bieber

Raymond Hettinger

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads