"RuntimeError: dictionary changed size during iteration" ; Good atomiccopy operations?

Discussion in 'Python' started by robert, Mar 11, 2006.

  1. robert

    robert Guest

    In very rare cases a program crashes (hard to reproduce) :

    * several threads work on an object tree with dict's etc. in it. Items
    are added, deleted, iteration over .keys() ... ). The threads are "good"
    in such terms, that this core data structure is changed only by atomic
    operations, so that the data structure is always consistent regarding
    the application. Only the change-operations on the dicts and lists
    itself seem to cause problems on a Python level ..

    * one thread periodically pickle-dumps the tree to a file:
    >>> cPickle.dump(obj, f)


    "RuntimeError: dictionary changed size during iteration" is raised by
    ..dump ( or a similar "..list changed ..." )

    What can I do about this to get a stable pickle-dump without risiking
    execution error or even worse - errors in the pickled file ?

    Is a copy.deepcopy ( -> "cPickle.dump(copy.deepcopy(obj),f)" ) an
    atomic opertion with a guarantee to not fail?

    Or can I only retry several times in case of RuntimeError? (which would
    apears to me as odd gambling; retry how often?)

    Robert


    PS: Zope dumps thread exposed data structes regularly. How does the ZODB
    in Zope handle dict/list changes during its pickling operations?


    ---
    Python 2.4.1 (#2, May 5 2005, 11:32:06)
    [GCC 3.3.5 (Debian 1:3.3.5-12)] on linux2
     
    robert, Mar 11, 2006
    #1
    1. Advertising

  2. robert

    robert Guest

    Re: "RuntimeError: dictionary changed size during iteration" ; Goodatomic copy operations?


    > Is a copy.deepcopy ( -> "cPickle.dump(copy.deepcopy(obj),f)" ) an
    > atomic opertion with a guarantee to not fail?
    >
    > Or can I only retry several times in case of RuntimeError? (which would
    > apears to me as odd gambling; retry how often?)


    For an intermediate solution, I'm playing roulette:

    for i in 1,2,3:
    try:
    cPickle.dump(obj, f)
    break
    except RuntimeError,v:
    pass


    I hope this works for some million years ...



    > PS: Zope dumps thread exposed data structes regularly. How does the ZODB
    > in Zope handle dict/list changes during its pickling operations?
     
    robert, Mar 11, 2006
    #2
    1. Advertising

  3. robert

    robert Guest

    Re: "RuntimeError: dictionary changed size during iteration" ; Goodatomic copy operations?

    robert wrote:

    >
    >> Is a copy.deepcopy ( -> "cPickle.dump(copy.deepcopy(obj),f)" ) an
    >> atomic opertion with a guarantee to not fail?
    >>
    >> Or can I only retry several times in case of RuntimeError? (which
    >> would apears to me as odd gambling; retry how often?)

    >
    >
    > For an intermediate solution, I'm playing roulette:
    >
    > for i in 1,2,3:
    > try:
    > cPickle.dump(obj, f)
    > break
    > except RuntimeError,v:
    > pass
    >


    hmm..

    for i in 1,2,3:
    try:
    cPickle.dump(obj, f)
    break
    except RuntimeError,v:
    f.seek(0);f.truncate(0)


    Meanwhile I think this is a bug of cPickle.dump: It should use .keys()
    instead of free iteration internally, when pickling elementary dicts.
    I'd file a bug if no objection.

    Robert


    > I hope this works for some million years ...
    >
    >
    >
    >> PS: Zope dumps thread exposed data structes regularly. How does the
    >> ZODB in Zope handle dict/list changes during its pickling operations?
     
    robert, Mar 11, 2006
    #3
  4. Re: "RuntimeError: dictionary changed size during iteration" ;Good atomic copy operations?

    Em Sáb, 2006-03-11 às 12:49 +0100, robert escreveu:
    > Meanwhile I think this is a bug of cPickle.dump: It should use .keys()
    > instead of free iteration internally, when pickling elementary dicts.
    > I'd file a bug if no objection.


    AFAICS, it's a problem with your code. You should lock your object while
    using it. That's what Threading.Lock is supposed to work for. If you
    want to use threads, you have to know in what parts of your code there
    should be locks.

    Cya,
    Felipe.

    --
    "Quem excele em empregar a força militar subjulga os exércitos dos
    outros povos sem travar batalha, toma cidades fortificadas dos outros
    povos sem as atacar e destrói os estados dos outros povos sem lutas
    prolongadas. Deve lutar sob o Céu com o propósito primordial da
    'preservação'. Desse modo suas armas não se embotarão, e os ganhos
    poderão ser preservados. Essa é a estratégia para planejar ofensivas."

    -- Sun Tzu, em "A arte da guerra"
     
    Felipe Almeida Lessa, Mar 11, 2006
    #4
  5. robert

    EleSSaR^ Guest

    Re: "RuntimeError: dictionary changed size during iteration" ; Good atomic copy operations?

    robert si è profuso/a a scrivere su comp.lang.python tutte queste
    elucubrazioni:

    [cut]

    I don't know what's your code like, but a similar error occurred in some of
    my software and it was my fault indeed. I think you should either use a
    lock, or implement a deepcopy method of your own.

    --
    EleSSaR^ <>
    --
    Togli .xyz dalla mia email per contattarmi.
     
    EleSSaR^, Mar 11, 2006
    #5
  6. robert

    robert Guest

    Re: "RuntimeError: dictionary changed size during iteration" ;Good atomic copy operations?

    Felipe Almeida Lessa wrote:

    > Em Sáb, 2006-03-11 às 12:49 +0100, robert escreveu:
    >
    >>Meanwhile I think this is a bug of cPickle.dump: It should use .keys()
    >>instead of free iteration internally, when pickling elementary dicts.
    >>I'd file a bug if no objection.

    >
    >
    > AFAICS, it's a problem with your code. You should lock your object while
    > using it. That's what Threading.Lock is supposed to work for. If you
    > want to use threads, you have to know in what parts of your code there
    > should be locks.


    99.99% no. I would have to use a lock everywhere, where I add or remove
    something into a dict or list of the struct. Thats not the purpose of
    big thread locks. Such simple operations are already atomic by the
    definition of Python - and thanks to the global interpreter lock.
    (Otherwise I would leave the Python language, God beware ... :) )

    I'm of course aware, where to use locks for resons of the application.
    But this is an issue on Python level. And it can be solved gracly and
    simple in Python - I guess:

    If cPickle.dump (and maybe also copy/deepcopy?) is corrected to work
    atomic on dicts (use .keys()) and list-copies or locks python threads)
    the problem is solved gracely and generally.

    Robert
     
    robert, Mar 11, 2006
    #6
  7. robert

    robert Guest

    Re: "RuntimeError: dictionary changed size during iteration" ; Goodatomic copy operations?

    EleSSaR^ wrote:

    > robert si è profuso/a a scrivere su comp.lang.python tutte queste
    > elucubrazioni:
    >
    > [cut]
    >
    > I don't know what's your code like, but a similar error occurred in some of
    > my software and it was my fault indeed. I think you should either use a
    > lock, or implement a deepcopy method of your own.


    100s of locks? no (see other message). It should be

    own deepcopy: thus, do you already know if the existing deepcopy has the
    same problem as cPickle.dump ? (as the problem araises rarely, it is
    difficult for me to test it out)

    Robert

    PS: how does ZODB work with this kind of problem? I thought is uses cPickle?
     
    robert, Mar 11, 2006
    #7
  8. Re: "RuntimeError: dictionary changed size during iteration" ; Good atomic copy operations?

    robert <> wrote:
    ...
    > 99.99% no. I would have to use a lock everywhere, where I add or remove
    > something into a dict or list of the struct. Thats not the purpose of
    > big thread locks. Such simple operations are already atomic by the
    > definition of Python - and thanks to the global interpreter lock.
    > (Otherwise I would leave the Python language, God beware ... :) )


    You have misread the Python Language Reference -- if you can give the
    URL on which you have read any such promise of atomicity, I will be glad
    to fix the docs to make that unambiguous.

    There is no such promise (there may be implementation accidents in some
    specific implementation which happen to make some operation atomic, but
    NO guarantee even there that the next bugfix won't break that).

    Farwell and best of luck in finding other languages which support
    threads in a way that is more to your liking than Python -- maybe Ruby
    suits you, I don't know for sure though.


    Alex
     
    Alex Martelli, Mar 11, 2006
    #8
  9. robert

    EleSSaR^ Guest

    Re: "RuntimeError: dictionary changed size during iteration" ; Good atomic copy operations?

    robert si è profuso/a a scrivere su comp.lang.python tutte queste
    elucubrazioni:

    > own deepcopy: thus, do you already know if the existing deepcopy has the
    > same problem as cPickle.dump ? (as the problem araises rarely, it is
    > difficult for me to test it out)


    I don't know the exact specs of your object, and I don't know what
    operations are you performing on that object, nor the way they're atomic.

    It seems like you're trying to save periodically the state of such object
    while it is being modified (a sort of backup?), and Python complains about
    that. A self-implemented deepcopy might raise anomalies (i.e. your dumped
    object may be partly a 'before' object and partly an 'after' object ) as
    well.

    By the way, you could try employing locks from other threads to dump the
    object as well... this would prevent additional locking.

    > PS: how does ZODB work with this kind of problem? I thought is uses cPickle?


    I have no idea about this.


    --
    EleSSaR^ <>
    --
    Togli .xyz dalla mia email per contattarmi.
     
    EleSSaR^, Mar 11, 2006
    #9
  10. robert

    EleSSaR^ Guest

    Re: "RuntimeError: dictionary changed size during iteration" ; Good atomic copy operations?

    robert si è profuso/a a scrivere su comp.lang.python tutte queste
    elucubrazioni:

    [cut]

    P.S.
    I'm very bad at threaded programming. Please verify any of my suggestions
    ^_^


    --
    EleSSaR^ <>
    --
    Togli .xyz dalla mia email per contattarmi.
     
    EleSSaR^, Mar 11, 2006
    #10
  11. robert

    Tim Peters Guest

    Re: "RuntimeError: dictionary changed size during iteration" ;Good atomic copy operations?

    [robert]
    > ...
    > PS: how does ZODB work with this kind of problem? I thought is uses cPickle?


    It does. Each thread in a ZODB application typically uses its own
    connection to a database. As a result, each thread gets its own
    consistent view of database objects, which can (and routinely does)
    vary across threads. No app-level synchronization is necessary
    because no sharing of in-memory objects occurs. When N threads each
    load a single persistent object from its own connection, N distinct
    in-memory revisions of that object are created (one per connection ==
    one per thread). If more than one thread modifies the same persistent
    object, the first thread to commit its changes "wins", and later
    threads that try to commit changes to the same object may suffer a
    ConflictError exception at commit time. Between transaction
    boundaries, each thread has an independent view of database state.
    Pragmatically, it's much more like programming with multiple processes
    than with multiple threads.
     
    Tim Peters, Mar 11, 2006
    #11
  12. robert

    robert Guest

    Re: "RuntimeError: dictionary changed ... & Ruby

    Alex Martelli wrote:

    > robert <> wrote:
    > ...
    >
    >>99.99% no. I would have to use a lock everywhere, where I add or remove
    >>something into a dict or list of the struct. Thats not the purpose of
    >>big thread locks. Such simple operations are already atomic by the
    >>definition of Python - and thanks to the global interpreter lock.
    >>(Otherwise I would leave the Python language, God beware ... :) )

    >
    > You have misread the Python Language Reference -- if you can give the
    > URL on which you have read any such promise of atomicity, I will be glad
    > to fix the docs to make that unambiguous.
    >
    > There is no such promise (there may be implementation accidents in some
    > specific implementation which happen to make some operation atomic, but
    > NO guarantee even there that the next bugfix won't break that).


    What? When I add/del an item to a dict or list, this is not an atomic
    thread-safe operation?
    E.g.:
    One thread does things like d['x']='y'
    Another thread reads d['z'] or sets d['z']='w' or dels something.

    If those operations are not atomic, then you'd have to use locks all the
    time to not get RuntimeErrors and worse !?

    Infact I rely on that all the time and standard python modules also do
    so AFAIK

    The only problem I know, is that on iteration over dicts/lists you get
    this type of error and this is understandable. But usually one solves
    this situations with .keys().

    I think cPickle has not necessarily to iterate free over native dicts.
    Whats does copy/deepcopy/[:] ?

    > Farwell and best of luck in finding other languages which support
    > threads in a way that is more to your liking than Python -- maybe Ruby
    > suits you, I don't know for sure though.


    I looked several times on Ruby, but stay with Python. Ruby is featured,
    but ill designed.

    * Ruby code is very very ugly @!{}&%$||endendend ..... egyptology.
    Nearly back to Perl.

    * try to translate this into Ruby:

    def f(): return 1
    def g(x): return x()
    g(f)

    => Then you'll receive a doctor hat about the OO paradigm and the famous
    "Ruby way". But you'll know, why functional programming is a stronger
    religion. Translating OO to Python, you'll often not even notice that
    Python's OO is attached to funcs and dicts. OO is naturally attached!
    The Ruby paradigm is more stilted.

    * Ruby doesn't lead to disciplined code. So much names for loops and
    everything => you are thinking and choosing 2x time and receive double
    mud. With Python you write happy and choiceless - but have all and more
    power.

    * Ruby without refcounts provides no deterministic __del__ in
    non-circular refs ==> your type finally finally finally .close .close
    ..close all the time

    * Rubys module and class namespaces are rewriteable from everywhere
    without any barriers. Thats mostly negative for serious apps. 'require'
    is the same random as C's #include. You scribble and change here - a
    bomb explodes in another module. That kills big projects. Modularization
    and consistency of modular code is 3x better in Python with its local
    module objects and other shielding stuff.

    * Ruby threads are not real OS threads, but the Ruby interpreter itself
    switches AFAIK. a Pro or Con regarding the requirements. The Python
    method is more powerfull for bigger apps

    * Ruby so far has no real (simple) generators, but in fact only block
    callbacks (bad readable also). In Ruby there is no framework for delayed
    execution - only a rudimentary error-prone 'callcc'. Thus they
    don't/can't have real iterators. So they also don't know of these kind
    of problems :). Python is more powerful in that, but things like
    cPickle.dump and deepcopy should be writen with discipline to not break
    code unnecessarily when Python evolves.

    * Ruby code executes 2x..4x slower, (but startup of very small scripts
    is 30% faster;)

    * etc etc ...

    Robert
     
    robert, Mar 11, 2006
    #12
  13. Re: "RuntimeError: dictionary changed ... & Ruby

    Em Sáb, 2006-03-11 às 23:44 +0100, robert escreveu:
    > > Farwell and best of luck in finding other languages which support
    > > threads in a way that is more to your liking than Python -- maybe Ruby
    > > suits you, I don't know for sure though.

    >
    > I looked several times on Ruby, but stay with Python. Ruby is featured,
    > but ill designed.

    [snip]

    Oh noes! Another rant of Ruby vs. Python! *Please*, no flamewars!
     
    Felipe Almeida Lessa, Mar 11, 2006
    #13
  14. robert

    robert Guest

    Re: "RuntimeError: dictionary changed size during iteration" ; Goodatomic copy operations?

    EleSSaR^ wrote:
    > robert si è profuso/a a scrivere su comp.lang.python tutte queste
    > elucubrazioni:
    >
    >
    >>own deepcopy: thus, do you already know if the existing deepcopy has the
    >>same problem as cPickle.dump ? (as the problem araises rarely, it is
    >>difficult for me to test it out)

    >
    > I don't know the exact specs of your object, and I don't know what
    > operations are you performing on that object, nor the way they're atomic.


    There is not much to know. Python object trees consist only of dicts and
    lists as far as variable non-atomic datastructures are concerned.
    (unless you use advanced extension libs like NumPy)

    Thus the RuntimeError problem is only about modified dicts/lists during
    Iteration in pickly/copy.


    > It seems like you're trying to save periodically the state of such object
    > while it is being modified (a sort of backup?), and Python complains about
    > that. A self-implemented deepcopy might raise anomalies (i.e. your dumped
    > object may be partly a 'before' object and partly an 'after' object ) as
    > well.


    Yes, a "backup" / autosave while all threads are running. It doesn't
    matter if 'before' of 'after' another item has been added/deleted
    atomically.


    > By the way, you could try employing locks from other threads to dump the
    > object as well... this would prevent additional locking.


    Don't understand.
    The threads work all simulatniously on the object tree, add and detach
    atomically only valid sub-trees.

    Regarding what AM said, I would have to lock _each_ dict/list operation
    on the tree, thus almost each change, because even a single attribute
    change "subobj.x='y'" is a dictionary operation. That would make
    threaded programming very arduous.

    AFAIK about the current Python implementation: This RuntimeError is only
    thrown "during true Iteration over dict/list, when the number of items
    changes". (and not when e.g. a single item is changed). Thus a

    def rt_save_dict_copy()
    tod={}
    for k in fromd.keys():
    try: tod[k]=fromd[k]
    except: pass
    return tod

    without true iteration over the original dict whould copy without
    RuntimeError.

    (or maybe equivalent: "dict(fromd.items())" ? )

    I don't know if dict.copy() works so - but I think so, as dict.keys()
    and dict.items() have the same footprint.

    The algorithm in cPickle.dump does not work so. Guess it does something
    like "for k in fromd: ..."(!) internally. This might be a "90%-bug"?

    Will have to see what copy/deepcopy does ...

    Robert
     
    robert, Mar 11, 2006
    #14
  15. robert

    robert Guest

    Re: "RuntimeError: dictionary changed size during iteration" ;Good atomic copy operations?

    Tim Peters wrote:

    > [robert]
    >
    >>...
    >>PS: how does ZODB work with this kind of problem? I thought is uses cPickle?

    >
    >
    > It does. Each thread in a ZODB application typically uses its own
    > connection to a database. As a result, each thread gets its own
    > consistent view of database objects, which can (and routinely does)
    > vary across threads. No app-level synchronization is necessary
    > because no sharing of in-memory objects occurs. When N threads each
    > load a single persistent object from its own connection, N distinct
    > in-memory revisions of that object are created (one per connection ==
    > one per thread). If more than one thread modifies the same persistent
    > object, the first thread to commit its changes "wins", and later
    > threads that try to commit changes to the same object may suffer a
    > ConflictError exception at commit time. Between transaction
    > boundaries, each thread has an independent view of database state.
    > Pragmatically, it's much more like programming with multiple processes
    > than with multiple threads.


    Thanks for that details.
    So when committing objects with multithreaded changes on a complex
    object into ZODB, it would raise the same a RuntimeError on altered
    dicts/lists...

    ---

    Looked up copy.py meanwhile:

    copy and deepcopy use :

    def _copy_dict(x):
    return x.copy()
    d[types.DictionaryType] = _copy_dict

    .....
    def _deepcopy_dict(x, memo):
    y = {}
    memo[id(x)] = y
    for key, value in x.iteritems():
    y[deepcopy(key, memo)] = deepcopy(value, memo)
    return y
    d[types.DictionaryType] = _deepcopy_dict


    Thus deepcopy (but not copy) seems to also expose itself to this
    RuntimeError as .iteritems() will iterate on the original dict!
    ( Would be maybe better to use x.items() here - as it was maybe before
    py2.2 )

    Its the same Problem as with cPickle.dump. Thus there seems to be no
    RuntimeError-save possibility in the standard Python lib to get a
    "current view" of an object tree in threaded applications.

    Guess it would be more wise to not expose deepcopy, cPickle.dump etc. to
    this kind of RuntimeError unnecessarily.
    The speed gain of the iterator-method - if any - is minor, compared to
    the app crash problems, which are not easy to discover and work-around
    (because they happen rarely on fast computers).

    Robert
     
    robert, Mar 11, 2006
    #15
  16. robert

    robert Guest

    Global Lock for Python Threading ? - Re: "RuntimeError: dictionarychanged ...

    robert wrote:
    >
    > Guess it would be more wise to not expose deepcopy, cPickle.dump etc. to
    > this kind of RuntimeError unnecessarily.
    > The speed gain of the iterator-method - if any - is minor, compared to
    > the app crash problems, which are not easy to discover and work-around
    > (because they happen rarely on fast computers).



    searched the thread, threading module for a function for generally
    locking/dislocking all other python threads from execution. Did not find
    something like that.

    (That would be very useful in some threading applications to protect
    critical sections without forcing the whole application to be populated
    with lock objects.
    Of course, such function should be used with care (and "finally") - but
    it should be there to make thread programming easier...)

    Robert
     
    robert, Mar 12, 2006
    #16
  17. Re: "RuntimeError: dictionary changed ... & Ruby

    robert <> wrote:
    ...
    > What? When I add/del an item to a dict or list, this is not an atomic
    > thread-safe operation?


    Exactly: there is no such guarantee in the Python language.

    > E.g.:
    > One thread does things like d['x']='y'
    > Another thread reads d['z'] or sets d['z']='w' or dels something.
    >
    > If those operations are not atomic, then you'd have to use locks all the
    > time to not get RuntimeErrors and worse !?


    If you want to be writing correct Python, yes. A preferred approach is
    to simply avoid sharing objects among threads, except for objects
    designed to be thread-safe (chiefly Queue.Queue).

    > Infact I rely on that all the time and standard python modules also do
    > so AFAIK


    You're relying on an accident of a specific, particular implementation;
    if any Python-coded standard library module does likewise, and I'm not
    aware of any, that's somewhat different (since that module is PART of
    the implementation, it may rely on all kinds of implementation details,
    correctly if maybe not wisely). The situation is quite different for
    C-coded modules in the CPython implementation, Java-coded ones in the
    Jython one, C#-coded one in the IronPython one; each of these is subject
    to specific constraints that it's perfectly wise to rely on (since each
    implementation, as the language specification fully allows it to do,
    adopts a different locking strategy at these low levels).

    > I think cPickle has not necessarily to iterate free over native dicts.


    It's not forced to by language specification, but neither is it
    forbidden. It would be an absurd implementation strategy to waste time
    and space to extract a dict's keys() first, as it would NOT buy
    "atomicity" anyway -- what if some other thread deletes keys while
    you're looping, or calls any nonatomic method on the very value you're
    in the process of serializing?!

    In some Python implementations, a C-coded module may count on some
    atomicity as long as it doesn't explicitly allow other threads nor ever
    call back into ANY python-coded part, but obviously cpickle cannot avoid
    doing that, so even in those implementations it will never be atomic.

    > Whats does copy/deepcopy/[:] ?


    Roughly the same situation.


    If as you indicate you want to stick with a Python-like language but do
    not want to change your style to make it correct Python, you could
    perhaps fork the implementation into an "AtomicPython" in which you
    somehow fix all nonatomicities (not sure how that would even be possible
    if pickling, deepcopying or anything else ever needs to fork into Python
    coded parts, but perhaps you might make the GIL into a reentrant lock
    and somehow hack it to work, with some constraints). Or perhaps you
    might be able to write an extension containing atomicset, atomicget,
    atomicpickle, and other operations you feel you need to be atomic (again
    barring the difficulties due to possible callbacks into Python) and use
    those instead of bare Python primitives.


    Alex
     
    Alex Martelli, Mar 12, 2006
    #17
  18. robert

    EleSSaR^ Guest

    Re: "RuntimeError: dictionary changed size during iteration" ; Good atomic copy operations?

    robert si è profuso/a a scrivere su comp.lang.python tutte queste
    elucubrazioni:

    > Yes, a "backup" / autosave while all threads are running. It doesn't
    > matter if 'before' of 'after' another item has been added/deleted
    > atomically.


    But it does matter if the autosave happens *while* an item is being
    updated, I suppose. E.g. if a single 'atomic' operation would change two
    dictionaries, and an autosave triggers after the first has been changed and
    the second hasn't, this would be an unwanted autosave, right?

    >> By the way, you could try employing locks from other threads to dump the
    >> object as well... this would prevent additional locking.

    >
    > Don't understand.
    > The threads work all simulatniously on the object tree, add and detach
    > atomically only valid sub-trees.


    You're never using any lock, then? Isn't it possible that two threads try
    changing the very same dict/list at the same time? Just one more question:
    are you running your software on a single-cpu machine?

    > change "subobj.x='y'" is a dictionary operation. That would make
    > threaded programming very arduous.


    Well... threaded programming usually is a hard task. No surprise so many
    people prefer async programming nowadays. It makes many things simpler.

    > def rt_save_dict_copy()
    > tod={}
    > for k in fromd.keys():
    > try: tod[k]=fromd[k]
    > except: pass
    > return tod
    >
    > without true iteration over the original dict whould copy without
    > RuntimeError.


    But with no warranty of data consistency. It will prevent new values to be
    computed, but if one value from the dict is modified during iteration, the
    dict may be left in a never-existed state:

    import random
    random.seed()
    fromd = {1:1, 2:2, 3:3, 4:4, 5:5}

    print "dict before iteration:", fromd
    def rt_save_dict_copy():
    tod={}
    for k in fromd.keys():
    try:
    tod[k]=fromd[k]
    except:
    pass
    fromd[random.choice(xrange(1,6))] = random.choice(xrange(1,10))
    return tod

    print "copied dict:", rt_save_dict_copy()
    print "dict after copy:", fromd




    --
    EleSSaR^ <>
    --
    Togli .xyz dalla mia email per contattarmi.
     
    EleSSaR^, Mar 12, 2006
    #18
  19. robert

    robert Guest

    Pythons (undefined) Practical Atoms ? - Re: "RuntimeError: dictionarychanged ...

    Alex Martelli wrote:

    > robert <> wrote:
    > ...
    >
    >>What? When I add/del an item to a dict or list, this is not an atomic
    >>thread-safe operation?

    >
    > Exactly: there is no such guarantee in the Python language.
    >
    >>E.g.:
    >>One thread does things like d['x']='y'
    >>Another thread reads d['z'] or sets d['z']='w' or dels something.
    >>
    >>If those operations are not atomic, then you'd have to use locks all the
    >>time to not get RuntimeErrors and worse !?

    >
    > If you want to be writing correct Python, yes. A preferred approach is
    > to simply avoid sharing objects among threads, except for objects
    > designed to be thread-safe (chiefly Queue.Queue).


    I don't know the Python language (non?-)definition about this. But the
    day, that will be a requirement in the Python implementation, I'll put
    the last Python in a safe :) ( ..and rethink my bad opinion about Ruby )

    For example hundreds of things like sre._cache and tenthousands of
    common global variables are shared "thread safe" in the standard lib
    whithout locks.

    ;-) They never will change any Python implementation and do the work to
    put millions of lock.acquire()'s into the standard lib...


    >>Infact I rely on that all the time and standard python modules also do
    >>so AFAIK

    >
    > You're relying on an accident of a specific, particular implementation;
    > if any Python-coded standard library module does likewise, and I'm not
    > aware of any, that's somewhat different (since that module is PART of
    > the implementation, it may rely on all kinds of implementation details,
    > correctly if maybe not wisely). The situation is quite different for
    > C-coded modules in the CPython implementation, Java-coded ones in the
    > Jython one, C#-coded one in the IronPython one; each of these is subject
    > to specific constraints that it's perfectly wise to rely on (since each
    > implementation, as the language specification fully allows it to do,
    > adopts a different locking strategy at these low levels).


    the other implementations whould also have a hard time to rewrite the
    standard lib.
    Python byte code is kind of "defined" and always interpreted similar and..

    >>> d={'a':1}
    >>> def f():

    .... print d['a']
    .... print d.keys()
    .... d['b']=2
    ....
    >>> dis.disassemble(f.func_code)

    2 0 LOAD_GLOBAL 0 (d)
    3 LOAD_CONST 1 ('a')
    6 BINARY_SUBSCR
    7 PRINT_ITEM
    8 PRINT_NEWLINE

    3 9 LOAD_GLOBAL 0 (d)
    12 LOAD_ATTR 1 (keys)
    15 CALL_FUNCTION 0
    18 PRINT_ITEM
    19 PRINT_NEWLINE

    4 20 LOAD_CONST 2 (2)
    23 LOAD_GLOBAL 0 (d)
    26 LOAD_CONST 3 ('b')
    29 STORE_SUBSCR
    30 LOAD_CONST 0 (None)
    33 RETURN_VALUE
    >>>



    ...things like LOAD_CONST / STORE_SUBSCR will be atomic as long as there
    is a GIL or at least a GIL during execution of one byte code. No
    threaded script language can reasonably afford to have thread-switching
    open down to native microprocessor execution atoms.


    >>I think cPickle has not necessarily to iterate free over native dicts.

    >
    > It's not forced to by language specification, but neither is it
    > forbidden. It would be an absurd implementation strategy to waste time
    > and space to extract a dict's keys() first, as it would NOT buy
    > "atomicity" anyway -- what if some other thread deletes keys while
    > you're looping, or calls any nonatomic method on the very value you're
    > in the process of serializing?!


    First I look at the practical requirement: I have the threads and the
    fast object tree and the need to autosave in this process. And don't
    want to lock everywhere any dict-access to the tree just because of the
    autosave (for app-reasons I need only a handful of locks so far). And I
    don't want to use a slow & overkill method like ZODB.

    One clean method (A) in order to stay practical would be possible if a
    global lock for Python treading would be offered by the thread-lib as
    described in <duvp5e$2rpm$>

    The other method (B), (that I believed to use so far) is to have a
    little discipline, but no need for massive trivial locking:
    * add complex objects to the hot tree only if the objects are
    complete/consistent.
    * don't do nasty things on removed complex objects; just forget them as
    usual
    * most attribute changes are already app-atomic; the overall majority of
    operations is read-only access anyway - both regarding the number of
    executions and the amount of source code (that is 99% the reason for
    using this "disciplined" method in threaded apps)
    * ..in case of non-atomic attribute changes, create a new appropriate
    top object with compound changes (same logic as in fact you have, when
    you use ZODB) and replace the whole top object in one step. Or use locks
    in rare cases.

    Now a practical "autosave" method in my app could fit to such practical
    disciplined method ( or not :-( ).

    And there is reason, why the Python standard lib should at least offer
    one "disciplined" method to sample-copy such object tree, despite of
    threads. There is no need get an "arbitray-Python-atomic" copy of the
    whole tree (which would require (A)). The app itself respects the
    discipline for "app-atomicity" if I use things in this way.

    Its a true practical standard requirement: Just a method with no
    "RuntimeError".

    Thus it is useful to have a deepcopy and/or cPickle.dump which does not
    break.

    Is current dict.copy() exposed to this Runtime Error? AFAIK: not.
    In that case copy.copy() respects "the disciplin", but deepcopy & dump
    not, because of use of .iteritems(). And most probably it worked before
    py2.2 as I got no such errors with that app in those times.

    The argument for the cost of dict.keys() - I guess it does nearly not
    even pay off in terms of speed. It could better stay disciplined in
    those few critical locations like in dump and deepcopy which are
    supposed to double much more and expensive things anyway. Its a weak
    O(1) issue.

    So far, its simple: I'd have to write my own dump or deepcopy after
    Python changes, because of new "fun" with RuntimeErrors !

    At least an alternate Runtime-Save version of deepcopy/dump would fit
    into a practical Python, if the defaults cannot be keept flat.


    > In some Python implementations, a C-coded module may count on some
    > atomicity as long as it doesn't explicitly allow other threads nor ever
    > call back into ANY python-coded part, but obviously cpickle cannot avoid
    > doing that, so even in those implementations it will never be atomic.


    The degree of atomicity defines the degree of usability for programming
    ideas. And that should not be lowered as it makes thread programming in
    VHL script languages so practical, when you just can do most things like
    d['a']='b' without thread worries.

    There is no theoretical treshold in a practical (=connected) world:

    Each OS and ASM/C level relies on CPU-time- & memory-atoms. In fact,
    otherwise, without such atoms, digital "platonic" computers could not
    interfer with the "threaded" reality at all. That is a
    _natural_requirement_ of "ideas" to _definitely_ be "atomic".
    The CPU bit x clocktick is defined now by ~1.8eV/kT in the real world =>
    no computing error within 10^20 years.

    ( Only something like biological neuro-brains or analogical computers
    can offer more of "free threading". But believe me, even the neuro- and
    quantum-threading respects the Plank-h-quantum: the time-resolution of
    "thread-interaction" is limited by energy restrictions and light speed )

    If Python really has not yet defined its time-atoms, that should go on
    the To-Do list ASAP. At worst, its atoms are that of ASM - I hope its
    better...


    >>Whats does copy/deepcopy/[:] ?

    >
    > Roughly the same situation.
    >
    > If as you indicate you want to stick with a Python-like language but do
    > not want to change your style to make it correct Python, you could
    > perhaps fork the implementation into an "AtomicPython" in which you
    > somehow fix all nonatomicities (not sure how that would even be possible
    > if pickling, deepcopying or anything else ever needs to fork into Python
    > coded parts, but perhaps you might make the GIL into a reentrant lock
    > and somehow hack it to work, with some constraints). Or perhaps you
    > might be able to write an extension containing atomicset, atomicget,
    > atomicpickle, and other operations you feel you need to be atomic (again
    > barring the difficulties due to possible callbacks into Python) and use
    > those instead of bare Python primitives.


    Each Python _is_ an AtomicPython. And that sets its VHL value. Maybe, it
    just doesn't know itself sofar?

    I asked maybe for a _big_ practical atom, and now have to make it
    myself, because Python became smaller :)

    The practical problem is: You must rely on and know deeply the real
    atoms of Python and the lib in order to know how much atoms/locks you
    have to ensure on your own.
    At a certain level, Python would loose its value. This
    disciplined-threading/dump/deepcopy issue here is of course somewhat
    discussable on the upper region of VHL. (I certainly like that direction)

    But, what you sketch here on the far other side about
    take-care-about-all-dicts-in-future-Python-threads is the door to the
    underworld and is certainly a plan for a Non-Python.

    "take care for everything" is in fact an excuse for heading towards low
    level in the future. (Ruby for example has in many ways more "greed"
    towards the low level and to "aritrary fun" - too much for me - and
    without offering more real power)

    Python evolution was mostly ok and careful. But it should be taken care
    that not too much negative ghosts break VHL atoms:

    One such ghost went into _deepcopy_dict/dump (for little LL speed
    issues). Other ghosts broke rexec ( a nice "Atom" which i miss very much
    now ), ...

    A function for locking python threading globally in the thread module
    would be kind of a practical "hammer" to maintain VHL code in deliberate
    key cases. (Similar as you do 'cli' in device driver code, but for a
    python process.)
    For example: "I know as programmer, if I can afford to lock app threads
    during a small multi-attribute change or even during .deepcopy etc.; The
    cost for this is less than the need to spread individual locks massively
    over the whole treaded app."
    For example socket.getaddrinfo does this unforeseeable for all
    app-thread for minutes (!) in bad cases internally on OS-level anyway.

    Python should name its Atoms.


    Robert
     
    robert, Mar 12, 2006
    #19
  20. Re: "RuntimeError: dictionary changed size during iteration" ; Good atomic copy operations?

    [robert]
    > In very rare cases a program crashes (hard to reproduce) :
    >
    > * several threads work on an object tree with dict's etc. in it. Items
    > are added, deleted, iteration over .keys() ... ). The threads are "good"
    > in such terms, that this core data structure is changed only by atomic
    > operations, so that the data structure is always consistent regarding
    > the application. Only the change-operations on the dicts and lists
    > itself seem to cause problems on a Python level ..
    >
    > * one thread periodically pickle-dumps the tree to a file:
    > >>> cPickle.dump(obj, f)

    >
    > "RuntimeError: dictionary changed size during iteration" is raised by
    > .dump ( or a similar "..list changed ..." )
    >
    > What can I do about this to get a stable pickle-dump without risiking
    > execution error or even worse - errors in the pickled file ?
    >
    > Is a copy.deepcopy ( -> "cPickle.dump(copy.deepcopy(obj),f)" ) an
    > atomic opertion with a guarantee to not fail?


    No. It is non-atomic.

    It seems that your application design intrinsically incorporates a race
    condition -- even if deepcopying and pickling were atomic, there would
    be no guarantee whether the pickle dump occurs before or after another
    thread modifies the structure. While that design smells of a rat, it
    may be that your apps can accept a dump of any consistent state and
    that possibly concurrent transactions may be randomly included or
    excluded without affecting the result.

    Python's traditional recommendation is to put all access to a resource
    in one thread and to have other threads communicate their transaction
    requests via the Queue module. Getting results back was either done
    through other Queues or by passing data through a memory location
    unique to each thread. The latter approach has become trivially simple
    with the advent of Py2.4's thread-local variables.

    Thinking about future directions for Python threading, I wonder if
    there is a way to expose the GIL (or simply impose a temporary
    moratorium on thread switches) so that it becomes easy to introduce
    atomicity when needed:

    gil.acquire(BLOCK=True)
    try:
    #do some transaction that needs to be atomic
    finally:
    gil.release()



    > Or can I only retry several times in case of RuntimeError? (which would
    > apears to me as odd gambling; retry how often?)


    Since the app doesn't seem to care when the dump occurs, it might be
    natural to put it in a while-loop that continuously retries until it
    succeeds; however, you still run the risk that other threads may never
    leave the object alone long enough to dump completely.


    Raymond
     
    Raymond Hettinger, Mar 13, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jesus M. Salvo Jr.
    Replies:
    2
    Views:
    4,321
    robert
    Feb 11, 2006
  2. Roman Suzi
    Replies:
    0
    Views:
    337
    Roman Suzi
    Jan 19, 2005
  3. Terry Reedy
    Replies:
    0
    Views:
    381
    Terry Reedy
    Jan 20, 2005
  4. Jean-Paul Calderone
    Replies:
    0
    Views:
    376
    Jean-Paul Calderone
    Mar 13, 2006
  5. Robert Dailey
    Replies:
    6
    Views:
    399
    Terry Reedy
    Dec 9, 2008
Loading...

Share This Page