garbage collection / cyclic references

Discussion in 'Python' started by Aaron Brady, Mar 21, 2009.

  1. Aaron Brady

    Aaron Brady Guest

    Hello,

    I was reading and Googling about garbage collection, reference
    counting, and the problem of cyclic references.

    Python's garbage collection module claims to be able to detect and
    break cyclic garbage. Some other languages merely prohibit it. Is
    this the place to ask about its technique?

    I understand that the disadvantage is a non-deterministic order of
    deletion/finalization. It's an acceptable cost for the system I am
    considering. I may even impose an arbitrary but consistent order on
    it, such as by class name, by id, etc. (tm!).

    After all the hullabaloo, I don't see that Python's purportedly
    successful strategy amounts to anything more than a breadth-first
    search, plus some optimizations. Did I miss something? What are its
    caveats and fragilities? If free software can do it, why isn't it all
    over the industry? What disqualifies it from solved-problem status?

    Finally, what are the costs and benefits of an additional
    '__gc_clear__' special method, for example, that is the equivalent of
    the 'tp_clear' method on extension types? Or, perhaps, a
    '__gc_cycle__' method, that is called with the attribute names of
    attributes that lead to a cycle, that would be called prior to the
    '__del__' method?
     
    Aaron Brady, Mar 21, 2009
    #1
    1. Advertising

  2. Aaron Brady

    Paul Rubin Guest

    "andrew cooke" <> writes:
    > the two dominant virtual machines - .net and the jvm both handle circular
    > references with no problem whatever.


    AFAIK, they also don't guarantee that finalizers ever run, much less
    run in deterministic order.
     
    Paul Rubin, Mar 21, 2009
    #2
    1. Advertising

  3. Aaron Brady

    Aaron Brady Guest

    On Mar 20, 8:12 pm, "andrew cooke" <> wrote:
    > Aaron Brady wrote:
    >
    > [...]
    >
    > > caveats and fragilities?  If free software can do it, why isn't it all
    > > over the industry?  What disqualifies it from solved-problem status?

    >
    > the two dominant virtual machines - .net and the jvm both handle circular
    > references with no problem whatever.  this is standard in modern garbage
    > collection - go read a book on the subject (personally i like grune et
    > al's modern compiler design).  it *is* a solved problem.  if anything,
    > python is behind the curve, not ahead of it, but this may change with the
    > next generation of python implementations (pypy targets a variety of vms,
    > i think).
    >
    > as for the extra methods you suggest - why do you want to expose
    > implementation details in an api?  that is not the normal aim of good
    > design.
    >
    > andrew


    "Circular references ...can only be cleaned up if there are no Python-
    level __del__() methods involved." __del__ doc.

    "Python doesn’t collect ... cycles automatically because, in general,
    it isn’t possible for Python to guess a safe order in which to run the
    __del__() methods." gc.garbage doc.

    "Errors should never pass silently." -The Zen of Python

    I advance that cyclic objects should be notified when their external
    references go to zero, but their usual '__del__' is inappropriate. If
    objects implement a __del__ method, they can choose to implement a
    '__gc_cycle__' method, and then just drop the specified attribute. It
    needn't be called on every object in the cycle, either; once it's
    called on one object, another object's normal __del__ may be safely
    called. Output, unproduced:

    >>> del x

    In X.__gc_cycle__, 'other' attribute. Deleting...
    In Y.__del__.
    In X.__del__.
    >>>


    The actual backend of CPython requires garbage-collected container
    types to implement tp_inquiry and tp_clear methods, but user-defined
    types apparently aren't required to conform.

    Supporting Cyclic Garbage Collection
    http://docs.python.org/3.0/c-api/gcsupport.html
     
    Aaron Brady, Mar 21, 2009
    #3
  4. > The actual backend of CPython requires garbage-collected container
    > types to implement tp_inquiry and tp_clear methods, but user-defined
    > types apparently aren't required to conform.


    tp_inquiry doesn't exist, you probably mean tp_traverse. tp_traverse
    is completely irrelevant for python-defined types; the VM can traverse
    a user-defined type just fine even without the help of tp_traverse.
    If a C-defined type fails to implement tp_traverse when it should,
    then garbage collection breaks entirely.

    tp_clear isn't invoked for an object at all if the object is in a
    cycle with finalizers, so it's not something that you can use to
    detect that you are in a cycle with finalizers.

    Cycles with finalizers are considered a bug; application programmers
    should check gc.garbage at the end of the program to determine whether
    they have this bug. There is an easy design pattern around it, so I'm
    -1 on complicating the GC protocol.

    Regards,
    Martin
     
    Martin v. Löwis, Mar 21, 2009
    #4
  5. Aaron Brady

    Aaron Brady Guest

    On Mar 21, 7:54 am, "andrew cooke" <> wrote:
    > Paul Rubin wrote:
    > > "andrew cooke" <> writes:
    > >> the two dominant virtual machines - .net and the jvm both handle
    > >> circular
    > >> references with no problem whatever.

    >
    > > AFAIK, they also don't guarantee that finalizers ever run, much less
    > > run in deterministic order.

    >
    > i think you're right, but i'm missing your point - perhaps there was some
    > sub-context to the original post that i didn't understand?
    >
    > finalizers should not be considered part of a public resource management
    > api - they should not be used to do things like flushing and closing
    > files, for example.   i think this was a minor "issue" early in java's
    > adoption (i guess because of incorrect assumptions made by c++
    > programmers) (in python the with context is a much better mechanism for
    > this kind of thing - the best java has is the finally statement).  but
    > it's one of those things that (afaik) isn't an issue once you fully
    > embrace the language (rather like, say, semantically meaningful
    > indentation).
    >
    > but i'm sure you know all that, so i'm still wondering what i've missed.
    >
    > andrew


    Theoretically, my object should be able to maintain an open resource
    for its lifetime; and its clients shouldn't need to know what its
    lifetime is. Therefore, it needs a callback when that is over.

    If finalization methods could be called in a structurally sound
    manner, they could be relied on to handle flushing and closing files.

    > they should not be used to do things like flushing and closing
    > files, for example.


    What is your basis for this claim, if it's not the mere unreliability
    of finalization? IOW, are you not merely begging the question?
     
    Aaron Brady, Mar 21, 2009
    #5
  6. Aaron Brady

    Aaron Brady Guest

    On Mar 21, 9:50 am, "andrew cooke" <> wrote:
    > Aaron Brady wrote:
    > > On Mar 21, 7:54 am, "andrew cooke" <> wrote:
    > >> they should not be used to do things like flushing and closing
    > >> files, for example.

    > > What is your basis for this claim, if it's not the mere unreliability
    > > of finalization?  IOW, are you not merely begging the question?

    >
    > I'm not sure it's clear, but I was talking about Java.
    >
    > As Paul implied, a consequence of completely automated garbage management
    > is that it is (from a programmer's POV) deterministic.  So it's a

    [indeterministic]
    > programming error to rely on the finalizer to free resources that don't
    > follow that model (ie any resource that's anything other that

    [than]
    > reasonable
    > amounts of memory).
    >
    > That's pretty much an unavoidable consequence of fully automated garbage
    > collection.  You can pretend it's not, and try using finalizers for other
    > work if you want.  That's fine - it's your code, not mine.  I'm just
    > explaining how life is.
    >
    > Andrew


    Hi, nice to talk to you this early. Sorry you're in a bad mood.
    You've sure come to the right place to find friends though. </tongue
    in cheek>

    My point is, that garbage collection is able to detect when there are
    no program-reachable references to an object. Why not notify the
    programmer (the programmer's objects) when that happens? If the
    object does still have other unreachable references, s/he should be
    informed of that too.

    I advanced an additional method to this end. Do you argue that there
    aren't any cases in which the class could make use of the information;
    or there aren't /enough/ cases so in which?

    Perhaps it would help to handle a contrary case by hand. Two objects
    need to make write operations each to the other when they are closed.
    Would it be sufficient in general, knowing nothing further about them,
    to queue some information, and close? Do they always know at design-
    time their references will be cyclic? Would a mere
    '__leave_reachability__' method be more generally informative or
    robust? Would it constitute a two-step destruction, to notify objects
    when they're unreachable, and then finalize? The two objects' write
    operations could execute in such a method, without risking prior
    destruction.
     
    Aaron Brady, Mar 21, 2009
    #6
  7. Aaron Brady

    Aaron Brady Guest

    On Mar 21, 10:28 am, Aaron Brady <> wrote:
    > On Mar 21, 9:50 am, "andrew cooke" <> wrote:
    >
    >
    >
    > > Aaron Brady wrote:
    > > > On Mar 21, 7:54 am, "andrew cooke" <> wrote:
    > > >> they should not be used to do things like flushing and closing
    > > >> files, for example.
    > > > What is your basis for this claim, if it's not the mere unreliability
    > > > of finalization?  IOW, are you not merely begging the question?

    >
    > > I'm not sure it's clear, but I was talking about Java.

    >
    > > As Paul implied, a consequence of completely automated garbage management
    > > is that it is (from a programmer's POV) deterministic.  So it's a

    > [indeterministic]
    > > programming error to rely on the finalizer to free resources that don't
    > > follow that model (ie any resource that's anything other that

    > [than]
    > > reasonable
    > > amounts of memory).

    >
    > > That's pretty much an unavoidable consequence of fully automated garbage
    > > collection.  You can pretend it's not, and try using finalizers for other
    > > work if you want.  That's fine - it's your code, not mine.  I'm just
    > > explaining how life is.

    >
    > > Andrew

    >
    > My point is, that garbage collection is able to detect when there are
    > no program-reachable references to an object.  Why not notify the
    > programmer (the programmer's objects) when that happens?  If the
    > object does still have other unreachable references, s/he should be
    > informed of that too.

    snip

    I took the liberty of composing a sample cyclic reference detector. I
    will post the class definition later on in the discussion (when and
    if).

    The 'run' method resets the globals to a sample graph, as
    illustrated. 'p' and 's' start out with one simulated program-visible
    reference each. As you see, the details are already long and boring
    (yum). I added comments post-facto.

    >>> run() #only decref 'p'

    p: (q), q: (pr), r: (q), s: (p)
    >>>
    >>> p.decref() #not safe to delete

    {<p,2>: 1, <q,2>: 0, <r,1>: 0}
    >>>
    >>>
    >>> run() #decref 'p' then 's'

    p: (q), q: (pr), r: (q), s: (p)
    >>>
    >>> p.decref()

    {<p,2>: 1, <q,2>: 0, <r,1>: 0}
    >>>
    >>> s.decref()

    {<s,0>: 0, <p,2>: 0, <r,1>: 0, <q,2>: 0}
    <s,0> ALL zero #'s' safe to delete
    {<p,1>: 0, <q,2>: 0, <r,1>: 0}
    <p,1> ALL zero #also deletes 'p', also safe
    finalizing <s,0>
    >>>
    >>>
    >>> run()

    p: (q), q: (pr), r: (q), s: (p)
    >>>
    >>> s.decref()

    {<s,0>: 0, <p,3>: 1, <r,1>: 0, <q,2>: 0}
    {<p,2>: 1, <q,2>: 0, <r,1>: 0}
    finalizing <s,0> #deletion
    >>>
    >>> p.decref()

    {<p,1>: 0, <q,2>: 0, <r,1>: 0}
    <p,1> ALL zero #'p' safe to delete
    >>>
    >>>
    >>> run()

    p: (q), q: (pr), r: (q), s: (p)
    >>>
    >>> s.decref()

    {<s,0>: 0, <p,3>: 1, <q,2>: 0, <r,1>: 0}
    {<p,2>: 1, <q,2>: 0, <r,1>: 0}
    finalizing <s,0> #'p' not safe, reference still visible

    We notice the duplicate 'all zero' indicator on run #2. The cycle
    detector ran on 's.decref', then 's' called 'p.decref', then the cycle
    detector ran on that. 'q' and 'r' are safe to delete on runs 2 and 3.

    Here is the implementation of 'final':

    def final( self ):
    for _, v in self.__dict__.items( ):
    if not isinstance( v, G ):
    continue
    v.decref( )
    print( 'finalizing', self )

    The object should be asked to finish its references (cyclic only?),
    but remain alive. The programmer should see that the state is
    consistent. Later, its __del__ will be called.

    We can decide that '__leave_reachability__' will be called without
    nesting; and/or that '__del__' will be called without nesting, by
    breaking finalization in to two steps.

    FTR, this makes __leave_reachability__ about the equivalent of
    tp_clear, since tp_traverse is prior defined for user-defined types.
     
    Aaron Brady, Mar 21, 2009
    #7
  8. Aaron Brady

    John Nagle Guest

    Aaron Brady wrote:
    > Hello,
    >
    > I was reading and Googling about garbage collection, reference
    > counting, and the problem of cyclic references.
    >
    > Python's garbage collection module claims to be able to detect and
    > break cyclic garbage. Some other languages merely prohibit it. Is
    > this the place to ask about its technique?
    >
    > I understand that the disadvantage is a non-deterministic order of
    > deletion/finalization.


    Garbage collection and destructors or "finalizers" don't play well
    together. It's a fundamental problem. Calling finalizers from the
    garbage collector is painful. It introduces concurrency where the
    user may not have expected it. Consider what happens if a finalizer
    tries to lock something. What if GC runs while that lock is locked?
    This can create a deadlock situation. Calling finalizers from the
    garbage collector can result in intermittent, very hard to find bugs.

    C++ takes destructors seriously; objects are supposed to be destructed
    exactly once, and if they're of "auto" scope (a local object in the
    Python sense) they will reliably be cleaned up at block exit.
    Microsoft's "Managed C++" broke those rules; in Managed C++,
    destructors can be called more than once. (Look up "re-animation"
    in Microsoft Managed C++ literature. It's not pretty.)

    Python actually has reference counting backed up by garbage collection.
    Most objects are destroyed as soon as all references to them disappear.
    Garbage collection is only needed to deal with cycles.

    Python has "weak references", which won't keep an object around
    once all the regular references are deleted. These are useful in
    some situations. In a tree, for example, pointers towards the leaves
    should be strong pointers, while back-pointers towards the root should
    be weak pointers.

    I once modified BeautifulSoup, the HTML parser, to use weak pointers
    that way. BeautifulSoup trees are big and don't go away immediately when
    no longer needed, because they have backpointers. They hang around until
    the next GC cycle. With the version that uses weak pointers, they go away
    as soon as they're no longer needed. We've found this useful in a web
    crawler; the data space used drops and actual GC runs are no longer
    necessary.

    Personally, I'd argue that the right answer is to prohibit cycles of
    strong pointers. That should be considered a programming error, and
    detected at run time, at least by debugging tools. With weak pointers,
    you don't really need cycles of strong pointers.

    The advantage of this is a clean order of destruction. This is useful
    in window widget systems, where you have objects with pointers going in many
    directions, yet object destruction has substantial side effects.

    Python originally had only reference counting, and didn't have weak pointers.
    If weak pointers had gone in before the garbage collector, Python might have
    gone in this direction.

    John Nagle
     
    John Nagle, Mar 21, 2009
    #8
  9. Aaron Brady

    Aaron Brady Guest

    On Mar 21, 1:04 pm, John Nagle <> wrote:
    > Aaron Brady wrote:
    > > Hello,

    >
    > > I was reading and Googling about garbage collection, reference
    > > counting, and the problem of cyclic references.

    >
    > > Python's garbage collection module claims to be able to detect and
    > > break cyclic garbage.  Some other languages merely prohibit it.  Is
    > > this the place to ask about its technique?

    >
    > > I understand that the disadvantage is a non-deterministic order of
    > > deletion/finalization.  

    >
    >      Garbage collection and destructors or "finalizers" don't play well
    > together.  It's a fundamental problem.  Calling finalizers from the
    > garbage collector is painful.  It introduces concurrency where the
    > user may not have expected it.  Consider what happens if a finalizer
    > tries to lock something.  What if GC runs while that lock is locked?
    > This can create a deadlock situation.  Calling finalizers from the
    > garbage collector can result in intermittent, very hard to find bugs.


    As I understand it, 'obj.decref()' can call 'other.decref()', which
    can try to access its reference to 'obj', which has already begun
    cleanup. At that point, 'obj' is in an inconsistent state. Its own
    methods are secretly called during its '__del__'.

    One step would be to serialize this process, so that 'other.decref()'
    gets deferred until 'obj.decref()' has completed. Then attributes are
    in an all-or-nothing state, which is at least consistent. However,
    that means every external reference in a '__del__' method has to be
    wrapped in an exception handler, one at a time, because the object /
    might/ already be in a reference cycle. (Non-container types are
    excepted.) The remaining reference merely needs to change its class
    to a ReclaimedObject type. That's acceptable if documented. I also
    believe it solves the potential for deadlock.

    > (Look up "re-animation"
    > in Microsoft Managed C++ literature.  It's not pretty.)


    Pass!

    >      Python actually has reference counting backed up by garbage collection.
    > Most objects are destroyed as soon as all references to them disappear.
    > Garbage collection is only needed to deal with cycles.
    >
    >      Python has "weak references", which won't keep an object around
    > once all the regular references are deleted.  These are useful in
    > some situations.  In a tree, for example, pointers towards the leaves
    > should be strong pointers, while back-pointers towards the root should
    > be weak pointers.

    snip
    >
    >     Personally, I'd argue that the right answer is to prohibit cycles of
    > strong pointers.  That should be considered a programming error, and
    > detected at run time, at least by debugging tools.  With weak pointers,
    > you don't really need cycles of strong pointers.


    Reference cycles can be detected anyway with debug tools, even prior
    to destruction. My objection is that would complicate control flow
    severely:

    for x in y:
    z.append( x )

    becomes:

    for x in y:
    if cyclic_ref( x ):
    z.append( weakref.ref( x ) )
    else:
    z.append( x )

    And worse, every attribute access has to be wrapped.

    for x in z:
    if isinstance( x, __builtins__.weakref ):
    if x() is not None:
    print( x() )
    else:
    print( x )

    In other words, it interferes with uniform access to attributes and
    container members. However, in the case where you know a structure a
    priori, it's a good technique, as your example showed. I observe that
    my proposal has the same weakness!

    If you make the case that you usually do know the structure your data
    have, I won't be able to disprove it. The example would come from a
    peer-to-peer representation of something, or storage of relational
    data.

    Regardless, the group has responded to most of my original post. I
    don't think I emphasized however that I'm designing an allocation
    system that can contain reference cycles; and I was asking if such
    special methods, '__gc_cycle__( self, attr )' or '__gc_clear__
    ( self )' would be right for me. I'm also interested in feedback
    about the serialization method of ref. counting earlier in this post.

    >     The advantage of this is a clean order of destruction.  This is useful
    > in window widget systems, where you have objects with pointers going in many
    > directions, yet object destruction has substantial side effects.
    >
    >     Python originally had only reference counting, and didn't have weak pointers.
    > If weak pointers had gone in before the garbage collector, Python might have
    > gone in this direction.
    >
    >                                 John Nagle
     
    Aaron Brady, Mar 21, 2009
    #9
  10. Aaron Brady

    Aaron Brady Guest

    On Mar 21, 11:59 am, "andrew cooke" <> wrote:
    > Aaron Brady wrote:
    > > My point is, that garbage collection is able to detect when there are
    > > no program-reachable references to an object.  Why not notify the
    > > programmer (the programmer's objects) when that happens?  If the
    > > object does still have other unreachable references, s/he should be
    > > informed of that too.

    >
    > i think we're mixing python-specific and more general / java details, but,
    > as far as i understand things, state of the art (and particularly
    > generational) garbage collectors don't guarantee that objects will ever be
    > reclaimed.  this is a trade for efficiency, and it's a trade that seems to
    > be worthwhile and popular.


    It's at best worthless, but so is politics. I take it back; you can
    reclaim memory in large numbers with a probabilistic finalizer. The
    expected value of reclaiming a KB with a 90% chance of call is .9 KB.

    The allocation structure I am writing will have a very long up-time.
    I can forcibly reclaim the memory of an object involved in a cycle,
    but lingering references it has will never be detected. Though, if I
    can't guarantee 100% reclamation, I'll have to be anticipating a
    buffer dump eventually anyway, which makes, does it not, 90% about the
    same as 99%.

    > furthermore, you're mixing responsibilities for two logically separate
    > ideas just because a particular implementation happens to associate them,
    > which is not a good idea from a design pov.


    I think a silent omission of finalization is the only alternative. If
    so they're mixed, one way or the other. I argue it is closer to
    associating your class with a hash table: they are logically separate
    ideas. Perhaps implementation is logically separate from design
    altogether.

    > i can remember, way back in the mists of time


    I understand they were having a fog problem there yesterday... not to
    mention a sale on sand. "Today: Scattered showers and thunderstorms
    before 1pm, then a slight chance of showers."

    > using java finalizers for
    > doing this kind of thing.  and then learning that it was a bad idea.  once
    > i got over the initial frustration, it really hasn't been a problem.  i
    > haven't met a situation


    I don't suppose I imagine one. So, you could argue that it's a low
    priority. Washing your hands of the rare, though, disqualifies you
    from the associate's in philosophy. I bet you want to meet my
    customers, too.

    > where i needed to tie resource management and
    > memory management together (except for interfacing with c code that does
    > not use the host language's gc - and i can imagine that for python this is
    > a very strong (perhaps *the*) argument for reference counting).


    I'm using a specialized mapping type to implement the back end of user-
    defined classes. Since I know the implementation, which in particular
    maps strings to objects, I can always just break cycles by hand; that
    is, until someone wants a C extension. Then they will want tp_clear
    and tp_traverse methods.

    > as an bonus example, consider object caching - a very common technique
    > that immediately breaks anything that associates other resources with
    > memory use.


    I assume your other processes are notified of the cache state. From
    what I understand, Windows supports /named/ caching. Collaborators
    can check the named cache, in the process creating it if it doesn't
    exist, and read and write at will there.

    > just because, in one limited case, you can do something, doesn't mean it's
    > a good idea.
    >
    > andrew


    But you're right. I haven't talked this over much on the outside, so
    I might be missing something huge, and serialized two-step
    finalization (tm) is the secret least of my worries.
     
    Aaron Brady, Mar 22, 2009
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jimjim
    Replies:
    15
    Views:
    580
    Chris Smith
    Nov 9, 2003
  2. Thomas Mailund

    Cyclic garbage collection and segfaults...

    Thomas Mailund, Jan 14, 2004, in forum: Python
    Replies:
    3
    Views:
    321
    Thomas Mailund
    Jan 15, 2004
  3. Øyvind Isaksen
    Replies:
    1
    Views:
    1,001
    Øyvind Isaksen
    May 18, 2007
  4. moerchendiser2k3

    Python leaks in cyclic garbage collection

    moerchendiser2k3, Feb 19, 2011, in forum: Python
    Replies:
    6
    Views:
    245
  5. Chris Angelico
    Replies:
    0
    Views:
    118
    Chris Angelico
    Apr 26, 2013
Loading...

Share This Page