Queue cleanup

Discussion in 'Python' started by EW, Aug 11, 2010.

  1. EW

    EW Guest

    Hi

    I'm writing a multithreaded app that relies on Queues to move data
    between the threads. I'm trying to write my objects in a general way
    so that I can reuse them in the future so I need to write them in such
    a way that I don't know how many producer and how many consumer
    threads I might need. I also might have different consumer threads do
    different tasks (for example one might write to a log and one might
    write to SQL) so that again means I can't plan for a set ratio of
    consumers to producers. So it's unknown.

    So this means that instead of having 1 Queue that all the producers
    put to and that all the consumers get from I actually have 1 Queue per
    producer thread that the main body sends to the correct type of
    consumer thread. So I could get something like this where 3 producer
    threads write to 3 different Queues all of which get read by 1
    consumer thread:

    P1 P2 P3
    \ | /
    \ | /
    C1

    So producers 1, 2, and 3 all write to individual Queues and consumer 1
    had a list of those Queues and reads them all. The problem I'm having
    is that those producer threads can come and go pretty quickly and when
    they die I can cleanup the thread with join() but I'm still left with
    the Queue. So I could get something like this:

    P1 P3
    \ | /
    \ | /
    C1

    So here the P2 thread has ended and gone away but I still have his
    Queue lingering.

    So on a thread I can use is_alive() to check status and use join() to
    clean up but I don't see any analogous functionality for Queues. How
    do I kill them? I thought about putting a suicide message on the
    Queue and then C1 would read it and set the variable to None but i'm
    not sure setting the variable to None actually makes the Queue go
    away. It could just end up sitting in memory unreferenced - and
    that's not good. Additionally, I could have any number of consumer
    threads reading that Queue so once the first one get the suicide note
    the other consumer threads never would.

    I figure there has to be an elegant way for managing my Queues but so
    far I can't find it. Any suggestions would be appreciated and thanks
    in advance for any help.


    ps Python rocks.
    EW, Aug 11, 2010
    #1
    1. Advertising

  2. EW

    EW Guest

    On Aug 11, 12:55 pm, EW <> wrote:
    > Hi
    >
    > I'm writing a multithreaded app that relies on Queues to move data
    > between the threads.  I'm trying to write my objects in a general way
    > so that I can reuse them in the future so I need to write them in such
    > a way that I don't know how many producer and how many consumer
    > threads I might need.  I also might have different consumer threads do
    > different tasks (for example one might write to a log and one might
    > write to SQL) so that again means I can't plan for a set ratio of
    > consumers to producers.  So it's unknown.
    >
    > So this means that instead of having 1 Queue that all the producers
    > put to and that all the consumers get from I actually have 1 Queue per
    > producer thread  that the main body sends to the correct type of
    > consumer thread.  So I could get something like this where 3 producer
    > threads write to 3 different Queues all of which get read by 1
    > consumer thread:
    >
    > P1    P2   P3
    >      \    |   /
    >        \  |  /
    >         C1
    >
    > So producers 1, 2, and 3 all write to individual Queues and consumer 1
    > had a list of those Queues and reads them all.  The problem I'm having
    > is that those producer threads can come and go pretty quickly and when
    > they die I can cleanup the thread with join() but I'm still left with
    > the Queue.  So I could get something like this:
    >
    > P1         P3
    >      \    |   /
    >        \  |  /
    >         C1
    >
    > So here the P2 thread has ended and gone away but I still have his
    > Queue lingering.
    >
    > So on a thread I can use is_alive() to check status and use join() to
    > clean up but I don't see any analogous functionality for Queues.  How
    > do I kill them?  I thought about putting a suicide message on the
    > Queue and then C1 would read it and set the variable to None but i'm
    > not sure setting the variable to None actually makes the Queue go
    > away.  It could just end up sitting in memory unreferenced - and
    > that's not good.  Additionally, I could have any number of consumer
    > threads reading that Queue so once the first one get the suicide note
    > the other consumer threads never would.
    >
    > I figure there has to be an elegant way for managing my Queues but so
    > far I can't find it.  Any suggestions would be appreciated and thanks
    > in advance for any help.
    >
    > ps Python rocks.


    Whoo..the formatting got torn up! My terrible diagrams are even more
    terrible! Oh well, I think you'll catch my meaning :)
    EW, Aug 11, 2010
    #2
    1. Advertising

  3. EW

    Paul Rubin Guest

    EW <> writes:
    > I also might have different consumer threads do
    > different tasks (for example one might write to a log and one might
    > write to SQL) so that again means I can't plan for a set ratio of
    > consumers to producers.... So it's unknown.
    >
    > So this means that instead of having 1 Queue that all the producers
    > put to and that all the consumers get from I actually have 1 Queue per
    > producer thread


    That doesn't sound appropriate. Queues can have many readers and many
    writers. So use one queue per task (logging, SQL, etc), regardless of
    the number of producer or consumer threads. Any producer with an SQL
    request sends it to the SQL queue, which can have many listeners. The
    different SQL consumer threads listen to the SQL queue and pick up
    requests and handle them.
    Paul Rubin, Aug 11, 2010
    #3
  4. EW

    EW Guest

    On Aug 11, 1:18 pm, Paul Rubin <> wrote:
    > EW <> writes:
    > > I also might have different consumer threads do
    > > different tasks (for example one might write to a log and one might
    > > write to SQL) so that again means I can't plan for a set ratio of
    > > consumers to producers....  So it's unknown.

    >
    > > So this means that instead of having 1 Queue that all the producers
    > > put to and that all the consumers get from I actually have 1 Queue per
    > > producer thread

    >
    > That doesn't sound appropriate.  Queues can have many readers and many
    > writers.  So use one queue per task (logging, SQL, etc), regardless of
    > the number of producer or consumer threads.  Any producer with an SQL
    > request sends it to the SQL queue, which can have many listeners.  The
    > different SQL consumer threads listen to the SQL queue and pick up
    > requests and handle them.


    I thought about doing it that way and I could do it that way but it
    still seems like there should be a way to clean up Queues on my own.
    If I did it this way then I guess I'd be relying on garbage collection
    when the script ended to clean up the Queues for me.

    What if I want to clean up my own Queues? Regardless of the specifics
    of my current design, I'm just generally curious how people manage
    cleanup of their Queues when they don't want them any more.
    EW, Aug 11, 2010
    #4
  5. EW

    MRAB Guest

    EW wrote:
    [snip]
    > So here the P2 thread has ended and gone away but I still have his
    > Queue lingering.
    >
    > So on a thread I can use is_alive() to check status and use join() to
    > clean up but I don't see any analogous functionality for Queues. How
    > do I kill them? I thought about putting a suicide message on the
    > Queue and then C1 would read it and set the variable to None but i'm
    > not sure setting the variable to None actually makes the Queue go
    > away. It could just end up sitting in memory unreferenced - and
    > that's not good. Additionally, I could have any number of consumer
    > threads reading that Queue so once the first one get the suicide note
    > the other consumer threads never would.
    >
    > I figure there has to be an elegant way for managing my Queues but so
    > far I can't find it. Any suggestions would be appreciated and thanks
    > in advance for any help.
    >

    An object will be available for garbage collection when nothing refers
    to it either directly or indirectly. If it's unreferenced then it will
    go away.

    As for the suicide note, if a consumer sees it then it can put it back
    into the queue so other consumers will see it and then forget about the
    queue (set the variable which refers to the queue to None, or, if the
    references are in a list, delete it from the list).
    MRAB, Aug 11, 2010
    #5
  6. EW

    EW Guest

    On Aug 11, 1:55 pm, MRAB <> wrote:
    > EW wrote:
    >
    > [snip]
    >
    >
    >
    > > So here the P2 thread has ended and gone away but I still have his
    > > Queue lingering.

    >
    > > So on a thread I can use is_alive() to check status and use join() to
    > > clean up but I don't see any analogous functionality for Queues.  How
    > > do I kill them?  I thought about putting a suicide message on the
    > > Queue and then C1 would read it and set the variable to None but i'm
    > > not sure setting the variable to None actually makes the Queue go
    > > away.  It could just end up sitting in memory unreferenced - and
    > > that's not good.  Additionally, I could have any number of consumer
    > > threads reading that Queue so once the first one get the suicide note
    > > the other consumer threads never would.

    >
    > > I figure there has to be an elegant way for managing my Queues but so
    > > far I can't find it.  Any suggestions would be appreciated and thanks
    > > in advance for any help.

    >
    > An object will be available for garbage collection when nothing refers
    > to it either directly or indirectly. If it's unreferenced then it will
    > go away.
    >
    > As for the suicide note, if a consumer sees it then it can put it back
    > into the queue so other consumers will see it and then forget about the
    > queue (set the variable which refers to the queue to None, or, if the
    > references are in a list, delete it from the list).


    Ok great. I wasn't sure about the Garbage collection part of it.
    That's actually pretty easy.

    Thanks!
    EW, Aug 11, 2010
    #6
  7. EW

    Paul Rubin Guest

    EW <> writes:
    > I thought about doing it that way and I could do it that way but it
    > still seems like there should be a way to clean up Queues on my own.
    > If I did it this way then I guess I'd be relying on garbage collection
    > when the script ended to clean up the Queues for me.


    Oh, I see. As long as it's possible to start new producer or consumer
    threads that touch a queue, obviously that queue has to still be around.
    If the program starts all its threads at the beginning, then runs til
    they exit, then does more stuff, then you could do something like:

    # make dictonary of queues, one queue per task type
    queues = {'sql': Queue(), 'logging': Queue(), ... }

    for i in <whatever threads you want>
    threading.Thread(target=your_handler, args=[queues])

    del queues

    and then when all the threads exit, there are no remaining references to
    the queues. But why do you care? Queues aren't gigantic structures,
    they're just a list (collections.deque) with an rlock. It's fine to let
    the gc clean them up; that's the whole point of having a gc in the first
    place.
    Paul Rubin, Aug 11, 2010
    #7
  8. EW

    EW Guest

    On Aug 11, 2:16 pm, Paul Rubin <> wrote:
    > EW <> writes:
    > > I thought about doing it that way and I could do it that way but it
    > > still seems like there should be a way to clean up Queues on my own.
    > > If I did it this way then I guess I'd be relying on garbage collection
    > > when the script ended to clean up the Queues for me.

    >
    > Oh, I see.  As long as it's possible to start new producer or consumer
    > threads that touch a queue, obviously that queue has to still be around.
    > If the program starts all its threads at the beginning, then runs til
    > they exit, then does more stuff, then you could do something like:
    >
    >     # make dictonary of queues, one queue per task type
    >     queues = {'sql': Queue(), 'logging': Queue(), ... }
    >
    >     for i in <whatever threads you want>
    >        threading.Thread(target=your_handler, args=[queues])
    >
    >     del queues
    >
    > and then when all the threads exit, there are no remaining references to
    > the queues.  But why do you care?  Queues aren't gigantic structures,
    > they're just a list (collections.deque) with an rlock.  It's fine to let
    > the gc clean them up; that's the whole point of having a gc in the first
    > place.


    Well I cared because I thought garbage collection would only happen
    when the script ended - the entire script. Since I plan on running
    this as a service it'll run for months at a time without ending. So I
    thought I was going to have heaps of Queues hanging out in memory,
    unreferenced and unloved. It seemed like bad practice so I wanted to
    get out ahead of it.

    But the GC doesn't work the way I thought it worked so there's really
    no problem I guess. I was just confused on garbage collection it seems.
    EW, Aug 11, 2010
    #8
  9. EW

    Paul Rubin Guest

    EW <> writes:
    > Well I cared because I thought garbage collection would only happen
    > when the script ended - the entire script. Since I plan on running
    > this as a service it'll run for months at a time without ending. So I
    > thought I was going to have heaps of Queues hanging out in memory,
    > unreferenced and unloved. It seemed like bad practice so I wanted to
    > get out ahead of it.


    Even if GC worked that way it wouldn't matter, if you use just one queue
    per type of task. That number should be a small constant so the memory
    consumption is small.
    Paul Rubin, Aug 11, 2010
    #9
  10. EW

    MRAB Guest

    Paul Rubin wrote:
    > EW <> writes:
    >> Well I cared because I thought garbage collection would only happen
    >> when the script ended - the entire script. Since I plan on running
    >> this as a service it'll run for months at a time without ending. So I
    >> thought I was going to have heaps of Queues hanging out in memory,
    >> unreferenced and unloved. It seemed like bad practice so I wanted to
    >> get out ahead of it.

    >
    > Even if GC worked that way it wouldn't matter, if you use just one queue
    > per type of task. That number should be a small constant so the memory
    > consumption is small.


    That's basically how _non_-garbage-collected languages work! :)
    MRAB, Aug 11, 2010
    #10
  11. EW

    EW Guest

    On Aug 11, 2:52 pm, Paul Rubin <> wrote:
    > EW <> writes:
    > > Well I cared because I thought garbage collection would only happen
    > > when the script ended - the entire script.  Since I plan on running
    > > this as a service it'll run for months at a time without ending.  So I
    > > thought I was going to have heaps of Queues hanging out in memory,
    > > unreferenced and unloved.  It seemed like bad practice so I wanted to
    > > get out ahead of it.

    >
    > Even if GC worked that way it wouldn't matter, if you use just one queue
    > per type of task.  That number should be a small constant so the memory
    > consumption is small.


    Well I can't really explain it but 1 Queue per task for what I'm
    designing just doesn't feel right to me. It feels like it will lack
    future flexibility. I like having 1 Queue per producer thread object
    and the person instantiating that object can do whatever he wants with
    that Queue. I can't prove I'll need that level of flexibility but I
    don't see why it' bad to have. It's still a small number of Queues,
    it's just a small, variable, number of Queues.
    EW, Aug 11, 2010
    #11
  12. EW

    Paul Rubin Guest

    EW <> writes:
    > Well I can't really explain it but 1 Queue per task for what I'm
    > designing just doesn't feel right to me. It feels like it will lack
    > future flexibility.


    That makes no sense at all. Multiple readers and writers per queue are
    the way Python queues are designed to work. The normal way to spray a
    bunch of concurrent tasks to worker threads is just have a bunch of
    workers listening to one queue. It's the same way at the producer end.
    Paul Rubin, Aug 11, 2010
    #12
  13. EW

    Aahz Guest

    In article <>,
    MRAB <> wrote:
    >
    >An object will be available for garbage collection when nothing refers
    >to it either directly or indirectly. If it's unreferenced then it will
    >go away.


    This isn't actually garbage collection as most people think of it.
    Refcounting semantics mean that objects get reaped as soon as nothing
    points at them. OTOH, CPython does also have garbage collection to back
    up refcounting so that when you have unreferenced object cycles they
    don't stay around.
    --
    Aahz () <*> http://www.pythoncraft.com/

    "...if I were on life-support, I'd rather have it run by a Gameboy than a
    Windows box." --Cliff Wells
    Aahz, Aug 27, 2010
    #13
  14. EW

    John Nagle Guest

    On 8/11/2010 1:26 PM, EW wrote:
    > On Aug 11, 2:52 pm, Paul Rubin<> wrote:
    >> EW<> writes:
    >>> Well I cared because I thought garbage collection would only happen
    >>> when the script ended - the entire script. Since I plan on running
    >>> this as a service it'll run for months at a time without ending. So I
    >>> thought I was going to have heaps of Queues hanging out in memory,
    >>> unreferenced and unloved. It seemed like bad practice so I wanted to
    >>> get out ahead of it.

    >>
    >> Even if GC worked that way it wouldn't matter, if you use just one queue
    >> per type of task. That number should be a small constant so the memory
    >> consumption is small.

    >
    > Well I can't really explain it but 1 Queue per task for what I'm
    > designing just doesn't feel right to me. It feels like it will lack
    > future flexibility. I like having 1 Queue per producer thread object
    > and the person instantiating that object can do whatever he wants with
    > that Queue. I can't prove I'll need that level of flexibility but I
    > don't see why it' bad to have. It's still a small number of Queues,
    > it's just a small, variable, number of Queues.


    That's backwards. Usually, you want one queue per unique consumer.
    That is, if you have a queue that contains one kind of request,
    there's one thread reading the queue, blocked until some other
    thread puts something on the queue. No polling is needed.

    One consumer reading multiple queues is difficult to implement
    well.

    Note, by the way, that CPython isn't really concurrent. Only
    one thread runs at a time, due to an archaic implementation. So
    if your threads are compute-bound, even on a multicore CPU threading
    will not help.

    There's a "multiprocessing module" which allows spreading work
    over several processes instead of threads. That can be helpful
    as a workaround.

    John Nagle
    John Nagle, Aug 27, 2010
    #14
  15. On Fri, 27 Aug 2010 09:16:52 -0700, Aahz wrote:

    > In article <>, MRAB
    > <> wrote:
    >>
    >>An object will be available for garbage collection when nothing refers
    >>to it either directly or indirectly. If it's unreferenced then it will
    >>go away.

    >
    > This isn't actually garbage collection as most people think of it.
    > Refcounting semantics mean that objects get reaped as soon as nothing
    > points at them. OTOH, CPython does also have garbage collection to back
    > up refcounting so that when you have unreferenced object cycles they
    > don't stay around.



    I've repeatedly asked, both here and elsewhere, why reference counting
    isn't "real" garbage collection. Nobody has been able to give me a
    satisfactory answer. As far as I can tell, it's a bit of pretentiousness
    with no basis in objective fact.

    http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
    http://en.wikipedia.org/wiki/Reference_counting

    Reference counting is one specific kind of garbage collection, and like
    all gc strategies, it has strengths as well as weaknesses. It is *simple*
    to implement (which may be why a certain class of programmer likes to
    think it isn't "real" gc). When it runs is deterministic, and is
    immediate upon the resource being freed. The overhead is very light (a
    plus) and continuous (which can be both a plus and a minus). It is better
    suited to managing scarce resources like open files than are tracing
    garbage collectors. It avoids the "embarrassing pause" of tracing
    collectors. It doesn't deal well with reference cycles, and (at least
    with Python's implementation of ref counting) it causes performance
    issues with threaded applications.

    http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
    http://en.wikipedia.org/wiki/Reference_counting

    So CPython has two garbage collectors -- a reference counting
    implementation, and a tracing implementation. Jython and IronPython use
    the native garbage collectors from Java and .Net. Other Pythons may use
    something else.


    --
    Steven
    Steven D'Aprano, Aug 28, 2010
    #15
  16. EW

    Paul Rubin Guest

    Steven D'Aprano <> writes:
    > I've repeatedly asked, both here and elsewhere, why reference counting
    > isn't "real" garbage collection. Nobody has been able to give me a
    > satisfactory answer. As far as I can tell, it's a bit of pretentiousness
    > with no basis in objective fact.


    Well, it's a bit of a subjective matter. I'd say it's not real gc
    because 1) it's unsound (misses reference cycles), and 2) it requires
    constant attention from the mutator to incr and decr the reference
    counts. So developing modules for the CPython API means endlessly
    finding and fixing refcount bugs that lead to either crashes/security
    failures, or memory leaks. If you program the Java JNI or a typical
    Lisp FFI, you'll find that real gc is a lot simpler to use since you
    avoid all the refcount maintenance hassles. You allocate memory and
    shut your eyes, and the gc takes care of freeing it when it figures out
    that you are done. Refcounting is basically a form of manual memory
    management, while gc is automatic.

    Someone said here recently that as a program gets larger, saying "this
    will work as long as we do X every time without fail" becomes equal to
    saying "this won't work". Substitute "properly maintain all ref counts"
    for X and you can see the problem. I've seen released "production"
    "tested" Python C modules with subtle refcount bugs on more than one
    occasion. In gc'd systems there are fewer places for the code to go
    wrong.
    Paul Rubin, Aug 28, 2010
    #16
  17. On 28 Aug 2010 00:24:12 GMT, Steven D'Aprano
    <> declaimed the following in
    gmane.comp.python.general:

    >
    > I've repeatedly asked, both here and elsewhere, why reference counting
    > isn't "real" garbage collection. Nobody has been able to give me a
    > satisfactory answer. As far as I can tell, it's a bit of pretentiousness
    > with no basis in objective fact.
    >
    > http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
    > http://en.wikipedia.org/wiki/Reference_counting
    >
    > Reference counting is one specific kind of garbage collection, and like
    > all gc strategies, it has strengths as well as weaknesses. It is *simple*


    The nice thing about it is that it is sort of deterministic -- one
    can examine code and determine that an object is collected at some point
    in the execution...

    Heap marking, OTOH, tends to run at indeterminate times, which could
    have an impact if one needs predictable response timings (I still recall
    the Supersoft LISP that ran on my TRS-80 -- the first time per session
    that it ran the garbage collector could take minutes, as it basically
    went through and marked all unused memory in the system as "available"
    in order to add it to the allocator...)
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
    Dennis Lee Bieber, Aug 28, 2010
    #17
  18. EW

    Paul Rubin Guest

    Dennis Lee Bieber <> writes:
    > The nice thing about it [reference counting] is that it is sort
    > of deterministic -- one can examine code and determine that an object
    > is collected at some point in the execution...
    > Heap marking, OTOH, tends to run at indeterminate times, which could
    > have an impact if one needs predictable response timings


    Reference counting has the same problem. If you drop the last reference
    to a complex structure, it could take quite a long time to free all the
    components. By contrast there are provably real-time tracing gc
    schemes, including some parallelizeable ones. One reason CPython still
    can't run threads on parallel cores is it would have to lock the
    reference counts every time they're updated, and the slowdown from that
    is terrible.
    Paul Rubin, Aug 28, 2010
    #18
  19. On Sat, 28 Aug 2010 00:33:10 -0700, Paul Rubin wrote:

    > Dennis Lee Bieber <> writes:
    >> The nice thing about it [reference counting] is that it is sort
    >> of deterministic -- one can examine code and determine that an object
    >> is collected at some point in the execution...
    >> Heap marking, OTOH, tends to run at indeterminate times, which

    could
    >> have an impact if one needs predictable response timings

    >
    > Reference counting has the same problem.


    In theory, yes, but in practice ref counting tends to spread out the
    performance impact more smoothly. There are exceptions, such as the one
    you mention below, but as a general rule ref counting isn't subject to
    the "embarrassing pauses" that tracing garbage collectors tend to be
    subject to.


    > If you drop the last reference
    > to a complex structure, it could take quite a long time to free all the
    > components. By contrast there are provably real-time tracing gc
    > schemes, including some parallelizeable ones.


    I could be wrong, but how can they not be subject to the same performance
    issue? If you have twenty thousand components that all have to be freed,
    they all have to be freed whether you do it when the last reference is
    cleared, or six seconds later when the gc does a sweep.


    > One reason CPython still
    > can't run threads on parallel cores is it would have to lock the
    > reference counts every time they're updated, and the slowdown from that
    > is terrible.


    On the other hand, the reason that CPython still has reference counting
    is that the alternatives tried so far are unacceptably for non-threaded
    code.




    --
    Steven
    Steven D'Aprano, Aug 28, 2010
    #19
  20. On Fri, 27 Aug 2010 18:06:19 -0700, Paul Rubin wrote:

    > Steven D'Aprano <> writes:
    >> I've repeatedly asked, both here and elsewhere, why reference counting
    >> isn't "real" garbage collection. Nobody has been able to give me a
    >> satisfactory answer. As far as I can tell, it's a bit of
    >> pretentiousness with no basis in objective fact.

    >
    > Well, it's a bit of a subjective matter. I'd say it's not real gc
    > because 1) it's unsound (misses reference cycles),


    You can add cycle detection to a reference count gc, at the cost of more
    complexity.

    If you read the Wikipedia article I linked to, tracing algorithms can
    also be unsound:

    Some collectors running in a particular environment can
    correctly identify all pointers (references) in an object;
    these are called "precise" (also "exact" or "accurate")
    collectors, the opposite being a "conservative" or "partly
    conservative" collector. Conservative collectors have to
    assume that any bit pattern in memory could be a pointer if
    (when interpreted as a pointer) it would point into any
    allocated object. Thus, conservative collectors may have
    some false negatives, where storage is not released because
    of accidental fake pointers...

    http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)



    > and 2) it requires
    > constant attention from the mutator to incr and decr the reference
    > counts.


    Yes. And?


    > So developing modules for the CPython API means endlessly
    > finding and fixing refcount bugs that lead to either crashes/security
    > failures, or memory leaks. If you program the Java JNI or a typical
    > Lisp FFI, you'll find that real gc is a lot simpler to use since you
    > avoid all the refcount maintenance hassles. You allocate memory and
    > shut your eyes, and the gc takes care of freeing it when it figures out
    > that you are done. Refcounting is basically a form of manual memory
    > management, while gc is automatic.



    That's a problem with the CPython API, not reference counting. The
    problem is that the CPython API is written at too low a level, beneath
    that at which the garbage collector exists, so naturally you have to
    manually manage memory.


    > Someone said here recently that as a program gets larger, saying "this
    > will work as long as we do X every time without fail" becomes equal to
    > saying "this won't work". Substitute "properly maintain all ref counts"
    > for X and you can see the problem. I've seen released "production"
    > "tested" Python C modules with subtle refcount bugs on more than one
    > occasion. In gc'd systems there are fewer places for the code to go
    > wrong.


    On the other hand, tracing gcs have their own set of problems too, mostly
    due to the use of finalizers and attempts to make garbage collection run
    more predictably. See here:

    http://publib.boulder.ibm.com/infoc...doc.diagnostics.142j9/html/coexistwithgc.html

    Quote:

    For tidying Java resources, think about the use of a clean
    up routine. When you have finished with an object, call the
    routine to null out all references, deregister listeners,
    clear out hash tables, and so on. This is far more efficient
    than using a finalizer and has the useful side-benefit of
    speeding up garbage collection. The Garbage Collector does
    not have so many object references to chase in the next
    garbage collection cycle.


    Translated: "Rather than relying on the garbage collector to clean up
    resources after you, do it yourself, manually, so the garbage collector
    has less work to do."

    Tracing garbage collectors aren't a panacea. They're software themselves,
    and complex software, which means they're subject to bugs like the one
    which plagued Flash plugin 9:

    http://gskinner.com/blog/archives/2008/04/failure_to_unlo.html

    The more complicated the garbage collector, the more scope you have for
    some interaction between your high-level code and the gc leading to
    memory not be reclaimed or extreme slowdown. Like this:

    http://tech.puredanger.com/2009/02/11/linkedblockingqueue-garbagecollection/





    --
    Steven
    Steven D'Aprano, Aug 28, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul L. Du Bois

    Queue.Queue-like class without the busy-wait

    Paul L. Du Bois, Mar 24, 2005, in forum: Python
    Replies:
    29
    Views:
    1,037
    Antoon Pardon
    Apr 4, 2005
  2. Russell Warren

    Is Queue.Queue.queue.clear() thread-safe?

    Russell Warren, Jun 22, 2006, in forum: Python
    Replies:
    4
    Views:
    660
    Russell Warren
    Jun 27, 2006
  3. Kceiw
    Replies:
    3
    Views:
    979
    Jim Langston
    Mar 14, 2006
  4. Gabriel Rossetti
    Replies:
    3
    Views:
    532
    Jerry Hill
    Apr 25, 2008
  5. Kris
    Replies:
    0
    Views:
    461
Loading...

Share This Page