Troubleshooting garbage collection issues

Discussion in 'Python' started by davemerkel@gmail.com, Nov 17, 2007.

  1. Guest

    Hi folks - wondering if anyone has any pointers on troubleshooting
    garbage collection. My colleagues and I are running into an
    interesting problem:

    Intermittently, we get into a situation where the garbage collection
    code is running in an infinite loop. The data structures within the
    garbage collector have been corrupted, but it is unclear how or why.
    The problem is extremely difficult to reproduce consistently as it is
    unpredictable.

    The infinite loop itself occurs in gcmodule.c, update_refs. After
    hitting this in the debugger a couple of times, it appears that that
    one of the nodes in the second or third generation list contains a
    pointer to the first generation head node. The first generation was
    cleared shortly before the call into this function, so it contains a
    prev and next which point to itself. Once this loop hits that node,
    it spins infinitely.

    Chances are another module we're depending on has done something
    hinkey with GC. The challenge is tracking that down. If anyone has
    seen something like this before and has either pointers to specific GC
    usage issues that can create this behavior or some additional thoughts
    on tricks to track it down to the offending module, they would be most
    appreciated.

    You can assume we've done some of the "usual" things - hacking up
    gcmodule to spit information when the condition occurs, various
    headstands and gymnastics in an attempt to identify reliable steps to
    reproduce - the challenge is the layers of indirection that we think
    are likely present between the manifestation of the problem and the
    module that produced it.

    Many thanks,

    Dave
    , Nov 17, 2007
    #1
    1. Advertising

  2. <dave..mail.com> (Dave) wrote:

    8<--------- description of horrible problem --------------

    Faced with this, I would:

    1 - identify the modules that import gc to separate the
    sheep from the goats.

    2 - do my best to change gc importing goats back to sheep.

    3 - amongst the remaining goats, identify the ones that also use
    threads, (supergoats) and take a long hard look at them.

    4 - hope I get lucky.

    5 - If no luck, I would change the most complex of the
    supergoats to use more processes and messaging,
    to make sheep out of a supergoat, or failing that,
    a goat and some sheep.

    6 - Repeat from 2 until luck strikes.

    Now the trouble with a simple minded algorithm such as
    the above is that a sheep could be at the bottom of the
    trouble if it uses threads. So a module is only a lamb if
    it uses neither threads nor makes calls into gc...

    HTH

    - Hendrik
    Hendrik van Rooyen, Nov 18, 2007
    #2
    1. Advertising

  3. On Nov 17, 10:34 am, "" <>
    wrote:
    > Hi folks - wondering if anyone has any pointers on troubleshooting
    > garbage collection. My colleagues and I are running into an
    > interesting problem:
    >
    > Intermittently, we get into a situation where the garbage collection
    > code is running in an infinite loop. The data structures within the
    > garbage collector have been corrupted, but it is unclear how or why.
    > The problem is extremely difficult to reproduce consistently as it is
    > unpredictable.
    >
    > The infinite loop itself occurs in gcmodule.c, update_refs. After
    > hitting this in the debugger a couple of times, it appears that that
    > one of the nodes in the second or third generation list contains a
    > pointer to the first generation head node. The first generation was
    > cleared shortly before the call into this function, so it contains a
    > prev and next which point to itself. Once this loop hits that node,
    > it spins infinitely.
    >
    > Chances are another module we're depending on has done something
    > hinkey with GC. The challenge is tracking that down. If anyone has
    > seen something like this before and has either pointers to specific GC
    > usage issues that can create this behavior or some additional thoughts
    > on tricks to track it down to the offending module, they would be most
    > appreciated.
    >
    > You can assume we've done some of the "usual" things - hacking up
    > gcmodule to spit information when the condition occurs, various
    > headstands and gymnastics in an attempt to identify reliable steps to
    > reproduce - the challenge is the layers of indirection that we think
    > are likely present between the manifestation of the problem and the
    > module that produced it.


    Does "usual things" also include compiling with --with-pydebug?

    You could also try the various memory debuggers. A refcounting error
    is the first thing that comes to mind, although I can't see off hand
    how this specific problem would come about.

    Are you using threading at all?

    Do you see any pattern to the types that have the bogus pointers?

    --
    Adam Olsen, aka Rhamphoryncus
    Rhamphoryncus, Nov 18, 2007
    #3
  4. Guest

    Thanks for the thoughts - much appreciated! The threaded super-goat
    was indeed the offender. A very aggressive QA tester got us enough of
    a pattern to identify the offending module: pyOpenSSL. After looking
    at it closely, we found there are problems with its thread handling.
    In particular, the GIL is not properly locked when manipulating
    reference counts and also, in once case, when creating a new python
    object. Once we cleaned that up we were unable to reproduce the
    problem.

    We'll post the fixes back to the pyOpenSSL folks shortly.

    Thanks again!

    Dave
    , Nov 28, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Laser Lu

    Garbage Collection and Manage Code?

    Laser Lu, Jan 26, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    700
    Gaurav Khanna [C# MVP]
    Jan 27, 2004
  2. Replies:
    1
    Views:
    432
    mrstephengross
    Jul 25, 2005
  3. Øyvind Isaksen
    Replies:
    1
    Views:
    949
    Øyvind Isaksen
    May 18, 2007
  4. Daniel Peterson

    Troubleshooting ASP.Net memory usage issues?

    Daniel Peterson, Nov 14, 2007, in forum: ASP .Net
    Replies:
    10
    Views:
    761
    Daniel Peterson
    Nov 20, 2007
  5. GriffyGriff

    Garbage Collection Issues

    GriffyGriff, Oct 17, 2003, in forum: Javascript
    Replies:
    0
    Views:
    79
    GriffyGriff
    Oct 17, 2003
Loading...

Share This Page