Debugging a difficult refcount issue.

B

buck

I'm getting a fatal python error "Fatal Python error: GC object already tracked"[1].

Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help mepinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.

I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.

I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

How should I go about debugging this further? I've been completely stuck onthis for two days now :(

[1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
[2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
 
P

Paul Rubin

buck said:
I tried to pinpoint this intermediate allocation with a similar
PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

Did you try a gdb watchpoint?
 
B

buck

buck said:
I tried to pinpoint this intermediate allocation with a similar
PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

Did you try a gdb watchpoint?

I didn't try that, since that piece of code is run millions of times, and I don't know the dict-id I'm looking for until after the problem has occurred.
 
J

Jack Diederich

I don't have any great advice, that kind of issue is hard to pin down.
That said, do try using a python compile with --with-debug enabled,
with that you can turn your unit tests on and off to pinpoint where
the refcounts are getting messed up. It also causes python to use
plain malloc()s so valgrind becomes useful. Worst case add assertions
and printf()s in the places you think are most janky.

-Jack
 
B

buck

Thanks Jack. I think printf is what it will come down to. I plan to put a little code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there.

I hope figuring out the file and line-number from within that code isn't too hard.


I don't have any great advice, that kind of issue is hard to pin down.
That said, do try using a python compile with --with-debug enabled,
with that you can turn your unit tests on and off to pinpoint where
the refcounts are getting messed up. It also causes python to use
plain malloc()s so valgrind becomes useful. Worst case add assertions
and printf()s in the places you think are most janky.

-Jack

I'm getting a fatal python error "Fatal Python error: GC object alreadytracked"[1].

Using gdb, I've pinpointed the place where the error is detected. It isan empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.

I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.

I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

How should I go about debugging this further? I've been completely stuck on this for two days now :(

[1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
[2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
 
B

buck

Thanks Jack. I think printf is what it will come down to. I plan to put a little code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there.

I hope figuring out the file and line-number from within that code isn't too hard.


I don't have any great advice, that kind of issue is hard to pin down.
That said, do try using a python compile with --with-debug enabled,
with that you can turn your unit tests on and off to pinpoint where
the refcounts are getting messed up. It also causes python to use
plain malloc()s so valgrind becomes useful. Worst case add assertions
and printf()s in the places you think are most janky.

-Jack

I'm getting a fatal python error "Fatal Python error: GC object alreadytracked"[1].

Using gdb, I've pinpointed the place where the error is detected. It isan empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.

I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.

I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

How should I go about debugging this further? I've been completely stuck on this for two days now :(

[1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
[2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
 
B

buck

This is what I came up with:
https://gist.github.com/1496028

We'll see if it helps, tomorrow.


Thanks Jack. I think printf is what it will come down to. I plan to put alittle code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there.

I hope figuring out the file and line-number from within that code isn't too hard.


I don't have any great advice, that kind of issue is hard to pin down.
That said, do try using a python compile with --with-debug enabled,
with that you can turn your unit tests on and off to pinpoint where
the refcounts are getting messed up. It also causes python to use
plain malloc()s so valgrind becomes useful. Worst case add assertions
and printf()s in the places you think are most janky.

-Jack

I'm getting a fatal python error "Fatal Python error: GC object already tracked"[1].

Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.

I believe that the dict was deallocated, cached, and re-allocated viaPyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.

I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

How should I go about debugging this further? I've been completely stuck on this for two days now :(

[1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
[2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
 
B

buck

This is what I came up with:
https://gist.github.com/1496028

We'll see if it helps, tomorrow.


Thanks Jack. I think printf is what it will come down to. I plan to put alittle code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there.

I hope figuring out the file and line-number from within that code isn't too hard.


I don't have any great advice, that kind of issue is hard to pin down.
That said, do try using a python compile with --with-debug enabled,
with that you can turn your unit tests on and off to pinpoint where
the refcounts are getting messed up. It also causes python to use
plain malloc()s so valgrind becomes useful. Worst case add assertions
and printf()s in the places you think are most janky.

-Jack

I'm getting a fatal python error "Fatal Python error: GC object already tracked"[1].

Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.

I believe that the dict was deallocated, cached, and re-allocated viaPyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.

I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

How should I go about debugging this further? I've been completely stuck on this for two days now :(

[1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
[2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top