Python C/API based multithread python program locks

K

kanji

Hi ALL,

I have written a multithreaded python program where each thread calls
a C function
(via Python/C extension module) to execute some tasks on a remote
node. The number
of threads == the number of nodes specified by the user.


The issue is it works most of the time, but occassionally (I mean this
is quite random ) it hangs and it does not generate any errors as
such. While trying to debug, sometimes even the gdb hangs, but i
managed to get a backtrace of a hung thread:

#0 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0xb75d11ee in pthread_cond_wait@@GLIBC_2.3.2 ()
from/lib/tls/libpthread.so.0
#2 0x0809bb3f in PyThread_acquire_lock ()
#3 0x0809e45c in _PyObject_GC_Del ()
#4 0x0807cad6 in PyEval_GetFuncDesc ()
#5 0x0807abc4 in PyEval_EvalCode ()
#6 0x0807b65e in PyEval_EvalCodeEx ()
#7 0x0807cbbb in PyEval_GetFuncDesc ()
#8 0x0807ab33 in PyEval_EvalCode ()
#9 0x0807b65e in PyEval_EvalCodeEx ()
#10 0x0807cbbb in PyEval_GetFuncDesc ()
#11 0x0807ab33 in PyEval_EvalCode ()
#12 0x0807b65e in PyEval_EvalCodeEx ()
#13 0x0807cbbb in PyEval_GetFuncDesc ()
#14 0x0807ab33 in PyEval_EvalCode ()
#15 0x0807b65e in PyEval_EvalCodeEx ()
#16 0x08078555 in PyEval_EvalCode ()
#17 0x08098569 in PyRun_FileExFlags ()
#18 0x080974d0 in PyRun_SimpleFileExFlags ()
#19 0x08096e1a in PyRun_AnyFileExFlags ()
#20 0x08053ac9 in Py_Main ()
#21 0x08053519 in main ()


So just to weed out the possibility that it is not because of some
error in the code, I iteratively called the same function (which
creates say 100 threads) in a for loop - for 500 times. I found that
it tends to hang at different iterations -- say may be at iteration
#480 or #12 or sometimes it sails smoothly.


in the python program -- the outputs from all threads are synchronized
via thread.join()

In the extension C srcs, i have used Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS brackets to take care of GIL. I have separately
tested the C functions and it seemed to work fine.

Any ideas what could be the possible problem ? The test system is RHEL
3 and Python version 2.2.2

Please let me know if there any useful pointers to solve this issue.

Thanks
kanji
 
A

Anders J. Munch

kanji said:
The issue is it works most of the time, but occassionally (I mean this
is quite random ) it hangs and it does not generate any errors as
such. While trying to debug, sometimes even the gdb hangs, but i
managed to get a backtrace of a hung thread:

#0 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0xb75d11ee in pthread_cond_wait@@GLIBC_2.3.2 ()
from/lib/tls/libpthread.so.0
#2 0x0809bb3f in PyThread_acquire_lock ()
#3 0x0809e45c in _PyObject_GC_Del ()

My guess is that the problem is not in this thread: Some other thread is
hung or crashed, while holding the GIL, and this thread here is just waiting
on the GIL.
So just to weed out the possibility that it is not because of some
error in the code, I iteratively called the same function (which
creates say 100 threads) in a for loop - for 500 times. I found that
it tends to hang at different iterations -- say may be at iteration
#480 or #12 or sometimes it sails smoothly.

You have a race condition. There may be some shared resource that is not
accessed in a thread-safe manner. Perhaps some C global variable used by
your code or a library that your code calls?

Or perhaps you just have a blocking I/O-call waiting for something that
never happens.

Good luck - you're gonna need it :/

- Anders
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,832
Latest member
GlennSmall

Latest Threads

Top