blocked on futex

M

msoulier

Hello,

I'm using the Python packaged with CentOS 4.7, which is a patched
2.3.4. Yes, ancient but I can't do anything about it.

The problem is that my long-running process, which talks to PostgreSQL
via Django models, does a lot of reading and writing to and from the
disk and writes to a Unix domain socket is randomly locking up. I have
not found a consistent place in my code where this lock-up occurs, but
every time it does, an strace shows that it is sitting at an futex()
call, apparently waiting forever.

My first guess is that futex() is being used by pthread_mutex_lock()
for thread-safety. I'm not using threads but the interpreter was built
with threaded support AFAIK.

A step in the right direction would be appreciated, I'm at a loss on
this one. I'm currently trying to strace the entire run of the program
to gain some context when it locks up to find out what it was doing at
the time.

Cheers and thanks,
MIke
 
A

Aahz

[posted and e-mailed, please respond to newsgroup]

I'm using the Python packaged with CentOS 4.7, which is a patched
2.3.4. Yes, ancient but I can't do anything about it.

That's not ancient -- maybe middle-aged. ;-)
The problem is that my long-running process, which talks to PostgreSQL
via Django models, does a lot of reading and writing to and from the
disk and writes to a Unix domain socket is randomly locking up. I have
not found a consistent place in my code where this lock-up occurs, but
every time it does, an strace shows that it is sitting at an futex()
call, apparently waiting forever.

My first guess is that futex() is being used by pthread_mutex_lock()
for thread-safety. I'm not using threads but the interpreter was built
with threaded support AFAIK.

How many processes do you have running? What kind of guarantee do you
have that there's only one process if you think there should be only one?
What's on the other side of the socket? If there's no consistent place,
is it always the same object type involved or function/method call? I'm
not at all familiar with Django, but it might be using threads
internally.

Have you tried dumping core and using gdb to find out more about the
process state?
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Programming language design is not a rational science. Most reasoning
about it is at best rationalization of gut feelings, and at worst plain
wrong." --GvR, python-ideas, 2009-3-1
 
M

msoulier

How many processes do you have running?  What kind of guarantee do you

One process. No threads, no forking.
have that there's only one process if you think there should be only one?
What's on the other side of the socket?  If there's no consistent place,

A preforking daemon.
is it always the same object type involved or function/method call?  I'm
not at all familiar with Django, but it might be using threads
internally.

Django doesn't use threads and neither do I. It doesn't deadlock in
any consistent location that I can find.
Have you tried dumping core and using gdb to find out more about the
process state?

Yeah, just did. I need the debuginfo for proper symbols, but here's an
initial backtrace.

#0 0x0084d7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x009dbc1f in sem_wait@GLIBC_2.0 () from /lib/tls/libpthread.so.0
#2 0x00a7ce1e in PyThread_acquire_lock () from /usr/lib/
libpython2.3.so.1.0
#3 0x00a7ff07 in _PyObject_GC_Del () from /usr/lib/libpython2.3.so.
1.0
#4 0x00a29991 in PyCFunction_Call () from /usr/lib/libpython2.3.so.
1.0
#5 0x00a5c51c in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#6 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#7 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#8 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#9 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#10 0x00a5e0a6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.
1.0
#11 0x00a19e8e in PyFunction_SetClosure () from /usr/lib/
libpython2.3.so.1.0
#12 0x00a06637 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#13 0x00a0ddc8 in PyMethod_New () from /usr/lib/libpython2.3.so.1.0
#14 0x00a06637 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#15 0x00a581b0 in PyEval_CallObjectWithKeywords ()
from /usr/lib/libpython2.3.so.1.0
#16 0x00a50dda in _PyUnicodeUCS4_IsAlpha () from /usr/lib/
libpython2.3.so.1.0
#17 0x00a29991 in PyCFunction_Call () from /usr/lib/libpython2.3.so.
1.0
#18 0x00a5c51c in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#19 0x00a5e0a6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.
1.0
#20 0x00a5cd99 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#21 0x00a5e0a6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.
1.0
#22 0x00a19e8e in PyFunction_SetClosure () from /usr/lib/
libpython2.3.so.1.0
#23 0x00a06637 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#24 0x00a581b0 in PyEval_CallObjectWithKeywords ()
from /usr/lib/libpython2.3.so.1.0
#25 0x00a8147f in PyErr_CheckSignals () from /usr/lib/libpython2.3.so.
1.0
#26 0x00a81548 in PyErr_CheckSignals () from /usr/lib/libpython2.3.so.
1.0
#27 0x00a57fa1 in Py_MakePendingCalls () from /usr/lib/libpython2.3.so.
1.0
#28 0x00a58f4d in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#29 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#30 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#31 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#32 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#33 0x00a5e0a6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.
1.0
#34 0x00a19e8e in PyFunction_SetClosure () from /usr/lib/
libpython2.3.so.1.0
#35 0x00a06637 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#36 0x00a0ddc8 in PyMethod_New () from /usr/lib/libpython2.3.so.1.0
#37 0x00a06637 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#38 0x00a581b0 in PyEval_CallObjectWithKeywords ()
from /usr/lib/libpython2.3.so.1.0
#39 0x00a50dda in _PyUnicodeUCS4_IsAlpha () from /usr/lib/
libpython2.3.so.1.0
#40 0x00a29991 in PyCFunction_Call () from /usr/lib/libpython2.3.so.
1.0
#41 0x00a5c51c in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#42 0x00a5e0a6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.
1.0
#43 0x00a5cd99 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#44 0x00a5e0a6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.
1.0
#45 0x00a19e8e in PyFunction_SetClosure () from /usr/lib/
libpython2.3.so.1.0
#46 0x00a06637 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#47 0x00a5b998 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#48 0x00a5e0a6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.
1.0
#49 0x00a5cd99 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#50 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#51 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#52 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#53 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#54 0x00a5e0a6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.
1.0
#55 0x00a5e36d in PyEval_EvalCode () from /usr/lib/libpython2.3.so.1.0
#56 0x00a77827 in PyErr_Display () from /usr/lib/libpython2.3.so.1.0
#57 0x00a78952 in PyRun_SimpleFileExFlags () from /usr/lib/
libpython2.3.so.1.0
#58 0x00a799a4 in PyRun_AnyFileExFlags () from /usr/lib/
libpython2.3.so.1.0
#59 0x00a7e69e in Py_Main () from /usr/lib/libpython2.3.so.1.0
#60 0x080485b2 in main ()

It looks to me like it's blocking in pthread_mutex_lock, synchronizing
on a critical section for thread safety. Looks like a bug to me. I
went to report it, but there's no 2.3 version in the python bug
tracker so I'm assuming it's no longer supported.

Mike
 
A

Aahz

Yeah, just did. I need the debuginfo for proper symbols, but here's an
initial backtrace.

#0 0x0084d7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x009dbc1f in sem_wait@GLIBC_2.0 () from /lib/tls/libpthread.so.0
#2 0x00a7ce1e in PyThread_acquire_lock () from /usr/lib/
libpython2.3.so.1.0
#3 0x00a7ff07 in _PyObject_GC_Del () from /usr/lib/libpython2.3.so.
1.0
#4 0x00a29991 in PyCFunction_Call () from /usr/lib/libpython2.3.so.
1.0
#5 0x00a5c51c in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0

[...]
#53 0x00a5d98b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.
1.0
#54 0x00a5e0a6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.
1.0
#55 0x00a5e36d in PyEval_EvalCode () from /usr/lib/libpython2.3.so.1.0
#56 0x00a77827 in PyErr_Display () from /usr/lib/libpython2.3.so.1.0
#57 0x00a78952 in PyRun_SimpleFileExFlags () from /usr/lib/
libpython2.3.so.1.0
#58 0x00a799a4 in PyRun_AnyFileExFlags () from /usr/lib/
libpython2.3.so.1.0
#59 0x00a7e69e in Py_Main () from /usr/lib/libpython2.3.so.1.0
#60 0x080485b2 in main ()

It looks to me like it's blocking in pthread_mutex_lock, synchronizing
on a critical section for thread safety. Looks like a bug to me. I
went to report it, but there's no 2.3 version in the python bug
tracker so I'm assuming it's no longer supported.

That's a bit bizarre. You're correct that if this is a Python bug, there
will be no fixes available. However, you said earlier that this is a
patched Python, so I'm wondering whether the applied patch is broken.
Assuming I'm reading this correctly, it's hanging when trying to emit a
top-level exception using PyErr_Display.

At this point, I would continue watching core dumps and seeing whether
_PyObject_GC_Del and/or PyErr_Display are always in the stack trace.
You might also write your own top-level exception handler that logs to a
file.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"At Resolver we've found it useful to short-circuit any doubt and just
refer to comments in code as 'lies'. :)"
--Michael Foord paraphrases Christian Muirhead on python-dev, 2009-3-22
 
M

msoulier

That's a bit bizarre.  You're correct that if this is a Python bug, there
will be no fixes available.  However, you said earlier that this is a
patched Python, so I'm wondering whether the applied patch is broken.
Assuming I'm reading this correctly, it's hanging when trying to emit a
top-level exception using PyErr_Display.

Actually, I think that I may have found the problem.

I have calls to a standard python logger object in a signal handler,
and I think the logging api is synchronized, so that it calls
pthread_mutex_lock() when on Linux. This is a no-no from a signal
handler. I've removed the offending calls and so far I am unable to
get the process to lock-up again, so I think I may have found it.

As such, I'd say it's my bug and not Python's.

Mike
 
A

Aahz

Actually, I think that I may have found the problem.

I have calls to a standard python logger object in a signal handler,
and I think the logging api is synchronized, so that it calls
pthread_mutex_lock() when on Linux. This is a no-no from a signal
handler. I've removed the offending calls and so far I am unable to
get the process to lock-up again, so I think I may have found it.

As such, I'd say it's my bug and not Python's.

Great! That would also explain the traceback since I think signal
handlers get called from the top level.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"At Resolver we've found it useful to short-circuit any doubt and just
refer to comments in code as 'lies'. :)"
--Michael Foord paraphrases Christian Muirhead on python-dev, 2009-3-22
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top