Bug in threading.Thread.join() ?

P

Peter Hansen

I'm still trying to understand the behaviour that I'm
seeing but I'm already pretty sure that it's either
a bug, or something that would be considered a bug if
it didn't perhaps avoid even worse behaviour.

Inside the join() method of threading.Thread objects,
a Condition named self.__block is acquired, and then
the wait logic is executed. After the wait() finishes,
self.__block is released and the method returns.

If you hit Ctrl-C while the join's wait() is occurring,
you'll raise a KeyboardInterrupt and bypass the
release() call. (I'm observing this on Win XP with
Python 2.4 but have no reason to think it wouldn't
work the same on other platforms, given the docs
on signals and such.) If you do this, the thread
you were waiting for will never be able to complete
its cleanup because __bootstrap() calls __stop()
and that tries to acquire the same Condition object,
which has never been released. (I suspect this will
happen only if its the MainThread that is doing
the join() call since KeyboardInterrupts only occur
in the main thread.)

A simple try/finally in join() appears to solve the
problem, but I'm unsure that this is a good idea,
partly because I'm a little surprised nobody else has
found this problem before and I lack confidence that
I've really found a bug.

Anyone have thoughts on this? I'll file a bug
report shortly unless someone can point out the
error in my reasoning or a reason why this must be
the way it is.

-Peter
 
T

Tim Peters

[Peter Hansen]
I'm still trying to understand the behaviour that I'm
seeing but I'm already pretty sure that it's either
a bug, or something that would be considered a bug if
it didn't perhaps avoid even worse behaviour.

Inside the join() method of threading.Thread objects,
a Condition named self.__block is acquired, and then
the wait logic is executed. After the wait() finishes,
self.__block is released and the method returns.

If you hit Ctrl-C while the join's wait() is occurring,
you'll raise a KeyboardInterrupt and bypass the
release() call.

(I'm observing this on Win XP with
Python 2.4 but have no reason to think it wouldn't
work the same on other platforms, given the docs
on signals and such.)

Then you're doing something other than what you described. Here on
WinXP SP2 w/ Python 2.4c2:
.... def run(self):
.... import time
.... while True:
.... time.sleep(1)
....
I can hit Ctrl+C all day at this point, and nothing (visible) happens.
That's because it's sitting in self.__block.wait(), which is in turn
sitting in waiter.acquire(), and it's simply not possible for Ctrl+C
to interrupt a mutex acquire.
If you do this, the thread you were waiting for will never be able
to complete its cleanup because __bootstrap() calls __stop()
and that tries to acquire the same Condition object,
which has never been released. (I suspect this will
happen only if its the MainThread that is doing
the join() call since KeyboardInterrupts only occur
in the main thread.)

A simple try/finally in join() appears to solve the
problem, but I'm unsure that this is a good idea,
partly because I'm a little surprised nobody else has
found this problem before and I lack confidence that
I've really found a bug.

Anyone have thoughts on this?

As above, I don't know what you're doing. Maybe you're doing a join()
with a timeout too? In that case, I doubt anyone gave any thought to
what happens if you muck with KeyboardInterrupt too.
 
P

Peter Hansen

Tim said:
[Peter Hansen]
If you hit Ctrl-C while the join's wait() is occurring,
you'll raise a KeyboardInterrupt and bypass the
release() call.
Then you're doing something other than what you described. Here on
WinXP SP2 w/ Python 2.4c2: [snip]
I can hit Ctrl+C all day at this point, and nothing (visible) happens.
That's because it's sitting in self.__block.wait(), which is in turn
sitting in waiter.acquire(), and it's simply not possible for Ctrl+C
to interrupt a mutex acquire.

As above, I don't know what you're doing. Maybe you're doing a join()
with a timeout too? In that case, I doubt anyone gave any thought to
what happens if you muck with KeyboardInterrupt too.

Yes, definitely doing this with a timeout value on the join().
Changing your example to do that (join(5), say) pretty
much demonstrates the trouble... in this case a traceback
for KeyboardInterrupt is printed, but the program does not
terminate because of the thread stuck at the __stop()'s
__acquire().

I'll take your last sentence as a form of blessing to go
file a bug report, unless you don't think that's a good idea.

Thanks, Tim!

-Peter
 
P

Peter Hansen

Peter said:
I'll take your last sentence as a form of blessing to go
file a bug report...

Filed as
http://sourceforge.net/tracker/index.php?func=detail&aid=1171023&group_id=5470&atid=105470

I guess it makes more sense for any further discussion to
go on there...

(Coincidentally, another bug report was filed just a few days
ago for a closely related item. As Tim mentioned, the wait()
without a timeout can never be interrupted by Ctrl-C, and
somebody else reported that as a bug (1167930) only about three
days before I encountered this one.)

-Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top