daemon thread cleanup approach

C

Carl Banks

Ok, so I have an issue with cleaning up threads upon a unexpected exit. I came up with a solution but I wanted to ask if anyone has any advice or warnings.

Basically I am writing a Python library to run certain tasks. All of the calls in the library start worker threads to do the actual work, and some ofthe worker threads are persistent, others not. Most threads have cleanup work to do (such as deleting temporary directories and killing spawned processes).

For better or worse, one of the requirements is that the library can't cause the program to hang no matter what, even if it means you have to forego cleanup in the event of an unexpected exit. Therefore all worker threads run as daemons. Nevertheless, I feel like the worker threads should at leastbe given a fair opportunity to clean up; all threads can be communicated with and asked to exit.

One obvious solution is to ask users to put all library calls inside a with-statement that cleans up on exit, but I don't like it for various reasons.
Using atexit doesn't work because it's called after the daemon threads are killed.

Here's the solution I came up with: in the library's init function, it willstart a non-daemon thread that simply joins the main thread, and then asksall existing worker threads to exit gracefully before timing out and leaving them to be killed. So if an exception ends the main thread, there is still a chance to clean up properly.

Does anyone see a potential problem with this approach? It it possible that this will cause the program to hang in any case? We can assume that all calls to the library will occur from the main thread, or at least from the same thread. (If that isn't the case, then the caller has taken responsibility to ensure the program doesn't hang.)

This is Python 2.7, and it's only ever going to run on Windows.

Thanks for any advice/warnings.

Carl Banks
 
M

Miki Tebeka

Greetings,
Ok, so I have an issue with cleaning up threads upon a unexpected exit.
What do you mean by "unexpected exit"? Uncaught exception? SIGTERM? ...
Using atexit doesn't work because it's called after the daemon threads are killed.
I don't follow. Who is killing the daemon threads?
...
It it possible that this will cause the program to hang in any case?
If due to a bug in the cleanup thread it hangs - the program will hang as well.

All the best,
 
C

Cameron Simpson

Here's the solution I came up with: in the library's init function, it will start a non-daemon thread that simply joins the main thread, and then asks all existing worker threads to exit gracefully before timing out and leaving them to be killed. So if an exception ends the main thread, there is still a chance to clean up properly.

Does anyone see a potential problem with this approach? It it possible that this will cause the program to hang in any case? We can assume that all calls to the library will occur from the main thread, or at least from the same thread. (If that isn't the case, then the caller has taken responsibility to ensure the program doesn't hang.)

That sounds safe to me, unless any of the subthreads call some C-level library
routine that hangs even in a daemon thread. Which I assume either isn't the
case or isn't a bug addressable this way anyway.

That's probably the best you can do from the sound of it, given that you may
not hang (for longer than your timeout choice) and the calls are all at the
whim of an external caller.

BTW, what were your dislikes of the with statement?

Disclaimer: I'm not a Windows guy.

Cheers,
Cameron Simpson <[email protected]>
 
C

Chris Angelico

Most threads have cleanup work to do (such as deleting temporary directories and killing spawned processes).

For better or worse, one of the requirements is that the library can't cause the program to hang no matter what...

This ma y be a fundamental problem. I don't know how Windows goes with
killing processes (can that ever hang?), but certainly you can get
unexpected delays deleting a temp dir, although it would probably
require some deliberate intervention, like putting your %temp% on a
remote drive and then bringing that server down. But believe you me,
if there is a stupid way to do something, someone WILL have done it.
(Have you ever thought what it'd be like to have your
swapfile/pagefile on a network drive? I mean, there's acres of room on
the server, why waste some of your precious local space?)

So you may want to organize this as a separate spin-off process that
does the cleaning up. That way, the main process has completely ended,
but the cleanup daemon is still busy. And if you're going to do that,
then the easiest way, IMO, would be to have your worker threads be
themselves in a separate process; your library passes work across to
this other process via a pipe or socket (this being Windows, that
would have to be a TCP socket, not a Unix domain socket, but a named
pipe would also work), and when the pipe/socket connection is broken,
the other end knows that it should clean up. That way, you get to
clean up perfectly even if the process terminates abruptly (segfault,
system kill, whatever), although possibly delayed until the system
notices that the other end is gone.

ChrisA
(sometimes I feel I suggest TCP/IP sockets the way Grant Imahara
suggests building a robot... enthusiastically and maaaaaybe too often)
 
C

Carl Banks

This ma y be a fundamental problem. I don't know how Windows goes with

killing processes (can that ever hang?), but certainly you can get

unexpected delays deleting a temp dir, although it would probably

require some deliberate intervention, like putting your %temp% on a

remote drive and then bringing that server down. But believe you me,

if there is a stupid way to do something, someone WILL have done it.

(Have you ever thought what it'd be like to have your

swapfile/pagefile on a network drive? I mean, there's acres of room on

the server, why waste some of your precious local space?)



So you may want to organize this as a separate spin-off process that

does the cleaning up.
[snip rest]


Thanks, that's good information. Even if the temp directories do fail to be removed before the join times out (which probably won't happen much) the situation is still no worse than the situation where the daemon thread is just killed without any chance to clean up.

And subprocesses would be a more reliable way to ensure cleanup and might be the direction I take it in the future.

Carl Banks
 
D

Devin Jeanpierre

Don't use daemon threads, they are inherently un-thread-safe: any
global access you do anywhere inside a daemon thread can fail, because
daemon threads are still potentially run during interpreter shutdown,
when globals are being deleted from every module. Most functions you
might call are not safe in a daemon thread at shutdown.

-- Devin
 
E

Ethan Furman

Don't use daemon threads, they are inherently un-thread-safe: any
global access you do anywhere inside a daemon thread can fail, because
daemon threads are still potentially run during interpreter shutdown,
when globals are being deleted from every module. Most functions you
might call are not safe in a daemon thread at shutdown.

Given the use-case (must shut down, cannot risk a hung process, orphan files be damned) I don't think having a daemon
thread die because it raised an exception trying to access a missing global is a big deal.
 
D

Devin Jeanpierre

Given the use-case (must shut down, cannot risk a hung process, orphan files
be damned) I don't think having a daemon thread die because it raised an
exception trying to access a missing global is a big deal.

It's certainly suboptimal. Subprocesses are better in every way.

-- Devin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top