How to cleanly pause/stop a long running function?

B

Basilisk96

Suppose I have a function that may run for a long time - perhaps from
several minutes to several hours. An example would be this file
processing function:

import os
def processFiles(startDir):
for root, dirs, files in os.walk(startDir):
for fname in files:
if fname.lower().endswith(".zip"):
# ... do interesting stuff with the file here ...

Imagine that there are thousands of files to process. This could take
a while. How can I implement this so that the caller can pause or
interrupt this function, and resume its program flow? Doing a Ctrl+C
interrupt would be a not-so-clean-way of performing such a thing, and
it would quit the application altogether. I'd rather have the function
return a status object of what it has accomplished thus far.

I have heard about threads, queues, and asynchronous programming, but
am not sure which is appropriate for this and how to apply it. Perhaps
the above function should be a method of a class that inherits from
the appropriate handler class? Any help will be appreciated.

-Basilisk96
 
A

Adam Atlas

Suppose I have a function that may run for a long time - perhaps from
several minutes to several hours. An example would be this file
processing function:

import os
def processFiles(startDir):
for root, dirs, files in os.walk(startDir):
for fname in files:
if fname.lower().endswith(".zip"):
# ... do interesting stuff with the file here ...

Imagine that there are thousands of files to process. This could take
a while. How can I implement this so that the caller can pause or
interrupt this function, and resume its program flow? Doing a Ctrl+C
interrupt would be a not-so-clean-way of performing such a thing, and
it would quit the application altogether. I'd rather have the function
return a status object of what it has accomplished thus far.

I have heard about threads, queues, and asynchronous programming, but
am not sure which is appropriate for this and how to apply it. Perhaps
the above function should be a method of a class that inherits from
the appropriate handler class? Any help will be appreciated.

-Basilisk96

Consider using generators.
http://docs.python.org/tut/node11.html#SECTION00111000000000000000000

This way, whatever part of your program calls this function can
completely control the iteration. Maybe you can have it yield status
information each time.
 
M

Michael Tobis

Doing a Ctrl+C
interrupt would be a not-so-clean-way of performing such a thing, and
it would quit the application altogether. I'd rather have the function
return a status object of what it has accomplished thus far.

Just in case you are unaware that you can explicitly handle ^C in your
python code, look up the KeyboardInterrupt exception.

mt
 
S

Steven D'Aprano

Suppose I have a function that may run for a long time - perhaps from
several minutes to several hours. An example would be this file
processing function:

import os
def processFiles(startDir):
for root, dirs, files in os.walk(startDir):
for fname in files:
if fname.lower().endswith(".zip"):
# ... do interesting stuff with the file here ...

Imagine that there are thousands of files to process. This could take
a while. How can I implement this so that the caller can pause or
interrupt this function, and resume its program flow?

I don't think there really is what I would call a _clean_ way, although
people may disagree about what's clean and what isn't.

Here's a way that uses global variables, with all the disadvantages that
entails:

last_dir_completed = None
restart = object() # a unique object

def processFiles(startDir):
global last_dir_completed
if startDir is restart:
startDir = last_dir_completed
for root, dirs, files in os.walk(startDir):
for fname in files:
if fname.lower().endswith(".zip"):
# ... do interesting stuff with the file here ...
last_Dir_completed = root



Here's another way, using a class. Probably not the best way, but a way.

class DirLooper(object):
def __init__(self, startdir):
self.status = "new"
self.startdir = startdir
self.root = startdir
def run(self):
if self.status == 'new':
self.loop(self.startdir)
elif self.status == 'finished':
print "nothing to do"
else:
self.loop(self.root)
def loop(self, where):
self.status = "started"
for self.root, dirs, files in os.walk(where):
# blah blah blah...


Here's another way, catching the interrupt:

def processFiles(startDir):
try:
for root, dirs, files in os.walk(startDir):
# blah blah blah ...
except KeyboardInterrupt:
do_something_with_status()


You can fill in the details :)


As for which is "better", I think the solution using a global variable is
the worst, although it has the advantage of being easy to implement. I
think you may need to try a few different implementations and judge for
yourself.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top