spawning a process with subprocess

B

bhunter

Hi,

I've used subprocess with 2.4 several times to execute a process, wait
for it to finish, and then look at its output. Now I want to spawn
the process separately, later check to see if it's finished, and if it
is look at its output. I may want to send a signal at some point to
kill the process. This seems straightforward, but it doesn't seem to
be working.

Here's my test case:

import subprocess, time

cmd = "cat somefile"
thread = subprocess.Popen(args=cmd.split(), shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

while(1):
time.sleep(1)
if(thread.returncode):
break
else:
print thread.returncode

print "returncode = ", thread.returncode
for line in thread.stdout:
print "stdout:\t",line


This will just print the returncode of None forever until I Ctrl-C it.

Of course, the program works fine if I call thread.communicate(), but
since this waits for the process to finish, that's not what I want.

Any help would be appreciated.
 
K

kyosohma

Hi,

I've used subprocess with 2.4 several times to execute a process, wait
for it to finish, and then look at its output. Now I want to spawn
the process separately, later check to see if it's finished, and if it
is look at its output. I may want to send a signal at some point to
kill the process. This seems straightforward, but it doesn't seem to
be working.

Here's my test case:

import subprocess, time

cmd = "cat somefile"
thread = subprocess.Popen(args=cmd.split(), shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

while(1):
time.sleep(1)
if(thread.returncode):
break
else:
print thread.returncode

print "returncode = ", thread.returncode
for line in thread.stdout:
print "stdout:\t",line

This will just print the returncode of None forever until I Ctrl-C it.

Of course, the program works fine if I call thread.communicate(), but
since this waits for the process to finish, that's not what I want.

Any help would be appreciated.

I've read that this sort of thing can be a pain. I'm sure someone will
post and have other views though. I have had some success using
Python's threading module though. There's a pretty good walkthrough
here (it uses wxPython in its example):

http://wiki.wxpython.org/LongRunningTasks

Other places of interest include:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/491281
http://uucode.com/texts/pylongopgui/pyguiapp.html
http://sayspy.blogspot.com/2007/11/idea-for-process-concurrency.html

If I were doing something like this, I would have the process write
it's output to a file and periodically check to see if the file has
data.

Hopefully someone with more knowledge will come along soon.

Mike
 
B

bhunter

I've read that this sort of thing can be a pain. I'm sure someone will
post and have other views though. I have had some success using
Python's threading module though. There's a pretty good walkthrough
here (it uses wxPython in its example):

http://wiki.wxpython.org/LongRunningTasks

Other places of interest include:

http://aspn.activestate.com/ASPN/Co...com/2007/11/idea-for-process-concurrency.html

If I were doing something like this, I would have the process write
it's output to a file and periodically check to see if the file has
data.

Hopefully someone with more knowledge will come along soon.

Mike

Darn. Is threading the only way to do it? I was hoping not to have
to avoid that. Would have thought that there might be a way for
subprocess to handle this automatically.

Thanks for your help,
Brian
 
K

kyosohma

Darn. Is threading the only way to do it? I was hoping not to have
to avoid that. Would have thought that there might be a way for
subprocess to handle this automatically.

Thanks for your help,
Brian

This is just the way I do it...as I said, there are probably some
other people in the group who will have other opinions. By the way,
your statement "I was hoping not to have to avoid that" means that you
hoped to use threading...which I think is contradictory to what you
meant.

Mike
 
D

Diez B. Roggisch

bhunter said:
Hi,

I've used subprocess with 2.4 several times to execute a process, wait
for it to finish, and then look at its output. Now I want to spawn
the process separately, later check to see if it's finished, and if it
is look at its output. I may want to send a signal at some point to
kill the process. This seems straightforward, but it doesn't seem to
be working.

Here's my test case:

import subprocess, time

cmd = "cat somefile"
thread = subprocess.Popen(args=cmd.split(), shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

while(1):
time.sleep(1)
if(thread.returncode):
break
else:
print thread.returncode

print "returncode = ", thread.returncode
for line in thread.stdout:
print "stdout:\t",line


This will just print the returncode of None forever until I Ctrl-C it.

Of course, the program works fine if I call thread.communicate(), but
since this waits for the process to finish, that's not what I want.

Any help would be appreciated.

I have difficulties understanding what you are after here. To me it
looks as if everything works as expected. I mean you periodically check
on the liveness of the "thread" - which is what you describe above. All
you are missing IMHO is the actual work in this program.

So

while True:
if do_work():
if thread.returncode:
break
else:
thread.kill()

This assumes that your do_work()-method communicates the wish to end the
sub-process using it's returnvalue.

Diez
 
B

bhunter

This is just the way I do it...as I said, there are probably some
other people in the group who will have other opinions. By the way,
your statement "I was hoping not to have to avoid that" means that you
hoped to use threading...which I think is contradictory to what you
meant.

Mike

That was a typo. "I was hoping to avoid that" is what it should
read. Proof once again: never type while holding a baby. :)

Brian
 
B

bhunter

bhunter schrieb:













I have difficulties understanding what you are after here. To me it
looks as if everything works as expected. I mean you periodically check
on the liveness of the "thread" - which is what you describe above. All
you are missing IMHO is the actual work in this program.

So

while True:
if do_work():
if thread.returncode:
break
else:
thread.kill()

This assumes that your do_work()-method communicates the wish to end the
sub-process using it's returnvalue.

Diez

If the subprocess had finished, I expect that the returncode will not
be None, and the loop would break. The process hasn't actually
started. I know this because while this simple testcase just cats a
file, the real case submits a simulation job. This job never starts
until after I ctrl-c the program.

Brian
 
D

Diez B. Roggisch

bhunter said:
If the subprocess had finished, I expect that the returncode will not
be None, and the loop would break. The process hasn't actually
started. I know this because while this simple testcase just cats a
file, the real case submits a simulation job. This job never starts
until after I ctrl-c the program.

I don't know what the reason is for that, but I've just today worked
with code that exactly uses subprocess as advertised - spawning a
process which runs while the main process occasionally checks inside the
child's logfiles for certain state changes.

What might be though is that you need to consume the subprocesses stdout
in your program - because otherwise it will buffer until a certain
amount (usually 4 or 16k) and then halt.



Diez
 
B

bhunter

bhunter schrieb:





I don't know what the reason is for that, but I've just today worked
with code that exactly uses subprocess as advertised - spawning a
process which runs while the main process occasionally checks inside the
child's logfiles for certain state changes.

What might be though is that you need to consume the subprocesses stdout
in your program - because otherwise it will buffer until a certain
amount (usually 4 or 16k) and then halt.

Diez

It works? You mean there are place on earth where it really does
this? Excellent!

Still doesn't work for me, though. I first tried changing bufsize to
-1, then 0, then a very large number greater than the number of bytes
in this file (just to be sure). None seemed to have any affect.

Then I modified the code to this:

cmd = "cat /nfs/dv1/bhunter/o52a/verif/gmx/rgx.cc"
args = cmd.split()
thread = subprocess.Popen(args=args, shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True, bufsize=300000)

lines = []

try:
while(1):
time.sleep(1)
if(thread.returncode):
break
else:
print thread.returncode
lines.extend(thread.stdout.readlines())
except KeyboardInterrupt:
print lines

print "returncode = ", thread.returncode
for line in thread.stdout:
print "stdout:\t",line


This one hangs after the first print of returncode None. Then, I ctrl-
C it and find that the lines array is empty.

It's my guess that the process is waiting for some input, even though
'cat' clearly does not require anything from stdin. But if I put a
communicate() after Popen and a few other tweaks, then everything
works as expected--but of course not as desired.

Still confused.

Brian
 
M

MonkeeSage

Hi Brian,

Couple of things. You should use poll() on the Popen instance, and
should check it explicitly against None (since a 0 return code,
meaning exit successfully, will be treated as a false condition the
same as None). Also, in your second example, you block the program
when you call readlines on the pipe, since readlines blocks until it
reaches eof (i.e., until pipe closes stdout, i.e., process is
complete). Oh, and you don't have to split the input to the args
option yourself, you can just pass a string. So, putting it all
together, you want something like:

import subprocess, time

cmd = "cat somefile"
proc = subprocess.Popen(args=cmd, shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

while 1:
time.sleep(1)
if proc.poll() != None:
break
else:
print "waiting on child..."

print "returncode =", proc.returncode

HTH,
Jordan
 
O

Ove Svensson

bhunter said:
Hi,

I've used subprocess with 2.4 several times to execute a process, wait
for it to finish, and then look at its output. Now I want to spawn
the process separately, later check to see if it's finished, and if it
is look at its output. I may want to send a signal at some point to
kill the process. This seems straightforward, but it doesn't seem to
be working.

Here's my test case:

import subprocess, time

cmd = "cat somefile"
thread = subprocess.Popen(args=cmd.split(), shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

while(1):
time.sleep(1)
if(thread.returncode):
break
else:
print thread.returncode

print "returncode = ", thread.returncode
for line in thread.stdout:
print "stdout:\t",line


This will just print the returncode of None forever until I Ctrl-C it.

Of course, the program works fine if I call thread.communicate(), but
since this waits for the process to finish, that's not what I want.

Any help would be appreciated.

Reading documentation for subprocess, it mentions that

On UNIX, with shell=False (default): In this case, the Popen class
uses os.execvp() to execute the child program. args should normally
be a sequence. A string will be treated as a sequence with the string
as the only item (the program to execute).

On UNIX, with shell=True: If args is a string, it specifies the
command string to execute through the shell. If args is a sequence,
the first item specifies the command string, and any additional items
will be treated as additional shell arguments.

Since you have specified shell = True, and since you pass a sequence as
args, you will efficiently invoke the cat process through the shell and
then pass somefile as an extra argument to she shell (not the cat command)
That is probably not what you intended.

This can be solved by either
- Not splitting the cmd, in which case you will pass the whole cmd
string to the shell for execution
- Or setting shell to False. This is what I would have done, since
I can't see any reason for going via the shell. Please note that
if setting shell to False, you must then split the cmd.

Please also note that your test for the returncode might not work
since a normal returncode is 0. Your code will only detect non-0
values.

Also, it is good practice to call wait() on the subprocess in order
to avoid zombie-processes.

Finally, I find it somewhat misleading to use the name thread for
the variable used to represent a sub-process. Threads and processes
are not exactly the same

Hence, the following code should works as expected

cmd = "cat somefile"
proc = subprocess.Popen(
args = cmd.split(),
shell = False,
stdin = None,
stdout = subprocess.PIPE,
stderr = subprocess.STDOUT,
close_fds = True)

while True:
rc = proc.poll()
if rc != None: break
print rc
time.sleep(1)

lno = 1
for lin in proc.stdout:
print '%i: %s' % (lno,lin.rstrip('\n'))
lno += 1

rc = proc.wait()
print "rc = %i" % rc


/Ove
 
N

Nick Craig-Wood

MonkeeSage said:
Couple of things. You should use poll() on the Popen instance, and
should check it explicitly against None (since a 0 return code,
meaning exit successfully, will be treated as a false condition the
same as None). Also, in your second example, you block the program
when you call readlines on the pipe, since readlines blocks until it
reaches eof (i.e., until pipe closes stdout, i.e., process is
complete). Oh, and you don't have to split the input to the args
option yourself, you can just pass a string.

Though passing an array is good practice if you want to avoid passing
user data through the shell.
So, putting it all together, you want something like:

import subprocess, time

cmd = "cat somefile"
proc = subprocess.Popen(args=cmd, shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

while 1:
time.sleep(1)
if proc.poll() != None:
break
else:
print "waiting on child..."

print "returncode =", proc.returncode

This works fine unless the command generates a lot of output (more
than 64k on linux) when the output pipe will fill up and the process
will block until it is emptied.

If you run the below with `seq 10000` then it works fine but as
written the subprocess will block forever writing its output pipe
(under linux 2.6.23).

#------------------------------------------------------------
import subprocess, time

cmd = """
for i in `seq 20000`; do
echo $i
done
exit 42
"""

proc = subprocess.Popen(args=cmd, shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

while 1:
time.sleep(1)
if proc.poll() != None:
break
else:
print "waiting on child..."

print "returncode =", proc.returncode
lines = 0
total = 0
for line in proc.stdout:
lines += 1
total += len(line)
print "Received %d lines of %d bytes total" % (lines, total)
#------------------------------------------------------------

So you do need to read stuff from your subprocess, but there isn't a
way in the standard library to do that without potentially blocking.

There are a few solutions

1) use the python expect module (not windows)

http://pexpect.sourceforge.net/

2) set your file descriptors non blocking. The following recipe shows
a cross platform module to do it.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440554

Or just do it with the fcntl module

3) Use a thread to read stuff from your subprocess and allow it to
block on proc.stdout.read()

Here is an example of 2)

#------------------------------------------------------------
import subprocess, time, os
from fcntl import fcntl, F_GETFL, F_SETFL
from errno import EAGAIN

cmd = """
for i in `seq 100000`; do
echo $i
done
exit 42
"""

proc = subprocess.Popen(args=cmd, shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

# Set non blocking (unix only)
fcntl(proc.stdout, F_SETFL, fcntl(proc.stdout, F_GETFL) | os.O_NONBLOCK)

def read_all(fd):
out = ""
while 1:
try:
bytes = fd.read(4096)
except IOError, e:
if e[0] != EAGAIN:
raise
break
if not bytes:
break
out += bytes
return out

rx = ""
while 1:
time.sleep(1)
if proc.poll() != None:
break
else:
print "waiting on child..."
rx += read_all(proc.stdout)

rx += read_all(proc.stdout)
print "returncode =", proc.returncode
lines = 0
total = 0
for line in rx.split("\n"):
lines += 1
total += len(line)
print "Received %d lines of %d bytes total" % (lines, total)
#------------------------------------------------------------

Which runs like this on my machine

$ python subprocess-shell-nb.py
waiting on child...
waiting on child...
waiting on child...
waiting on child...
waiting on child...
waiting on child...
waiting on child...
waiting on child...
returncode = 42
Received 100001 lines of 488895 bytes total
 
M

MonkeeSage

Though passing an array is good practice if you want to avoid passing
user data through the shell.

Well, he was setting shell=True, but I guess being explicit (about
that) is better than implicit. ;)
So, putting it all together, you want something like:
import subprocess, time
cmd = "cat somefile"
proc = subprocess.Popen(args=cmd, shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)
while 1:
time.sleep(1)
if proc.poll() != None:
break
else:
print "waiting on child..."
print "returncode =", proc.returncode

This works fine unless the command generates a lot of output (more
than 64k on linux) when the output pipe will fill up and the process
will block until it is emptied.

If you run the below with `seq 10000` then it works fine but as
written the subprocess will block forever writing its output pipe
(under linux 2.6.23).

#------------------------------------------------------------
import subprocess, time

cmd = """
for i in `seq 20000`; do
echo $i
done
exit 42
"""

proc = subprocess.Popen(args=cmd, shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

while 1:
time.sleep(1)
if proc.poll() != None:
break
else:
print "waiting on child..."

print "returncode =", proc.returncode
lines = 0
total = 0
for line in proc.stdout:
lines += 1
total += len(line)
print "Received %d lines of %d bytes total" % (lines, total)
#------------------------------------------------------------

So you do need to read stuff from your subprocess, but there isn't a
way in the standard library to do that without potentially blocking.

There are a few solutions

1) use the python expect module (not windows)

http://pexpect.sourceforge.net/

2) set your file descriptors non blocking. The following recipe shows
a cross platform module to do it.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440554

Or just do it with the fcntl module

3) Use a thread to read stuff from your subprocess and allow it to
block on proc.stdout.read()

Here is an example of 2)

#------------------------------------------------------------
import subprocess, time, os
from fcntl import fcntl, F_GETFL, F_SETFL
from errno import EAGAIN

cmd = """
for i in `seq 100000`; do
echo $i
done
exit 42
"""

proc = subprocess.Popen(args=cmd, shell=True,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=subprocess.STDOUT, close_fds=True)

# Set non blocking (unix only)
fcntl(proc.stdout, F_SETFL, fcntl(proc.stdout, F_GETFL) | os.O_NONBLOCK)

def read_all(fd):
out = ""
while 1:
try:
bytes = fd.read(4096)
except IOError, e:
if e[0] != EAGAIN:
raise
break
if not bytes:
break
out += bytes
return out

rx = ""
while 1:
time.sleep(1)
if proc.poll() != None:
break
else:
print "waiting on child..."
rx += read_all(proc.stdout)

rx += read_all(proc.stdout)
print "returncode =", proc.returncode
lines = 0
total = 0
for line in rx.split("\n"):
lines += 1
total += len(line)
print "Received %d lines of %d bytes total" % (lines, total)
#------------------------------------------------------------

Which runs like this on my machine

$ python subprocess-shell-nb.py
waiting on child...
waiting on child...
waiting on child...
waiting on child...
waiting on child...
waiting on child...
waiting on child...
waiting on child...
returncode = 42
Received 100001 lines of 488895 bytes total

Nice. Thanks for the recipe link too.

Regards,
Jordan
 
T

Thomas Bellman

bhunter said:
* The problem with the testcase, I believe, was the size of the file
and the output pipe filling up, as Nick suggested. When run on a
smaller file, with Jordan's suggestions, it works fine. With a larger
file, it's necessary to do as Nick says. If the size of the file is
unknown, its best to use this case as the default. This seems
unfortunate to me, because it's quite a bit of code to do something
that should be fairly straightforward--at least, that's what I think.

You may be interrested in the module 'asyncproc', which I wrote
a couple of years ago to make it easier working with processes
that would otherwise block on output. You can download it at
<http://www.lysator.liu.se/~bellman/download/asyncproc.py>.

It probably only works on Unix, but considering your use of "cat"
as a test program, I suppose that isn't a problem for you.
 
B

bhunter

Wow, everyone. Great comments. Thanks so much!

A few points on all of the above, just so I don't look too stupid:

* The problem with the testcase, I believe, was the size of the file
and the output pipe filling up, as Nick suggested. When run on a
smaller file, with Jordan's suggestions, it works fine. With a larger
file, it's necessary to do as Nick says. If the size of the file is
unknown, its best to use this case as the default. This seems
unfortunate to me, because it's quite a bit of code to do something
that should be fairly straightforward--at least, that's what I think.

* Using poll() and checking for None and not non-zero: Yes, I had
both of those originally in my testcase, but when I re-wrote and re-
wrote it after it initially didn't work those initial concepts got
dropped. Thanks for reminding me.

* Yes, I should use proc instead of thread as a variable. Good point,
Ove. But your solution works on small files but chokes on larger
files, too.

Thanks again...and just to reiterate, I really think this could be
more straightforward for the rest of us if Popen could do all of this
on its own.

Brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top