polling for output from a subprocess module

T

Thomas Bellman

try:
test = Popen(test_path,
stdout=PIPE,
stderr=PIPE,
close_fds=True,
env=test_environ)
while test.poll() == None:
ready = select.select([test.stderr], [], [])
if test.stderr in ready[0]:
t_stderr_new = test.stderr.readlines()
if t_stderr_new != []:
print "STDERR:", "\n".join(t_stderr_new)
t_stderr.extend(t_stderr_new) [...]
The problem is, that it seems that all the output from the subprocess
seems to be coming at once. Do I need to take a different approach?

The readlines() method will read until it reaches end of file (or
an error occurs), not just what is available at the moment. You
can see that for your self by running:

$ python -c 'import sys; print sys.stdin.readlines()'

The call to sys.stdin.readlines() will not return until you press
Ctrl-D (or, I think, Ctrl-Z if you are using MS-Windows).

However, the os.read() function will only read what is currently
available. Note, though, that os.read() does not do line-based
I/O, so depending on the timing you can get incomplete lines, or
multiple lines in one read.
 
J

jakub.hrozek

Hello,
My program uses the subprocess module to spawn a child and capture its
output. What I'd like to achieve is that stdout is parsed after the
subprocess finishes, but anything that goes to stderr is printed
immediately. The code currently looks like:

try:
test = Popen(test_path,
stdout=PIPE,
stderr=PIPE,
close_fds=True,
env=test_environ)

while test.poll() == None:
ready = select.select([test.stderr], [], [])

if test.stderr in ready[0]:
t_stderr_new = test.stderr.readlines()
if t_stderr_new != []:
print "STDERR:", "\n".join(t_stderr_new)
t_stderr.extend(t_stderr_new)

except OSError, e:
print >>sys.stderr, _("Test execution failed"), e
else:
self.result.return_code = test.returncode
self.result.process(test.stdout.readlines(), t_stderr)


The problem is, that it seems that all the output from the subprocess
seems to be coming at once. Do I need to take a different approach?
 
C

Christian Heimes

Thomas said:
The readlines() method will read until it reaches end of file (or
an error occurs), not just what is available at the moment. You
can see that for your self by running:

Bad idea ;)

readlines() on a subprocess Popen instance will block when you PIPE more
than one stream and the buffer of the other stream is full.

You can find some insight at http://bugs.python.org/issue1606. I
discussed the matter with Guido a while ago.

Christian
 
J

jakub.hrozek

try:
test = Popen(test_path,
stdout=PIPE,
stderr=PIPE,
close_fds=True,
env=test_environ)
while test.poll() == None:
ready = select.select([test.stderr], [], [])
if test.stderr in ready[0]:
t_stderr_new = test.stderr.readlines()
if t_stderr_new != []:
print "STDERR:", "\n".join(t_stderr_new)
t_stderr.extend(t_stderr_new) [...]
The problem is, that it seems that all the output from the subprocess
seems to be coming at once. Do I need to take a different approach?

The readlines() method will read until it reaches end of file (or
an error occurs), not just what is available at the moment. You
can see that for your self by running:

$ python -c 'import sys; print sys.stdin.readlines()'

The call to sys.stdin.readlines() will not return until you press
Ctrl-D (or, I think, Ctrl-Z if you are using MS-Windows).

However, the os.read() function will only read what is currently
available. Note, though, that os.read() does not do line-based
I/O, so depending on the timing you can get incomplete lines, or
multiple lines in one read.

Right, I didn't realize that. I'll try the os.read() method. Reading
what's available (as opposed to whole lines) shouldn't be an issue in
this specific case. Thanks for the pointer!
 
T

Thomas Bellman


Why is it a bad idea to see how the readlines() method behaves?

readlines() on a subprocess Popen instance will block when you PIPE more
than one stream and the buffer of the other stream is full.
You can find some insight at http://bugs.python.org/issue1606. I
discussed the matter with Guido a while ago.

Umm... Yes, you are correct that the code in the original post
also has a deadlock problem. I missed that. But saying that it
is the readline() method that is blocking is a bit misleading,
IMHO. Both processes will be blocking, in a deadly embrace.
It's a problem that has been known since the concept of inter-
process communication was invented, and isn't specific to the
readlines() method in Python.

But the OP *also* has the problem that I described in my reply.
Even if he only PIPE:d one of the output streams from his
subprocess, he would only receive its output when the subprocess
finished (if it ever does), not as it is produced.


(To those that don't understand why the OP's code risks a deadly
embrace: if a process (A) writes significant amounts of data to
both its standard output and standard error, but the process that
holds the other end of those streams (process B) only reads data
from one of those streams, process A will after a while fill the
operating system's buffers for the other stream. When that
happens, the OS will block process A from running until process B
reads data from that stream too, freeing up buffer space. If
process B never does that, then process A will never run again.

The OP must therefore do a select() on both the standard output
and standard error of his subprocess, and use os.read() to
retrieve the output from both streams to free up buffer space in
the pipes.)
 
I

Ivo

Thomas said:
try:
test = Popen(test_path,
stdout=PIPE,
stderr=PIPE,
close_fds=True,
env=test_environ)
while test.poll() == None:
ready = select.select([test.stderr], [], [])
if test.stderr in ready[0]:
t_stderr_new = test.stderr.readlines()
if t_stderr_new != []:
print "STDERR:", "\n".join(t_stderr_new)
t_stderr.extend(t_stderr_new) [...]
The problem is, that it seems that all the output from the subprocess
seems to be coming at once. Do I need to take a different approach?

The readlines() method will read until it reaches end of file (or
an error occurs), not just what is available at the moment. You
can see that for your self by running:

$ python -c 'import sys; print sys.stdin.readlines()'

The call to sys.stdin.readlines() will not return until you press
Ctrl-D (or, I think, Ctrl-Z if you are using MS-Windows).

However, the os.read() function will only read what is currently
available. Note, though, that os.read() does not do line-based
I/O, so depending on the timing you can get incomplete lines, or
multiple lines in one read.
be carefull that you specify how much you want to read at a time,
otherwise it cat be that you keep on reading.

Specify read(1024) or somesuch.

In case of my PPCEncoder I recompiled the mencoder subprocess to deliver
me lines that end with \n.

If anyone can tell me how to read a continues stream than I am really
interested.

cya
 
T

Thomas Bellman

Ivo said:
Thomas Bellman wrote:
be carefull that you specify how much you want to read at a time,
otherwise it cat be that you keep on reading.
Specify read(1024) or somesuch.

Well, of course you need to specify how much you want to read.
Otherwise os.read() throws an exception:
Traceback (most recent call last):
In case of my PPCEncoder I recompiled the mencoder subprocess to deliver
me lines that end with \n.
If anyone can tell me how to read a continues stream than I am really
interested.

I have never had any problem when using the os.read() function,
as long as I understand the effects of output buffering in the
subprocess. The file.read() method is a quite different animal.

(And then there's the problem of getting mplayer/mencoder to
output any *useful* information, but that is out of the scope of
this newsgroup. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top