Reading Live Output from a Subprocess

B

bunslow

Okay, I've been trying for days to figure this out, posting on forums, Googling, whatever. I have yet to find a solution that has worked for me. (I'm using Python 3.2.2, Ubuntu 11.04.) Everything I've tried has led to buffered output being spat back all at once after the subprocess terminates. I need this functionality because the script I am currently writing is only glue, and most of the work is done by a subprocess (which could potentially runfor a long time).

Here are some StackOverflow questions about this topic:
http://stackoverflow.com/questions/2525263/capture-subprocess-output
http://stackoverflow.com/questions/...-tee-behavior-in-python-when-using-subprocess
http://stackoverflow.com/questions/803265/getting-realtime-output-using-subprocess
http://stackoverflow.com/questions/1183643/unbuffered-read-from-process-using-subprocess-in-python
http://stackoverflow.com/questions/527197/intercepting-stdout-of-a-subprocess-while-it-is-running

And a few others for good measure:
http://www.linuxquestions.org/quest...e-stdout-stream-from-external-program-612998/
http://devlishgenius.blogspot.com/2008/10/logging-in-real-time-in-python.html

Not a single one of the solutions above has worked for me. I've tried

######################################
import subprocess as sub
out = sub.Popen(["pypy", "5"], universal_newlines=True, stdout=sub.PIPE, stderr=sub.STDOUT, bufsize=1)

line = out.stdout.readline()
out.stdout.flush()
while line:
print(line)
line = out.stdout.readline()
out.stdout.flush()
######################################

I've tried

######################################
line = out.stdout.readline()
while line:
print(line)
line = out.stdout.readline()
######################################

I've tried

######################################
for line in out.readline():
print(line)
######################################

I've tried

######################################
for line in out.communicate():
print(line)
######################################

etc...

None have worked, and it seems that the while loops worked in Python 2.x (according to those links), but not in Python 3. (I am a two week old Python coder, and it's already tied for my strongest language, which is why I decided to start with Python 3.)

I've heard that the Pexpect module works wonders, but the problem is that relies on pty which is available in Unix only. Additionally, because I want this script to be usable by others, any solution should be in the standard library, which means I'd have to copy the Pexpect code into my script to use it.

Is there any such solution in the Python 3 Standard Library, and if not, how much of a thorn is this?

"There should be one-- and preferably only one --obvious way to do it."
Unfortunately, this is one case where the above is true for Perl but not Python. Such an example in Perl is

open(PROG, "command |") or die "Couldn't start prog!";
while (<PROG>) {
print "$_"; }

(Note that I do not know Perl and do not have any intentions to learn it; the above comes from the script I was previously copying and extending, but I imagine (due to its simplicity) that it's a common Perl idiom. Note however, that the above does fail if the program re-prints output to the same line, as many long-running C programs do. Preferably this would also be caught in a Python solution.)

If there is a general consensus that this is a problem for lots of people, I might consider writing a PEP.

Of course, my highest priority is solving the blasted problem, which is holding up my script at the moment. (I can work around this by redirecting theprogram to a tmp file and reading that, but that would be such a perilous and ugly kludge that I would like to avoid it if at all possible.)

Thanks,
Bill
 
N

Nobody

Okay, I've been trying for days to figure this out, posting on forums,
Googling, whatever. I have yet to find a solution that has worked for me.
(I'm using Python 3.2.2, Ubuntu 11.04.) Everything I've tried has led to
buffered output being spat back all at once after the subprocess
terminates.

In all probability, this is because the child process (pypy) is
buffering its stdout, meaning that the data doesn't get passed to the OS
until either the buffer is full or the process terminates. If it doesn't
get passed to the OS, then the OS can't pass it on to whatever is on the
read end of the pipe. In that situation, there is nothing that the parent
process can do about it.
I've heard that the Pexpect module works wonders,

If it does, that would confirm the buffering hypothesis. Pexpect causes
the child process' stdout to be associated with a pty.

The "stdout" stream created by the C library (libc) is initially
line-buffered if it is associated with a tty (according to the isatty()
function) and fully-buffered otherwise. The program can change the
buffering with e.g. setvbuf(), or it can explicitly fflush(stdout) after
each line, but this is up to the program, and is not something which can
be controlled externally (short of "hacking" the child process with
techniques such as ptrace() or LD_PRELOAD).

While the libc behaviour is occasionally inconvenient, it is mandated by
the C standard (7.19.3p7):

As initially opened, the standard error stream is not fully
buffered; the standard input and standard output streams are
fully buffered if and only if the stream can be determined not
to refer to an interactive device.

It's up to individual programs to force line-buffered output where
appropriate (e.g. GNU grep has the --line-buffered switch, GNU sed has the
-u switch, etc).
 
D

Dubslow

In all probability, this is because the child process (pypy) is
buffering its stdout, meaning that the data doesn't get passed to the OS
until either the buffer is full or the process terminates. If it doesn't
get passed to the OS, then the OS can't pass it on to whatever is on the
read end of the pipe. In that situation, there is nothing that the parent
process can do about it.
It's just a short test script written in python, so I have no idea how to even control the buffering (and even if I did, I still can't modify the subprocess I need to use in my script). What confuses me then is why Perl is able to get around this just fine without faking a terminal or similar stuff.(And also, this needs to work in Windows as well.) For the record, here's the test script:
######################################
#!/usr/bin/python

import time, sys
try:
total = int(sys.argv[1])
except IndexError:
total = 10

for i in range(total):
print('This is iteration', i)
time.sleep(1)

print('Done. Exiting!')
sys.exit(0)
######################################
If it does, that would confirm the buffering hypothesis. Pexpect causes
the child process' stdout to be associated with a pty.

The "stdout" stream created by the C library (libc) is initially
line-buffered if it is associated with a tty (according to the isatty()
function) and fully-buffered otherwise. The program can change the
buffering with e.g. setvbuf(), or it can explicitly fflush(stdout) after
each line, but this is up to the program, and is not something which can
be controlled externally (short of "hacking" the child process with
techniques such as ptrace() or LD_PRELOAD).

While the libc behaviour is occasionally inconvenient, it is mandated by
the C standard (7.19.3p7):

As initially opened, the standard error stream is not fully
buffered; the standard input and standard output streams are
fully buffered if and only if the stream can be determined not
to refer to an interactive device.

It's up to individual programs to force line-buffered output where
appropriate (e.g. GNU grep has the --line-buffered switch, GNU sed has the
-u switch, etc).

Well, they shouldn't assume they can determine what's interactive or not *grumble*. I take it then that setting Shell=True will not be fake enough for catching output live?


Maybe, but even if it does work as expected, it's a _lot_ of pain to go through, with separate threads, etc.. just to get something so blasted simple to work. It certainly isn't obvious. Thanks for the link though.
 
V

Vinay Sajip

I've heard that the Pexpect module works wonders, but the problem is thatrelies on pty which is available in Unix only. Additionally, because I want this script to be usable by others, any solution should be in the standard library, which means I'd have to copy the Pexpect code into my script to use it.

Is there any such solution in the Python 3 Standard Library, and if not, how much of a thorn is this?

"There should be one-- and preferably only one --obvious way to do it."
Unfortunately, this is one case where the above is true for Perl but not Python. Such an example in Perl is

open(PROG, "command |") or die "Couldn't start prog!";
        while (<PROG>) {
                 print "$_"; }

(Note that I do not know Perl and do not have any intentions to learn it;the above comes from the script I was previously copying and extending, but I imagine (due to its simplicity) that it's a common Perl idiom. Note however, that the above does fail if the program re-prints output to the same line, as many long-running C programs do. Preferably this would also be caught in a Python solution.)

If there is a general consensus that this is a problem for lots of people, I might consider writing a PEP.

Of course, my highest priority is solving the blasted problem, which is holding up my script at the moment. (I can work around this by redirecting the program to a tmp file and reading that, but that would be such a perilous and ugly kludge that I would like to avoid it if at all possible.)


Try the sarge package [1], with documentation at [2] and source code
at [3]. It's intended for your use case, works with both Python 2.x
and 3.x, and is tested on Linux, OS X and Windows. Disclosure: I'm the
maintainer.

Regards,

Vinay Sajip

[1] http://pypi.python.org/pypi/sarge/0.1
[2] http://sarge.readthedocs.org/en/latest/
[3] https://bitbucket.org/vinay.sajip/sarge/
 
N

Nobody

It's just a short test script written in python, so I have no idea how
to even control the buffering

In Python, you can set the buffering when opening a file via the third
argument to the open() function, but you can't change a stream's buffering
once it has been created. Although Python's file objects are built on the
C stdio streams, they don't provide an equivalent to setvbuf().

On Linux, you could use e.g.:

sys.stdout = open('/dev/stdout', 'w', 1)

Other than that, if you want behaviour equivalent to line buffering, call
sys.stdout.flush() after each print statement.
(and even if I did, I still can't modify the subprocess I need to use in
my script).

In which case, discussion of how to make Python scripts use line-buffered
output is beside the point.
What confuses me then is why Perl is able to get around this just fine
without faking a terminal or similar stuff.

It isn't. If a program sends its output to the OS in blocks, anything
which reads that output gets it in blocks. The language doesn't matter;
writing the parent program in assembler still wouldn't help.
I take it then that setting Shell=True will not be fake enough for
catching output live?

No. It just invokes the command via /bin/sh or cmd.exe. It doesn't affect
how the process' standard descriptors are set up.

On Unix, the only real use for shell=True is if you have a "canned" shell
command, e.g. from a file, and you need to execute it. In that situation,
args should be a string rather than a list. And you should never try to
construct such a string dynamically in order to pass arguments; that's an
injection attack waiting to happen.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top