subprocess.Popen does not close pipe in an error case


S

Steven K. Wong

Below, I have a Python script that launches 2 child programs, prog1
and prog2, with prog1's stdout connected to prog2's stdin via a pipe.
(It's like executing "prog1 | prog2" in the shell.)

If both child programs exit with 0, then the script runs to
completion. But if prog2 exits with non-0, prog1 does not exit and the
script hangs (i.e. prog1.poll() always returns None) -- unless I
uncomment the 2 lines marked by XXX to close prog1.stdout.

I was expecting that I don't have to explicitly close prog1.stdout,
whether prog2 succeeds or fails. Is the current behavior a bug in the
subprocess module or is it expected? Or am I doing something wrong?

Thanks.

import subprocess
import time

# prog1: a program that writes lots of data to the pipe
cmd = ['zcat', '--force', 'a_large_file']
prog1 = subprocess.Popen(cmd, bufsize=-1, stdout=subprocess.PIPE)

# prog2: a program that fails without reading much data from the pipe
cmd = ['python', '-c', 'import time; time.sleep(10); asdf']
prog2 = subprocess.Popen(cmd, bufsize=-1, stdin=prog1.stdout,
stdout=open('popen.out', 'w'))
print 'waiting for a while'

retCodeProg2 = prog2.wait()
print 'prog2 returns', retCodeProg2
# XXX
# if retCodeProg2 != 0:
# prog1.stdout.close()
while prog1.poll() is None:
print 'sleep a bit'
time.sleep(1)
retCodeProg1 = prog1.poll()
print 'prog1 returns', retCodeProg1
 
Ad

Advertisements

N

Nobody

Below, I have a Python script that launches 2 child programs, prog1
and prog2, with prog1's stdout connected to prog2's stdin via a pipe.
(It's like executing "prog1 | prog2" in the shell.)

If both child programs exit with 0, then the script runs to
completion. But if prog2 exits with non-0, prog1 does not exit and the
script hangs (i.e. prog1.poll() always returns None) -- unless I
uncomment the 2 lines marked by XXX to close prog1.stdout.

I was expecting that I don't have to explicitly close prog1.stdout,
whether prog2 succeeds or fails. Is the current behavior a bug in the
subprocess module or is it expected? Or am I doing something wrong?

Thanks.

import subprocess
import time

# prog1: a program that writes lots of data to the pipe
cmd = ['zcat', '--force', 'a_large_file']
prog1 = subprocess.Popen(cmd, bufsize=-1, stdout=subprocess.PIPE)

# prog2: a program that fails without reading much data from the pipe
cmd = ['python', '-c', 'import time; time.sleep(10); asdf']
prog2 = subprocess.Popen(cmd, bufsize=-1, stdin=prog1.stdout,
stdout=open('popen.out', 'w'))

I think that you should close prog1.stdout here. Otherwise, there will
be two readers on the pipe (the calling process and prog2). Even if one of
them dies, there's always the possibility that the caller might eventually
decide to read prog1.stdout itself. If you close it in the caller, when
prog2 terminates there will be no readers, and prog1 will get SIGPIPE (or
write() will fail with EPIPE if SIGPIPE is handled).
 
S

Steven K. Wong

I think that you should close prog1.stdout here. Otherwise, there will
be two readers on the pipe (the calling process and prog2). Even if one of
them dies, there's always the possibility that the caller might eventually
decide to read prog1.stdout itself. If you close it in the caller, when
prog2 terminates there will be no readers, and prog1 will get SIGPIPE (or
write() will fail with EPIPE if SIGPIPE is handled).

Thanks for raising a great point, that prog1.stdout is also readable
by the calling process, not just by prog2. Therefore, I agree it makes
sense to explicitly call prog1.stdout.close() in the given code (say
right after the creation of prog2).

Suppose now all the prog1.poll() calls/loop are replaced by a single
prog1.wait(). Without the explicit prog1.stdout.close(), prog1.wait()
will not return, so the calling process still hangs. Because calling
prog1.wait() means that the calling process will naturally never read
prog1.stdout, I would argue that prog1.wait() should close the pipe
before actually waiting for prog1 to exit. Makes sense?
 
S

Steven K. Wong

Well, the example code at http://www.python.org/doc/2.6.2/library/subprocess.html#replacing-shell-pipeline
has the same issue:

output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]

After communicate() returns, if you wait for p1 to finish (by calling
p1.poll() repeatedly or p1.wait()), you can hang if the conditions
described in the original post are true, i.e. p1 wrote lots of data to
the pipe and p2 failed without reading much data from the pipe.

Perhaps the doc can be improved to remind folks to close p1.stdout if
the calling process doesn't need it, unless wait() is changed to close
it and p1.wait() is called.

Am I making any sense here?
 
N

Nobody

Suppose now all the prog1.poll() calls/loop are replaced by a single
prog1.wait(). Without the explicit prog1.stdout.close(), prog1.wait()
will not return, so the calling process still hangs. Because calling
prog1.wait() means that the calling process will naturally never read
prog1.stdout, I would argue that prog1.wait() should close the pipe
before actually waiting for prog1 to exit. Makes sense?

prog1.stdout might be being read by a different thread.
 
Ad

Advertisements

N

Nobody

Well, the example code at
http://www.python.org/ ... /subprocess.html#replacing-shell-pipeline
has the same issue:
Perhaps the doc can be improved to remind folks to close p1.stdout if
the calling process doesn't need it, unless wait() is changed to close
it and p1.wait() is called.

Am I making any sense here?

The docs should include the p1.stdout.close().

It isn't needed in the typical case, where p2 runs until EOF on stdin, but
(as you have noticed) it matters if p2 terminates prematurely.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top