popen2 with large input

C

cherico

from popen2 import popen2

r, w = popen2 ( 'tr "[A-Z]" "[a-z]"' )
w.write ( t ) # t is a text file of around 30k bytes
w.close ()
text = r.readlines ()
print text
r.close ()

This simple script halted on

w.write ( t )

Anyone knows what the problem is?
 
E

Eric Brunel

cherico said:
from popen2 import popen2

r, w = popen2 ( 'tr "[A-Z]" "[a-z]"' )
w.write ( t ) # t is a text file of around 30k bytes
w.close ()
text = r.readlines ()
print text
r.close ()

This simple script halted on

w.write ( t )

Anyone knows what the problem is?

Yep: deadlock... Pipes are synchronized: you can't read from (resp. write to) a
pipe if the process at the other end does not write to (resp. read from) it. If
you try the command "tr '[A-Z]' '[a-z]'" interactively, you'll see that
everytime tr receives a line, it outputs *immediately* the converted line. So if
you write a file having several lines to the pipe, on the first \n, tr will try
to write to its output, and will be stuck since your program is not reading from
it. So it won't read on its input anymore, so your program will be stuck because
it can't write to the pipe. And they'll wait for each other until the end of
times...

If you really want to use the "tr" command for this stuff, you'd better send
your text lines by lines and read the result immediatly, like in:

text = ''
for line in text.splitlines(1):
w.write(line)
w.flush() # Mandatory because of output bufferization - see below
text += r.readline()
w.close()
r.close()

It *may* work better, but you cannot be sure: in fact, you just can't know
exactly when tr will actually output the converted text. Even worse: since
output is usually buffered, you'll only see the output from tr when its standard
output is flushed, and you can't know when that will be...

(BTW, the script above does not work on my Linux box: the first r.readline()
never returns...)

So the conclusion is: don't use pipes unless you're really forced to. They're a
hell to use, since you never know how to synchronize them.

BTW, if the problem you posted is your real problem, why on earth don't you do:
text = t.lower()
???

HTH
 
J

Jeff Epler

The connection to the child process created by the popen family have
some inherent maximum size for data "in flight". I'm not sure how to
find out what that value is, but it might be anywhere from a few bytes
to a few K.

So tr starts to write its output as it gets input, but you won't read
its output before you've written all your output. If the size of tr's
output is bigger than the size of the buffer for tr's unread output,
you'll deadlock.

As an aside, the particular problem you pose can be solved with Python's
str.translate method. If the actual goal is to "work like tr", then use
that instead and forget about popen.

Anyway, to solve the popen2 problem, you'll need to write something like this:
[untested, and as you can see there's lots of pseudocode]
def getoutput( command, input ):
r, w = popen2(command)
rr = [r]; ww = [w]
output = []
set r and w nonblocking
while 1:
_r, _w, _ = select.select(rr, ww, [], 0)

if _w:
write some stuff from input to w
if nothing left:
w.close(); ww = []
if _r:
read some stuff into output
if nothing to read:
handle the fact that r was closed
if w was closed: break
else: probably an error condition
return "".join(output)

You could also write 'input' into a temporary file and use
commands.getoutput() or os.popen(.., "r").

Jeff
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

popen2 1
don't understand popen2 9
Problem with select.poll and popen2 1
how to flush child_stdin 5
Reversing output of user input by using while loop... 2
popen2 psql 2
Possible problem with popen2 module 2
IPC 3

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top