Tuning a select() loop for os.popen3()

Christopher DeMarco · Dec 30, 2005

Hi all...

I've written a class to provide an interface to popen; I've included
the actual select() loop below. I'm finding that "sometimes" popen'd
processes take "a really long time" to complete and "other times" I
get incomplete stdout.

E.g:

- on boxA ffmpeg returns in ~25s; on boxB (comparable hardware,
identical OS) ~5m.

- ``ls'' on a directory with 15 nodes returns full stdout; ``ls -R''
on that same directory (with ~32K nodes beneath) stops after
4097KB of output.

The code in question is running on Linux 2.6.x; no cross-platform
portability desired. popen'd commands will never be interactive; I
just wanna read stdin/stdout and perhaps feed a one-shot string via
stdin.

Here's the relevent code (stripped of comments and various OO
setup/output stuff):

# # ## ### ##### ######## ############# #####################
# cut here

def run(self):
import os, select, syslog
(_stdin, _stdout, _stderr) = os.popen3(self.command)

stdoutChunks = []; stderrChunks = []
readList = [_stdout, _stderr];
if self.stdinString is not "": writeList = [_stdin]
else: writeList = []
readStderr = False; readStdout = False

i = 0
while True:
i += 1
(r, w, x) = select.select(readList, writeList, [], 1)
read = ""

if self.stdinString is not "":
if w:
bytesWritten = os.write(_stdin.fileno(), self.stdinString)
writeList.remove(_stdin)
_stdin.close()
continue

if r:
if _stderr in r:
readStderr = True
read = os.read(_stderr.fileno(), 16384)
if read: stderrChunks.append(read)
else: readList.remove(_stderr)
continue

elif _stdout in r:
readStdout = True
read = os.read(_stdout.fileno(), 16384)
if read:
stdoutChunks.append(read)
syslog.syslog("Command instance read %d from stdout" % len(read))
else: readList.remove(_stdout)
continue

else:
if \
(readStderr and self.dieOnStderr) \
or \
readStdout:
syslog.syslog("Command instance finished")
break
return

# cut here
# # ## ### ##### ######## ############# #####################

Tweaking (a) the os.read() buffer size and (b) the select() timeout
and testing with ``ls -R'' on a directory with ~ 32K nodes beneath, I
find the following trends:

1. With a very small os.read() buffer, I get full stdout, but running
time is rather long. Running time increases as select() timeout
increases.

2. With a very large os.read() buffer, I get incomplete stdout (but
running time is *very* fast). As select() timeout increases, I get
better and better results - with a select() timeout of 0.2 I seem to
get reliably full stdout.

The values used in the code I've pasted above - large buffer, large
select() timeout - seem to perform "well enough"; none of the
previously described problems manifest. However, ``ls -lR /'' (way
more than 32K nodes) "sometimes" gives incomplete stdout.

My first question, then, is paranoid: I've run all these benchmarks
because the application using this code saw a HUGE performance hit
when we started using popen'd commands which generated "lots of"
output.

Is there anything wrong with the logic in my code?!

Will I see severe performance degradation (or worse, incomplete
stdout/stderr) as system variables change (e.g. system load increases,
popen'd program changes, popen'd program increases workload, etc.)?

Next question - how do I tune the select() timeout and the os.read()
buffer correctly? Is it *really* per- command, per- system, per-
phase-of-moon voodoo? Is there a Reccommended Setup for such a
select() loop?

Thanks in advance, for insight as well as for tolerating my
long-windedness...

--
Christopher DeMarco <[email protected]>
Alephant Systems (http://alephant.net)
PGP public key at http://pgp.alephant.net
+1-412-708-9660

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDtYoUm4cw+C52z1wRApzUAJ9Nw6bUIlxG8hph4Xixu4fwmjB4ngCcC6JC
8ST4U1vgtFsQpqauooK9+Tw=
=qf+r
-----END PGP SIGNATURE-----

Donn Cave · Dec 31, 2005

Christopher DeMarco said:
I've written a class to provide an interface to popen; I've included
the actual select() loop below. I'm finding that "sometimes" popen'd
processes take "a really long time" to complete and "other times" I
get incomplete stdout.
....

My first question, then, is paranoid: I've run all these benchmarks
because the application using this code saw a HUGE performance hit
when we started using popen'd commands which generated "lots of"
output.

Is there anything wrong with the logic in my code?!

I tried a modified version with 'ls -R .', which yields about
1 Mb of data, and saw no problems on MacOS X. Same data, and
about the same time as 'ls -R .' from the shell, maybe 5% longer.

But I modified it a lot. I removed every "continue", I removed
the "break", and I made readList the condition for the while loop.
With these changes, a 0.1 second timeout is about the same as no
timeout, but at 0.01 second I do see a little slow down. Still
no loss of data.

I suspect there is indeed something wrong with your logic, but
I'm not going to try to figure it out. If you're sure it's
right, I think you should post again with the actual code for
a program that demonstrates your problem(s). Your goal for the
revised logic should be 1) avoid gratuitous branches in the flow
of control, 2) reduce number of state variables that you have to
account for, and 3) express your intentions clearly with respect
to the timeouts -- what do you do when it times out, and why?

Donn Cave, (e-mail address removed)

Rock, Paper, Scissor game. Im getting TypeError, unsupported operand type(s) for -=: 'NoneType' and 'int'	2	Aug 29, 2023
13 second delay using select() with Popen3()	0	Dec 21, 2006
polling for output from a subprocess module	6	Feb 4, 2008
Python 2.2.1 and select()	6	Mar 24, 2008
select() problem, not timing out	3	Jul 29, 2004
shebang & windows: call an extensionless git hook	0	May 22, 2014
How to resume execution of a FOR loop after timeout error	3	Oct 14, 2009
Why 'files.py' does not print the filenames into a table format?	32	Jun 15, 2013

Tuning a select() loop for os.popen3()

Christopher DeMarco

Donn Cave

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads