Python 2.2.1 and select()

D

Derek Martin

Hi kids!

I've got some code that uses select.select() to capture all the output
of a subprocess (both stdout and stderr, see below). This code works
as expected on a variety of Fedora systems running Python > 2.4.0, but
on a Debian Sarge system running Python 2.2.1 it's a no-go. I'm
thinking this is a bug in that particular version of Python, but I'd
like to have confirmation if anyone can provide it.

The behavior I see is this: the call to select() returns:
[<file corresponding to sub-proc's STDOUT>] [] []

If and only if the total amount of output is greater than the
specified buffer size, then reading on this file hangs indefinitely.
For what it's worth, the program whose output I need to capture with
this generates about 17k of output to STDERR, and about 1k of output
to STDOUT, at essentially random intervals. But I also ran it with a
test shell script that generates roughly the same amount of output to
each file object, alternating between STDOUT and STDERR, with the same
results.

Yes, I'm aware that this version of Python is quite old, but I don't
have a great deal of control over that (though if this is indeed a
python bug, as opposed to a problem with my implementation, it might
provide some leverage to get it upgraded)... Thanks in advance for
any help you can provide. The code in question (quite short) follows:

def capture(cmd):
buffsize = 8192
inlist = []
inbuf = ""
errbuf = ""

io = popen2.Popen3(cmd, True, buffsize)
inlist.append(io.fromchild)
inlist.append(io.childerr)
while True:
ins, outs, excepts = select.select(inlist, [], [])
for i in ins:
x = i.read()
if not x:
inlist.remove(i)
else:
if i == io.fromchild:
inbuf += x
if i == io.childerr:
errbuf += x
if not inlist:
break
if io.wait():
raise FailedExitStatus, errbuf
return (inbuf, errbuf)

If anyone would like, I could also provide a shell script and a main
program one could use to test this function...

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFH6CQSHEnASN++rQIRAh7dAJsFSzzE2OBAdCwC7N0lXW4/1AvMxACfcibu
YV8/VS3XI0Bwanc6swvEdM4=
=D1br
-----END PGP SIGNATURE-----
 
N

Noah

If and only if the total amount of output is greater than the
specified buffer size, then reading on this file hangs indefinitely.
For what it's worth, the program whose output I need to capture with
this generates about 17k of output to STDERR, and about 1k of output
to STDOUT, at essentially random intervals. But I also ran it with a
test shell script that generates roughly the same amount of output to
each file object, alternating between STDOUT and STDERR, with the same
results.

I think this is more of a limitation with the underlying clib.
Subprocess buffering defaults to block buffering instead of
line buffering. You can't change this unless you can recompile
the application you are trying to run in a subprocess or
unless you run your subprocess in a pseudotty (pty).

Pexpect takes care of this problem. See http://www.noah.org/wiki/Pexpect
for more info.
 
D

Derek Martin

I think this is more of a limitation with the underlying clib.
Subprocess buffering defaults to block buffering instead of
line buffering.

That's an interesting thought, but I guess I'd need you to elaborate
on how the buffering mode would affect the operation of select(). I
really don't see how your explanation can cover this, given the
following:

1. The subprocess used to test, in both the case where it worked, and
the case where it did not, was the very same shell script -- not a
compiled program (well, bash technically). As far as I'm aware, there
haven't been any significant changes to the buffering mode defaults in
glibc... But I could easily be mistaken.

2. By default, STDERR is always unbuffered, whether or not STDOUT is a
terminal device or not.

3. The actual subproc I care about is a perl script.

4. Most importantly, the whole point of using select() is that it
should only return a list of file objects which are ready for reading
or writing. In this case, in both the working case (Python 2.4+ on
Red Hat) and the non-working case (Python 2.2.1 on Debian 3.1),
select() returns the file object corresponding to the subprocess's
STDOUT, which *should* mean that there is data ready to be read on
that file descriptor. However, the actual read blocks, and both the
parent and the child go to sleep.

This should be impossible. That is the very problem select() is
designed to solve...

Moreover, we've set the buffer size to 8k. If your scenario were
correct, then at the very least, as soon as the process wrote 8k to
STDOUT, there should be data ready to read. Assuming full buffering
is enabled for the pipe that connects STDOUT of the subprocess to the
parent, the call to select() should block until one of the following
conditions occur:

- 8k of data is written by the child into the pipe

- any amount of data is written to STDERR

- the child process terminates

The last point is important; if the child process only has 4k of data
to write to STDOUT, and never writes anything to STDERR, then the
buffer will never fill. However, the program will terminate, at which
point (assuming there was no explicit call to close() previously) the
operating system will close all open file descriptors, and flush all
of the child's I/O buffers. At that point, the parent process, which
would be sleeping in select(), will wake up, read the 4k of data, and
(eventually) close its end of the pipe (an additional iteration
through the select() loop will be required, I believe).

Should the program write output to STDERR before the 8k STDOUT buffer
is full, then again, the parent, sleeping in select(), will awaken, and
select will return the file object corresponding to the parent's end
of the pipe connecting to the child's STDERR. Again, all of this is the
essence of what select() does. It is supposed to guarantee that any
file descriptors (or objects) it returns are in fact ready for data to
be read or written.

So, unless I'm missing something, I'm pretty certain that buffering
mode has nothing to do with what's going on here. I think there are
only a few possibilities:

1. My implementation of the select() loop is subtlely broken. This
seems like the most likely case to me; however I've been over it a
bunch of times, and I can't find anything wrong with it. It's
undenyable that select is returning a file object, and that reads
on that file object immediately after the call to select block. I
can't see how this could be possible, barring a bug somewhere else.

2. select.select() is broken in the version of Python I'm using.

3. The select() system call is somehow broken in the Linux kernel I'm
using. I tend to rule this out, because I'm reasonably certain
someone would have noticed this before I did. The kernel in
question is being used on thousands of machines (I'm not
exaggerating) which run a variety of network-oriented programs. I
can't imagine that none of them uses select() (though perhaps its
possible that none use it in quite the manner I'm using it here).
But it may be worth looking at... I could write an implementation
of a select() loop in C and see how that works.

If you can see any flaw in my analysis, by all means point it out!
Thanks for your response.

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFH6F2LHEnASN++rQIRAul8AJ9ISg2qw8+iu+2EpEznoNh2TypEPQCfeLqx
L6eH1CaEvnvKrTtx4mxTqkg=
=+SWp
-----END PGP SIGNATURE-----
 
G

Gabriel Genellina

You may try using two worker threads to read both streams; this way you
don't care about the blocking issues.
 
F

Francesco Bochicchio

Il Mon, 24 Mar 2008 17:58:42 -0400, Derek Martin ha scritto:
Hi kids!

I've got some code that uses select.select() to capture all the output
of a subprocess (both stdout and stderr, see below). This code works as
expected on a variety of Fedora systems running Python > 2.4.0, but on a
Debian Sarge system running Python 2.2.1 it's a no-go. I'm thinking
this is a bug in that particular version of Python, but I'd like to have
confirmation if anyone can provide it.

The behavior I see is this: the call to select() returns: [<file
corresponding to sub-proc's STDOUT>] [] []

If and only if the total amount of output is greater than the specified
buffer size, then reading on this file hangs indefinitely. For what it's
worth, the program whose output I need to capture with this generates
about 17k of output to STDERR, and about 1k of output to STDOUT, at
essentially random intervals. But I also ran it with a test shell
script that generates roughly the same amount of output to each file
object, alternating between STDOUT and STDERR, with the same results.

Yes, I'm aware that this version of Python is quite old, but I don't
have a great deal of control over that (though if this is indeed a
python bug, as opposed to a problem with my implementation, it might
provide some leverage to get it upgraded)... Thanks in advance for any
help you can provide. The code in question (quite short) follows:

def capture(cmd):
buffsize = 8192
inlist = []
inbuf = ""
errbuf = ""

io = popen2.Popen3(cmd, True, buffsize) inlist.append(io.fromchild)
inlist.append(io.childerr)
while True:
ins, outs, excepts = select.select(inlist, [], []) for i in ins:
x = i.read()
if not x:
inlist.remove(i)
else:
if i == io.fromchild:
inbuf += x
if i == io.childerr:
errbuf += x
if not inlist:
break
if io.wait():
raise FailedExitStatus, errbuf
return (inbuf, errbuf)

If anyone would like, I could also provide a shell script and a main
program one could use to test this function...

From yor description, it would seem that two events occurs:

- there are actual data to read, but in amount less than bufsize.
- the subsequent read waits (for wathever reason) until a full buffer can
be read, and therefore lock your program.

Try specifying bufsize=1 or doing read(1). If my guess is correct, you
should not see the problem. I'm not sure that either is a good solution
for you, since both have performance issues.

Anyway, I doubt that the python library does more than wrapping the
system call, so if there is a bug it is probably in the software layers
under python.

Ciao
 
D

Derek Martin

I might be completely off the mark here. I have not tested your code or even
closely examined it. I don't mean to waste your time. I'm only giving a
reflex response because your problem seems to exactly match a very common
situation where someone tries to use select with a pipe to a subprocess
created with popen and that subprocess uses C stdio.

Yeah, you're right, more or less. I talked to someone much smarter
than I here in the office, who pointed out that the behavior of
Python's read() without a specified size is to attempt to read until
EOF. This will definitely cause the read to block (if there's I/O
waiting from STDERR), if you're allowing I/O to block... :(

The solution is easy though...

def set_nonblock(fd):
flags = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, flags | os.O_NONBLOCK)

Then in the function, after calling popen:
set_nonblock(io.fromchild.fileno())
set_nonblock(io.childerr.fileno())

Yay for smart people.

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFH6uNyHEnASN++rQIRAsFpAKCMr60u03yDHsIH5xfbs+1klWIETwCfeNDe
ldWnh3VrcTZV7M5RigFFfv4=
=kY9y
-----END PGP SIGNATURE-----
 
D

Derek Martin

You should still try Pexpect :) As I recall there are also gotchas
on the non-blocking trick.

Well, you need to remember to read ALL the file descriptors (objects)
that select() returns, and if you don't, your program will hang and
spin... It might also be the case that if the child is using stdio
functions for output, you'll need to set the buffering mode explicitly
(which you can theoretically do, see below). Aside from that, there
are none, and actually the problem with my program had nothing to do
with stdio buffering modes.
Pexpect is 100% pure Python. No extra libs to install.

I looked at it, and (what I believe is) the very reason it manages to
solve this particular problem is also the reason it won't work for me:
it combines STDOUT and STDERR to one I/O stream. The reason I'm
bothering with all this is because I need to keep them separate.

Interestingly, were it not for that fact, I'm not sure that
pexpect wouldn't still suffer from the same problem that plagued my
original implementation. I had to drag out W. R. Stevens to remind
myself of a few things before I continued with this discussion...
Even though it forces the program to use line buffering, read() would
still try to read until EOF, and if STDOUT and STDERR were separate
files, it seems likely that it would eventually block reading from one
file when the child program was sending its output to the other. The
only way to prevent that problem, aside from non-blocking I/O, is to
do a read(1) (i.e. read one character at a time), which will use
silly amounts of CPU time. But mixing stdio and non-stdio functions
is kind of funky, and I'm not completely sure what the behavior would
be in that case, and couldn't quickly ind anything in Stevens to
suggest one way or the other.

Also, you could combine the streams yourself without using pexpect by
having your subproc use the shell to redirect STDERR to STDOUT, or (if
Python has it) using the dup() family of system calls to combine the
two in Python [i.e. dup2(1,2)]. As far as I can tell, the whole pseudo
terminal thing (though it definitely does have its uses) is a red
herring for this particular problem...

I also read (some of) the pexpect FAQ, and there are a number of
incorrect statements in it, particularly in the section 'Why not just
use a pipe (popen())?"

- a pipe, if programmed correctly, is perfectly fine for controlling
interactive programs, most of the time. You will almost certainly
need to use non-blocking I/O, unless your communicating programs
are perfectly synchronized, or else you'll have I/O deadlocks. The
only time a pipe isn't OK is where the program tries to use terminal
services (e.g. writing to /dev/tty), in which case you will need a
pseudo-terminal device (as the FAQ correctly points out with regard
to entering passwords in SSH).

- Any application which contains "#include <stdio.h>" does not
necessarily make use of the stdio library (which isn't really a
separate library at all, it's part of the standard C library).
The file stdio.h is just a C header file which contains
declarations of the prototypes for stdio-related functions, and
various constants. It's often included in source files simply
because it's so common to need it, or to make use of some constants
defined there. You're only actually using stdio if you use
stdio functions in your program, which are:

printf, fopen, getc, getchar, putc, scanf, gets, puts, etc.

In particular, open(), read() and write() are *not* stdio
functions, and do *not* buffer I/O. They're Unix system calls, and
the C functions by the same name are simply interfaces to those
system calls. There is a kernel I/O buffer associated with all of
the streams you will use them on, but this is not a stdio buffer.

I have not checked Python's code, but based on its behavior, I
assume that its read() function is a wrapper around the Unix read()
system call, and as such it is not using stdio at all, and thus the
stdio buffers are not relevant (though if the child is using stdio
functions, that could be an issue).

- The FAQ states: "The STDIO lib will use block buffering when
talking to a block file descriptor such as a pipe." This is only
true *by default* and indeed you can change the buffering mode of
any stdio stream using the setbuf() and setvbuf() stdio functions
(though I don't know if Python provides a way to do this, but I
assume it does). Since the python program is the one opening the
pipe, it controls the buffering mode, and you have only to change
it in your program, and the child will honor that.

The way it works is because popen() opens the pipe, and then forks
a child process, which inherits all of the parent's open files.
Changing the properties on the files can be done in the parent, and
will be honored in the child, because *it's the same file*. :)

- It states: "[when writing to a pipe] In this mode the currently
buffered data is flushed when the buffer is full. This causes most
interactive programs to deadlock." That's misleading/false.

Deadlocks can easily occur in this case, but it's definitely
avoidable, and it's not *necessarily* because of buffering, per se.
It's because you're trying to read or write to a file descriptor
which is not ready to be read from or written to, which can be
caused by a few things. STDIO buffers could be the cause, or it
could simply be that the parent is reading from one file
descriptor, but the child is writing to a different one. Or
(perhaps even more likely) it is trying to read from the parent, so
both are reading and no one is writing. Non-blocking I/O allows
you to recover from all such I/O synchronization problems, though
you'll still need (your program) to figure out where you should be
reading and/or writing in order to avoid an infinite loop. But
non-blocking I/O may not help with interactive programs that make
use of the stdio library, unless you also explicitly set the
buffering mode.

All of the above are explained in much more detail in W. Richard
Stevens' "Advanced Programming in the Unix Environment" than I can
possibly go into here. See chapter 3 for the stdio stuff, chapter 12
for non-blocking I/O, and chapter 14 for discussions about pipes,
popen(), and how buffering modes can be controlled by your program and
how that affects the child.

Don't get me wrong... pexpect is useful. But some of the problems
you're trying to solve with it have other solutions, which in some
cases might be more efficiently done in those other ways.

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFH6z0HHEnASN++rQIRAtbZAKCzm9EMxe/yK6NiEeHy/xUpmCOxLQCggq+W
qj5Yv2Q6lDxCqpcMgd/xkB4=
=2xa0
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top