Clueless: piping between 2 non-python processes

M

Michael Lehmeier

Hi, I have a problem.

I have written a big python script with a simple passage like that:

while 1 :
if os.system("A | B") == 256 :
sys.exit(0)

A is a streaming application, B an encoding application.
Both are not python.
Occasionally A terminates and the command is executed again.
If it is stopped by a SIGINT, os.system returns 256 and the script ends.

The problem is: B can also terminate by itself.
Whenever this happens, A continues to happily pipe data to a
non-existant B until eternity or A stops.

So I wanted to start A and B as fork, but all solutions to create a pipe
between them seem to demand that at least one of the two processes is in
python itself.
Or maybe I just understood it wrong.

So the basic problem is:
- create two processes A and B
- pipe from A to B
- terminate A when B ends

Can anybody help?
Thanks!
 
J

John Roth

Michael Lehmeier said:
Hi, I have a problem.

I have written a big python script with a simple passage like that:

while 1 :
if os.system("A | B") == 256 :
sys.exit(0)

A is a streaming application, B an encoding application.
Both are not python.
Occasionally A terminates and the command is executed again.
If it is stopped by a SIGINT, os.system returns 256 and the script ends.

The problem is: B can also terminate by itself.
Whenever this happens, A continues to happily pipe data to a
non-existant B until eternity or A stops.

So I wanted to start A and B as fork, but all solutions to create a pipe
between them seem to demand that at least one of the two processes is in
python itself.
Or maybe I just understood it wrong.

So the basic problem is:
- create two processes A and B
- pipe from A to B
- terminate A when B ends

Can anybody help?
Thanks!

You need an intermediary process in Python. Set it up as a
thread so that it receives data from A and sends it to B. Then
you can do whatever you want on exceptional conditions.

John Roth
 
M

Michael Lehmeier

You need an intermediary process in Python. Set it up as a
thread so that it receives data from A and sends it to B. Then
you can do whatever you want on exceptional conditions.

I am currently trying to do this and am making progress.

However, I still have a rather simple problem that I just can't solve.

I have a file B.py that looks like this:

import os

while 1 :
i = sys.stdin.read(10)
if len(i) == 0 :
break
sys.stdout.write(i)

It is supposed to print whatever it is fed.
When I use the following commands:

Bclass = popen2.Popen3("./B.py", 't')

while 1 :
Bclass.tochild.write("Hello World")

the process hangs in the Hello World line.
B.py never gets anything.

What's the problem?
 
A

Andrew Bennetts

So I wanted to start A and B as fork, but all solutions to create a pipe
between them seem to demand that at least one of the two processes is in
python itself.
Or maybe I just understood it wrong.

So the basic problem is:
- create two processes A and B
- pipe from A to B
- terminate A when B ends

I think you can do something like (untested):

import os, signal

pathToA = '/usr/bin/A'
pathToB = '/usr/bin/B'

# Create pipes
readA, writeB = os.pipe()
readB, writeA = os.pipe()

# Create process A
pidA = os.fork()
if pidA == 0: # child
os.dup2(readA.fileno(), 0) # set read pipe to stdin
os.dup2(writeA.fileno(), 1) # set write pipe to stdout
os.execl(pathToA)

# Create process B
pidB = os.fork()
if pidB == 0: # child
os.dup2(readB.fileno(), 0) # set read pipe to stdin
os.dup2(writeB.fileno(), 1) # set write pipe to stdout
os.execl(pathToB)

# Close file descriptors in the parent; it doesn't need them anymore
readA.close(); writeB.close(); readB.close(); writeA.close()

# Wait for B to terminate
os.waitpid(pidB, 0)

# Kill A (even if it has finished already... this might need a try/except)
os.kill(pidA, signal.SIGTERM)

# Wait for A (to avoid zombies)
os.waitpid(pidA, 0)

I've probably stuffed up some details, but I'm pretty sure that that's the
basic idea.

-Andrew.
 
D

Donn Cave

Quoth Andrew Bennetts <[email protected]>:
| On Sat, Oct 25, 2003 at 08:16:33PM +0200, Michael Lehmeier wrote:
....
| > So the basic problem is:
| > - create two processes A and B
| > - pipe from A to B
| > - terminate A when B ends
|
| I think you can do something like (untested):

Well, this is really more than he asked for. All we need here is
the same thing as the shell does with 'a | b', but instead of forking
a from b as the shell would normally do it, both processes need to
be children of the Python program so that it can wait for either.
[More comments interleaved.]

| import os, signal
|
| pathToA = '/usr/bin/A'
| pathToB = '/usr/bin/B'
|
| # Create pipes
| readA, writeB = os.pipe()
| readB, writeA = os.pipe()

.... We only need one pipe here, the second one.

| # Create process A
| pidA = os.fork()
| if pidA == 0: # child
| os.dup2(readA.fileno(), 0) # set read pipe to stdin
.... omit the above line.
| os.dup2(writeA.fileno(), 1) # set write pipe to stdout
.... readA and writeA will be integer unit numbers, so omit ".fileno()"

| os.execl(pathToA)
.... os.execl(pathToA, pathToA)

.... Better enclose the whole child fork's Python code in try/finally,
.... with os._exit(113) in the finally block (where 113 is some distinctive
.... number.) Otherwise exceptions will branch back out of this block into
.... code that you intended for the parent.

| # Create process B
| pidB = os.fork()
| if pidB == 0: # child
| os.dup2(readB.fileno(), 0) # set read pipe to stdin
| os.dup2(writeB.fileno(), 1) # set write pipe to stdout
.... omit above line, and see above for same comments.

| os.execl(pathToB)
|
| # Close file descriptors in the parent; it doesn't need them anymore
| readA.close(); writeB.close(); readB.close(); writeA.close()
.... os.close(readB), os.close(writeA)

| # Wait for B to terminate
| os.waitpid(pidB, 0)

OK, this is where the fun starts. I understood the problem to be
that B may fail to exit when A exits, so I think we really want
to wait for A. Or if the converse may also happen, then we need
to wait for either and then dispatch the other. In any case I think
this is not too hard to figure out. The most useful trick here is
the os.WNOHANG flag to waitpid, which will allow one or more waits
without blocking to see if B is really going to exit on its own.
You don't want to kill it unless you're fairly sure it's necessary,
assuming it's doing something useful enough to justify running it
in the first place. kill should also be surrounded with try/except,
because on some platforms it may be an error to kill a process that
has exited even if it still hasn't been reaped.

| # Kill A (even if it has finished already... this might need a try/except)
| os.kill(pidA, signal.SIGTERM)
|
| # Wait for A (to avoid zombies)
| os.waitpid(pidA, 0)
|
| I've probably stuffed up some details, but I'm pretty sure that that's the
| basic idea.

Well, it's not much worse than the one that proposed a thread.

The bi-directional pipes you set up there can be a good thing, in
a case where that's what you need, but even then they're extremely
brittle, because pipes have a fixed, limited buffer size and because
C library I/O (including Python's fileobject) employs process internal
block buffering when writing to pipes. The former means the pipe can
fill up when the reading process is dilatory, the latter means the
pipe may be empty at a point where the writing process has logically
written to it. Often enough you find both conditions together.

Donn Cave, (e-mail address removed)
 
A

Andrew Bennetts

Quoth Andrew Bennetts <[email protected]>:
| On Sat, Oct 25, 2003 at 08:16:33PM +0200, Michael Lehmeier wrote:
...
| > So the basic problem is:
| > - create two processes A and B
| > - pipe from A to B
| > - terminate A when B ends
|
| I think you can do something like (untested):

Well, this is really more than he asked for. All we need here is
the same thing as the shell does with 'a | b', but instead of forking
a from b as the shell would normally do it, both processes need to
be children of the Python program so that it can wait for either.
[More comments interleaved.]

| import os, signal
|
| pathToA = '/usr/bin/A'
| pathToB = '/usr/bin/B'
|
| # Create pipes
| readA, writeB = os.pipe()
| readB, writeA = os.pipe()

... We only need one pipe here, the second one.

For his application, where the data is only flowing one-way, that's true. I
was being unnecessarily general here.
| # Create process A
| pidA = os.fork()
| if pidA == 0: # child
| os.dup2(readA.fileno(), 0) # set read pipe to stdin
... omit the above line.
| os.dup2(writeA.fileno(), 1) # set write pipe to stdout
... readA and writeA will be integer unit numbers, so omit ".fileno()"

Oops, yes. In my haste I assumed os.pipe() returned file objects rather
than raw file descriptors.

Also, ideally you'd close all other file descriptors in the child process
apart from these pipes, to avoid allowing the child processes to muck with
files that the parent has open.
| os.execl(pathToA)
... os.execl(pathToA, pathToA)

Ah, good point.
... Better enclose the whole child fork's Python code in try/finally,
... with os._exit(113) in the finally block (where 113 is some distinctive
... number.) Otherwise exceptions will branch back out of this block into
... code that you intended for the parent.

Yeah, that's safer, although I can't think of a reason why an exception
would be raised here (but better safe than sorry).

For that matter, a call to "sys.settrace(None)" in the child is probably a
good idea, in case he ever tries to step through the code with pdb...
| # Wait for B to terminate
| os.waitpid(pidB, 0)

OK, this is where the fun starts. I understood the problem to be
that B may fail to exit when A exits, so I think we really want
to wait for A. Or if the converse may also happen, then we need
to wait for either and then dispatch the other. In any case I think
this is not too hard to figure out. The most useful trick here is
the os.WNOHANG flag to waitpid, which will allow one or more waits
without blocking to see if B is really going to exit on its own.
You don't want to kill it unless you're fairly sure it's necessary,
assuming it's doing something useful enough to justify running it
in the first place. kill should also be surrounded with try/except,
because on some platforms it may be an error to kill a process that
has exited even if it still hasn't been reaped.

Well, if these are the only child processes his program spawns, he can
afford to just use os.wait, e.g. something like:

pid, status = os.wait()
if pid == pidA:
otherpid = pidB
elif pid == pidB:
otherpid = pidA
else:
assert 0, "This isn't supposed to happen"

try:
os.kill(pidB, signal.SIGTERM)
except OSError:
# Already dead, it seems
pass

# Make sure to reap both children
os.waitpid(otherpid, 0)

But polling using the os.WNOHANG flag would work too, although polling
always feels less elegant to me.
| I've probably stuffed up some details, but I'm pretty sure that that's the
| basic idea.

Well, it's not much worse than the one that proposed a thread.
:)

The bi-directional pipes you set up there can be a good thing, in
a case where that's what you need, but even then they're extremely
brittle, because pipes have a fixed, limited buffer size and because
C library I/O (including Python's fileobject) employs process internal
block buffering when writing to pipes. The former means the pipe can
fill up when the reading process is dilatory, the latter means the
pipe may be empty at a point where the writing process has logically
written to it. Often enough you find both conditions together.

If buffering is a problem, the processes comminicating via pipes are welcome
to call fflush() or change their I/O library's buffer settings as needed.
This isn't significantly different to the problems you can encounter with
TCP sockets, unless I'm misunderstanding you.

-Andrew.
 
A

Anthony Briggs

The other, quick and dirty way to do it (depending on disk space, and
whether it's time critical) is just to buffer it into a temporary
file. So, execute two commands, os.system("A > tempfile") and
os.system("cat tempfile | B"), and check to see whether either one of
them returns 256.

Anthony
 
M

Michael Lehmeier

Quoth Andrew Bennetts <[email protected]>:

| # Wait for B to terminate
| os.waitpid(pidB, 0)

OK, this is where the fun starts. I understood the problem to be
that B may fail to exit when A exits, so I think we really want
to wait for A. Or if the converse may also happen, then we need
to wait for either and then dispatch the other. In any case I think
this is not too hard to figure out. The most useful trick here is
the os.WNOHANG flag to waitpid, which will allow one or more waits
without blocking to see if B is really going to exit on its own.

This is what I have here now:

# Wait for B to terminate
os.waitpid(pidB, os.WNOHANG)

# Kill A (even if it has finished already... this might need a
# try/except)
print "Terminating A"
os.kill(pidA, signal.SIGKILL)

# Wait for A (to avoid zombies)
os.waitpid(pidA, os.WNOHANG)

print "Terminating B"
os.kill(pidA, signal.SIGKILL)

waitpid doesn't wait for anything here.
The os.kill are called immediately after start and ends the script.

If I understand it correctly, since B or A seem to take some time
starting up, waitpid ignores them because of WNOHANG.
That makes this waitpid pretty useless, doesn't it?
And even if, if A terminates sooner than B, wouldn't the script hang
then?

What about the Popen3 approach?
Wouldn't it be cleaner (once I get it running)?

Thanks so far.
 
D

Donn Cave

Quoth Michael Lehmeier <[email protected]>:
....
| This is what I have here now:
|
| # Wait for B to terminate
| os.waitpid(pidB, os.WNOHANG)
|
| # Kill A (even if it has finished already... this might need a
| # try/except)
| print "Terminating A"
| os.kill(pidA, signal.SIGKILL)
|
| # Wait for A (to avoid zombies)
| os.waitpid(pidA, os.WNOHANG)
|
| print "Terminating B"
| os.kill(pidA, signal.SIGKILL)

I may be losing track of where we are. The way I remember it, you
expect A to complete on its own, and the objective was to kill B
then, if necessary. (The ideal solution, of course, would be to fix
B so that doesn't need that.)

But that's not at all what you're doing here. In fact you never do
kill B, though it appears that in the last line you meant to.

I suggested that you wait for B, with WNOHANG, because you can expect
a brief lag between A's exit and B's exit, if B will exit as it should
when it detects end of file on the input unit. There are tricky ways
to use extraneous pipes with select for things like this, but the simple
route is to check (hence WNOHANG), sleep a little, check, sleep, etc.
for as long as you think is appropriate. Then kill him.

But first, wait for A, without WNOHANG.

| If I understand it correctly, since B or A seem to take some time
| starting up, waitpid ignores them because of WNOHANG.
| That makes this waitpid pretty useless, doesn't it?

Yes. This will all be pretty easy to write if you think about what
you want it to do. It may help to draw a diagram of the course of
A and B's lifetimes, as you expect them to work, and then write the
code that follows from that.

| What about the Popen3 approach?
| Wouldn't it be cleaner (once I get it running)?

I'm sorry, I didn't look very hard at your Popen3 post, but I assume
you mean to have the Python parent process shovel data between A and B.
This doubles the I/O, when the parent could be sleeping quietly and
just waiting for the process to exit. By the time you get it working,
I find it hard to imagine that it will be cleaner by any standard, but
then I can't be sure what you mean by clean.

Donn Cave, (e-mail address removed)
 
D

Donn Cave

Quoth Andrew Bennetts <[email protected]>:
....
| Also, ideally you'd close all other file descriptors in the child process
| apart from these pipes, to avoid allowing the child processes to muck with
| files that the parent has open.

Sure, popen() does that. For me, that's more to avoid keeping a file
descriptor open past its normal lifetime. UNIX file descriptors are
an interesting parallel to Python's object system, and this is a lot
like a reference leak - you may expect some important finalization to
run on collection of the object, or just reclaim its memory, so you
don't want to see other objects get accidental references to it that
will keep it alive. The resource is file descriptors instead of memory,
the finalization may be end of file seen by another process, deletion
of the file inode where no directory entry remains, etc.

| >
| > | os.execl(pathToA)
| > ... os.execl(pathToA, pathToA)
|
| Ah, good point.
|
| > ... Better enclose the whole child fork's Python code in try/finally,
| > ... with os._exit(113) in the finally block (where 113 is some distinctive
| > ... number.) Otherwise exceptions will branch back out of this block into
| > ... code that you intended for the parent.
|
| Yeah, that's safer, although I can't think of a reason why an exception
| would be raised here (but better safe than sorry).

Eh, you mean because I fixed the one that would have been raised?
I often fail to think in advance of the reasons why my code will
raise an exception.

....
| Well, if these are the only child processes his program spawns, he can
| afford to just use os.wait, e.g. something like:
|
| pid, status = os.wait()
| if pid == pidA:
| otherpid = pidB
| elif pid == pidB:
| otherpid = pidA
| else:
| assert 0, "This isn't supposed to happen"
|
| try:
| os.kill(pidB, signal.SIGTERM)
| except OSError:
| # Already dead, it seems
| pass
|
| # Make sure to reap both children
| os.waitpid(otherpid, 0)
|
| But polling using the os.WNOHANG flag would work too, although polling
| always feels less elegant to me.

Get over it. Killing a process that's on its way to exiting normally
because you find polling distasteful, gives elegance a bad name. The
point I guess I needed to make more explicitly is that along with the
poll, you need a delay to give B time to exit.

[... re deadlock potential in 2 way piping ]

| If buffering is a problem, the processes comminicating via pipes are welcome
| to call fflush() or change their I/O library's buffer settings as needed.
| This isn't significantly different to the problems you can encounter with
| TCP sockets, unless I'm misunderstanding you.

The reason it's a common problem is that it normally works through
the standard UNIX in/out/err system, which is predicated on disk files
or a simple producer/consumer pipe line. That assumption is reflected
in how the applications behave, and it's relatively unusual to be able
to do anything about that.

Donn Cave, (e-mail address removed)
 
A

Andrew Bennetts

[... re deadlock potential in 2 way piping ]

| If buffering is a problem, the processes comminicating via pipes are welcome
| to call fflush() or change their I/O library's buffer settings as needed.
| This isn't significantly different to the problems you can encounter with
| TCP sockets, unless I'm misunderstanding you.

The reason it's a common problem is that it normally works through
the standard UNIX in/out/err system, which is predicated on disk files
or a simple producer/consumer pipe line. That assumption is reflected
in how the applications behave, and it's relatively unusual to be able
to do anything about that.

I'm not sure what you mean... is it "applications that aren't designed for 2
way piping don't handle it reliably"?

-Andrew.
 
D

Donn Cave

Quoth Andrew Bennetts <[email protected]>:
| On Sun, Oct 26, 2003 at 07:44:27PM -0000, Donn Cave wrote:
|> [... re deadlock potential in 2 way piping ]
|>
|>| If buffering is a problem, the processes comminicating via pipes are welcome
|>| to call fflush() or change their I/O library's buffer settings as needed.
|>| This isn't significantly different to the problems you can encounter with
|>| TCP sockets, unless I'm misunderstanding you.
|>
|> The reason it's a common problem is that it normally works through
|> the standard UNIX in/out/err system, which is predicated on disk files
|> or a simple producer/consumer pipe line. That assumption is reflected
|> in how the applications behave, and it's relatively unusual to be able
|> to do anything about that.
|
| I'm not sure what you mean... is it "applications that aren't designed for 2
| way piping don't handle it reliably"?

Pretty much. Normally optimal behavior is bad for 2-way pipes.

Donn Cave, (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top