Using "pickle" for interprocess communication - some notes and thingsthat ought to be documented.

J

John Nagle

It's possible to use "pickle" for interprocess communication over
pipes, but it's not straightforward.

First, "pickle" output is self-delimiting.
Each dump ends with ".", and, importantly, "load" doesn't read
any characters after the "." So "pickle" can be used repeatedly
on the same pipe, and one can do repeated message-passing this way. This
is a useful, but undocumented, feature.

It almost works.

Pickle's "dump" function doesn't flush output after dumping, so
there's still some data left to be written. The sender has to
flush the underlying output stream after each call to "dump",
or the receiver will stall. The "dump" function probably ought to flush
its output file.

It's also necessary to call Pickle's "clear_memo" before each "dump"
call, since objects might change between successive "dump" calls.
"Unpickle" doesn't have a "clear_memo" function. It should, because
if you keep reusing the "Unpickle" object, the memo dictionary
fills up with old objects which can't be garbage collected.
This creates a memory leak in long-running programs.

Then, on Windows, there's a CR LF problem. This can be fixed by
launching the subprocess with

proc = subprocess.Popen(launchargs,
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
universal_newlines=True)

Failure to do this produces the useful error message "Insecure string pickle".
Binary "pickle" protocol modes won't work at all in this situation; "universal
newline" translation is compatible, not transparent. On Unix/Linux, this
just works, but the code isn't portable.

Incidentally, in the subprocess, it's useful to do

sys.stdout = sys.stderr

after setting up the Pickle objects. This prevents any stray print statements
from interfering with the structured Pickle output.

Then there's end of file detection. When "load" reaches an end of
file, it properly raises EOFError. So it's OK to do "load" after
"load" until EOFerror is raised.

"pickle" and "cPickle" seem to be interchangeable in this application,
so that works.

It's a useful way to talk to a subprocess, but you need to know all the
issues above to make it work.

John Nagle
 
J

John Nagle

"Processing" is useful, but it uses named pipes and sockets,
not ordinary pipes. Also, it has C code, so all the usual build
and version problems apply.
So does Pyro: http://pyro.sourceforge.net/

However Pyro uses TCP-IP sockets for communication.

It uses a small header that contains the size of the message and a few
other things, and then the (binary by default) pickle stream.

I'd thought I might have to add another layer of encapsulation to
delimit "pickled" sections, but it turns out that's not necessary.
So it doesn't take much code to do this, and it's all Python.
I may release this little module.

John Nagle
 
J

John Nagle

Another "gotcha". The "pickle" module seems to be OK with the
translations of "universal newlines" on Windows, but the "cPickle" module
is not. If I pickle

Exception("Test")

send it across the Windows pipe to the parent in universal newlines
mode, and read it with cPickle's

load()

function, I get

ImportError: No module named exceptions

If I read it with "pickle"'s "load()", it works. And if I read the input
one character at a time until I see ".", then feed that to cPickle's "loads()",
that works. So cPickle doesn't read the same thing Python does in "universal
newline" mode.

Is there any way within Python to get the pipe from a child process to the
parent to be completely transparent under Windows?

John Nagle
 
C

Carl Banks

It's possible to use "pickle" for interprocess communication over
pipes, but it's not straightforward.

First, "pickle" output is self-delimiting.
Each dump ends with ".", and, importantly, "load" doesn't read
any characters after the "." So "pickle" can be used repeatedly
on the same pipe, and one can do repeated message-passing this way. This
is a useful, but undocumented, feature.

It almost works.

Pickle's "dump" function doesn't flush output after dumping, so
there's still some data left to be written. The sender has to
flush the underlying output stream after each call to "dump",
or the receiver will stall. The "dump" function probably ought to flush
its output file.


But... you can also write multiple pickles to the same file.

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
import cPickle
f = open('xxx.pkl','wb')
cPickle.dump(1,f)
cPickle.dump('hello, world',f)
cPickle.dump([1,2,3,4],f)
f.close()
f = open('xxx.pkl','rb')
cPickle.load(f) 1
cPickle.load(f) 'hello, world'
cPickle.load(f)
[1, 2, 3, 4]

An automatic flush would be very undesirable there. Best to let those
worrying about IPC to flush the output file themselves: which they
ought to be doing regardless (either by explicitly flushing or using
an unbuffered stream).

It's also necessary to call Pickle's "clear_memo" before each "dump"
call, since objects might change between successive "dump" calls.
"Unpickle" doesn't have a "clear_memo" function. It should, because
if you keep reusing the "Unpickle" object, the memo dictionary
fills up with old objects which can't be garbage collected.
This creates a memory leak in long-running programs.

This is all good to know. I agree that this is a good use case for a
clear_memo on a pickle unloader.

Then, on Windows, there's a CR LF problem. This can be fixed by
launching the subprocess with

proc = subprocess.Popen(launchargs,
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
universal_newlines=True)

Failure to do this produces the useful error message "Insecure string pickle".
Binary "pickle" protocol modes won't work at all in this situation; "universal
newline" translation is compatible, not transparent. On Unix/Linux, this
just works, but the code isn't portable.

I would think a better solution would be to use the -u switch to
launch the subprocess, or the PYTHONUNBUFFERED environment variable if
you want to invoke the Python script directly. It opens up stdin and
stdout in binary, unbuffered mode.

Using "univeral newlines" in a non-text format seems like it's not a
good idea.

For text-format pickles it'd be the right thing, of course.

Incidentally, in the subprocess, it's useful to do

sys.stdout = sys.stderr

after setting up the Pickle objects. This prevents any stray print statements
from interfering with the structured Pickle output.

Nice idea.

Then there's end of file detection. When "load" reaches an end of
file, it properly raises EOFError. So it's OK to do "load" after
"load" until EOFerror is raised.

"pickle" and "cPickle" seem to be interchangeable in this application,
so that works.

It's a useful way to talk to a subprocess, but you need to know all the
issues above to make it work.

Thanks: this was an informative post


Carl Banks
 
J

John Nagle

Carl said:
This is all good to know. I agree that this is a good use case for a
clear_memo on a pickle unloader.

reader = pickle.Unpickler(self.datain) # set up reader
....
reader.memo = {} # no memory from cycle to cycle
I would think a better solution would be to use the -u switch to
launch the subprocess, or the PYTHONUNBUFFERED environment variable if
you want to invoke the Python script directly. It opens up stdin and
stdout in binary, unbuffered mode.

Ah. That works. I wasn't aware that "unbuffered" mode also implied
binary transparency. I did that, and now cPickle works in both text (0)
and binary (2) protocol modes. Turned off "Universal Newline" mode.
Thanks: this was an informative post

Thanks. We have this working well now. After a while, I'll publish
the module, which is called "subprocesscall.py".

John Nagle
 
P

Paul Boddie

"Processing" is useful, but it uses named pipes and sockets,
not ordinary pipes. Also, it has C code, so all the usual build
and version problems apply.

The pprocess module uses pickles over sockets, mostly because the
asynchronous aspects of the communication only appear to work reliably
with sockets. See here for the code:

http://www.python.org/pypi/pprocess

Unlike your approach, pprocess employs the fork system call. In
another project of mine - jailtools - I use some of the pprocess
functionality with the subprocess module:

http://www.python.org/pypi/jailtools

I seem to recall that a few things are necessary when dealing with
subprocesses, especially those which employ the python executable:
running in unbuffered mode is one of those things.

Paul
 
J

John Nagle

Paul said:
Unlike your approach, pprocess employs the fork system call.

Unfortunately, that's not portable. Python's "fork()" is
"Availability: Macintosh, Unix." I would have preferred
to use "fork()".

John Nagle
 
P

Paul Boddie

Unfortunately, that's not portable. Python's "fork()" is
"Availability: Macintosh, Unix." I would have preferred
to use "fork()".

There was a discussion some time ago about providing a fork
implementation on Windows, since Cygwin attempts/attempted to provide
such support [1] and there's a Perl module which pretends to provide
fork (using threads if I recall correctly), but I'm not sure whether
anyone really believed that it was workable. I believe that on modern
releases of Windows it was the ZwCreateProcess function which was
supposed to be usable for this purpose, but you then apparently have
to add a bunch of other things to initialise the new process
appropriately.

Of course, for the purposes of pprocess - providing a multiprocess
solution which should be as easy to use as spawning threads whilst
having some shared, immutable state hanging around that you don't want
to think too hard about - having fork is essential, but if you're
obviously willing to split your program up into different components
then any of the distributed object technologies would be good enough.

Paul

[1] http://www.cygwin.com/ml/cygwin/2002-01/msg01826.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top