Shared memory python between two separate shell-launched processes

Charles Fox (Sheffield) · Feb 10, 2011

Hi guys,
I'm working on debugging a large python simulation which begins by
preloading a huge cache of data. I want to step through code on many
runs to do the debugging. Problem is that it takes 20 seconds to
load the cache at each launch. (Cache is a dict in a 200Mb cPickle
binary file).

So speed up the compile-test cycle I'm thinking about running a
completely separate process (not a fork, but a processed launched form
a different terminal) that can load the cache once then dunk it in an
area of shareed memory. Each time I debug the main program, it can
start up quickly and read from the shared memory instead of loading
the cache itself.

But when I look at posix_ipc and POSH it looks like you have to fork
the second process from the first one, rather than access the shared
memory though a key ID as in standard C unix shared memory. Am I
missing something? Are there any other ways to do this?

thanks,
Charles

Jean-Paul Calderone · Feb 10, 2011

Hi guys,
I'm working on debugging a large python simulation which begins by
preloading a huge cache of data. I want to step through code on many
runs to do the debugging. Problem is that it takes 20 seconds to
load the cache at each launch. (Cache is a dict in a 200Mb cPickle
binary file).

So speed up the compile-test cycle I'm thinking about running a
completely separate process (not a fork, but a processed launched form
a different terminal)

Why _not_ fork? Load up your data, then go into a loop forking and
loading/
running the rest of your code in the child. This should be really
easy to
implement compared to doing something with shared memory, and solves
the
problem you're trying to solve of long startup time just as well. It
also
protects you from possible bugs where the data gets corrupted by the
code
that operates on it, since there's only one copy shared amongst all
your
tests. Is there some other benefit that the shared memory approach
gives
you?

Of course, adding unit tests that exercise your code on a smaller data
set
might also be a way to speed up development.

Jean-Paul

Charles Fox (Sheffield) · Feb 10, 2011

Why _not_ fork? Load up your data, then go into a loop forking and
loading/
running the rest of your code in the child. This should be really
easy to
implement compared to doing something with shared memory, and solves
the
problem you're trying to solve of long startup time just as well. It
also
protects you from possible bugs where the data gets corrupted by the
code
that operates on it, since there's only one copy shared amongst all
your
tests. Is there some other benefit that the shared memory approach
gives
you?

Of course, adding unit tests that exercise your code on a smaller data
set
might also be a way to speed up development.

Jean-Paul

Thanks Jean-Paul, I'll have a think about this. I'm not sure if it
will get me exactly what I want though, as I would need to keep
unloading my development module and reloading it, all within the
forked process, and I don't see how my debugger (and emacs pdb
tracking) will keep up with that to let me step though the code.
(this debugging is more about integration issues than single
functions, I have a bunch of unit tests for the little bits but
something is unhappy when I put them all together...)

(I also had a reply by email, suggesting I use /dev/shm to store the
data instead of the hard disc; this speeds things up a little but not
much as the data still has to be transferred in bulk into my
process. Unless I'm missing something and my process can just access
the data in that shm without having to load its own copy?)

Jean-Paul Calderone · Feb 10, 2011

Thanks Jean-Paul, I'll have a think about this. I'm not sure if it
will get me exactly what I want though, as I would need to keep
unloading my development module and reloading it, all within the
forked process, and I don't see how my debugger (and emacs pdb
tracking) will keep up with that to let me step though the code.

Not really. Don't load your code at all in the parent. Then there's
nothing to unload in each child process, just some code to load for
the very first time ever (as far as that process is concerned).

Jean-Paul

Charles Fox (Sheffield) · Feb 11, 2011

Not really. Don't load your code at all in the parent. Then there's
nothing to unload in each child process, just some code to load for
the very first time ever (as far as that process is concerned).

Jean-Paul

Jean, sorry I'm still not sure what you mean, could you give a couple
of lines of pseudocode to illustrate it? And explain how my emacs
pdbtrack would still be able to pick it up?
thanks,
charles

Adam Skutt · Feb 11, 2011

But when I look at posix_ipc and POSH it looks like you have to fork
the second process from the first one, rather than access the shared
memory though a key ID as in standard C unix shared memory. Am I
missing something? Are there any other ways to do this?

I don't see what would have given you that impression at all, at least
with posix_ipc. It's a straight wrapper on the POSIX shared memory
functions, which can be used across processes when used correctly.
Even if for some reason that implementation lacks the right stuff,
there's always SysV IPC.

Open the same segment in both processes, use mmap, and go. What will
be tricky, of course, is binding to wherever you stuffed the
dictionary. I think ctypes.cast() can do what you need, but I've
never done it before.

Also, just FYI, there is no such thing as "standard C unix shared
memory". There are at least three different relatively widely-
supported techniques: SysV, (anonymous) mmap, and POSIX Realtime
Shared Memory (which normally involves mmap). All three are
standardized by the Open Group, and none of the three are implemented
with perfect consistency across Unicies.

Adam

Jean-Paul Calderone · Feb 11, 2011

Jean, sorry I'm still not sure what you mean, could you give a couple
of lines of pseudocode to illustrate it? And explain how my emacs
pdbtrack would still be able to pick it up?
thanks,
charles

import os
import loader
data = loader.preload()

while True:
pid = os.fork()
if pid == 0:
import program
program.main(data)
else:
os.waitpid(pid, 0)

But I won't actually try to predict how this is going to interact with
emacs pdbtrack.

Jean-Paul

Philip · Feb 11, 2011

But when I look at posix_ipc and POSH it looks like you have to fork
the second process from the first one, rather than access the shared
memory though a key ID as in standard C unix shared memory. Am I
missing something? Are there any other ways to do this?

Click to expand...

I don't see what would have given you that impression at all, at least
with posix_ipc. It's a straight wrapper on the POSIX shared memory
functions, which can be used across processes when used correctly.
Even if for some reason that implementation lacks the right stuff,
there's always SysV IPC.
[some stuff snipped]
Also, just FYI, there is no such thing as "standard C unix shared
memory". There are at least three different relatively widely-
supported techniques: SysV, (anonymous) mmap, and POSIX Realtime
Shared Memory (which normally involves mmap). All three are
standardized by the Open Group, and none of the three are implemented
with perfect consistency across Unicies.

Adam is 100% correct. posix_ipc doesn't require fork.

@the OP: Charles, since you refer to "standard" shared memory as being
referred to by a key, it sounds like you're thinking of SysV shared
memory. POSIX IPC objects are referred to by a string that looks like
a filename, e.g. "/my_shared_memory".

Note that there's a module called sysv_ipc which is a close cousin of
posix_ipc. I'm the author of both. IMO POSIX is easier to use.

Cheers
Philip

John Nagle · Feb 12, 2011

Thanks Jean-Paul, I'll have a think about this. I'm not sure if it
will get me exactly what I want though, as I would need to keep
unloading my development module and reloading it, all within the
forked process, and I don't see how my debugger (and emacs pdb
tracking) will keep up with that to let me step though the code.
(this debugging is more about integration issues than single
functions, I have a bunch of unit tests for the little bits but
something is unhappy when I put them all together...)

If you're having trouble debugging a sequential program,
you do not want to add shared memory to the problem.

John Nagle

Adam Skutt · Feb 12, 2011

If you're having trouble debugging a sequential program,
you do not want to add shared memory to the problem.

No, you don't want to add additional concurrent threads of execution.
But he's not doing that, he's just preloading stuff into RAM. It's
not different from using memcached, or sqlite, or any other database
for that matter.

Adam

FW: Using Python to shared memory resources between Linux and Windows	0	Aug 26, 2008
Sharing Data in Python	1	May 11, 2012
Python Memory Usage	1	Jun 20, 2007
run function in separate process	7	Apr 11, 2007
Python Scalability TCP Server + Background Game	7	Jan 15, 2014
stl container within shared memory	17	Jul 16, 2007
Shared memory across different worker processes in a web garden	0	Feb 23, 2005
perl and shared memory	0	Apr 6, 2005

Shared memory python between two separate shell-launched processes

Charles Fox (Sheffield)

Jean-Paul Calderone

Charles Fox (Sheffield)

Jean-Paul Calderone

Charles Fox (Sheffield)

Adam Skutt

Jean-Paul Calderone

Philip

John Nagle

Adam Skutt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads