Shared memory python between two separate shell-launched processes

  • Thread starter Charles Fox (Sheffield)
  • Start date
C

Charles Fox (Sheffield)

Hi guys,
I'm working on debugging a large python simulation which begins by
preloading a huge cache of data. I want to step through code on many
runs to do the debugging. Problem is that it takes 20 seconds to
load the cache at each launch. (Cache is a dict in a 200Mb cPickle
binary file).

So speed up the compile-test cycle I'm thinking about running a
completely separate process (not a fork, but a processed launched form
a different terminal) that can load the cache once then dunk it in an
area of shareed memory. Each time I debug the main program, it can
start up quickly and read from the shared memory instead of loading
the cache itself.

But when I look at posix_ipc and POSH it looks like you have to fork
the second process from the first one, rather than access the shared
memory though a key ID as in standard C unix shared memory. Am I
missing something? Are there any other ways to do this?

thanks,
Charles
 
J

Jean-Paul Calderone

Hi guys,
I'm working on debugging a large python simulation which begins by
preloading a huge cache of data.  I want to step through code on many
runs to do the debugging.   Problem is that it takes 20 seconds to
load the cache at each launch.  (Cache is a dict in a 200Mb cPickle
binary file).

So speed up the compile-test cycle I'm thinking about running a
completely separate process (not a fork, but a processed launched form
a different terminal)

Why _not_ fork? Load up your data, then go into a loop forking and
loading/
running the rest of your code in the child. This should be really
easy to
implement compared to doing something with shared memory, and solves
the
problem you're trying to solve of long startup time just as well. It
also
protects you from possible bugs where the data gets corrupted by the
code
that operates on it, since there's only one copy shared amongst all
your
tests. Is there some other benefit that the shared memory approach
gives
you?

Of course, adding unit tests that exercise your code on a smaller data
set
might also be a way to speed up development.

Jean-Paul
 
C

Charles Fox (Sheffield)

Why _not_ fork?  Load up your data, then go into a loop forking and
loading/
running the rest of your code in the child.  This should be really
easy to
implement compared to doing something with shared memory, and solves
the
problem you're trying to solve of long startup time just as well.  It
also
protects you from possible bugs where the data gets corrupted by the
code
that operates on it, since there's only one copy shared amongst all
your
tests.  Is there some other benefit that the shared memory approach
gives
you?

Of course, adding unit tests that exercise your code on a smaller data
set
might also be a way to speed up development.

Jean-Paul



Thanks Jean-Paul, I'll have a think about this. I'm not sure if it
will get me exactly what I want though, as I would need to keep
unloading my development module and reloading it, all within the
forked process, and I don't see how my debugger (and emacs pdb
tracking) will keep up with that to let me step though the code.
(this debugging is more about integration issues than single
functions, I have a bunch of unit tests for the little bits but
something is unhappy when I put them all together...)

(I also had a reply by email, suggesting I use /dev/shm to store the
data instead of the hard disc; this speeds things up a little but not
much as the data still has to be transferred in bulk into my
process. Unless I'm missing something and my process can just access
the data in that shm without having to load its own copy?)
 
J

Jean-Paul Calderone

Thanks Jean-Paul, I'll have a think about this.  I'm not sure if it
will get me exactly what I want though, as I would need to keep
unloading my development module and reloading it, all within the
forked process, and I don't see how my debugger (and emacs pdb
tracking) will keep up with that to let me step though the code.

Not really. Don't load your code at all in the parent. Then there's
nothing to unload in each child process, just some code to load for
the very first time ever (as far as that process is concerned).

Jean-Paul
 
C

Charles Fox (Sheffield)

Not really.  Don't load your code at all in the parent.  Then there's
nothing to unload in each child process, just some code to load for
the very first time ever (as far as that process is concerned).

Jean-Paul


Jean, sorry I'm still not sure what you mean, could you give a couple
of lines of pseudocode to illustrate it? And explain how my emacs
pdbtrack would still be able to pick it up?
thanks,
charles
 
A

Adam Skutt

But when I look at posix_ipc and POSH it looks like you have to fork
the second process from the first one, rather than access the shared
memory though a key ID as in standard C unix shared memory.  Am I
missing something?   Are there any other ways to do this?

I don't see what would have given you that impression at all, at least
with posix_ipc. It's a straight wrapper on the POSIX shared memory
functions, which can be used across processes when used correctly.
Even if for some reason that implementation lacks the right stuff,
there's always SysV IPC.

Open the same segment in both processes, use mmap, and go. What will
be tricky, of course, is binding to wherever you stuffed the
dictionary. I think ctypes.cast() can do what you need, but I've
never done it before.

Also, just FYI, there is no such thing as "standard C unix shared
memory". There are at least three different relatively widely-
supported techniques: SysV, (anonymous) mmap, and POSIX Realtime
Shared Memory (which normally involves mmap). All three are
standardized by the Open Group, and none of the three are implemented
with perfect consistency across Unicies.

Adam
 
J

Jean-Paul Calderone

Jean, sorry I'm still not sure what you mean, could you give a couple
of lines of pseudocode to illustrate it?   And explain how my emacs
pdbtrack would still be able to pick it up?
thanks,
charles

import os
import loader
data = loader.preload()

while True:
pid = os.fork()
if pid == 0:
import program
program.main(data)
else:
os.waitpid(pid, 0)

But I won't actually try to predict how this is going to interact with
emacs pdbtrack.

Jean-Paul
 
P

Philip

But when I look at posix_ipc and POSH it looks like you have to fork
the second process from the first one, rather than access the shared
memory though a key ID as in standard C unix shared memory.  Am I
missing something?   Are there any other ways to do this?

I don't see what would have given you that impression at all, at least
with posix_ipc.  It's a straight wrapper on the POSIX shared memory
functions, which can be used across processes when used correctly.
Even if for some reason that implementation lacks the right stuff,
there's always SysV IPC.
[some stuff snipped]
Also, just FYI, there is no such thing as "standard C unix shared
memory".  There are at least three different relatively widely-
supported techniques: SysV, (anonymous) mmap, and POSIX Realtime
Shared Memory (which normally involves mmap).  All three are
standardized by the Open Group, and none of the three are implemented
with perfect consistency across Unicies.

Adam is 100% correct. posix_ipc doesn't require fork.

@the OP: Charles, since you refer to "standard" shared memory as being
referred to by a key, it sounds like you're thinking of SysV shared
memory. POSIX IPC objects are referred to by a string that looks like
a filename, e.g. "/my_shared_memory".

Note that there's a module called sysv_ipc which is a close cousin of
posix_ipc. I'm the author of both. IMO POSIX is easier to use.

Cheers
Philip
 
J

John Nagle

Thanks Jean-Paul, I'll have a think about this. I'm not sure if it
will get me exactly what I want though, as I would need to keep
unloading my development module and reloading it, all within the
forked process, and I don't see how my debugger (and emacs pdb
tracking) will keep up with that to let me step though the code.
(this debugging is more about integration issues than single
functions, I have a bunch of unit tests for the little bits but
something is unhappy when I put them all together...)

If you're having trouble debugging a sequential program,
you do not want to add shared memory to the problem.

John Nagle
 
A

Adam Skutt

    If you're having trouble debugging a sequential program,
you do not want to add shared memory to the problem.

No, you don't want to add additional concurrent threads of execution.
But he's not doing that, he's just preloading stuff into RAM. It's
not different from using memcached, or sqlite, or any other database
for that matter.

Adam
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top