Interprocess communication and memory mapping

Discussion in 'Python' started by James Aguilar, Dec 14, 2005.

  1. Oh wise readers of comp.lang.python,

    Lend a newbie your ears. I have read several old articles from this
    group about memory mapping and interprocess communication and have
    Googled the sh** out of the internet, but have not found sufficient to
    answer my questions.

    Suppose that I am writing a ray tracer in Python. Well, perhaps not a
    ray tracer. Suppose that I am writing a ray tracer that has to update
    sixty times a second (Ignore for now that this is impossible and silly.
    Ignore also that one would probably not choose Python to do such a
    thing.). Ray tracing, as some of you may know, is an inherently
    parallelizable task. Hence, I would love to split the task across my
    quad-core CPU (Ignore also that such things do not exist yet.).
    Because of GIL, I need all of my work to be done in separate processes.

    My vision for this is that I would create a controller process and read
    in the data (The lights, the matrices describing all of the objects,
    and their colors.), putting it into a memory mapped file. Then I would
    create a single child process for each CPU and assign them each a range
    of pixels to work on. I would say GO and they would return the results
    of their computations by placing them in an array in the memory mapped
    file, which, when completed, the parent process would pump out to the
    frame buffer. Meanwhile, the parent process is collecting changes from
    whatever is controlling the state of the world. As soon as the picture
    is finished, the parent process adjusts the data in the file to reflect
    the new state of the world, and tells the child processes to go again,
    etc.

    So, I have a couple of questions:

    * Is there any way to have Python objects (Such as a light or a color)
    put themselves into a byte array and then pull themselves out of the
    same array without any extra work? If each of the children had to load
    all of the values from the array, we would probably lose much of the
    benefit of doing things this way. What I mean to say is, can I say to
    Python, "Interpret this range of bytes as a Light object, interpret
    this range of bytes as a Matrix, etc." This is roughly equivalent to
    simply static_casting a void * to an object type in C++.

    * Are memory mapped files fast enough to do something like this? The
    whole idea is that I would avoid the cost of having the whole world
    loaded into memory in every single process. With threads, this is not
    a problem -- what I am trying to do is figure out the Pythonic way to
    work around the impossibility of using more than one processor because
    of the GIL.

    * Are pipes a better idea? If so, how do I avoid the problem of
    wasting extra memory by having all of the children processes hold all
    of the data in memory as well?

    * Are there any other shared memory models that would work for this
    task?

    OK, I think that is enough. I look forward eagerly to your replies!

    Yours,

    James Aguilar
    James Aguilar, Dec 14, 2005
    #1
    1. Advertising

  2. James  Aguilar

    Paul Boddie Guest

    James Aguilar wrote:
    > Suppose that I am writing a ray tracer in Python. Well, perhaps not a
    > ray tracer. Suppose that I am writing a ray tracer that has to update
    > sixty times a second (Ignore for now that this is impossible and silly.
    > Ignore also that one would probably not choose Python to do such a
    > thing.).


    Someone doesn't agree with you there... ;-)

    http://www.pawfal.org/index.php?page=PyGmy

    > Ray tracing, as some of you may know, is an inherently parallelizable task.
    > Hence, I would love to split the task across my quad-core CPU (Ignore also that
    > such things do not exist yet.). Because of GIL, I need all of my work to be done in
    > separate processes.


    Right. I suppose that you could just use the existing parallel
    processing mechanisms for which Python interfaces exist. However, much
    has been said about making multicore parallelism more accessible to the
    average thread programmer, although much of that was said on the
    python-dev mailing list [1], presumably because those doing most of the
    talking clearly don't think of discussing such issues with the wider
    community (and probably wanted to petition for core language changes as
    well).

    [...]

    > * Is there any way to have Python objects (Such as a light or a color)
    > put themselves into a byte array and then pull themselves out of the
    > same array without any extra work?


    Unless you mean something very special about "extra work", I would have
    thought that the pickle module would cover this need.

    [Other interesting questions about memory mapped files, pipes, shared
    memory.]

    My idea was to attempt to make use of existing multiprocessing
    mechanisms, putting communications facilities on top. I don't know how
    feasible or interesting that is, but what I wanted to do with the
    pprocess module [2] was to develop an API using the POSIX fork system
    call which resembled existing APIs for threading and communications. My
    reasoning is that, as far as I know/remember, fork in modern POSIX
    systems lets processes share read-only data - so like multithreaded
    programs, each process shares the "context" of a computation with the
    other computation units - whilst any modified data is held only by the
    modifying process. With the supposed process migration capabilities of
    certain operating systems, it should be possible to distribute
    processes across CPUs and even computation nodes.

    The only drawback is that one cannot, in a scheme as described above,
    transparently modify global variables in order to communicate with
    other processes. However, I consider it somewhat more desirable to
    provide explicit communications channels for such communications, and
    it is arguably a matter of taste as to how one then uses those
    channels: either by explicitly manipulating channel objects, like
    streams, or by wrapping them in such a way that a distributed
    computation just looks like a normal function invocation.

    Anyway, I don't have any formal experience in multiprocessing or any
    multiprocessor/multicore environments available to me, so what I've
    written may be somewhat naive, but should anything like it be workable,
    it'd be a gentler path to parallelism than hacking Python's runtime to
    remove the global interpreter lock.

    Paul

    [1]
    http://mail.python.org/pipermail/python-dev/2005-September/056801.html
    [2] http://www.python.org/pypi/parallel
    Paul Boddie, Dec 15, 2005
    #2
    1. Advertising

  3. James  Aguilar

    Donn Cave Guest

    In article <>,
    "James Aguilar" <> wrote:
    ....
    > So, I have a couple of questions:
    >
    > * Is there any way to have Python objects (Such as a light or a color)
    > put themselves into a byte array and then pull themselves out of the
    > same array without any extra work? If each of the children had to load
    > all of the values from the array, we would probably lose much of the
    > benefit of doing things this way. What I mean to say is, can I say to
    > Python, "Interpret this range of bytes as a Light object, interpret
    > this range of bytes as a Matrix, etc." This is roughly equivalent to
    > simply static_casting a void * to an object type in C++.


    Not exactly. One basic issue is that a significant amount of the
    storage associated with a light or a color is going to be "overhead"
    specific to the interpreter process image, and not shareable. A
    Python process would not be able to simply acquire a lot of objects
    by mapping a memory region.

    However, if you're ready to go to the trouble to implement your
    data types in C, then you can do the (void *) thing with their data,
    and then these objects would automatically have the current value
    of the data at that address. I'm not saying this is a really good
    idea, but right off hand it seems technically possible. The simplest
    thing might be to copy the array module and make a new type that
    works just like it but borrows its storage instead of allocating it.
    That would be expedient, maybe not as fast because each access to
    the data comes at the expense of creating an object.

    > * Are memory mapped files fast enough to do something like this?


    Shared memory is pretty fast.

    > * Are pipes a better idea? If so, how do I avoid the problem of
    > wasting extra memory by having all of the children processes hold all
    > of the data in memory as well?


    Pipes might likely be a better idea, but a lot depends on the design.

    Donn Cave,
    Donn Cave, Dec 15, 2005
    #3
  4. Paul

    This is pretty useful for me. Appreciate it! My whole point is not
    that I actually want to do this, but that I want to make sure that
    Python is powerful enough to handle this kind of thing before I really
    invest myself deeply into learning and using it. I do believe that
    parallel computing is in my future, one way or another, so I want to
    make sure it's possible to use python to do that well and efficiently.

    - James Aguilar
    Aguilar, James, Dec 20, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Swapnajit Mittra
    Replies:
    0
    Views:
    442
    Swapnajit Mittra
    Dec 21, 2004
  2. Dave Bartlett

    newbie question: interprocess communication

    Dave Bartlett, May 13, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    492
    DalePres
    May 13, 2004
  3. Michael Butscher
    Replies:
    7
    Views:
    342
    Lawrence D'Oliveiro
    Jul 1, 2006
  4. John Nagle
    Replies:
    9
    Views:
    1,573
    Paul Boddie
    Jan 19, 2008
  5. Tom Bates
    Replies:
    0
    Views:
    118
    Tom Bates
    Jun 4, 2005
Loading...

Share This Page