Multiprocessing, shared memory vs. pickled copies

Discussion in 'Python' started by John Ladasky, Apr 4, 2011.

  1. John Ladasky

    John Ladasky Guest

    Hi folks,

    I'm developing some custom neural network code. I'm using Python 2.6,
    Numpy 1.5, and Ubuntu Linux 10.10. I have an AMD 1090T six-core CPU,
    and I want to take full advantage of it. I love to hear my CPU fan
    running, and watch my results come back faster.

    When I'm training a neural network, I pass two numpy.ndarray objects
    to a function called evaluate. One array contains the weights for the
    neural network, and the other array contains the input data. The
    evaluate function returns an array of output data.

    I have been playing with multiprocessing for a while now, and I have
    some familiarity with Pool. Apparently, arguments passed to a Pool
    subprocess must be able to be pickled. Pickling is still a pretty
    vague progress to me, but I can see that you have to write custom
    __reduce__ and __setstate__ methods for your objects. An example of
    code which creates a pickle-friendly ndarray subclass is here:

    http://www.mail-archive.com//msg02446.html

    Now, I don't know that I actually HAVE to pass my neural network and
    input data as copies -- they're both READ-ONLY objects for the
    duration of an evaluate function (which can go on for quite a while).
    So, I have also started to investigate shared-memory approaches. I
    don't know how a shared-memory object is referenced by a subprocess
    yet, but presumably you pass a reference to the object, rather than
    the whole object. Also, it appears that subprocesses also acquire a
    temporary lock over a shared memory object, and thus one process may
    well spend time waiting for another (individual CPU caches may
    sidestep this problem?) Anyway, an implementation of a shared-memory
    ndarray is here:

    https://bitbucket.org/cleemesser/numpy-sharedmem/src/3fa526d11578/shmarray.py

    I've added a few lines to this code which allows subclassing the
    shared memory array, which I need (because my neural net objects are
    more than just the array, they also contain meta-data). But I've run
    into some trouble doing the actual sharing part. The shmarray class
    CANNOT be pickled. I think that my understanding of multiprocessing
    needs to evolve beyond the use of Pool, but I'm not sure yet. This
    post suggests as much.

    http://mail.scipy.org/pipermail/scipy-user/2009-February/019696.html

    I don't believe that my questions are specific to numpy, which is why
    I'm posting here, in a more general Python forum.

    When should one pickle and copy? When to implement an object in
    shared memory? Why is pickling apparently such a non-trivial process
    anyway? And, given that multi-core CPU's are apparently here to stay,
    should it be so difficult to make use of them?
    John Ladasky, Apr 4, 2011
    #1
    1. Advertising

  2. On Apr 4, 2011, at 4:20 PM, John Ladasky wrote:

    > I have been playing with multiprocessing for a while now, and I have
    > some familiarity with Pool. Apparently, arguments passed to a Pool
    > subprocess must be able to be pickled.


    Hi John,
    multiprocessing's use of pickle is not limited to Pool. For instance, objects put into a multiprocessing.Queue are also pickled, as are the args to a multiprocessing.Process. So if you're going to use multiprocessing, you're going to use pickle, and you need pickleable objects.


    > Pickling is still a pretty
    > vague progress to me, but I can see that you have to write custom
    > __reduce__ and __setstate__ methods for your objects.


    Well, that's only if one's objects don't support pickle by default. A lot of classes do without any need for custom __reduce__ and __setstate__ methods. Since you're apparently not too familiar with pickle, I don't want you to get the false impression that it's a lot of trouble. I've used pickle a number of times and never had to write custom methods for it.



    > Now, I don't know that I actually HAVE to pass my neural network and
    > input data as copies -- they're both READ-ONLY objects for the
    > duration of an evaluate function (which can go on for quite a while).
    > So, I have also started to investigate shared-memory approaches. I
    > don't know how a shared-memory object is referenced by a subprocess
    > yet, but presumably you pass a reference to the object, rather than
    > the whole object. Also, it appears that subprocesses also acquire a
    > temporary lock over a shared memory object, and thus one process may
    > well spend time waiting for another (individual CPU caches may
    > sidestep this problem?) Anyway, an implementation of a shared-memory
    > ndarray is here:


    There's no standard shared memory implementation for Python. The mmap module is as close as you get. I wrote & support the posix_ipc and sysv_ipc modules which give you IPC primitives (shared memory and semaphores) in Python. They work well (IMHO) but they're *nix-only and much lower level than multiprocessing. If multiprocessing is like a kitchen well stocked with appliances, posix_ipc (and sysc_ipc) is like a box of sharp knives.

    Note that mmap and my IPC modules don't expose Python objects. They expose raw bytes in memory. YOu're still going to have to jump through some hoops (...like pickle) to turn your Python objects into a bytestream and vice versa.


    What might be easier than fooling around with boxes of sharp knives is to convert your ndarray objects to Python lists. Lists are pickle-friendly and easy to turn back into ndarray objects once they've crossed the pickle boundary.


    > When should one pickle and copy? When to implement an object in
    > shared memory? Why is pickling apparently such a non-trivial process
    > anyway? And, given that multi-core CPU's are apparently here to stay,
    > should it be so difficult to make use of them?


    My answers to these questions:

    1) Depends
    2) In Python, almost never unless you're using a nice wrapper like shmarray.py
    3) I don't think it's non-trivial =)
    4) No, definitely not. Python will only get better at working with multiple cores/CPUs, but there's plenty of room for improvement on the status quo.

    Hope this helps
    Philip
    Philip Semanchuk, Apr 5, 2011
    #2
    1. Advertising

  3. John Ladasky

    Robert Kern Guest

    On 4/4/11 3:20 PM, John Ladasky wrote:
    > Hi folks,
    >
    > I'm developing some custom neural network code. I'm using Python 2.6,
    > Numpy 1.5, and Ubuntu Linux 10.10. I have an AMD 1090T six-core CPU,
    > and I want to take full advantage of it. I love to hear my CPU fan
    > running, and watch my results come back faster.


    You will want to ask numpy questions on the numpy mailing list.

    http://www.scipy.org/Mailing_Lists

    > When I'm training a neural network, I pass two numpy.ndarray objects
    > to a function called evaluate. One array contains the weights for the
    > neural network, and the other array contains the input data. The
    > evaluate function returns an array of output data.
    >
    > I have been playing with multiprocessing for a while now, and I have
    > some familiarity with Pool. Apparently, arguments passed to a Pool
    > subprocess must be able to be pickled. Pickling is still a pretty
    > vague progress to me, but I can see that you have to write custom
    > __reduce__ and __setstate__ methods for your objects. An example of
    > code which creates a pickle-friendly ndarray subclass is here:
    >
    > http://www.mail-archive.com//msg02446.html


    Note that numpy arrays are already pickle-friendly. This message is telling you
    how, *if* you are already subclassing, how to make your subclass pickle the
    extra information it holds.

    > Now, I don't know that I actually HAVE to pass my neural network and
    > input data as copies -- they're both READ-ONLY objects for the
    > duration of an evaluate function (which can go on for quite a while).
    > So, I have also started to investigate shared-memory approaches. I
    > don't know how a shared-memory object is referenced by a subprocess
    > yet, but presumably you pass a reference to the object, rather than
    > the whole object. Also, it appears that subprocesses also acquire a
    > temporary lock over a shared memory object, and thus one process may
    > well spend time waiting for another (individual CPU caches may
    > sidestep this problem?) Anyway, an implementation of a shared-memory
    > ndarray is here:
    >
    > https://bitbucket.org/cleemesser/numpy-sharedmem/src/3fa526d11578/shmarray.py
    >
    > I've added a few lines to this code which allows subclassing the
    > shared memory array, which I need (because my neural net objects are
    > more than just the array, they also contain meta-data).


    Honestly, you should avoid subclassing ndarray just to add metadata. It never
    works well. Make a plain class, and keep the arrays as attributes.

    > But I've run
    > into some trouble doing the actual sharing part. The shmarray class
    > CANNOT be pickled.


    Please never just *say* that something doesn't work. Show us what you tried, and
    show us exactly what output you got. I assume you tried something like this:

    [Downloads]$ cat runmp.py
    from multiprocessing import Pool
    import shmarray


    def f(z):
    return z.sum()

    y = shmarray.zeros(10)
    z = shmarray.ones(10)

    p = Pool(2)
    print p.map(f, [y, z])


    And got output like this:

    [Downloads]$ python runmp.py
    Exception in thread Thread-2:
    Traceback (most recent call last):
    File
    "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/threading.py",
    line 530, in __bootstrap_inner
    self.run()
    File
    "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/threading.py",
    line 483, in run
    self.__target(*self.__args, **self.__kwargs)
    File
    "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/multiprocessing/pool.py",
    line 287, in _handle_tasks
    put(task)
    PicklingError: Can't pickle <class
    'multiprocessing.sharedctypes.c_double_Array_10'>: attribute lookup
    multiprocessing.sharedctypes.c_double_Array_10 failed


    Now, the sharedctypes is supposed to implement shared arrays. Underneath, they
    have some dynamically created types like this c_double_Array_10 type.
    multiprocessing has a custom pickler which has a registry of reduction functions
    for types that do not implement a __reduce_ex__() method. For these dynamically
    created types that cannot be imported from a module, this dynamic registry is
    the only way to do it. At least at one point, the Connection objects which
    communicate between processes would use this custom pickler to serialize objects
    to bytes to transmit them.

    However, at least in Python 2.7, multiprocessing seems to have a C extension
    module defining the Connection objects. Unfortunately, it looks like this C
    extension just imports the regular pickler that is not aware of these custom
    types. That's why you get this error. I believe this is a bug in Python.

    So what did you try, and what output did you get? What version of Python are you
    using?

    > I think that my understanding of multiprocessing
    > needs to evolve beyond the use of Pool, but I'm not sure yet. This
    > post suggests as much.
    >
    > http://mail.scipy.org/pipermail/scipy-user/2009-February/019696.html


    Maybe. If the __reduce_ex__() method is implemented properly (and
    multiprocessing bugs aren't getting in the way), you ought to be able to pass
    them to a Pool just fine. You just need to make sure that the shared arrays are
    allocated before the Pool is started. And this only works on UNIX machines. The
    shared memory objects that shmarray uses can only be inherited. I believe that's
    what Sturla was getting at.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, Apr 5, 2011
    #3
  4. Philip Semanchuk, Apr 5, 2011
    #4
  5. John Ladasky

    Robert Kern Guest

    On 4/4/11 7:05 PM, Robert Kern wrote:
    > On 4/4/11 3:20 PM, John Ladasky wrote:


    > However, at least in Python 2.7, multiprocessing seems to have a C extension
    > module defining the Connection objects. Unfortunately, it looks like this C
    > extension just imports the regular pickler that is not aware of these custom
    > types. That's why you get this error. I believe this is a bug in Python.
    >
    > So what did you try, and what output did you get? What version of Python are you
    > using?
    >
    >> I think that my understanding of multiprocessing
    >> needs to evolve beyond the use of Pool, but I'm not sure yet. This
    >> post suggests as much.
    >>
    >> http://mail.scipy.org/pipermail/scipy-user/2009-February/019696.html

    >
    > Maybe. If the __reduce_ex__() method is implemented properly (and
    > multiprocessing bugs aren't getting in the way), you ought to be able to pass
    > them to a Pool just fine. You just need to make sure that the shared arrays are
    > allocated before the Pool is started. And this only works on UNIX machines. The
    > shared memory objects that shmarray uses can only be inherited. I believe that's
    > what Sturla was getting at.


    I'll take that back a little bit. Since the underlying shared memory types can
    only be shared by inheritance, the ForkingPickler is only used for the arguments
    passed to Process(), and not used for things sent through Queues (which Pool
    uses). Since Pool cannot guarantee that the data exists before the Pool starts
    its subprocesses, it must use the general mechanism.

    So in short, if you pass the shmarrays as arguments to the target function in
    Process(), it should work fine.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, Apr 5, 2011
    #5
  6. John Ladasky

    John Ladasky Guest

    Hi Philip,

    Thanks for the reply.

    On Apr 4, 4:34 pm, Philip Semanchuk <> wrote:
    > So if you're going to use multiprocessing, you're going to use pickle, and you
    > need pickleable objects.


    OK, that's good to know.

    > > Pickling is still a pretty
    > > vague progress to me, but I can see that you have to write custom
    > > __reduce__ and __setstate__ methods for your objects.

    >
    > Well, that's only if one's objects don't support pickle by default. A lotof
    > classes do without any need for custom __reduce__ and __setstate__ methods.
    > Since you're apparently not too familiar with pickle, I don't want you toget
    > the false impression that it's a lot of trouble. I've used pickle a number of
    > times and never had to write custom methods for it.


    I used __reduce__ and __setstate__ once before with success, but I
    just hacked at code I found on the Net, without fully understanding
    it.

    All right, since my last post, I've made an attempt to understand a
    bit more about pickle. In doing so, I'm getting into the guts of the
    Python class model, in a way that I did not expect. I believe that I
    need to understand the following:

    1) What kinds of objects have a __dict__,

    2) Why the default pickling methods do not simply look for a __dict__
    and serialize its contents, if it is present, and

    3) Why objects can exist that do not support pickle by default, and
    cannot be correctly pickled without custom __reduce__ and __setstate__
    methods.

    I can only furnish a (possibly incomplete) answer to question #1.
    User subclasses apparently always have a __dict__. Here:

    >>> class Foo:

    ... pass # It doesn't get any simpler
    ...
    >>> x = Foo()
    >>> x

    <__builtin__.Foo instance at 0x9b916cc>
    >>> x.__dict__

    {}
    >>> x.spam = "eggs"
    >>> x.__dict__

    {'spam': 'eggs'}
    >>> setattr(x, "parrot", "dead")

    <__builtin__.Bar instance at 0x9b916cc>
    >>> dir(x)

    ['__doc__', '__module__', 'parrot', 'spam']
    >>> x.__dict__

    {'parrot': 'dead', 'spam': 'eggs'}
    >>> x.__dict__ = {"a":"b", "c":"d"}
    >>> dir(x)

    ['__doc__', '__module__', 'a', 'c']
    >>> x.__dict__

    {'a': 'b', 'c': 'd'}

    Aside: this finding makes me wonder what exactly dir() is showing me
    about an object (all objects in the namespace, including methods, but
    not __dict__ itself?), and why it is distinct from __dict__.

    In contrast, it appears that BUILT-IN classes (numbers, lists,
    dictionaries, etc.) do not have a __dict__. And that you can't force
    a __dict__ into built-in objects, no matter how you may try.

    >>> n = 1
    >>> dir(n)

    ['__abs__', '__add__', '__and__', '__class__', '__cmp__',
    '__coerce__',
    '__delattr__', '__div__', '__divmod__', '__doc__', '__float__',
    '__floordiv__',
    '__format__', '__getattribute__', '__getnewargs__', '__hash__',
    '__hex__',
    '__index__', '__init__', '__int__', '__invert__', '__long__',
    '__lshift__',
    '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__',
    '__oct__', '__or__',
    '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__',
    '__rdivmod__',
    '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__',
    '__rlshift__',
    '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__',
    '__rshift__',
    '__rsub__', '__rtruediv__', '__rxor__', '__setattr__',
    '__sizeof__', '__str__',
    '__sub__', '__subclasshook__', '__truediv__', '__trunc__',
    '__xor__', 'conjugate',
    'denominator', 'imag', 'numerator', 'real']
    >>> n.__dict__

    File "<console>", line 1, in <module>
    ''' <type 'exceptions.AttributeError'> : 'int' object has no
    attribute '__dict__' '''
    >>> n.__dict__ = {"a":"b", "c":"d"}

    File "<console>", line 1, in <module>
    ''' <type 'exceptions.AttributeError'> : 'int' object has no
    attribute '__dict__' '''
    >>> setattr(n, "__dict__", {"a":"b", "c":"d"})

    File "<console>", line 1, in <module>
    ''' <type 'exceptions.AttributeError'> : 'int' object has no
    attribute '__dict__' '''


    This leads straight into my second question. I THINK, without knowing
    for sure, that most user classes would pickle correctly by simply
    iterating through __dict__. So, why isn't this the default behavior
    for Python? Was the assumption that programmers would only want to
    pickle built-in classes? Can anyone show me an example of a common
    object where my suggested approach would fail?

    > What might be easier than fooling around with boxes of sharp knives is toconvert
    > your ndarray objects


    .... with their attendant meta-data, of course...

    > to Python lists. Lists are pickle-friendly and easy to turn
    > back into ndarray objects once they've crossed the pickle boundary.


    That's a cute trick. I may try it, but I would rather do it the way
    that Python recommends. I always prefer to understand what I'm doing,
    and why.

    > Python will only get better at working with multiple cores/CPUs, but there's plenty of room for improvement on the status quo.


    Thank you for agreeing with me. I'm not formally trained in computer
    science, and there was always the chance that my opinion as a non-
    professional might come from simply not understanding the issues well
    enough. But I chose Python as a programming language (after Applesoft
    Basic, 6502 assembler, Pascal, and C -- yeah, I've been at this a
    while) because of its ease of entry. And while Python has carried me
    far, I don't think that an issue that is as central to modern
    computing as using multiple CPU's concurrently is so arcane that we
    should declare it black magic -- and therefore, it's OK if Python's
    philosophical goal of achieving clarity and simplicity is not met in
    this case.
    John Ladasky, Apr 5, 2011
    #6
  7. On Apr 5, 2011, at 12:58 PM, John Ladasky wrote:

    > Hi Philip,
    >
    > Thanks for the reply.
    >
    > On Apr 4, 4:34 pm, Philip Semanchuk <> wrote:
    >> So if you're going to use multiprocessing, you're going to use pickle, and you
    >> need pickleable objects.

    >
    > OK, that's good to know.


    But as Dan Stromberg pointed out, there are some pickle-free ways to communicate between processes using multiprocessing.

    > This leads straight into my second question. I THINK, without knowing
    > for sure, that most user classes would pickle correctly by simply
    > iterating through __dict__. So, why isn't this the default behavior
    > for Python? Was the assumption that programmers would only want to
    > pickle built-in classes?


    One can pickle user-defined classes:

    >>> class Foo(object):

    .... pass
    ....
    >>> import pickle
    >>> foo_instance = Foo()
    >>> pickle.dumps(foo_instance)

    'ccopy_reg\n_reconstructor\np0\n(c__main__\nFoo\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n.'


    And as Robert Kern pointed out, numpy arrays are also pickle-able.

    >>> import numpy
    >>> pickle.dumps(numpy.zeros(3))

    "cnumpy.core.multiarray\n_reconstruct\np0\n(cnumpy\nndarray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'f8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\ntp14\nb."

    As a side note, you should always use "new style" classes, particularly since you're exploring the details of Python class construction. "New" is a bit a of misnomer now, as "new" style classes were introduced in Python 2.2. They have been the status quo in Python 2.x for a while now and are the only choice in Python 3.x.

    Subclassing object gives you a new style class:
    class Foo(object):

    Not subclassing object (as you did in your example) gives you an old style class:
    class Foo:



    Cheers
    Philip
    Philip Semanchuk, Apr 5, 2011
    #7
  8. John Ladasky

    John Ladasky Guest

    Hello again, Philip,

    I really appreciate you sticking with me. Hopefully this will help
    someone else, too. I've done some more reading, and will offer some
    minimal code below.

    I've known about this page for a while, and it describes some of the
    unconventional things one needs to consider when subclassing
    numpy.ndarray:

    http://www.scipy.org/Subclasses

    Now, section 11.1 of the Python documentation says the following
    concerning pickling: "Classes can further influence how their
    instances are pickled; if the class defines the method __getstate__(),
    it is called and the return state is pickled as the contents for the
    instance, instead of the contents of the instance’s dictionary. If
    there is no __getstate__() method, the instance’s __dict__ is
    pickled."

    http://docs.python.org/release/2.6.6/library/pickle.html

    That being said, I'm having problems there! I've found this minimal
    example of pickling with overridden __getstate__ and __setstate__
    methods:

    http://stackoverflow.com/questions/...le-example-about-setstate-and-getstate-thanks

    I'll put all of the thoughts generated by these links together after
    responding to a few things you wrote.

    On Apr 5, 10:43 am, Philip Semanchuk <> wrote:
    > But as Dan Stromberg pointed out, there are some pickle-free ways to communicate between processes using multiprocessing.


    I only see your reply to Dan Stromberg in this thread, but not Dan
    Stromberg's original post. I am reading this through Google Groups.
    Perhaps Dan's post failed to make it through a gateway for some
    reason?

    > As a side note, you should always use "new style" classes, particularly since you're exploring the details of Python class construction. "New" is a bit a of misnomer now, as "new" style classes were introduced in Python 2.2.. They have been the status quo in Python 2.x for a while now and are the only choice in Python 3.x.


    Sorry, that was an oversight on my part. Normally I do remember that,
    but I've been doing a lot of subclassing rather than defining top-
    level classes, and therefore it slipped my mind.

    > One can pickle user-defined classes:


    OK, I got that working, I'll show an example farther down. (I never
    tried to pickle a minimal class until now, I went straight for a hard
    one.)

    > And as Robert Kern pointed out, numpy arrays are also pickle-able.


    OK, but SUBCLASSES of numpy.ndarray are not, in my hands, pickling as
    I would expect. I already have lots of code that is based on such
    subclasses, and they do everything I want EXCEPT pickle correctly. I
    may have to direct this line of questions to the numpy discussion
    group after all.

    This is going to be a longer exposition. So let me thank you in
    advance for your patience, and ask you to get comfortable.


    ========== begin code ==========

    from numpy import array, ndarray
    from pickle import dumps, loads

    ##===

    class Foo(object):

    def __init__(self, contents):
    self.contents = contents

    def __str__(self):
    return str(type(self)) + "\n" + str(self.__dict__)

    ##===

    class SuperFoo(Foo):

    def __getstate__(self):
    print "__getstate__"
    duplicate = dict(self.__dict__)
    duplicate["bonus"] = "added during pickling"
    return duplicate

    def __setstate__(self, d):
    print "__setstate__"
    self.__dict__ = d
    self.finale = "added during unpickling"

    ##===

    class ArraySubclass(ndarray):
    """
    See http://www.scipy.org/Subclasses, this class is very similar.
    """
    def __new__(subclass, data, extra=None, dtype=None, copy=False):
    print " __new__"
    arr = array(data, dtype=dtype, copy=copy)
    arr = arr.view(subclass)
    if extra is not None:
    arr.extra = extra
    elif hasattr(data, "extra"):
    arr.extra = data.extra
    return arr

    def __array_finalize__(self, other):
    print " __array_finalize__"
    self.__dict__ = getattr(other, "__dict__", {})

    def __str__(self):
    return str(type(self)) + "\n" + ndarray.__str__(self) + \
    "\n__dict__ : " + str(self.__dict__)

    ##===

    class PicklableArraySubclass(ArraySubclass):

    def __getstate__(self):
    print "__getstate__"
    return self.__dict__

    def __setstate__(self, d):
    print "__setstate__"
    self.__dict__ = d

    ##===

    print "\n\n** Create a Foo object, then create a copy via pickling."
    original = Foo("pickle me please")
    print original
    clone = loads(dumps(original))
    print clone

    print "\n\n** Create a SuperFoo object, just to watch __setstate__ and
    __getstate__."
    original = SuperFoo("pickle me too, please")
    print original
    clone = loads(dumps(original))
    print clone

    print "\n\n** Create a numpy ndarray, then create a copy via
    pickling."
    original = array(((1,2,3),(4,5,6)))
    print original
    clone = loads(dumps(original))
    print clone

    print "\n\n** Create an ArraySubclass object, with meta-data..."
    original = ArraySubclass(((9,8,7),(6,5,4)), extra = "pickle me
    PLEASE!")
    print original
    print "\n...now attempt to create a copy via pickling."
    clone = loads(dumps(original))
    print clone

    print "\n\n** That failed, try a PicklableArraySubclass..."
    original = PicklableArraySubclass(((1,2),(3,4)), extra = "pickle,
    dangit!!!")
    print original
    print "\n...now try to create a copy of the PicklableArraySubclass via
    pickling."
    clone = loads(dumps(original))
    print clone


    ========== end code, begin output ==========


    ** Create a Foo object, then create a copy via pickling.
    <class '__main__.Foo'>
    {'contents': 'pickle me please'}
    <class '__main__.Foo'>
    {'contents': 'pickle me please'}


    ** Create a SuperFoo object, just to watch __setstate__ and
    __getstate__.
    <class '__main__.SuperFoo'>
    {'contents': 'pickle me too, please'}
    __getstate__
    __setstate__
    <class '__main__.SuperFoo'>
    {'bonus': 'added during pickling', 'finale': 'added during
    unpickling', 'contents': 'pickle me too, please'}


    ** Create a numpy ndarray, then create a copy via pickling.
    [[1 2 3]
    [4 5 6]]
    [[1 2 3]
    [4 5 6]]


    ** Create an ArraySubclass object, with meta-data...
    __new__
    __array_finalize__
    __array_finalize__
    __array_finalize__
    <class '__main__.ArraySubclass'>
    [[9 8 7]
    [6 5 4]]
    __dict__ : {'extra': 'pickle me PLEASE!'}

    ....now attempt to create a copy via pickling.
    __array_finalize__
    __array_finalize__
    __array_finalize__
    <class '__main__.ArraySubclass'>
    [[9 8 7]
    [6 5 4]]
    __dict__ : {}


    ** That failed, try a PicklableArraySubclass...
    __new__
    __array_finalize__
    __array_finalize__
    __array_finalize__
    <class '__main__.PicklableArraySubclass'>
    [[1 2]
    [3 4]]
    __dict__ : {'extra': 'pickle, dangit!!!'}

    ....now try to create a copy of the PicklableArraySubclass via
    pickling.
    __array_finalize__
    __setstate__
    Traceback (most recent call last):
    File "minimal ndarray subclass example for posting.py", line 109, in
    <module>
    clone = loads(dumps(original))
    File "/usr/lib/python2.6/pickle.py", line 1374, in loads
    return Unpickler(file).load()
    File "/usr/lib/python2.6/pickle.py", line 858, in load
    dispatch[key](self)
    File "/usr/lib/python2.6/pickle.py", line 1217, in load_build
    setstate(state)
    File "minimal ndarray subclass example for posting.py", line 75, in
    __setstate__
    self.__dict__ = d
    TypeError: __dict__ must be set to a dictionary, not a 'tuple'
    >Exit code: 1


    ========== end output ==========

    Observations:

    My first three examples work fine. The second example, my SuperFoo
    class, shows that I can intercept pickling attempts and add data to
    __dict__.

    Problems appear in the fourth example, when I try to pickle an
    ArraySubclass object. It gets the array itself, but __dict__ is not
    copied, despite the fact that that is supposed to be Python's default
    behavior, and my first example did indeed show this default behavior.

    So in my fifth and final example, the PickleableArraySublcass object,
    I tried to override __getstate__ and __setstate__ just as I did with
    SuperFoo. Here's what I see: 1) __getstate__ is NEVER called. 2)
    __setstate__ is called, but it's expecting a dictionary as an
    argument, and that dictionary is absent.

    What's up with that?
    John Ladasky, Apr 7, 2011
    #8
  9. John Ladasky

    John Ladasky Guest

    Following up to my own post...

    On Apr 6, 11:40 pm, John Ladasky <> wrote:

    > What's up with that?


    Apparently, "what's up" is that I will need to implement a third
    method in my ndarray subclass -- namely, __reduce__.

    http://www.mail-archive.com//msg02446.html

    I'm burned out for tonight, I'll attempt to grasp what __reduce__ does
    tomorrow.

    Again, I'm going to point out that, given the extent that
    multiprocessing depends upon pickling, pickling should be made
    easier. This is Python, for goodness' sake! I'm still surprised at
    the hoops I've got to jump through.
    John Ladasky, Apr 7, 2011
    #9
  10. On Apr 7, 2011, at 3:41 AM, John Ladasky wrote:

    > Following up to my own post...
    >
    > On Apr 6, 11:40 pm, John Ladasky <> wrote:
    >
    >> What's up with that?

    >
    > Apparently, "what's up" is that I will need to implement a third
    > method in my ndarray subclass -- namely, __reduce__.
    >
    > http://www.mail-archive.com//msg02446.html
    >
    > I'm burned out for tonight, I'll attempt to grasp what __reduce__ does
    > tomorrow.
    >
    > Again, I'm going to point out that, given the extent that
    > multiprocessing depends upon pickling, pickling should be made
    > easier. This is Python, for goodness' sake! I'm still surprised at
    > the hoops I've got to jump through.


    Hi John,
    My own experience has been that when I reach a surprising level of hoop jumping, it usually means there's an easier path somewhere else that I'm neglecting.

    But if pickling subclasses of numpy.ndarray objects is what you really feel you need to do, then yes, I think asking on the numpy list is the best idea.


    Good luck
    Philip
    Philip Semanchuk, Apr 7, 2011
    #10
  11. John Ladasky

    Robert Kern Guest

    On 4/7/11 1:40 AM, John Ladasky wrote:

    > On Apr 5, 10:43 am, Philip Semanchuk<> wrote:


    >> And as Robert Kern pointed out, numpy arrays are also pickle-able.

    >
    > OK, but SUBCLASSES of numpy.ndarray are not, in my hands, pickling as
    > I would expect. I already have lots of code that is based on such
    > subclasses, and they do everything I want EXCEPT pickle correctly. I
    > may have to direct this line of questions to the numpy discussion
    > group after all.


    Define the __reduce_ex__() method, not __getstate__(), __setstate__().

    http://docs.python.org/release/2.6.6/library/pickle.html#pickling-and-unpickling-extension-types

    ndarrays are extension types, so they use that mechanism.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, Apr 7, 2011
    #11
  12. John Ladasky

    John Ladasky Guest

    On Apr 7, 10:44 am, Robert Kern <> wrote:
    > On 4/7/11 1:40 AM, John Ladasky wrote:
    >
    > > On Apr 5, 10:43 am, Philip Semanchuk<>  wrote:
    > >> And as Robert Kern pointed out, numpy arrays are also pickle-able.

    >
    > > OK, but SUBCLASSES of numpy.ndarray are not, in my hands, pickling as
    > > I would expect.  I already have lots of code that is based on such
    > > subclasses, and they do everything I want EXCEPT pickle correctly.  I
    > > may have to direct this line of questions to the numpy discussion
    > > group after all.

    >
    > Define the __reduce_ex__() method, not __getstate__(), __setstate__().
    >
    > http://docs.python.org/release/2.6.6/library/pickle.html#pickling-and...
    >
    > ndarrays are extension types, so they use that mechanism.


    Thanks, Robert, as you can see, I got on that track shortly after I
    posted my code example. This is apparently NOT a numpy issue, it's an
    issue for pickling all C extension types.

    Is there a way to examine a Python object, and determine whether it's
    a C extension type or not? Or do you have to deduce that from the
    documentation and/or the source code?

    I started hunting through the numpy source code last night. It's
    complicated. I haven't found the ndarray object definition yet.
    Perhaps because I was looking through .py files, when I actually
    should have been looking through .c files?
    John Ladasky, Apr 7, 2011
    #12
  13. John Ladasky

    Robert Kern Guest

    On 4/7/11 1:39 PM, John Ladasky wrote:
    > On Apr 7, 10:44 am, Robert Kern<> wrote:
    >> On 4/7/11 1:40 AM, John Ladasky wrote:
    >>
    >>> On Apr 5, 10:43 am, Philip Semanchuk<> wrote:
    >>>> And as Robert Kern pointed out, numpy arrays are also pickle-able.

    >>
    >>> OK, but SUBCLASSES of numpy.ndarray are not, in my hands, pickling as
    >>> I would expect. I already have lots of code that is based on such
    >>> subclasses, and they do everything I want EXCEPT pickle correctly. I
    >>> may have to direct this line of questions to the numpy discussion
    >>> group after all.

    >>
    >> Define the __reduce_ex__() method, not __getstate__(), __setstate__().
    >>
    >> http://docs.python.org/release/2.6.6/library/pickle.html#pickling-and...
    >>
    >> ndarrays are extension types, so they use that mechanism.

    >
    > Thanks, Robert, as you can see, I got on that track shortly after I
    > posted my code example. This is apparently NOT a numpy issue, it's an
    > issue for pickling all C extension types.


    Yes, but seriously, you should ask on the numpy mailing list. You will probably
    run into more numpy-specific issues. At least, we'd have been able to tell you
    things like "ndarray is an extension type, so look at that part of the
    documentation" quicker.

    > Is there a way to examine a Python object, and determine whether it's
    > a C extension type or not?


    For sure? No, not really. Not at the Python level, at least. You may be able to
    do something at the C level, I don't know.

    > Or do you have to deduce that from the
    > documentation and/or the source code?
    >
    > I started hunting through the numpy source code last night. It's
    > complicated. I haven't found the ndarray object definition yet.
    > Perhaps because I was looking through .py files, when I actually
    > should have been looking through .c files?


    Yes. The implementation for __reduce__ is in numpy/core/src/multiarray/methods.c
    as array_reduce(). You may want to look in numpy/ma/core.py for the definition
    of MaskedArray. It shows how you would define __reduce__, __getstate__ and
    __setstate__ for a subclass of ndarray.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, Apr 7, 2011
    #13
  14. John Ladasky

    sturlamolden Guest

    On 5 apr, 02:05, Robert Kern <> wrote:

    > PicklingError: Can't pickle <class
    > 'multiprocessing.sharedctypes.c_double_Array_10'>: attribute lookup
    > multiprocessing.sharedctypes.c_double_Array_10 failed


    Hehe :D

    That is why programmers should not mess with code they don't
    understand!

    Gaël and I wrote shmem to avoid multiprocessing.sharedctypes, because
    they cannot be pickled (they are shared by handle inheritance)! To do
    this we used raw Windows API and Unix System V IPC instead of
    multiprocessing.Array, and the buffer is pickled by giving it a name
    in the file system. Please be informed that the code on bitbucked has
    been "fixed" by someone who don't understand my code. "If it ain't
    broke don't fix it."

    http://folk.uio.no/sturlamo/python/sharedmem-feb13-2009.zip

    Known issues/bugs: 64-bit support is lacking, and os._exit in
    multiprocessing causes a memory leak on Linux.

    > Maybe. If the __reduce_ex__() method is implemented properly (and
    > multiprocessing bugs aren't getting in the way), you ought to be able to pass
    > them to a Pool just fine. You just need to make sure that the shared arrays are
    > allocated before the Pool is started. And this only works on UNIX machines. The
    > shared memory objects that shmarray uses can only be inherited. I believethat's
    > what Sturla was getting at.


    It's a C extension that gives a buffer to NumPy. Then YOU changed how
    NumPy pickles arrays referencing these buffers, using
    pickle.copy_reg :)

    Sturla
    sturlamolden, Apr 8, 2011
    #14
  15. John Ladasky

    sturlamolden Guest

    On 4 apr, 22:20, John Ladasky <> wrote:

    > https://bitbucket.org/cleemesser/numpy-sharedmem/src/3fa526d11578/shm...
    >
    > I've added a few lines to this code which allows subclassing the
    > shared memory array, which I need (because my neural net objects are
    > more than just the array, they also contain meta-data).  But I've run
    > into some trouble doing the actual sharing part.  The shmarray class
    > CANNOT be pickled.


    That is hilarious :)

    I see that the bitbucket page has my and Gaëls name on it, but the
    code is changed and broken beyond repair! I don't want to be
    associated with that crap!

    Their "shmarray.py" will not work -- ever. It fails in two ways:

    1. multiprocessing.Array cannot be pickled (as you noticed). It is
    shared by handle inheritance. Thus we (that is Gaël and I) made a
    shmem buffer object that could be pickled by giving it a name in the
    file system, instead of sharing it anonymously by inheriting the
    handle. Obviously those behind the bitbucket page don't understand the
    difference between named and anonymous shared memory (that is, System
    V IPC and BSD mmap, respectively.)

    2. By subclassing numpy.ndarray a pickle dump would encode a copy of
    the buffer. But that is what we want to avoid! We want to share the
    buffer itself, not make a copy of it! So we changed how numpy pickles
    arrays pointing to shared memory, instead of subclassing ndarray. I
    did that by slightly modifying some code written by Robert Kern.

    http://folk.uio.no/sturlamo/python/sharedmem-feb13-2009.zip
    Known issues/bugs: 64-bit support is lacking, and os._exit in
    multiprocessing causes a memory leak on Linux.


    Sturla
    sturlamolden, Apr 8, 2011
    #15
  16. John Ladasky

    sturlamolden Guest

    On 8 apr, 02:03, sturlamolden <> wrote:

    > http://folk.uio.no/sturlamo/python/sharedmem-feb13-2009.zip
    > Known issues/bugs: 64-bit support is lacking, and os._exit in
    > multiprocessing causes a memory leak on Linux.


    I should probably fix it for 64-bit now. Just recompiliong with 64-bit
    integers will not work, because I intentionally hardcoded the higher
    32 bits to 0. It doesn't help to represent the lower 32 bits with a 64
    bit integer (which it seems someone actually have tried :-D) The
    memory leak on Linux is pesky. os._exit prevents clean-up code from
    executing, but unlike Windows the Linux kernel does no reference
    counting. I am worried we actually need to make a small kernel driver
    to make this work properly for Linux, since os._exit makes the current
    user-space reference counting fail on child process exit.


    Sturla
    sturlamolden, Apr 8, 2011
    #16
  17. John Ladasky

    sturlamolden Guest

    On 8 apr, 02:38, sturlamolden <> wrote:

    > I should probably fix it for 64-bit now. Just recompiliong with 64-bit
    > integers will not work, because I intentionally hardcoded the higher
    > 32 bits to 0.


    That was easy, 64-bit support for Windows is done :)

    Now I'll just have to fix the Linux code, and figure out what to do
    with os._exit preventing clean-up on exit... :-(

    Sturla
    sturlamolden, Apr 8, 2011
    #17
  18. John Ladasky

    John Ladasky Guest

    On Apr 7, 6:10 pm, sturlamolden <> wrote:
    > On 8 apr, 02:38, sturlamolden <> wrote:
    >
    > > I should probably fix it for 64-bit now. Just recompiliong with 64-bit
    > > integers will not work, because I intentionally hardcoded the higher
    > > 32 bits to 0.

    >
    > That was easy, 64-bit support for Windows is done :)
    >
    > Now I'll just have to fix the Linux code, and figure out what to do
    > with os._exit preventing clean-up on exit... :-(
    >
    > Sturla


    Hi Sturla,

    Thanks for finding my discussion! Yes, it's about passing numpy
    arrays to multiple processors. I'll accomplish that any way that I
    can.

    AND thanks to the discussion provided here by Philip and Robert, I've
    become a bit less confused about pickling. I have working code which
    subclasses ndarray, pickles using __reduce_ex__, and unpickles using
    __setstate__. Just seven lines of additional code do what I want.

    I almost understand it, too. :^) Seriously, I'm not altogether sure
    about the purpose and structure of the tuple generated by
    __reduce_ex__. It looks very easy to abuse and break. Referencing
    tuple elements by number seems rather unPythonic. Has Python 3 made
    any improvements to pickle, I wonder?

    At the moment, my arrays are small enough that pickling them should
    not be a very expensive process -- provided that I don't have to keep
    doing it over and over again! I'm returning my attention to the
    multiprocessing module now. Processes created by Pool do not appear
    to persist. They seem to disappear after they are called. So I can't
    call them once with the neural net array, and then once again (or even
    repeatedly) with input data. Perhaps I need to look at Queue.

    I will retain a copy of YOUR shmarray code (not the Bitbucket code)
    for some time in the future. I anticipate that my arrays might get
    really large, and then copying them might not be practical in terms of
    time and memory usage.
    John Ladasky, Apr 9, 2011
    #18
  19. John Ladasky

    sturlamolden Guest

    On 9 apr, 09:36, John Ladasky <> wrote:

    > Thanks for finding my discussion!  Yes, it's about passing numpy
    > arrays to multiple processors.  I'll accomplish that any way that I
    > can.


    My preferred ways of doing this are:

    1. Most cases for parallel processing are covered by libraries, even
    for neural nets. This particularly involves linear algebra solvers and
    FFTs, or calling certain expensive functions (sin, cos, exp) over and
    over again. The solution here is optimised LAPACK and BLAS (Intel MKL,
    AMD ACML, GotoBLAS, ATLAS, Cray libsci), optimised FFTs (FFTW, Intel
    MKL, ACML), and fast vector math libraries (Intel VML, ACML). For
    example, if you want to make multiple calls to the function "exp",
    there is a good chance you want to use a vector math library. Despite
    of this, most Python programmers' instinct seems to be to use multiple
    processes with numpy.exp or math.exp, or use multiple threads in C
    with exp from libm (cf. math.h). Why go through this pain when a
    single function call to Intel VML or AMD ACML (acml-vm) will be much
    better? It is common to see scholars argue that "yes but my needs are
    so special that I need to customise everything myself." Usually this
    translates to "I don't know these libraries (not even that they exist)
    and are happy to reinvent the wheel." Thus, if you think you need to
    use manually managed threads or processes for parallel technical
    computing, and even contemplate that the GIL might get in your way,
    there is a 99% chance you are wrong. You will almost ALWAYS want ot
    use a fast library, either directly in Python or linked to your own
    serial C or Fortran code. You have probably heard that "premature
    optimisation is the root of all evil in computer programming." It
    particularly applies here. Learn to use the available performance
    libraires, and it does not matter from which language they are called
    (C or Fortran will not be faster than Python!) This is one of the
    major reasons Python can be used for HPC (high-performance computing)
    even though the Python part itself is "slow". Most of these libraires
    are available for free (GotoBLAS, ATLAS, FFTW, ACML), but Intel MKL
    and VML require a license fee.

    Also note that there are comprehensive numerical libraries you can
    use, which can be linked with multi-threaded performance libraries
    under the hood. Of particular interest are the Fortran libraries from
    NAG and IMSL, which are the two gold standards of technical computing.
    Also note that the linear algebra solvers of NumPy and SciPy in
    Enthought Python Distribution are linked with Intel MKL. Enthought's
    license are cheaper than Intel's, and you don't need to build NumPy or
    SciPy against MKL yourself. Using scipy.linalg from EPD is likely to
    cover your need for parallel computing with neural nets.


    2. Use Cython, threading.Thread, and release the GIL. Perhaps we
    should have a cookbook example in scipy.org on this. In the "nogil"
    block you can call a library or do certain things that Cython allows
    without holding the GIL.


    3. Use C, C++ or Fortran with OpenMP, and call these using Cython,
    ctypes or f2py. (I prefer Cython and Fortran 95, but any combination
    will do.)


    4. Use mpi4py with any MPI implementation.


    Note that 1-4 are not mutually exclusive, you can always use a certain
    combination.


    > I will retain a copy of YOUR shmarray code (not the Bitbucket code)
    > for some time in the future.  I anticipate that my arrays might get
    > really large, and then copying them might not be practical in terms of
    > time and memory usage.


    The expensive overhead in passing a NymPy array to
    multiprocessing.Queue is related to pickle/cPickle, not IPC or making
    a copy of the buffer.

    For any NumPy array you can afford to copy in terms of memory, just
    work with copies.

    The shared memory arrays I made are only useful for large arrays. They
    are just as expensive to pickle in terms of time, but can be
    inexpensive in terms of memory.

    Also beware that the buffer is not copied to the pickle, so you need
    to call .copy() to pickle the contents of the buffer.

    But again, I'd urge you to consider a library or threads
    (threading.Thread in Cython or OpenMP) before you consider multiple
    processes. The reason I have not updated the sharedmem arrays for two
    years is that I have come to the conclusion that there are better ways
    to do this (paricularly vendor tuned libraries). But since they are
    mostly useful with 64-bit (i.e. large arrays), I'll post an update
    soon.

    If you decide to use a multithreaded solution (or shared memory as
    IPC), beware of "false sharing". If multiple processors write to the
    same cache line (they can be up to 512K depending on hardware), you'll
    create an invisible "GIL" that will kill any scalability. That is
    because dirty cache lines need to be synchonized with RAM. "False
    sharing" is one of the major reasons that "home-brewed" compute-
    intensive code will not scale.

    It is not uncommon to see Java programmers complain about Python's
    GIL, and then they go on to write i/o bound or false shared code. Rest
    assured that multi-threaded Java will not scale better than Python in
    these cases :)


    Regards,
    Sturla
    sturlamolden, Apr 9, 2011
    #19
  20. John Ladasky

    John Ladasky Guest

    On Apr 9, 10:15 am, sturlamolden <> wrote:
    > On 9 apr, 09:36, John Ladasky <> wrote:
    >
    > > Thanks for finding my discussion!  Yes, it's about passing numpy
    > > arrays to multiple processors.  I'll accomplish that any way that I
    > > can.

    >
    > My preferred ways of doing this are:
    >
    > 1. Most cases for parallel processing are covered by libraries, even
    > for neural nets. This particularly involves linear algebra solvers and
    > FFTs, or calling certain expensive functions (sin, cos, exp) over and
    > over again. The solution here is optimised LAPACK and BLAS (Intel MKL,
    > AMD ACML, GotoBLAS, ATLAS, Cray libsci), optimised FFTs (FFTW, Intel
    > MKL, ACML), and fast vector math libraries (Intel VML, ACML). For
    > example, if you want to make multiple calls to the function "exp",
    > there is a good chance you want to use a vector math library. Despite
    > of this, most Python programmers' instinct seems to be to use multiple
    > processes with numpy.exp or math.exp, or use multiple threads in C
    > with exp from libm (cf. math.h). Why go through this pain when a
    > single function call to Intel VML or AMD ACML (acml-vm) will be much
    > better? It is common to see scholars argue that "yes but my needs are
    > so special that I need to customise everything myself." Usually this
    > translates to "I don't know these libraries (not even that they exist)
    > and are happy to reinvent the wheel."  


    Whoa, Sturla. That was a proper core dump!

    You're right, I'm unfamiliar with the VAST array of libraries that you
    have just described. I will have to look at them. It's true, I
    probably only know of the largest and most widely-used Python
    libraries. There are so many, who can keep track?

    Now, I do have a special need. I've implemented a modified version of
    the Fahlman Cascade Correlation algorithm that will not be found in
    any existing libraries, and which I think should be superior for
    certain types of problems. (I might even be able to publish this
    algorithm, if I can get it working and show some examples?)

    That doesn't mean that I can't use the vector math libraries that
    you've recommended. As long as those libraries can take advantage of
    my extra computing power, I'm interested. Note, however, that the
    cascade evaluation does have a strong sequential requirement. It's
    not a traditional three-layer network. In fact, describing a cascade
    network according to the number of "layers" it has is not very
    meaningful, because each hidden node is essentially its own layer.

    So, there are limited advantages to trying to parallelize the
    evaluation of ONE cascade network's weights against ONE input vector.
    However, evaluating several copies of one cascade network's output,
    against several different test inputs simultaneously, should scale up
    nicely. Evaluating many possible test inputs is exactly what you do
    when training a network to a data set, and so this is how my program
    is being designed.

    > Thus, if you think you need to
    > use manually managed threads or processes for parallel technical
    > computing, and even contemplate that the GIL might get in your way,
    > there is a 99% chance you are wrong. You will almost ALWAYS want ot
    > use a fast library, either directly in Python or linked to your own
    > serial C or Fortran code. You have probably heard that "premature
    > optimisation is the root of all evil in computer programming." It
    > particularly applies here.


    Well, I thought that NUMPY was that fast library...

    Funny how this works, though -- I built my neural net class in Python,
    rather than avoiding numpy and going straight to wrapping code in C,
    precisely because I wanted to AVOID premature optimization (for
    unknown, and questionable gains in performance). I started on this
    project when I had only a single-core CPU, though. Now that multi-
    core CPU's are apparently here to stay, and I've seen just how long my
    program takes to run, I want to make full use of multiple cores. I've
    even looked at MPI. I'm considering networking to another multi-CPU
    machine down the hall, once I have my program working.

    > But again, I'd urge you to consider a library or threads
    > (threading.Thread in Cython or OpenMP) before you consider multiple
    > processes.


    My single-CPU neural net training program had two threads, one for the
    GUI and one for the neural network computations. Correct me if I'm
    wrong here, but -- since the two threads share a single Python
    interpreter, this means that only a single CPU is used, right? I'm
    looking at multiprocessing for this reason.

    > The reason I have not updated the sharedmem arrays for two
    > years is that I have come to the conclusion that there are better ways
    > to do this (paricularly vendor tuned libraries). But since they are
    > mostly useful with 64-bit (i.e. large arrays), I'll post an update
    > soon.
    >
    > If you decide to use a multithreaded solution (or shared memory as
    > IPC), beware of "false sharing". If multiple processors write to the
    > same cache line (they can be up to 512K depending on hardware), you'll
    > create an invisible "GIL" that will kill any scalability. That is
    > because dirty cache lines need to be synchonized with RAM. "False
    > sharing" is one of the major reasons that "home-brewed" compute-
    > intensive code will not scale.


    Even though I'm not formally trained in computer science, I am very
    conscious of the fact that WRITING to shared memory is a problem,
    cache or otherwise. At the very top of this thread, I pointed out
    that my neural network training function would need READ-ONLY access
    to two items -- the network weights, and the input data. Given that,
    and my (temporary) struggles with pickling, I considered the shared-
    memory approach as an alternative.

    > It is not uncommon to see Java programmers complain about Python's
    > GIL, and then they go on to write i/o bound or false shared code. Rest
    > assured that multi-threaded Java will not scale better than Python in
    > these cases :)


    I've never been a Java programmer, and I hope it stays that way!
    John Ladasky, Apr 9, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Felix
    Replies:
    1
    Views:
    872
    Robert Kern
    Oct 8, 2009
  2. Valery
    Replies:
    9
    Views:
    1,479
    Klauss
    Jan 7, 2010
  3. Kevin Ar18
    Replies:
    0
    Views:
    219
    Kevin Ar18
    Jul 27, 2010
  4. Kevin Ar18
    Replies:
    0
    Views:
    228
    Kevin Ar18
    Jul 27, 2010
  5. Miki Tebeka
    Replies:
    5
    Views:
    413
    sturlamolden
    Apr 11, 2011
Loading...

Share This Page