problem with simple multiprocessing script on OS X

D

Darren Dale

The following script runs without problems on Ubuntu and Windows 7.
h5py is a package wrapping the hdf5 library (http://code.google.com/p/
h5py/):

from multiprocessing import Pool
import h5py

def update(i):
print i

def f(i):
"hello foo"
return i*i

if __name__ == '__main__':
pool = Pool()
for i in range(10):
pool.apply_async(f, , callback=update)
pool.close()
pool.join()


On OS X 10.6 (tested using python-2.6.5 from MacPorts), I have to
comment out the as-yet unused h5py import, otherwise I get a
traceback:

Exception in thread Thread-1:
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/multiprocessing/pool.py", line 226, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup
__builtin__.function failed


I've searched that pickle error and found some references to pickling
a lambda, but I don't think that is the issue. There are no lambdas in
the h5py module, and the script runs fine on windows and linux. I need
access to both multiprocessing and h5py objects in the same module, so
I can register a callback that saves the results to an hdf5 file.

Are there any suggestions as to what could be the problem, or
suggestions on how I can track it down?

Thanks,
Darren
 
D

Darren Dale

The following script runs without problems on Ubuntu and Windows 7.
h5py is a package wrapping the hdf5 library (http://code.google.com/p/
h5py/):

from multiprocessing import Pool
import h5py

def update(i):
    print i

def f(i):
    "hello foo"
    return i*i

if __name__ == '__main__':
    pool = Pool()
    for i in range(10):
        pool.apply_async(f, , callback=update)
    pool.close()
    pool.join()

On OS X 10.6 (tested using python-2.6.5 from MacPorts), I have to
comment out the as-yet unused h5py import, otherwise I get a
traceback:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/multiprocessing/pool.py", line 226, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup
__builtin__.function failed



This is a really critical bug for me, but I'm not sure how to proceed.
Can I file a bug report on the python bugtracker if the only code I
can come up with to illustrate the problem requires a lame import of a
third party module?
 
T

Thomas Jollans

The following script runs without problems on Ubuntu and Windows 7.
h5py is a package wrapping the hdf5 library (http://code.google.com/p/
h5py/):

from multiprocessing import Pool
import h5py

def update(i):
print i

def f(i):
"hello foo"
return i*i

if __name__ == '__main__':
pool = Pool()
for i in range(10):
pool.apply_async(f, , callback=update)
pool.close()
pool.join()

On OS X 10.6 (tested using python-2.6.5 from MacPorts), I have to
comment out the as-yet unused h5py import, otherwise I get a
traceback:


What on earth is h5py doing there? If what you're telling us is actually
happening, and the code works 1:1 on Linux and Windows, but fails on OSX, and
you're using the same versions of h5py and Python, then the h5py
initialization code is not only enticing multiprocessing to try to pickle
something other than usual, but it is also doing that due to some platform-
dependent witchcraft, and I doubt there's very much separating the OSX
versions from the Linux versions of anything involved.
This is a really critical bug for me, but I'm not sure how to proceed.
Can I file a bug report on the python bugtracker if the only code I
can come up with to illustrate the problem requires a lame import of a
third party module?

I doubt this is an issue with Python. File a bug on the h5py tracker and see
what they say. The people there might at least have some vague inkling of what
may be going on.
 
B

Benjamin Kaplan

The following script runs without problems on Ubuntu and Windows 7.
h5py is a package wrapping the hdf5 library (http://code.google.com/p/
h5py/):

from multiprocessing import Pool
import h5py

def update(i):
    print i

def f(i):
    "hello foo"
    return i*i

if __name__ == '__main__':
    pool = Pool()
    for i in range(10):
        pool.apply_async(f, , callback=update)
    pool.close()
    pool.join()

On OS X 10.6 (tested using python-2.6.5 from MacPorts), I have to
comment out the as-yet unused h5py import, otherwise I get a
traceback:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/multiprocessing/pool.py", line 226, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup
__builtin__.function failed



This is a really critical bug for me, but I'm not sure how to proceed.
Can I file a bug report on the python bugtracker if the only code I
can come up with to illustrate the problem requires a lame import of a
third party module?
--


It's working fine for me, OS X 10.6.4, Python 2.6 and h5py from Macports.
 
D

Darren Dale

The following script runs without problems on Ubuntu and Windows 7.
h5py is a package wrapping the hdf5 library (http://code.google.com/p/
h5py/):
from multiprocessing import Pool
import h5py
def update(i):
    print i
def f(i):
    "hello foo"
    return i*i
if __name__ == '__main__':
    pool = Pool()
    for i in range(10):
        pool.apply_async(f, , callback=update)
    pool.close()
    pool.join()
On OS X 10.6 (tested using python-2.6.5 from MacPorts), I have to
comment out the as-yet unused h5py import, otherwise I get a
traceback:
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/multiprocessing/pool.py", line 226, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup
__builtin__.function failed

This is a really critical bug for me, but I'm not sure how to proceed.
Can I file a bug report on the python bugtracker if the only code I
can come up with to illustrate the problem requires a lame import of a
third party module?
--

It's working fine for me, OS X 10.6.4, Python 2.6 and h5py from Macports.


Really? With the h5py import uncommented? I just uninstalled and
reinstalled my entire macports python26/py26-numpy/hdf5-18/py26-h5py
stack, and I still see the same error.
 
D

Darren Dale

The following script runs without problems on Ubuntu and Windows 7.
h5py is a package wrapping the hdf5 library (http://code.google.com/p/
h5py/):
from multiprocessing import Pool
import h5py
def update(i):
    print i
def f(i):
    "hello foo"
    return i*i
if __name__ == '__main__':
    pool = Pool()
    for i in range(10):
        pool.apply_async(f, , callback=update)
    pool.close()
    pool.join()
On OS X 10.6 (tested using python-2.6.5 from MacPorts), I have to
comment out the as-yet unused h5py import, otherwise I get a
traceback:


What on earth is h5py doing there?  If what you're telling us is actually
happening, and the code works 1:1 on Linux and Windows, but fails on OSX, and
you're using the same versions of h5py and Python, then the h5py
initialization code is not only enticing multiprocessing to try to pickle
something other than usual, but it is also doing that due to some platform-
dependent witchcraft, and I doubt there's very much separating the OSX
versions from the Linux versions of anything involved.


I can't find anything in the source to suggest that h5py is doing any
platform-specific magic. Do you have an idea of how it would be
possible for initialization code to cause multiprocessing to try to
pickle something it normally would not?
I doubt this is an issue with Python. File a bug on the h5py tracker and see
what they say. The people there might at least have some vague inkling of what
may be going on.

Thanks for the suggestion. I was in touch with the h5py maintainer
before my original post. We don't have any leads.
 
D

Darren Dale

The following script runs without problems on Ubuntu and Windows 7.
h5py is a package wrapping the hdf5 library (http://code.google.com/p/
h5py/):
from multiprocessing import Pool
import h5py
def update(i):
    print i
def f(i):
    "hello foo"
    return i*i
if __name__ == '__main__':
    pool = Pool()
    for i in range(10):
        pool.apply_async(f, , callback=update)
    pool.close()
    pool.join()
On OS X 10.6 (tested using python-2.6.5 from MacPorts), I have to
comment out the as-yet unused h5py import, otherwise I get a
traceback:


What on earth is h5py doing there?  If what you're telling us is actually
happening, and the code works 1:1 on Linux and Windows, but fails on OSX, and
you're using the same versions of h5py and Python, then the h5py
initialization code is not only enticing multiprocessing to try to pickle
something other than usual, but it is also doing that due to some platform-
dependent witchcraft, and I doubt there's very much separating the OSX
versions from the Linux versions of anything involved.


Your analysis was spot on.

About a year ago, I contributed a patch to h5py which checks to see if
h5py is being imported into an active IPython session. If so, then a
custom tab completer is loaded to make it easier to navigate hdf5
files. In the development version of IPython, a function that used to
return None if there was no instance of an IPython interactive shell
now creates and returns a new instance. This was the cause of the
error I was reporting. If one were to install ipython from the master
branch at github or from http://ipython.scipy.org/dist/testing/ipython-dev-nightly.tgz,
then the following script will reproduce the problem. I'm not sure why
this causes an error, but I'll discuss it with the IPython devs.

Thank you Thomas and Benjamin for helping me understand the problem.

Darren


from multiprocessing import Pool

import IPython.core.ipapi as ip

ip.get()

def update(i):
print i

def f(i):
return i*i

if __name__ == '__main__':
pool = Pool()
for i in range(10):
pool.apply_async(f, , callback=update)
pool.close()
pool.join()
 
A

Aahz

About a year ago, I contributed a patch to h5py which checks to see if
h5py is being imported into an active IPython session. If so, then a
custom tab completer is loaded to make it easier to navigate hdf5
files. In the development version of IPython, a function that used to
return None if there was no instance of an IPython interactive shell
now creates and returns a new instance. This was the cause of the
error I was reporting.

Hoist on your own petard, eh? ;-) Thanks for reporting the solution.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top