Using Pool map with a method of a class and a list

L

Luca Cerone

Hi guys,
I would like to apply the Pool.map method to a member of a class.

Here is a small example that shows what I would like to do:

from multiprocessing import Pool

class A(object):
def __init__(self,x):
self.value = x
def fun(self,x):
return self.value**x


l = range(10)

p = Pool(4)

op = p.map(A.fun,l)

#using this with the normal map doesn't cause any problem

This fails because it says that the methods can't be pickled.
(I assume it has something to do with the note in the documentation: "functionality within this package requires that the __main__ module be importable by the children.", which is obscure to me).

I would like to understand two things: why my code fails and when I can expect it to fail? what is a possible workaround?

Thanks a lot in advance to everybody for the help!

Cheers,
Luca
 
C

Chris Angelico

from multiprocessing import Pool

class A(object):
def __init__(self,x):
self.value = x
def fun(self,x):
return self.value**x


l = range(10)

p = Pool(4)

op = p.map(A.fun,l)

Do you ever instantiate any A() objects? You're attempting to call an
unbound method without passing it a 'self'.

You may find the results completely different in Python 2 vs Python 3,
and between bound and unbound methods. In Python 3, an unbound method
is simply a function. In both versions, a bound method carries its
first argument around, so it has to be something different. Play
around with it a bit.

ChrisA
 
L

Luca Cerone

Hi guys,

I would like to apply the Pool.map method to a member of a class.



Here is a small example that shows what I would like to do:



from multiprocessing import Pool



class A(object):

def __init__(self,x):

self.value = x

def fun(self,x):

return self.value**x





l = range(10)



p = Pool(4)



op = p.map(A.fun,l)



#using this with the normal map doesn't cause any problem



This fails because it says that the methods can't be pickled.

(I assume it has something to do with the note in the documentation: "functionality within this package requires that the __main__ module be importable by the children.", which is obscure to me).



I would like to understand two things: why my code fails and when I can expect it to fail? what is a possible workaround?



Thanks a lot in advance to everybody for the help!



Cheers,

Luca



Hi guys,

I would like to apply the Pool.map method to a member of a class.



Here is a small example that shows what I would like to do:



from multiprocessing import Pool



class A(object):

def __init__(self,x):

self.value = x

def fun(self,x):

return self.value**x





l = range(10)



p = Pool(4)



op = p.map(A.fun,l)



#using this with the normal map doesn't cause any problem



This fails because it says that the methods can't be pickled.

(I assume it has something to do with the note in the documentation: "functionality within this package requires that the __main__ module be importable by the children.", which is obscure to me).



I would like to understand two things: why my code fails and when I can expect it to fail? what is a possible workaround?



Thanks a lot in advance to everybody for the help!



Cheers,

Luca
 
L

Luca Cerone

Hi Chris, thanks
Do you ever instantiate any A() objects? You're attempting to call an

unbound method without passing it a 'self'.

I have tried a lot of variations, instantiating the object, creating lambda functions that use the unbound version of fun (A.fun.__func__) etc etc..
I have played around it quite a bit before posting.

As far as I have understood the problem is due to the fact that Pool pickle the function and copy it in the various pools..
But since the methods cannot be pickled this fails..

The same example I posted won't run in Python 3.2 neither (I am mostly interested in a solution for Python 2.7, sorry I forgot to mention that).

Thanks in any case for the help, hopefully there will be some other advice in the ML :)

Cheers,
Luca
 
J

Joshua Landau

Hi Chris, thanks


I have tried a lot of variations, instantiating the object, creating lambda functions that use the unbound version of fun (A.fun.__func__) etc etc..
I have played around it quite a bit before posting.

As far as I have understood the problem is due to the fact that Pool pickle the function and copy it in the various pools..
But since the methods cannot be pickled this fails..

The same example I posted won't run in Python 3.2 neither (I am mostly interested in a solution for Python 2.7, sorry I forgot to mention that).

Thanks in any case for the help, hopefully there will be some other advice in the ML :)


I think you might not understand what Chris said.

Currently this does *not* work with Python 2.7 as you suggested it would.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unbound method fun() must be called with A instance as
first argument (got int instance instead)

This, however, does:
[1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683]


Chris might have also been confused because once you fix that it works
in Python 3.

You will find that
http://stackoverflow.com/questions/...od-when-using-pythons-multiprocessing-pool-ma
explains the problem in more detail than I understand. I suggest
reading it and relaying further questions back to us. Or use Python 3
;).
 
L

Luca Cerone

Hi Joshua thanks!
I think you might not understand what Chris said.
Currently this does *not* work with Python 2.7 as you suggested it would.

Yeah actually that wouldn't work even in Python 3, since value attribute used by fun has not been set.
It was my mistake in the example, but it is not the source of the problem..
This, however, does:
[1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683]

This works fine (and I knew that).. but is not what I want...

You are using the map() function that comes with Python. I want
to use the map() method of the Pool class (available in the multiprocessing module).

And there are differences between map() and Pool.map() apparently, so that if something works fine with map() it may not work with Pool.map() (as in my case).

To correct my example:

from multiprocessing import Pool

class A(object):
def __init__(self,x):
self.value = x
def fun(self,x):
return self.value**x

l = range(100)
p = Pool(4)
op = p.map(A(3).fun, l)

doesn't work neither in Python 2.7, nor 3.2 (by the way I can't use Python 3 for my application).
You will find that
http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-> > when-using-pythons-multiprocessing-pool-ma
explains the problem in more detail than I understand. I suggest
reading it and relaying further questions back to us. Or use Python 3

:) Thanks, but of course I googled and found this link before posting. I don't understand much of the details as well, that's why I posted here.

Anyway, thanks for the attempt :)

Luca
 
J

Joshua Landau

To correct my example:

from multiprocessing import Pool

class A(object):
def __init__(self,x):
self.value = x
def fun(self,x):
return self.value**x

l = range(100)
p = Pool(4)
op = p.map(A(3).fun, l)

doesn't work neither in Python 2.7, nor 3.2 (by the way I can't use Python 3 for my application).

Are you using Windows? Over here on 3.3 on Linux it does. Not on 2.7 though.
:) Thanks, but of course I googled and found this link before posting. I don't understand much of the details as well, that's why I posted here.

Anyway, thanks for the attempt :)

Reading there, the simplest method seems to be, in effect:

from multiprocessing import Pool
from functools import partial

class A(object):
def __init__(self,x):
self.value = x
def fun(self,x):
return self.value**x

def _getattr_proxy_partialable(instance, name, arg):
return getattr(instance, name)(arg)

def getattr_proxy(instance, name):
"""
A version of getattr that returns a proxy function that can
be pickled. Only function calls will work on the proxy.
"""
return partial(_getattr_proxy_partialable, instance, name)

l = range(100)
p = Pool(4)
op = p.map(getattr_proxy(A(3), "fun"), l)
print(op)
 
L

Luca Cerone

doesn't work neither in Python 2.7, nor 3.2 (by the way I can't use Python 3 for my application).
Are you using Windows? Over here on 3.3 on Linux it does. Not on 2.7 though.

No I am using Ubuntu (12.04, 64 bit).. maybe things changed from 3.2 to 3.3?
from multiprocessing import Pool

from functools import partial



class A(object):

def __init__(self,x):

self.value = x

def fun(self,x):

return self.value**x



def _getattr_proxy_partialable(instance, name, arg):

return getattr(instance, name)(arg)



def getattr_proxy(instance, name):

"""

A version of getattr that returns a proxy function that can

be pickled. Only function calls will work on the proxy.

"""

return partial(_getattr_proxy_partialable, instance, name)



l = range(100)

p = Pool(4)

op = p.map(getattr_proxy(A(3), "fun"), l)

print(op)

I can't try it now, I'll let you know later if it works!
(Though just by reading I can't really understand what the code does).

Thanks for the help,
Luca
 
J

Joshua Landau

I can't try it now, I'll let you know later if it works!
(Though just by reading I can't really understand what the code does).
Well,

This is all the same, as with

You then wanted to do:
op = p.map(A(3).fun, l)

but bound methods can't be pickled, it seems.

However, A(3) *can* be pickled. So what we want is a function:

def proxy(arg):
A(3).fun(arg)

so we can write:
op = p.map(proxy, l)

To generalise you might be tempted to write:

def generic_proxy(instance, name):
def proxy(arg):
# Equiv. of instance.name(arg)
getattr(instance, name)(arg)

but the inner function won't work as functions-in-functions can't be
pickled either.

So we use:

Which takes all instance, name and arg. Of course we only want our
function to take arg, so we partial it:

partial objects are picklable, btw.

:)
 
P

Peter Otten

Joshua said:
This is all the same, as with


You then wanted to do:


but bound methods can't be pickled, it seems.

However, A(3) *can* be pickled. So what we want is a function:

def proxy(arg):
A(3).fun(arg)

so we can write:


To generalise you might be tempted to write:

def generic_proxy(instance, name):
def proxy(arg):
# Equiv. of instance.name(arg)
getattr(instance, name)(arg)

but the inner function won't work as functions-in-functions can't be
pickled either.

So we use:


Which takes all instance, name and arg. Of course we only want our
function to take arg, so we partial it:


partial objects are picklable, btw.


:)


There is also the copy_reg module. Adapting

<http://mail.python.org/pipermail/python-list/2008-July/469164.html>

you get:

import copy_reg
import multiprocessing
import new

def make_instancemethod(inst, methodname):
return getattr(inst, methodname)

def pickle_instancemethod(method):
return make_instancemethod, (method.im_self, method.im_func.__name__)

copy_reg.pickle(
new.instancemethod, pickle_instancemethod, make_instancemethod)

class A(object):
def __init__(self, a):
self.a = a
def fun(self, b):
return self.a**b

if __name__ == "__main__":
items = range(10)
pool = multiprocessing.Pool(4)
print pool.map(A(3).fun, items)
 
J

Joshua Landau

import copy_reg
import multiprocessing
import new

"new" is deprecated from 2.6+; use types.MethodType instead of
new.instancemethod.
def make_instancemethod(inst, methodname):
return getattr(inst, methodname)

This is just getattr -- you can replace the two uses of
make_instancemethod with getattr and delete this ;).
def pickle_instancemethod(method):
return make_instancemethod, (method.im_self, method.im_func.__name__)

copy_reg.pickle(
new.instancemethod, pickle_instancemethod, make_instancemethod)

class A(object):
def __init__(self, a):
self.a = a
def fun(self, b):
return self.a**b

if __name__ == "__main__":
items = range(10)
pool = multiprocessing.Pool(4)
print pool.map(A(3).fun, items)

Well that was easy. The Stackoverflow link made that look *hard*. -1
to my hack, +1 to this.

You can do this in one statement:

copy_reg.pickle(
types.MethodType,
lambda method: (getattr, (method.im_self, method.im_func.__name__)),
getattr
)
 
L

Luca Cerone

Thanks for the post.
I actually don't know exactly what can and can't be pickles..
not what partialing a function means..
Maybe can you link me to some resources?

I still can't understand all the details in your code :)
 
J

Joshua Landau

Thanks for the post.
I actually don't know exactly what can and can't be pickles..

I just try it and see what works ;).

The general idea is that if it is module-level it can be pickled and
if it is defined inside of something else it cannot. It depends
though.
not what partialing a function means..

"partial" takes a function and returns it with arguments "filled in":

from functools import partial

def add(a, b):
return a + b

add5 = partial(add, 5)

print(add5(10)) # Returns 15 == 5 + 10
Maybe can you link me to some resources?
http://docs.python.org/2/library/functools.html#functools.partial


I still can't understand all the details in your code :)

Never mind that, though, as Peter Otten's code (with my very minor
suggested modifications) if by far the cleanest method of the two and
is arguably more correct too.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,072
Latest member
trafficcone

Latest Threads

Top