questions about multiprocessing

V

Vincent Ren

Hello, everyone, recently I am trying to learn python's
multiprocessing, but
I got confused as a beginner.

If I run the code below:

from multiprocessing import Pool
import urllib2
otasks = [
'http://www.php.net'
'http://www.python.org'
'http://www.perl.org'
'http://www.gnu.org'
]

def f(url):
return urllib2.urlopen(url).read()

pool = Pool(processes = 2)
print pool.map(f, tasks)


I'll receive this message:

Traceback (most recent call last):
File "<stdin>", line 14, in <module>
File "/usr/lib/python2.6/multiprocessing/pool.py", line 148, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.6/multiprocessing/pool.py", line 422, in get
raise self._value
httplib.InvalidURL: nonnumeric port: ''



I run Python 2.6 on Ubuntu 10.10


Regards
Vincent
 
P

Philip Semanchuk

Hello, everyone, recently I am trying to learn python's
multiprocessing, but
I got confused as a beginner.

If I run the code below:

from multiprocessing import Pool
import urllib2
otasks = [
'http://www.php.net'
'http://www.python.org'
'http://www.perl.org'
'http://www.gnu.org'
]

def f(url):
return urllib2.urlopen(url).read()

pool = Pool(processes = 2)
print pool.map(f, tasks)

Hi Vincent,
I don't think that's the code you're running, because that code won't run. Here's what I get when I run the code you gave us:

Traceback (most recent call last):
File "x.py", line 14, in <module>
print pool.map(f, tasks)
NameError: name 'tasks' is not defined


When I change the name of "otasks" to "tasks", I get the nonnumeric port error that you reported.

Me, I would debug it by adding a print statement to f():
def f(url):
print url
return urllib2.urlopen(url).read()


Your problem isn't related to multiprocessing.

Good luck
Philip
 
D

Dennis Lee Bieber

Hello, everyone, recently I am trying to learn python's
multiprocessing, but
I got confused as a beginner.

If I run the code below:

from multiprocessing import Pool
import urllib2
You've just defined a list with ONE element -- a string of:

"http://www.php.nethttp://www.python.orghttp://www.perl.orghttp://http://www.gnu.org"


Python concatenates adjacent strings -- which includes those on
multiple lines when inside an open ( [ { structure.

You need to put commas after the closing quotes on those lines.
def f(url):
return urllib2.urlopen(url).read()

pool = Pool(processes = 2)
print pool.map(f, tasks)

And I'm presuming the others are correct -- and that should be

(f, otasks)
httplib.InvalidURL: nonnumeric port: ''

No surprise... URL nomenclature expects a port number after the
second : in URL, and with concatenation you've got four : in a single
URL.
 
V

Vincent Ren

Got it.
After putting commas, it works (The 'o' was a mistake when I posted,
sorry about it ).

Thanks to all of you :)


Hello, everyone, recently I am trying to learn python's
multiprocessing, but
I got confused as a beginner.
If I run the code below:
from multiprocessing import Pool
import urllib2
otasks = [
     'http://www.php.net'
     'http://www.python.org'
     'http://www.perl.org'
     'http://www.gnu.org'
     ]

        You've just defined a list with ONE element -- a string of:

"http://www.php.nethttp://www.python.orghttp://www.perl.orghttp://http..."

        Python concatenates adjacent strings -- which includes those on
multiple lines when inside an open ( [ { structure.

        You need to put commas after the closing quotes on those lines.
def f(url):
     return urllib2.urlopen(url).read()
pool = Pool(processes = 2)
print pool.map(f, tasks)

        And I'm presuming the others are correct -- and that should be

(f, otasks)
httplib.InvalidURL: nonnumeric port: ''

        No surprise... URL nomenclature expects a port number after the
second : in URL, and with concatenation you've got four : in a single
URL.
 
V

Vincent Ren

I've got some new problems and I tried to search on Google but got no
useful information.


I want to download some images with multiprocessing.pool
In my class named Renren, I defined two methods:

def getPotrait(self, url):
# get the current potraits of a friend on Renren.com
try:
r = urllib2.urlopen(url)
except urllib2.URLError:
print "Time out"

tmp = re.search('large_[\d\D]*.jpg', url)
image_name = tmp.group()

img = r.read()
output = open(image_name, 'wb')
output.write(img)
output.close()

def getLargePotraits(self):

tasks = self.makeTaskList()
pool = Pool(processes = 3)
pool.map(self.getPotrait, tasks)


tasks is a list of URLs of images, I want to download these images and
save them locally.

In another python file, I wrote this:

from renren import Renren

# get username and password for RenRen.com
username = raw_input('Email: ')
password = raw_input('Password: ')
print


a = Renren(username, password)
a.login()
a.getLargePotraits()



However, when I try to run this file, I received an error message:

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in
__bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.6/multiprocessing/pool.py", line 225, in
_handle_tasks
put(task)
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup
__builtin__.instancemethod failed
 
J

Jean-Michel Pichavant

Vincent said:
Hello, everyone, recently I am trying to learn python's
multiprocessing, but
I got confused as a beginner.
[SNIP]
httplib.InvalidURL: nonnumeric port: ''

Regards
Vincent
It's a mistake many beginners do, I don't understand why, but it's a
very common thing. RTFM should stand for "Read The Formidable (error)
Message" as well.
Your url is invalid, check your url definition.

JM
 
V

Vincent Ren

It's a mistake many beginners do, I don't understand why, but it's a
very common thing. RTFM should stand for "Read The Formidable (error)
Message" as  well.
Your url is invalid, check your url definition.

JM

I've fixed that problem. But I got a new one

PicklingError: Can't pickle <type 'instancemethod'>: attribute
lookup
__builtin__.instancemethod failed

The details were listed in my last post in this thread.
Thanks for your reply :)
 
R

Robert Kern

I've fixed that problem. But I got a new one

PicklingError: Can't pickle<type 'instancemethod'>: attribute
lookup
__builtin__.instancemethod failed

The details were listed in my last post in this thread.
Thanks for your reply :)

I'm afraid his response applies to this as well: you can't pass methods to
pool.map() or any other such communication channel to your subprocesses.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
V

Vincent Ren

Got it, thanks.
But what should I do if I want to improve the efficiency of my
program?
 
B

Benjamin Kaplan

Got it, thanks.
But what should I do if I want to improve the efficiency of my
program?

Is there any particular reason you're using processes and not threads?
Functions that wait for stuff to happen in C land, such as I/O calls,
release the GIL so threads can be run in parallel. It's only stuff
that happens in Python land (i.e. manipulating Python objects) that
can't be run concurrently.
 
V

Vincent Ren

I'm just learning python. After changed it to a non-OOP program, it
works.
Thank you all for suggestions :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top