G
Gib Bogle
Hi,
I'm learning Python, jumping in the deep end with a threading application. I
came across an authoritative-looking site that recommends using queues for
threading in Python.
http://www.ibm.com/developerworks/aix/library/au-threadingpython/index.html
The author provides example code that fetches data from several web sites, using
threads. I have modified his code slightly, just adding a couple of print
statements and passing an ID number to the thread.
#!/usr/bin/env python
import Queue
import threading
import urllib2
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
#"""Threaded Url Grab"""
def __init__(self, queue,i):
threading.Thread.__init__(self)
self.queue = queue
self.num = i
print "Thread: ",self.num
def run(self):
while True:
#grabs host from queue
host = self.queue.get()
print "num, host: ",self.num,host
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print url.read(1024)
#signals to queue job is done
self.queue.task_done()
start = time.time()
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue,i)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)
Executed on Windows with Python 2.5 this program doesn't do what you want, which
is to fetch data from each site once. Instead, it processes the first host in
the list 5 times, the next 4 times, etc, and the last just once. I don't know
whether it is a case of the code simply being wrong (which seems unlikely), or
the behaviour on my system being different from AIX (also seems unlikely).
Naively, I would have expected the queue to enforce processing of its members
once only. Is there a simple change that will make this code execute as
required? Or is this author out to lunch?
Cheers
Gib
I'm learning Python, jumping in the deep end with a threading application. I
came across an authoritative-looking site that recommends using queues for
threading in Python.
http://www.ibm.com/developerworks/aix/library/au-threadingpython/index.html
The author provides example code that fetches data from several web sites, using
threads. I have modified his code slightly, just adding a couple of print
statements and passing an ID number to the thread.
#!/usr/bin/env python
import Queue
import threading
import urllib2
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
#"""Threaded Url Grab"""
def __init__(self, queue,i):
threading.Thread.__init__(self)
self.queue = queue
self.num = i
print "Thread: ",self.num
def run(self):
while True:
#grabs host from queue
host = self.queue.get()
print "num, host: ",self.num,host
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print url.read(1024)
#signals to queue job is done
self.queue.task_done()
start = time.time()
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue,i)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)
Executed on Windows with Python 2.5 this program doesn't do what you want, which
is to fetch data from each site once. Instead, it processes the first host in
the list 5 times, the next 4 times, etc, and the last just once. I don't know
whether it is a case of the code simply being wrong (which seems unlikely), or
the behaviour on my system being different from AIX (also seems unlikely).
Naively, I would have expected the queue to enforce processing of its members
once only. Is there a simple change that will make this code execute as
required? Or is this author out to lunch?
Cheers
Gib