Threading with queues

G

Gib Bogle

Hi,
I'm learning Python, jumping in the deep end with a threading application. I
came across an authoritative-looking site that recommends using queues for
threading in Python.
http://www.ibm.com/developerworks/aix/library/au-threadingpython/index.html
The author provides example code that fetches data from several web sites, using
threads. I have modified his code slightly, just adding a couple of print
statements and passing an ID number to the thread.

#!/usr/bin/env python
import Queue
import threading
import urllib2
import time

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

queue = Queue.Queue()

class ThreadUrl(threading.Thread):
#"""Threaded Url Grab"""
def __init__(self, queue,i):
threading.Thread.__init__(self)
self.queue = queue
self.num = i
print "Thread: ",self.num

def run(self):
while True:
#grabs host from queue
host = self.queue.get()
print "num, host: ",self.num,host
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print url.read(1024)

#signals to queue job is done
self.queue.task_done()

start = time.time()
def main():

#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue,i)
t.setDaemon(True)
t.start()

#populate queue with data
for host in hosts:
queue.put(host)

#wait on the queue until everything has been processed
queue.join()

main()
print "Elapsed Time: %s" % (time.time() - start)

Executed on Windows with Python 2.5 this program doesn't do what you want, which
is to fetch data from each site once. Instead, it processes the first host in
the list 5 times, the next 4 times, etc, and the last just once. I don't know
whether it is a case of the code simply being wrong (which seems unlikely), or
the behaviour on my system being different from AIX (also seems unlikely).

Naively, I would have expected the queue to enforce processing of its members
once only. Is there a simple change that will make this code execute as
required? Or is this author out to lunch?

Cheers
Gib
 
S

Stephen Hansen

 #spawn a pool of threads, and pass them queue instance
 for i in range(5):
   t = ThreadUrl(queue,i)
   t.setDaemon(True)
   t.start()

 #populate queue with data
   for host in hosts:
     queue.put(host)

This is indented over one indentation level too much. You want it to
be at the same level as the for above. Here, its at the same level
with "t" -- meaning this entire loop gets repeated five times.

I sorta really recommend a tab width of 4 spaces, not 2 :) At 2, its
_really_ hard (especially if you're newer to Python) to see these
kinds of issues and since indentation is program logic and structure
in Python, that's bad... especially since your comment is indented to
the right level, but the code isn't :)

--S
 
S

Stephen Hansen

This is indented over one indentation level too much. You want it to
be at the same level as the for above. Here, its at the same level
with "t" -- meaning this entire loop gets repeated five times.

Err, "this" in this context meant the second for loop, if that wasn't
obvious. Sorry.

--S
 
G

Gib Bogle

Stephen said:
This is indented over one indentation level too much. You want it to
be at the same level as the for above. Here, its at the same level
with "t" -- meaning this entire loop gets repeated five times.

I sorta really recommend a tab width of 4 spaces, not 2 :) At 2, its
_really_ hard (especially if you're newer to Python) to see these
kinds of issues and since indentation is program logic and structure
in Python, that's bad... especially since your comment is indented to
the right level, but the code isn't :)

--S

Aarrh! Caught by the obvious Python trap that everyone knows about! In my
defense, it's wrong on the web site. I agree, 4 spaces is the best plan.
Thanks very much!
 
G

Gib Bogle

Stephen said:
This is indented over one indentation level too much. You want it to
be at the same level as the for above. Here, its at the same level
with "t" -- meaning this entire loop gets repeated five times.

I sorta really recommend a tab width of 4 spaces, not 2 :) At 2, its
_really_ hard (especially if you're newer to Python) to see these
kinds of issues and since indentation is program logic and structure
in Python, that's bad... especially since your comment is indented to
the right level, but the code isn't :)

--S

It turns out that this code isn't a great demo of the advantages of threading,
on my system anyway. The time taken to execute doesn't vary much when the
number of threads is set anywhere from 1 to 6.
 
L

Lie Ryan

It turns out that this code isn't a great demo of the advantages of
threading, on my system anyway. The time taken to execute doesn't vary
much when the number of threads is set anywhere from 1 to 6.

it does in mine:

Elapsed Time: 7.47399997711 (with one thread)
Elapsed Time: 1.90199995041 (with five threads)

what sort of weird machine are you in?
 
G

Gib Bogle

Lie said:
it does in mine:

Elapsed Time: 7.47399997711 (with one thread)
Elapsed Time: 1.90199995041 (with five threads)

what sort of weird machine are you in?

Hmmm, interesting. I am running Windows XP on Intel quad core hardware, Q6600.
And you?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top