using queue

T

Tim Arnold

Hi, I've been using the threading module with each thread as a key in a
dictionary. I've been reading about Queues though and it looks like that's
what I should be using instead. Just checking here to see if I'm on the
right path.
The code I have currently compiles a bunch of chapters in a book (no more
than 80 jobs at a time) and just waits for them all to finish:

max_running = 80
threads = dict()
current = 1
chaps = [x.config['name'] for x in self.document.chapter_objects]
while current <= len(chaps):
running = len([x for x in threads.keys() if
threads[x].isAlive()])
if running == max_running:
time.sleep(10)
else:
chap = chaps[current - 1]
c = self.compiler(self.document.config['name'], chap)
threads[chap] = threading.Thread(target=c.compile)
threads[chap].start()
current += 1

for thread in threads.keys():
threads[thread].join(3600.0)
---------------------------------
but I think Queue could do a lot of the above work for me. Here is
pseudocode for what I'm thinking:

q = Queue(maxsize=80)
for chap in [x.config['name'] for x in self.document.chapter_objects]:
c = self.compiler(self.document.config['name'], chap)
t = threading.Thread(target=c.compile)
t.start()
q.put(t)
q.join()

is that the right idea?
thanks,
--Tim Arnold
 
M

MRAB

Tim said:
Hi, I've been using the threading module with each thread as a key in a
dictionary. I've been reading about Queues though and it looks like that's
what I should be using instead. Just checking here to see if I'm on the
right path.
The code I have currently compiles a bunch of chapters in a book (no more
than 80 jobs at a time) and just waits for them all to finish:

max_running = 80
threads = dict()
current = 1
chaps = [x.config['name'] for x in self.document.chapter_objects]
while current <= len(chaps):
running = len([x for x in threads.keys() if
threads[x].isAlive()])
if running == max_running:
time.sleep(10)
else:
chap = chaps[current - 1]
c = self.compiler(self.document.config['name'], chap)
threads[chap] = threading.Thread(target=c.compile)
threads[chap].start()
current += 1

for thread in threads.keys():
threads[thread].join(3600.0)
---------------------------------
but I think Queue could do a lot of the above work for me. Here is
pseudocode for what I'm thinking:

q = Queue(maxsize=80)
for chap in [x.config['name'] for x in self.document.chapter_objects]:
c = self.compiler(self.document.config['name'], chap)
t = threading.Thread(target=c.compile)
t.start()
q.put(t)
q.join()

is that the right idea?
I don't need that many threads; just create a few to do the work and let
each do multiple chapters, something like this:

class CompilerTask(object):
def __init__(self, chapter_queue):
self.chapter_queue = chapter_queue
def __call__(self):
while True:
chapter = self.chapter_queue.get()
if chapter is None:
# A None indicates that there are no more chapters.
break
chapter.compile()
# Put back the None so that the next thread will also see it.
self.chapter_queue.put(None)

MAX_RUNNING = 10

# Put the chapters into a queue, ending with a None.
chapter_queue = Queue()
for c in self.document.chapter_objects:
chapter_queue.put(self.compiler(self.document.config['name'],
c.config['name']))
chapter_queue.put(None)

# Start the threads to do the work.
for i in range(MAX_RUNNING):
t = threading.Thread(target=CompilerTask(chapter_queue))
t.start()
thread_list.append(t)

# The threads will finish when they see the None in the queue.
for t in thread_list:
t.join()
 
T

Tim Arnold

MRAB said:
Tim said:
Hi, I've been using the threading module with each thread as a key in a
dictionary. I've been reading about Queues though and it looks like
that's what I should be using instead. Just checking here to see if I'm
on the right path.
The code I have currently compiles a bunch of chapters in a book (no more
than 80 jobs at a time) and just waits for them all to finish:

max_running = 80
threads = dict()
current = 1
chaps = [x.config['name'] for x in self.document.chapter_objects]
while current <= len(chaps):
running = len([x for x in threads.keys() if
threads[x].isAlive()])
if running == max_running:
time.sleep(10)
else:
chap = chaps[current - 1]
c = self.compiler(self.document.config['name'], chap)
threads[chap] = threading.Thread(target=c.compile)
threads[chap].start()
current += 1

for thread in threads.keys():
threads[thread].join(3600.0)
---------------------------------
but I think Queue could do a lot of the above work for me. Here is
pseudocode for what I'm thinking:

q = Queue(maxsize=80)
for chap in [x.config['name'] for x in self.document.chapter_objects]:
c = self.compiler(self.document.config['name'], chap)
t = threading.Thread(target=c.compile)
t.start()
q.put(t)
q.join()

is that the right idea?
I don't need that many threads; just create a few to do the work and let
each do multiple chapters, something like this:

class CompilerTask(object):
def __init__(self, chapter_queue):
self.chapter_queue = chapter_queue
def __call__(self):
while True:
chapter = self.chapter_queue.get()
if chapter is None:
# A None indicates that there are no more chapters.
break
chapter.compile()
# Put back the None so that the next thread will also see it.
self.chapter_queue.put(None)

MAX_RUNNING = 10

# Put the chapters into a queue, ending with a None.
chapter_queue = Queue()
for c in self.document.chapter_objects:
chapter_queue.put(self.compiler(self.document.config['name'],
c.config['name']))
chapter_queue.put(None)

# Start the threads to do the work.
for i in range(MAX_RUNNING):
t = threading.Thread(target=CompilerTask(chapter_queue))
t.start()
thread_list.append(t)

# The threads will finish when they see the None in the queue.
for t in thread_list:
t.join()

hi, thanks for that code. It took me a bit to understand what's going on,
but I think I see it now.
Still, I have two questions about it:
(1) what's wrong with having each chapter in a separate thread? Too much
going on for a single processor? I guess that probably doesn't matter at
all, but some chapters run in minutes and some in seconds.
(2) The None at the end of the queue...I thought t.join() would just work.
Why do we need None?

thanks for thinking about this,
--Tim Arnold
 
T

Tim Arnold

Jan Kaliszewski said:
06:49:13 Scott David Daniels said:
Tim Arnold wrote:
Many more threads than cores and you spend a lot of your CPU switching
tasks.

In fact, python threads work relatively the best with a powerful single
core; with more cores it becomes being suprisingly inefficient.

The culprit is Pythn GIL and the way it [mis]cooperates with OS
scheduling.

See: http://www.dabeaz.com/python/GIL.pdf

Yo
*j

I've read about the GIL (I think I understand the problem there)--thanks. In
my example, the actual job called for each chapter ended up being a call to
subprocess (that called a different python program). I figured that would
save me from the GIL problems since each subprocess would have its own GIL.

In the words of Tom Waits, " the world just keeps getting bigger when you
get out on your own". So I'm re-reading now, and maybe what I've been doing
would have been better served by the multiprocessing package.

I'm running python2.6 on FreeBSD with a dual quadcore cpu. Now my questions
are:
(1) what the heck should I be doing to get concurrent builds of the
chapters, wait for them all to finish, and pick up processing the main job
again? The separate chapter builds have no need for communication--they're
autonomous.
(2) using threads with the target fn calling subprocess, a bad idea?
(3) should I study up on multiprocessing package and/or pprocessing?

thanks for your inputs,
--Tim
 
D

Dennis Lee Bieber

In fact, python threads work relatively the best with a powerful single
core; with more cores it becomes being suprisingly inefficient.

The culprit is Pythn GIL and the way it [mis]cooperates with OS
scheduling.
On a heavily I/O bound system, it may not be that much of an impact
between multiple or single processing core... Either way, only one
thread will be active processing... If that thread now blocks on an I/O
request (freeing the GIL), the question then becomes how much overhead
is there between configuring the single core for the next unblocked
thread vs configuring a different core for it.


However, WRT the thought of having LOTS of threads... that could
mean (I've not looked at Python implementation) having long lists of
blocked-waiting, blocked-runnable, etc... threads which need to be
traversed when ever a thread swap takes place. (Blocked-runnable may be
treated as a FIFO, so not much time lost there, but the search of
blocked-waiting to determine which threads can be moved to runnable
could consume some time)
 
M

MRAB

Tim said:
Jan Kaliszewski said:
06:49:13 Scott David Daniels said:
Tim Arnold wrote:
(1) what's wrong with having each chapter in a separate thread? Too
much going on for a single processor?
Many more threads than cores and you spend a lot of your CPU switching
tasks.
In fact, python threads work relatively the best with a powerful single
core; with more cores it becomes being suprisingly inefficient.

The culprit is Pythn GIL and the way it [mis]cooperates with OS
scheduling.

See: http://www.dabeaz.com/python/GIL.pdf

Yo
*j

I've read about the GIL (I think I understand the problem there)--thanks. In
my example, the actual job called for each chapter ended up being a call to
subprocess (that called a different python program). I figured that would
save me from the GIL problems since each subprocess would have its own GIL.

In the words of Tom Waits, " the world just keeps getting bigger when you
get out on your own". So I'm re-reading now, and maybe what I've been doing
would have been better served by the multiprocessing package.

I'm running python2.6 on FreeBSD with a dual quadcore cpu. Now my questions
are:
(1) what the heck should I be doing to get concurrent builds of the
chapters, wait for them all to finish, and pick up processing the main job
again? The separate chapter builds have no need for communication--they're
autonomous.
(2) using threads with the target fn calling subprocess, a bad idea?
(3) should I study up on multiprocessing package and/or pprocessing?

thanks for your inputs,
You could adapt the threading solution I gave to multiprocessing; just
use the multiprocessing queue class instead of the threading queue
class, etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top