using queue

Tim Arnold · Sep 1, 2009

Hi, I've been using the threading module with each thread as a key in a
dictionary. I've been reading about Queues though and it looks like that's
what I should be using instead. Just checking here to see if I'm on the
right path.
The code I have currently compiles a bunch of chapters in a book (no more
than 80 jobs at a time) and just waits for them all to finish:

max_running = 80
threads = dict()
current = 1
chaps = [x.config['name'] for x in self.document.chapter_objects]
while current <= len(chaps):
running = len([x for x in threads.keys() if
threads[x].isAlive()])
if running == max_running:
time.sleep(10)
else:
chap = chaps[current - 1]
c = self.compiler(self.document.config['name'], chap)
threads[chap] = threading.Thread(target=c.compile)
threads[chap].start()
current += 1

for thread in threads.keys():
threads[thread].join(3600.0)
---------------------------------
but I think Queue could do a lot of the above work for me. Here is
pseudocode for what I'm thinking:

q = Queue(maxsize=80)
for chap in [x.config['name'] for x in self.document.chapter_objects]:
c = self.compiler(self.document.config['name'], chap)
t = threading.Thread(target=c.compile)
t.start()
q.put(t)
q.join()

is that the right idea?
thanks,
--Tim Arnold

MRAB · Sep 2, 2009

Tim said:
Hi, I've been using the threading module with each thread as a key in a
dictionary. I've been reading about Queues though and it looks like that's
what I should be using instead. Just checking here to see if I'm on the
right path.
The code I have currently compiles a bunch of chapters in a book (no more
than 80 jobs at a time) and just waits for them all to finish:

max_running = 80
threads = dict()
current = 1
chaps = [x.config['name'] for x in self.document.chapter_objects]
while current <= len(chaps):
running = len([x for x in threads.keys() if
threads[x].isAlive()])
if running == max_running:
time.sleep(10)
else:
chap = chaps[current - 1]
c = self.compiler(self.document.config['name'], chap)
threads[chap] = threading.Thread(target=c.compile)
threads[chap].start()
current += 1

for thread in threads.keys():
threads[thread].join(3600.0)
---------------------------------
but I think Queue could do a lot of the above work for me. Here is
pseudocode for what I'm thinking:

q = Queue(maxsize=80)
for chap in [x.config['name'] for x in self.document.chapter_objects]:
c = self.compiler(self.document.config['name'], chap)
t = threading.Thread(target=c.compile)
t.start()
q.put(t)
q.join()

is that the right idea?

I don't need that many threads; just create a few to do the work and let
each do multiple chapters, something like this:

class CompilerTask(object):
def __init__(self, chapter_queue):
self.chapter_queue = chapter_queue
def __call__(self):
while True:
chapter = self.chapter_queue.get()
if chapter is None:
# A None indicates that there are no more chapters.
break
chapter.compile()
# Put back the None so that the next thread will also see it.
self.chapter_queue.put(None)

MAX_RUNNING = 10

# Put the chapters into a queue, ending with a None.
chapter_queue = Queue()
for c in self.document.chapter_objects:
chapter_queue.put(self.compiler(self.document.config['name'],
c.config['name']))
chapter_queue.put(None)

# Start the threads to do the work.
for i in range(MAX_RUNNING):
t = threading.Thread(target=CompilerTask(chapter_queue))
t.start()
thread_list.append(t)

# The threads will finish when they see the None in the queue.
for t in thread_list:
t.join()

Tim Arnold · Sep 2, 2009

MRAB said:
Tim said:

Hi, I've been using the threading module with each thread as a key in a
dictionary. I've been reading about Queues though and it looks like
that's what I should be using instead. Just checking here to see if I'm
on the right path.
The code I have currently compiles a bunch of chapters in a book (no more
than 80 jobs at a time) and just waits for them all to finish:

max_running = 80
threads = dict()
current = 1
chaps = [x.config['name'] for x in self.document.chapter_objects]
while current <= len(chaps):
running = len([x for x in threads.keys() if
threads[x].isAlive()])
if running == max_running:
time.sleep(10)
else:
chap = chaps[current - 1]
c = self.compiler(self.document.config['name'], chap)
threads[chap] = threading.Thread(target=c.compile)
threads[chap].start()
current += 1

for thread in threads.keys():
threads[thread].join(3600.0)
---------------------------------
but I think Queue could do a lot of the above work for me. Here is
pseudocode for what I'm thinking:

q = Queue(maxsize=80)
for chap in [x.config['name'] for x in self.document.chapter_objects]:
c = self.compiler(self.document.config['name'], chap)
t = threading.Thread(target=c.compile)
t.start()
q.put(t)
q.join()

is that the right idea?

Click to expand...

I don't need that many threads; just create a few to do the work and let
each do multiple chapters, something like this:

class CompilerTask(object):
def __init__(self, chapter_queue):
self.chapter_queue = chapter_queue
def __call__(self):
while True:
chapter = self.chapter_queue.get()
if chapter is None:
# A None indicates that there are no more chapters.
break
chapter.compile()
# Put back the None so that the next thread will also see it.
self.chapter_queue.put(None)

MAX_RUNNING = 10

# Put the chapters into a queue, ending with a None.
chapter_queue = Queue()
for c in self.document.chapter_objects:
chapter_queue.put(self.compiler(self.document.config['name'],
c.config['name']))
chapter_queue.put(None)

# Start the threads to do the work.
for i in range(MAX_RUNNING):
t = threading.Thread(target=CompilerTask(chapter_queue))
t.start()
thread_list.append(t)

# The threads will finish when they see the None in the queue.
for t in thread_list:
t.join()

hi, thanks for that code. It took me a bit to understand what's going on,
but I think I see it now.
Still, I have two questions about it:
(1) what's wrong with having each chapter in a separate thread? Too much
going on for a single processor? I guess that probably doesn't matter at
all, but some chapters run in minutes and some in seconds.
(2) The None at the end of the queue...I thought t.join() would just work.
Why do we need None?

thanks for thinking about this,
--Tim Arnold

Jan Kaliszewski · Sep 3, 2009

06:49:13 Scott David Daniels said:
Tim Arnold wrote:

Many more threads than cores and you spend a lot of your CPU switching
tasks.

In fact, python threads work relatively the best with a powerful single
core; with more cores it becomes being suprisingly inefficient.

The culprit is Pythn GIL and the way it [mis]cooperates with OS
scheduling.

See: http://www.dabeaz.com/python/GIL.pdf

Yo
*j

Tim Arnold · Sep 3, 2009

Jan Kaliszewski said:
06:49:13 Scott David Daniels said:

Tim Arnold wrote:

Click to expand...

Many more threads than cores and you spend a lot of your CPU switching
tasks.

Click to expand...

In fact, python threads work relatively the best with a powerful single
core; with more cores it becomes being suprisingly inefficient.

The culprit is Pythn GIL and the way it [mis]cooperates with OS
scheduling.

See: http://www.dabeaz.com/python/GIL.pdf

Yo
*j

I've read about the GIL (I think I understand the problem there)--thanks. In
my example, the actual job called for each chapter ended up being a call to
subprocess (that called a different python program). I figured that would
save me from the GIL problems since each subprocess would have its own GIL.

In the words of Tom Waits, " the world just keeps getting bigger when you
get out on your own". So I'm re-reading now, and maybe what I've been doing
would have been better served by the multiprocessing package.

I'm running python2.6 on FreeBSD with a dual quadcore cpu. Now my questions
are:
(1) what the heck should I be doing to get concurrent builds of the
chapters, wait for them all to finish, and pick up processing the main job
again? The separate chapter builds have no need for communication--they're
autonomous.
(2) using threads with the target fn calling subprocess, a bad idea?
(3) should I study up on multiprocessing package and/or pprocessing?

thanks for your inputs,
--Tim

Dennis Lee Bieber · Sep 3, 2009

In fact, python threads work relatively the best with a powerful single
core; with more cores it becomes being suprisingly inefficient.

The culprit is Pythn GIL and the way it [mis]cooperates with OS
scheduling.

On a heavily I/O bound system, it may not be that much of an impact
between multiple or single processing core... Either way, only one
thread will be active processing... If that thread now blocks on an I/O
request (freeing the GIL), the question then becomes how much overhead
is there between configuring the single core for the next unblocked
thread vs configuring a different core for it.

However, WRT the thought of having LOTS of threads... that could
mean (I've not looked at Python implementation) having long lists of
blocked-waiting, blocked-runnable, etc... threads which need to be
traversed when ever a thread swap takes place. (Blocked-runnable may be
treated as a FIFO, so not much time lost there, but the search of
blocked-waiting to determine which threads can be moved to runnable
could consume some time)

MRAB · Sep 4, 2009

Tim said:
Jan Kaliszewski said:

06:49:13 Scott David Daniels said:

Tim Arnold wrote:
(1) what's wrong with having each chapter in a separate thread? Too
much going on for a single processor?
Many more threads than cores and you spend a lot of your CPU switching
tasks.

Click to expand...

In fact, python threads work relatively the best with a powerful single
core; with more cores it becomes being suprisingly inefficient.

The culprit is Pythn GIL and the way it [mis]cooperates with OS
scheduling.

See: http://www.dabeaz.com/python/GIL.pdf

Yo
*j

Click to expand...

I've read about the GIL (I think I understand the problem there)--thanks. In
my example, the actual job called for each chapter ended up being a call to
subprocess (that called a different python program). I figured that would
save me from the GIL problems since each subprocess would have its own GIL.

In the words of Tom Waits, " the world just keeps getting bigger when you
get out on your own". So I'm re-reading now, and maybe what I've been doing
would have been better served by the multiprocessing package.

I'm running python2.6 on FreeBSD with a dual quadcore cpu. Now my questions
are:
(1) what the heck should I be doing to get concurrent builds of the
chapters, wait for them all to finish, and pick up processing the main job
again? The separate chapter builds have no need for communication--they're
autonomous.
(2) using threads with the target fn calling subprocess, a bad idea?
(3) should I study up on multiprocessing package and/or pprocessing?

thanks for your inputs,

You could adapt the threading solution I gave to multiprocessing; just
use the multiprocessing queue class instead of the threading queue
class, etc.

how does a queue stop the thread?	5	Apr 21, 2010
use of Queue	13	Aug 27, 2008
Threading / Queue management	3	Feb 2, 2009
Newbie queue question	9	Jun 17, 2009
Queue cleanup	77	Aug 11, 2010
Blocking queue race condition?	10	Jan 5, 2012
Problem Regarding Queue	0	Feb 9, 2010
multicpu bzip2 using os.system or queue using python script	2	Jul 27, 2010

using queue

Tim Arnold

MRAB

Tim Arnold

Jan Kaliszewski

Tim Arnold

Dennis Lee Bieber

MRAB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads