Queue peek?

Veloz · Mar 2, 2010

Hi all
I'm looking for a queue that I can use with multiprocessing, which has
a peek method.

I've seen some discussion about queue.peek but don't see anything in
the docs about it.

Does python have a queue class with peek semantics?

Michael

Raymond Hettinger · Mar 2, 2010

Hi all
I'm looking for a queue that I can use with multiprocessing, which has
a peek method.

I've seen some discussion about queue.peek but don't see anything in
the docs about it.

Does python have a queue class with peek semantics?

Am curious about your use case? Why peek at something
that could be gone by the time you want to use it.

val = q.peek()
if something_i_want(val):
v2 = q.get() # this could be different than val

Wouldn't it be better to just get() the value and return if you don't
need it?

val = q.peek()
if not something_i_want(val):
q.put(val)

Raymond

Veloz · Mar 2, 2010

Am curious about your use case? Why peek at something
that could be gone by the time you want to use it.

val = q.peek()
if something_i_want(val):
v2 = q.get() # this could be different than val

Wouldn't it be better to just get() the value and return if you don't
need it?

val = q.peek()
if not something_i_want(val):
q.put(val)

Raymond

Yeah, I hear you. Perhaps queue is not the best solution. My highest
level use case is this: The user visits a web page (my app is a
Pylons app) and requests a "report" be created. The report takes too
long to create and display on the spot, so the user expects to visit
some url "later" and see if the specific report has completed, and if
so, have it returned to them.

At a lower level, I'm thinking of using some process workers to create
these reports in the background; there'd be a request queue (into
which requests for reports would go, each with an ID) and a completion
queue, into which the workers would write an entry when a report was
created, along with an ID matching the original request.

The "peek" parts comes in when the user comes back later to see if
their report has done. That is, in my page controller logic, I'd like
to look through the complete queue and see if the specific report has
been finished (I could tell by matching up the ID of the original
request to the ID in the completed queue). If there was an item in the
queue matching the ID, it would be removed.

It's since occurred to me that perhaps a queue is not the best way to
handle the completions. (We're ignoring the file system as a solution
for the time being, and focusing on in-memory structures). I'm
wondering now if a simple array of completed items wouldn't be better.
Of course, all the access to the array would have to be thread/process-
proof. As you pointed out, for example, multi-part operations such as
"is such-and-such an ID in the list? If so, remove it and return in"
would have to be treated atomically to avoid concurrency issues.

Any thoughts on this design approach are welcomed

Michael

MRAB · Mar 2, 2010

Veloz said:
Yeah, I hear you. Perhaps queue is not the best solution. My highest
level use case is this: The user visits a web page (my app is a
Pylons app) and requests a "report" be created. The report takes too
long to create and display on the spot, so the user expects to visit
some url "later" and see if the specific report has completed, and if
so, have it returned to them.

At a lower level, I'm thinking of using some process workers to create
these reports in the background; there'd be a request queue (into
which requests for reports would go, each with an ID) and a completion
queue, into which the workers would write an entry when a report was
created, along with an ID matching the original request.

The "peek" parts comes in when the user comes back later to see if
their report has done. That is, in my page controller logic, I'd like
to look through the complete queue and see if the specific report has
been finished (I could tell by matching up the ID of the original
request to the ID in the completed queue). If there was an item in the
queue matching the ID, it would be removed.

It's since occurred to me that perhaps a queue is not the best way to
handle the completions. (We're ignoring the file system as a solution
for the time being, and focusing on in-memory structures). I'm
wondering now if a simple array of completed items wouldn't be better.
Of course, all the access to the array would have to be thread/process-
proof. As you pointed out, for example, multi-part operations such as
"is such-and-such an ID in the list? If so, remove it and return in"
would have to be treated atomically to avoid concurrency issues.

Any thoughts on this design approach are welcomed

A set of completed reports, or a dict with the ID as the key? The
advantage of a dict is that the value could contain several bits of
information, such as when it was completed, the status (OK or failed),
etc. You might want to wrap it in a class with locks (mutexes) to ensure
it's threadsafe.

Martin P. Hellwig · Mar 2, 2010

On 03/02/10 19:44, MRAB wrote:

information, such as when it was completed, the status (OK or failed),
etc. You might want to wrap it in a class with locks (mutexes) to ensure
it's threadsafe.

What actually happens if multiple threads at the same time, write to a
shared dictionary (Not using the same key)?

I would think that if the hashing part of the dictionary has some sort
of serialization (please forgive me if I misuse a term) it should 'just
work'(tm)?

mk · Mar 2, 2010

Daniel said:
On Tue, Mar 2, 2010 at 1:58 PM, Martin P. Hellwig

What actually happens if multiple threads at the same time, write to
a shared dictionary (Not using the same key)?

All of Python's built-in types are thread safe. Both updates will happen.

No need to use synchro primitives like locks?

I know that it may work, but that strikes me as somehow wrong... I'm
used to using things like Lock().acquire() and Lock().release() when
accessing shared data structures, whatever they are.

Although trying to do the "right thing" may indeed get one in trouble in
case of deadlock caused by a bug in one's own program.

Regards,
mk

John Krukoff · Mar 2, 2010

On Tue, 2010-03-02 at 22:54 +0100, mk wrote:

No need to use synchro primitives like locks?

I know that it may work, but that strikes me as somehow wrong... I'm
used to using things like Lock().acquire() and Lock().release() when
accessing shared data structures, whatever they are.

<snip>

This is one of those places where the GIL is a good thing, and makes
your life simpler. You could consider it that the interpreter does the
locking for you for such primitive operations, if you like.

MRAB · Mar 3, 2010

John said:
On Tue, 2010-03-02 at 22:54 +0100, mk wrote:

<snip>

This is one of those places where the GIL is a good thing, and makes
your life simpler. You could consider it that the interpreter does the
locking for you for such primitive operations, if you like.

I suppose it depends on the complexity of the data structure. A dict's
methods are threadsafe, for example, but if you have a data structure
where access leads to multiple method calls then collectively they need
a lock.

Gregory Ewing · Mar 3, 2010

MRAB said:
I suppose it depends on the complexity of the data structure. A dict's
methods are threadsafe, for example, but if you have a data structure
where access leads to multiple method calls then collectively they need
a lock.

It also depends on the nature of the objects being used
as dict keys. Dict operations invoke the hashing and
comparison methods of the keys, which in general can
execute arbitrary Python code.

If the keys are elementary built-in types such as
strings or ints, then dict operations will *probably*
be atomic. But I think I'd use a lock anyway, to be
on the safe side.

Veloz · Mar 3, 2010

It also depends on the nature of the objects being used
as dict keys. Dict operations invoke the hashing and
comparison methods of the keys, which in general can
execute arbitrary Python code.

If the keys are elementary built-in types such as
strings or ints, then dict operations will *probably*
be atomic. But I think I'd use a lock anyway, to be
on the safe side.

Unless I missed where you guys were going, I think we got off the main
point. The main question at hand was this: what's the best way (heck,
any way) to implement a sort of "peek" whereby a number of processes
can write results to some common "object" and some other process can
"peek" into this object, looking for specific items they're interested
in?

I've tried doing this with a queue, as follows: children all write
results to queue, each result has an identifier. Another interested
party, which wants to know if identifier XXX has been placed in the
queue, removes all the items, one by one from the queue, "keeps" the
one matching the identifier (if found) and puts the rest of the items
back on the queue, so other interested parties can also look through
it.

This is not a good solution, but it illustrates what I'm trying to
achieve..

I'm looking at multiprocessing.Manager, ctypes, etc, but nothing's
really jumped out.

I also tried creating my own list class which uses locks to provide a
"peek and remove" method, but I don't have a good way to share an
instance of this object across processes.

Any thoughts would be welcomed!
Michael

Steve Holden · Mar 3, 2010

Veloz said:
Unless I missed where you guys were going, I think we got off the main
point. The main question at hand was this: what's the best way (heck,
any way) to implement a sort of "peek" whereby a number of processes
can write results to some common "object" and some other process can
"peek" into this object, looking for specific items they're interested
in?

I've tried doing this with a queue, as follows: children all write
results to queue, each result has an identifier. Another interested
party, which wants to know if identifier XXX has been placed in the
queue, removes all the items, one by one from the queue, "keeps" the
one matching the identifier (if found) and puts the rest of the items
back on the queue, so other interested parties can also look through
it.

This is not a good solution, but it illustrates what I'm trying to
achieve..

I'm looking at multiprocessing.Manager, ctypes, etc, but nothing's
really jumped out.

I also tried creating my own list class which uses locks to provide a
"peek and remove" method, but I don't have a good way to share an
instance of this object across processes.

Any thoughts would be welcomed!
Michael

Sounds to me like you are using one queue when you should be using
several. If a single process wants to distribute information to multiple
clients, have a queue for each client. This makes it easy for each
client to find out if it has work (if it does its queue is non-empty),

Is this helpful, or a red herring?

regards
Steve

mk · Mar 3, 2010

Veloz said:
Unless I missed where you guys were going, I think we got off the main
point. The main question at hand was this: what's the best way (heck,
any way) to implement a sort of "peek" whereby a number of processes
can write results to some common "object" and some other process can
"peek" into this object, looking for specific items they're interested
in?

I have used threads so far and not multiprocessing (much), so I'll
describe it using threads:

If I were you, I'd do it the simplest possible way, with dictionary and
lock, in the following way:

1. Have shared "queue" of tasks, say, a dictionary.

2. Each worker thread and central "queue manager" access the queue in
the same manner:

....do your stuff, prepare yourself

acquire a lock

update or read value from queue

....don't do much here to be able release lock quickly

release the lock

....process value

If you had your dictionary simple, it could work like Daniel Stutzbach
wrote, that updates are thread safe and perhaps you could even dispense
with locks and just read the dictionary; and if you wanted to use locks,
for "peek"ing you could write a function that could e.g. acquire a lock,
do a copy of the queue, release the lock, and return the copy to the
object wanting to examine the queue.

I have used this approach (although with a list, not a dictionary -- I
haven't had the need to do extensive searching by keys) and so far it
has worked for me.

Regards,
mk

Floris Bruynooghe · Mar 3, 2010

Am curious about your use case? Why peek at something
that could be gone by the time you want to use it.

val = q.peek()
if something_i_want(val):
v2 = q.get() # this could be different than val

Wouldn't it be better to just get() the value and return if you don't
need it?

val = q.peek()
if not something_i_want(val):
q.put(val)

What I have found myself wanting when thinking of this pattern is a
"q.put_at_front_of_queue(val)" method. I've never actually used this
because of not having such a method. Not that it's that much of an
issue as I've never been completely stuck and usually found a way to
solve whatever I was trying to do without peeking, which could be
argued as a better design in the first place. I was just wondering if
other people ever missed the "q.put_at_front_of_queue()" method or if
it is just me.

Regards
Floris

PS: assuming "val = q.get()" on the first line

Gregory Ewing · Mar 4, 2010

Floris said:
I was just wondering if
other people ever missed the "q.put_at_front_of_queue()" method or if
it is just me.

Sounds like you don't want a queue, but a stack. Or
maybe a double-ended queue.

Aahz · Mar 9, 2010

The "peek" parts comes in when the user comes back later to see if
their report has done. That is, in my page controller logic, I'd like
to look through the complete queue and see if the specific report has
been finished (I could tell by matching up the ID of the original
request to the ID in the completed queue). If there was an item in the
queue matching the ID, it would be removed.

Here's the question: what happens when the user refreshes the "report
done" page? The problem with the way you're doing it is that checking
for report done is a one-shot operation. You probably want to use a
database to cache this and have some kind of cache expiration.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer

Queue cleanup	77	Aug 11, 2010
Multiprocessing / threading confusion	11	Sep 5, 2013
Why queue.empty() returns False even after put() is called?	2	Nov 23, 2012
multiprocessing timing issue	2	Aug 9, 2011
Newbie queue question	9	Jun 17, 2009
multiprocessing: pool with blocking queue	4	Jul 2, 2009
multiprocessing.Pool, its queue, and pre-emption	7	Sep 15, 2011
How to properly implement worker processes	7	Aug 22, 2012

Queue peek?

Veloz

Raymond Hettinger

Veloz

MRAB

Martin P. Hellwig

mk

John Krukoff

MRAB

Gregory Ewing

Veloz

Steve Holden

mk

Floris Bruynooghe

Gregory Ewing

Aahz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads