Re: Multiprocessing.Queue - I want to end.

Discussion in 'Python' started by Hendrik van Rooyen, May 1, 2009.

  1. "Luis Zarrabeitia" <> wrote:

    8< -------explanation and example of one producer, --------
    8< -------more consumers and one queue --------------------

    >As you can see, I'm sending one 'None' per consumer, and hoping that no
    >consumer will read more than one None. While this particular implementation


    You don't have to hope. You can write the consumers that way to guarantee it.

    >ensures that, it is very fragile. Is there any way to signal the consumers?


    Signalling is not easy - you can signal a process, but I doubt if it is
    possible to signal a thread in a process.

    >(or better yet, the queue itself, as it is shared by all consumers?)
    >Should "close" work for this? (raise the exception when the queue is
    >exhausted, not when it is closed by the producer).


    I haven't the foggiest if this will work, and it seems to me to be kind
    of involved compared to passing a sentinel or sentinels.

    And while we are on the subject - Passing None as a sentinel is IMO as
    good as or better than passing "XXZulu This is the End uluZXX",
    or any other imaginative string that is not likely to occur naturally
    in the input.

    I have always wondered why people do the one queue many getters thing.

    Given that the stuff you pass is homogenous in that it will require a
    similar amount of effort to process, is there not a case to be made
    to have as many queues as consumers, and to round robin the work?

    And if the stuff you pass around needs disparate effort to consume,
    it seems to me that you can more easily balance the load by having
    specialised consumers, instead of instances of one humungous
    "I can eat anything" consumer.

    I also think that having a queue per consumer thread makes it easier
    to replace the threads with processes and the queues with pipes or
    sockets if you need to do serious scaling later.

    In fact I happen to believe that anything that does any work needs
    one and only one input queue and nothing else, but I am peculiar
    that way.

    - Hendrik
    Hendrik van Rooyen, May 1, 2009
    #1
    1. Advertising

  2. Hendrik van Rooyen schreef:
    > I have always wondered why people do the one queue many getters thing.


    Because IMO it's the simplest and most elegant solution.
    >
    > Given that the stuff you pass is homogenous in that it will require a
    > similar amount of effort to process, is there not a case to be made
    > to have as many queues as consumers, and to round robin the work?


    Could work if the processing time for each work unit is exactly the same
    (otherwise one or more consumers will be idle part of the time), but in
    most cases that is not guaranteed. A simple example is fetching data
    over the network: even if the data size is always the same, there will
    be differences because of network load variations.

    If you use one queue, each consumer fetches a new work unit as soon it
    has consumed the previous one. All consumers will be working as long as
    there is work to do, without having to write any code to do the load
    balancing.

    With one queue for each consumer, you either have to assume that the
    average processing time is the same (otherwise some consumers will be
    idle at the end, while others are still busy processing work units), or
    you need some clever code in the producer(s) or the driving code to
    balance the loads. That's extra complexity for little or no benefit.

    I like the simplicity of having one queue: the producer(s) put work
    units on the queue with no concern which consumer will process them or
    how many consumers there even are; likewise the consumer(s) don't know
    and don't need to know where their work units come from. And the work
    gets automatically distributed to whichever consumer has first finished
    its previous work unit.

    > And if the stuff you pass around needs disparate effort to consume,
    > it seems to me that you can more easily balance the load by having
    > specialised consumers, instead of instances of one humungous
    > "I can eat anything" consumer.


    If there is a semantic difference, maybe yes; but I think it makes no
    sense to differentiate purely on the expected execution times.

    > I also think that having a queue per consumer thread makes it easier
    > to replace the threads with processes and the queues with pipes or
    > sockets if you need to do serious scaling later.


    Perhaps, but isn't that a case of YAGNI and/or premature optimization?


    --
    The saddest aspect of life right now is that science gathers knowledge
    faster than society gathers wisdom.
    -- Isaac Asimov

    Roel Schroeven
    Roel Schroeven, May 1, 2009
    #2
    1. Advertising

  3. : "Roel Schroeven" <> wrote:


    > Hendrik van Rooyen schreef:
    > > I have always wondered why people do the one queue many getters thing.

    >
    > Because IMO it's the simplest and most elegant solution.


    That is fair enough...

    > >
    > > Given that the stuff you pass is homogenous in that it will require a
    > > similar amount of effort to process, is there not a case to be made
    > > to have as many queues as consumers, and to round robin the work?

    >
    > Could work if the processing time for each work unit is exactly the same
    > (otherwise one or more consumers will be idle part of the time), but in
    > most cases that is not guaranteed. A simple example is fetching data
    > over the network: even if the data size is always the same, there will
    > be differences because of network load variations.
    >
    > If you use one queue, each consumer fetches a new work unit as soon it
    > has consumed the previous one. All consumers will be working as long as
    > there is work to do, without having to write any code to do the load
    > balancing.
    >
    > With one queue for each consumer, you either have to assume that the
    > average processing time is the same (otherwise some consumers will be
    > idle at the end, while others are still busy processing work units), or
    > you need some clever code in the producer(s) or the driving code to
    > balance the loads. That's extra complexity for little or no benefit.
    >
    > I like the simplicity of having one queue: the producer(s) put work
    > units on the queue with no concern which consumer will process them or
    > how many consumers there even are; likewise the consumer(s) don't know
    > and don't need to know where their work units come from. And the work
    > gets automatically distributed to whichever consumer has first finished
    > its previous work unit.


    This is all true in the case of a job that starts, runs and finishes.
    I am not so sure it applies to something that has a long life.

    >
    > > And if the stuff you pass around needs disparate effort to consume,
    > > it seems to me that you can more easily balance the load by having
    > > specialised consumers, instead of instances of one humungous
    > > "I can eat anything" consumer.

    >
    > If there is a semantic difference, maybe yes; but I think it makes no
    > sense to differentiate purely on the expected execution times.


    The idea is basically that you have the code that classifies in one
    place only, instead of running in all the instances of the consumer.
    Feels better to me, somehow.

    >
    > > I also think that having a queue per consumer thread makes it easier
    > > to replace the threads with processes and the queues with pipes or
    > > sockets if you need to do serious scaling later.

    >
    > Perhaps, but isn't that a case of YAGNI and/or premature optimization?


    Yes and no:

    Yes - You Are Gonna Need It.
    and
    No it is never premature to use a decent structure.

    :)

    - Hendrik
    Hendrik van Rooyen, May 2, 2009
    #3
  4. Hendrik van Rooyen schreef:
    > : "Roel Schroeven" <> wrote:


    >> ...


    > This is all true in the case of a job that starts, runs and finishes.
    > I am not so sure it applies to something that has a long life.


    It's true that I'm talking about work units with relatively short
    lifetimes, mostly a few seconds but perhaps maximum about ten minutes. I
    assumed that queues are mostly used for that kind of stuff. I've never
    really thought about cases where that assumption doesn't hold, so it's
    very well possible that all I've said is invalid in other cases.

    >>> And if the stuff you pass around needs disparate effort to consume,
    >>> it seems to me that you can more easily balance the load by having
    >>> specialised consumers, instead of instances of one humungous
    >>> "I can eat anything" consumer.

    >> If there is a semantic difference, maybe yes; but I think it makes no
    >> sense to differentiate purely on the expected execution times.

    >
    > The idea is basically that you have the code that classifies in one
    > place only, instead of running in all the instances of the consumer.
    > Feels better to me, somehow.


    I most cases that I can imagine (and certainly in all cases I've used),
    no classification whatsoever is even needed.

    --
    The saddest aspect of life right now is that science gathers knowledge
    faster than society gathers wisdom.
    -- Isaac Asimov

    Roel Schroeven
    Roel Schroeven, May 2, 2009
    #4
  5. Hendrik van Rooyen

    Dave Angel Guest

    Hendrik van Rooyen wrote:
    > : "Roel Schroeven" <> wrote:
    >
    >
    >
    >> Hendrik van Rooyen schreef:
    >>
    >>> I have always wondered why people do the one queue many getters thing.
    >>>

    >> Because IMO it's the simplest and most elegant solution.
    >>

    >
    > That is fair enough...
    >
    >
    >>> Given that the stuff you pass is homogenous in that it will require a
    >>> similar amount of effort to process, is there not a case to be made
    >>> to have as many queues as consumers, and to round robin the work?
    >>>

    >> Could work if the processing time for each work unit is exactly the same
    >> (otherwise one or more consumers will be idle part of the time), but in
    >> most cases that is not guaranteed. A simple example is fetching data
    >> over the network: even if the data size is always the same, there will
    >> be differences because of network load variations.
    >>
    >> If you use one queue, each consumer fetches a new work unit as soon it
    >> has consumed the previous one. All consumers will be working as long as
    >> there is work to do, without having to write any code to do the load
    >> balancing.
    >>
    >> With one queue for each consumer, you either have to assume that the
    >> average processing time is the same (otherwise some consumers will be
    >> idle at the end, while others are still busy processing work units), or
    >> you need some clever code in the producer(s) or the driving code to
    >> balance the loads. That's extra complexity for little or no benefit.
    >>
    >> I like the simplicity of having one queue: the producer(s) put work
    >> units on the queue with no concern which consumer will process them or
    >> how many consumers there even are; likewise the consumer(s) don't know
    >> and don't need to know where their work units come from. And the work
    >> gets automatically distributed to whichever consumer has first finished
    >> its previous work unit.
    >>

    >
    > This is all true in the case of a job that starts, runs and finishes.
    > I am not so sure it applies to something that has a long life.
    >
    >
    >>> And if the stuff you pass around needs disparate effort to consume,
    >>> it seems to me that you can more easily balance the load by having
    >>> specialised consumers, instead of instances of one humungous "I can
    >>> eat anything" consumer.
    >>>

    >> If there is a semantic difference, maybe yes; but I think it makes no
    >> sense to differentiate purely on the expected execution times.
    >>

    >
    > The idea is basically that you have the code that classifies in one
    > place only, instead of running in all the instances of the consumer.
    > Feels better to me, somehow.
    > <snip>

    If the classifying you're doing is just based on expected time to
    consume the item, then I think your plan to use separate queues is
    misguided.

    If the consumers are interchangeable in their abilities, then feeding
    them from a single queue is more efficient, both on average wait time,
    worst-case wait time, and on consumer utilization, in nearly all
    non-pathological scenarios. Think the line at the bank. There's a good
    reason they now have a single line for multiple tellers. If you have
    five tellers, and one of your transactions is really slow, the rest of
    the line is only slowed down by 20%, rather than a few people being
    slowed down by a substantial amount because they happen to be behind the
    slowpoke. 30 years ago, they'd have individual lines, and I tried in
    vain to explain queuing theory to the bank manager.

    Having said that, notice that sometimes the consumers in a computer are
    not independent. If you're running 20 threads with this model, on a
    processor with only two cores, and if the tasks are CPU bound, you're
    wasting lots of thread-management time without gaining anything.
    Similarly, if there are other shared resources that all the threads trip
    over, it may not pay to do as many of them in parallel. But for
    homogeneous consumers, do them with a single queue, and do benchmarks to
    determine the optimum number of consumers to start.
    Dave Angel, May 2, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Russell Warren

    Is Queue.Queue.queue.clear() thread-safe?

    Russell Warren, Jun 22, 2006, in forum: Python
    Replies:
    4
    Views:
    653
    Russell Warren
    Jun 27, 2006
  2. redbaron
    Replies:
    1
    Views:
    343
    redbaron
    Oct 15, 2008
  3. redbaron
    Replies:
    3
    Views:
    473
    Paul Rubin
    Oct 21, 2008
  4. Luis Zarrabeitia

    Multiprocessing.Queue - I want to end.

    Luis Zarrabeitia, Apr 30, 2009, in forum: Python
    Replies:
    3
    Views:
    1,164
  5. Kris
    Replies:
    0
    Views:
    414
Loading...

Share This Page