Pipelined Processing

Discussion in 'Ruby' started by Robert Klemme, Nov 20, 2005.

  1. Hi,

    this came up recently on IRC: question was how to chain processing so that
    each step runs concurrently to other steps. While I don't see real benefit
    as long as there are no native threads in Ruby I played around a bit and
    this is the result (attached). There's certainly rool for improvement. Do
    with this whatever you like.

    Kind regards

    robert
     
    Robert Klemme, Nov 20, 2005
    #1
    1. Advertising

  2. Paulus Esterhazy <> wrote:
    >>> this came up recently on IRC: question was how to chain processing
    >>> so that each step runs concurrently to other steps. While I don't
    >>> see real benefit as long as there are no native threads in Ruby I
    >>> played around a bit and this is the result (attached). There's
    >>> certainly rool for improvement. Do with this whatever you like.

    >
    > Hello,
    >
    > it's funny you should bring this up because I just cooked up something
    > similar (albeit less sophisticated and robust). What I needed was
    > several tasks being carried out in parallel.


    It's not really similar. While my program executes several stages of
    processing in parallel you are actually doing similar things in parallel.
    At least that's what I understood from your explanation and code.

    > In this particular
    > instance I download several pages using open-uri and output a chunk
    > to webrick as soon as its processing is finished (as responsiveness
    > is critical). For network I/O, ruby's pseudo-threads seem to work
    > well, though I agree native threads would be much better.


    True, as soon as slow IO is involved ruby threads do ok as long as the
    processing doesn't use too many resources.

    > Attached the ParallelEnumerate class along with a trivial test case.
    > Bear with me, it's my first ruby script.


    You try to tackle multithreading with your first script? Wow! I don't
    exactly understand why you use a thread per collection just to fill a queue.
    Maybe I'm missing something here but it looks a bit strange. Are you sure
    this actually downloads in parallel?

    If I would have done this I'd taken a different approach (but maybe I'm
    missing some of your requirements): I'd create a queue which receives URL's
    (or whatever tasks you have). Then I'd set up n threads (n>0, probably
    depending on user input) and each thread reads elements from the queue and
    processes them in parallel. You might as well combine both approaches, i.e.
    if a URL has been downloaded, the content is pushed onto another queue from
    which another number of threads (possible just 1) reads and processes.

    Kind regards

    robert
     
    Robert Klemme, Nov 20, 2005
    #2
    1. Advertising

  3. Robert Klemme schrieb:
    >> Attached the ParallelEnumerate class along with a trivial test case.
    >> Bear with me, it's my first ruby script.

    > You try to tackle multithreading with your first script? Wow! I don't
    > exactly understand why you use a thread per collection just to fill a
    > queue. Maybe I'm missing something here but it looks a bit strange. Are
    > you sure this actually downloads in parallel?


    Yes it works for the purpose. The "collections" look like this:

    class Source
    includes Enumerable
    def each
    @data = open_url("http://...").read
    while true
    element = get_next_element
    yield element
    end
    end
    def next_element
    # process @data
    end
    end

    It would probably have been cleaner to do this using a thread pool for
    downloading the pages and processing the data in the main thread, as you
    suggest - seperating the stages. I used an enumerator because it's
    convenient - I wrap the enumerator in a pseudo IO object which I return
    to webrick (which expects an object that supports the method "read").
    That way, I get a kind of simple asynchronous data processing.

    >
    > If I would have done this I'd taken a different approach (but maybe I'm
    > missing some of your requirements): I'd create a queue which receives
    > URL's (or whatever tasks you have). Then I'd set up n threads (n>0,
    > probably depending on user input) and each thread reads elements from
    > the queue and processes them in parallel. You might as well combine
    > both approaches, i.e. if a URL has been downloaded, the content is
    > pushed onto another queue from which another number of threads (possible
    > just 1) reads and processes.


    Thanks for the comment,
    Paulus
     
    Paulus Esterhazy, Nov 20, 2005
    #3
  4. On 11/20/05, Robert Klemme <> wrote:
    >
    > Hi,
    >
    > this came up recently on IRC: question was how to chain processing so tha=

    t
    > each step runs concurrently to other steps. While I don't see real benef=

    it
    > as long as there are no native threads in Ruby


    One idea that springs to mind is to keep a responsive monitor thread
    (e.g. GUI, console, remote) while performing a batch process by
    splitting up an intensive computation into environmentally
    thread-friendly chunks, represented by the blocks.

    > Do
    > with this whatever you like.
    >
    > Kind regards
    >
    > robert


    Thanks! I will :)

    Regards,

    Sean
     
    Sean O'Halpin, Nov 20, 2005
    #4
  5. Robert Klemme

    Phil Tomson Guest

    In article <>,
    Robert Klemme <> wrote:
    >-=-=-=-=-=-
    >
    >
    >Hi,
    >
    >this came up recently on IRC: question was how to chain processing so that
    >each step runs concurrently to other steps. While I don't see real benefit
    >as long as there are no native threads in Ruby I played around a bit and
    >this is the result (attached). There's certainly rool for improvement. Do
    >with this whatever you like.
    >


    True, without native threads you won't really gain any performance, but what
    if (to improve performance) you were to either:
    1) launch new processes instead of threads?
    or
    2) set things up so that different stages of the pipeline can run on different
    machines? (maybe using Drb?)

    ....of course the amount of information passed between stages of the pipeline
    would need to be small so that the communication overhead would stay low.


    Phil
     
    Phil Tomson, Nov 21, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fred Bartoli

    Pipelined binary encoder

    Fred Bartoli, Nov 7, 2004, in forum: VHDL
    Replies:
    1
    Views:
    743
    Jonathan Bromley
    Nov 10, 2004
  2. Replies:
    5
    Views:
    576
    Ray Andraka
    Mar 3, 2005
  3. MB
    Replies:
    1
    Views:
    738
  4. wallge
    Replies:
    0
    Views:
    1,432
    wallge
    Feb 20, 2006
  5. Jonathan Bromley

    Describing pipelined hardware

    Jonathan Bromley, Jun 6, 2006, in forum: VHDL
    Replies:
    50
    Views:
    2,047
    Ben Jones
    Jun 22, 2006
Loading...

Share This Page