Pipelined Processing

Robert Klemme · Nov 20, 2005

Hi,

this came up recently on IRC: question was how to chain processing so that
each step runs concurrently to other steps. While I don't see real benefit
as long as there are no native threads in Ruby I played around a bit and
this is the result (attached). There's certainly rool for improvement. Do
with this whatever you like.

Kind regards

robert

Robert Klemme · Nov 20, 2005

Paulus Esterhazy said:
Hello,

it's funny you should bring this up because I just cooked up something
similar (albeit less sophisticated and robust). What I needed was
several tasks being carried out in parallel.

It's not really similar. While my program executes several stages of
processing in parallel you are actually doing similar things in parallel.
At least that's what I understood from your explanation and code.

In this particular
instance I download several pages using open-uri and output a chunk
to webrick as soon as its processing is finished (as responsiveness
is critical). For network I/O, ruby's pseudo-threads seem to work
well, though I agree native threads would be much better.

True, as soon as slow IO is involved ruby threads do ok as long as the
processing doesn't use too many resources.

Attached the ParallelEnumerate class along with a trivial test case.
Bear with me, it's my first ruby script.

You try to tackle multithreading with your first script? Wow! I don't
exactly understand why you use a thread per collection just to fill a queue.
Maybe I'm missing something here but it looks a bit strange. Are you sure
this actually downloads in parallel?

If I would have done this I'd taken a different approach (but maybe I'm
missing some of your requirements): I'd create a queue which receives URL's
(or whatever tasks you have). Then I'd set up n threads (n>0, probably
depending on user input) and each thread reads elements from the queue and
processes them in parallel. You might as well combine both approaches, i.e.
if a URL has been downloaded, the content is pushed onto another queue from
which another number of threads (possible just 1) reads and processes.

Kind regards

robert

Paulus Esterhazy · Nov 20, 2005

Robert said:
You try to tackle multithreading with your first script? Wow! I don't
exactly understand why you use a thread per collection just to fill a
queue. Maybe I'm missing something here but it looks a bit strange. Are
you sure this actually downloads in parallel?

Yes it works for the purpose. The "collections" look like this:

class Source
includes Enumerable
def each
@data = open_url("http://...").read
while true
element = get_next_element
yield element
end
end
def next_element
# process @data
end
end

It would probably have been cleaner to do this using a thread pool for
downloading the pages and processing the data in the main thread, as you
suggest - seperating the stages. I used an enumerator because it's
convenient - I wrap the enumerator in a pseudo IO object which I return
to webrick (which expects an object that supports the method "read").
That way, I get a kind of simple asynchronous data processing.

If I would have done this I'd taken a different approach (but maybe I'm
missing some of your requirements): I'd create a queue which receives
URL's (or whatever tasks you have). Then I'd set up n threads (n>0,
probably depending on user input) and each thread reads elements from
the queue and processes them in parallel. You might as well combine
both approaches, i.e. if a URL has been downloaded, the content is
pushed onto another queue from which another number of threads (possible
just 1) reads and processes.

Thanks for the comment,
Paulus

Sean O'Halpin · Nov 20, 2005

Hi,

this came up recently on IRC: question was how to chain processing so tha= t
each step runs concurrently to other steps. While I don't see real benef= it
as long as there are no native threads in Ruby

One idea that springs to mind is to keep a responsive monitor thread
(e.g. GUI, console, remote) while performing a batch process by
splitting up an intensive computation into environmentally
thread-friendly chunks, represented by the blocks.

Do
with this whatever you like.

Kind regards

robert

Thanks! I will

Regards,

Sean

Phil Tomson · Nov 21, 2005

-=-=-=-=-=-

Hi,

this came up recently on IRC: question was how to chain processing so that
each step runs concurrently to other steps. While I don't see real benefit
as long as there are no native threads in Ruby I played around a bit and
this is the result (attached). There's certainly rool for improvement. Do
with this whatever you like.

True, without native threads you won't really gain any performance, but what
if (to improve performance) you were to either:
1) launch new processes instead of threads?
or
2) set things up so that different stages of the pipeline can run on different
machines? (maybe using Drb?)

....of course the amount of information passed between stages of the pipeline
would need to be small so that the communication overhead would stay low.

Phil

Parallel/Multiprocessing script design question	4	Sep 13, 2007
[SUMMARY] Word Chains (#44)	12	Sep 1, 2005
Default scope of variables	55	Jul 4, 2013
A crosspost from the Perl Community	54	Jun 4, 2008
Ruby List Roast - A Tentative Attempt	8	Aug 20, 2009
Ruby Weekly News 7th - 13th August 2006	0	Aug 17, 2006
[SUMMARY] Housie (#114)	0	Feb 22, 2007
Anomaly: onblur handling AND advice on validation	1	Jan 29, 2008

Pipelined Processing

Robert Klemme

Robert Klemme

Paulus Esterhazy

Sean O'Halpin

Phil Tomson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads