Pipelined Processing

R

Robert Klemme

Hi,

this came up recently on IRC: question was how to chain processing so that
each step runs concurrently to other steps. While I don't see real benefit
as long as there are no native threads in Ruby I played around a bit and
this is the result (attached). There's certainly rool for improvement. Do
with this whatever you like.

Kind regards

robert
 
R

Robert Klemme

Paulus Esterhazy said:
Hello,

it's funny you should bring this up because I just cooked up something
similar (albeit less sophisticated and robust). What I needed was
several tasks being carried out in parallel.

It's not really similar. While my program executes several stages of
processing in parallel you are actually doing similar things in parallel.
At least that's what I understood from your explanation and code.
In this particular
instance I download several pages using open-uri and output a chunk
to webrick as soon as its processing is finished (as responsiveness
is critical). For network I/O, ruby's pseudo-threads seem to work
well, though I agree native threads would be much better.

True, as soon as slow IO is involved ruby threads do ok as long as the
processing doesn't use too many resources.
Attached the ParallelEnumerate class along with a trivial test case.
Bear with me, it's my first ruby script.

You try to tackle multithreading with your first script? Wow! I don't
exactly understand why you use a thread per collection just to fill a queue.
Maybe I'm missing something here but it looks a bit strange. Are you sure
this actually downloads in parallel?

If I would have done this I'd taken a different approach (but maybe I'm
missing some of your requirements): I'd create a queue which receives URL's
(or whatever tasks you have). Then I'd set up n threads (n>0, probably
depending on user input) and each thread reads elements from the queue and
processes them in parallel. You might as well combine both approaches, i.e.
if a URL has been downloaded, the content is pushed onto another queue from
which another number of threads (possible just 1) reads and processes.

Kind regards

robert
 
P

Paulus Esterhazy

Robert said:
You try to tackle multithreading with your first script? Wow! I don't
exactly understand why you use a thread per collection just to fill a
queue. Maybe I'm missing something here but it looks a bit strange. Are
you sure this actually downloads in parallel?

Yes it works for the purpose. The "collections" look like this:

class Source
includes Enumerable
def each
@data = open_url("http://...").read
while true
element = get_next_element
yield element
end
end
def next_element
# process @data
end
end

It would probably have been cleaner to do this using a thread pool for
downloading the pages and processing the data in the main thread, as you
suggest - seperating the stages. I used an enumerator because it's
convenient - I wrap the enumerator in a pseudo IO object which I return
to webrick (which expects an object that supports the method "read").
That way, I get a kind of simple asynchronous data processing.
If I would have done this I'd taken a different approach (but maybe I'm
missing some of your requirements): I'd create a queue which receives
URL's (or whatever tasks you have). Then I'd set up n threads (n>0,
probably depending on user input) and each thread reads elements from
the queue and processes them in parallel. You might as well combine
both approaches, i.e. if a URL has been downloaded, the content is
pushed onto another queue from which another number of threads (possible
just 1) reads and processes.

Thanks for the comment,
Paulus
 
S

Sean O'Halpin

Hi,

this came up recently on IRC: question was how to chain processing so tha= t
each step runs concurrently to other steps. While I don't see real benef= it
as long as there are no native threads in Ruby

One idea that springs to mind is to keep a responsive monitor thread
(e.g. GUI, console, remote) while performing a batch process by
splitting up an intensive computation into environmentally
thread-friendly chunks, represented by the blocks.
Do
with this whatever you like.

Kind regards

robert

Thanks! I will :)

Regards,

Sean
 
P

Phil Tomson

-=-=-=-=-=-


Hi,

this came up recently on IRC: question was how to chain processing so that
each step runs concurrently to other steps. While I don't see real benefit
as long as there are no native threads in Ruby I played around a bit and
this is the result (attached). There's certainly rool for improvement. Do
with this whatever you like.

True, without native threads you won't really gain any performance, but what
if (to improve performance) you were to either:
1) launch new processes instead of threads?
or
2) set things up so that different stages of the pipeline can run on different
machines? (maybe using Drb?)

....of course the amount of information passed between stages of the pipeline
would need to be small so that the communication overhead would stay low.


Phil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top