Parallelizing a ruby task

M

Mark Thomas

I have a long-running batch job that I would like to speed up.
Currently it uses only one CPU and the server I have in mind for this
has 16 cores, and I want to take advantage of them.

I'm thinking one of three possibilities:

1. JRuby, where the threads are native OS threads
2. A message queue (e.g. ActiveMQ + Stomp), where worker threads run
as separate processes, thus using all cores.
3. A MapReduce implementation (e.g. hadoop)

I would like to see if anyone has gone down this road and can weigh in
on these options.

-- Mark.
 
L

Lee Hinman

I have a long-running batch job that I would like to speed up.
Currently it uses only one CPU and the server I have in mind for this
has 16 cores, and I want to take advantage of them.

I'm thinking one of three possibilities:

1. JRuby, where the threads are native OS threads
2. A message queue (e.g. ActiveMQ + Stomp), where worker threads run
as separate processes, thus using all cores.
3. A MapReduce implementation (e.g. hadoop)

I would like to see if anyone has gone down this road and can weigh in
on these options.

How difficult it your task? If you were able to reduce (heh) it to a
MapReduce problem, you could use something like Skynet or Starfish.
For even simpler forking, check out Ara Howard's threadify or my
forkify for simple parallel processing.

- Lee
 
M

Mark Thomas

How difficult it your task? If you were able to reduce (heh) it to a  
MapReduce problem, you could use something like Skynet or Starfish.  
For even simpler forking, check out Ara Howard's threadify or my  
forkify for simple parallel processing.

Yes, it fits a MapReduce problem but most MapReduce implementations I
came across seemed like overkill. I wasn't aware of Skynet or
Starfish--they look promising, thanks. The file interface of Starfish
may in fact be just what I'm looking for.

I'll check out threadify and forkify too.

Thanks again.
-- Mark.
 
M

Michael Linfield

Just throwing my 2 cents out here:

What if you just created a daemon controller that threaded each process
on a different core o_O?

Would speed things up greatly whilst keeping control over each process.

- Mac
 
E

Eleanor McHugh

Just throwing my 2 cents out here:

What if you just created a daemon controller that threaded each
process
on a different core o_O?

Would speed things up greatly whilst keeping control over each
process.

Yep, multiple processes are your friend - especially on Unix :)


Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
 
M

Mark Thomas

Just throwing my 2 cents out here:

What if you just created a daemon controller that threaded each process
on a different core o_O?

Would speed things up greatly whilst keeping control over each process.

That's the idea behind using a message queue -- it does that kind of
stuff for you. Workers are processes that will be distributed among
cores. The only thing I'm unsure about in a MQ architecture is the
collating of answers from all the worker threads, i.e. the Reduce part
of MapReduce.
 
B

Brian Candler

Mark Thomas wrote:
...
2. A message queue (e.g. ActiveMQ + Stomp), where worker threads run
as separate processes, thus using all cores. ...
I would like to see if anyone has gone down this road and can weigh in
on these options.

I have gone down option 2, it works well.

Depending on your application, you may not need the sophistication of a
"real" queue manager. You could just create a Queue object (from
thread.rb), running in its own process, and share it using DRb. Multiple
reader processes can pop messages from the queue, and will block until a
message is available. Writers can push messages into the queue as
required. There is also SizedQueue which will block the writers if the
queue gets too full.

A "real" queue manager like RabbitMQ may make sense if you need your
subtasks to persist in the queue in the event of a system crash. But for
a simple worker-farm type of application, this usually isn't necessary.

Regards,

Brian.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top