basic threading question: can ruby use real threads?

Kyle Schmitt · May 8, 2007

I've read somewhere, and would love for it to be wrong, that ruby
doesn't use real threads, that it handles it's threads internally. Is
that true? If it is true, will that still be true when ruby 1.9, or
2.0 comes out?

For many systems this isn't a big deal one way or the other, since
they only have one physical processor. Luckily(?) pretty much all my
systems have two procs. (Two real processors, not HT, but that's a
debate for another day.) I'd like to write some threaded ruby code,
and have it spread across my cpus, share data structures etc.

I'm used to pthreads in UNIX systems

so I'd _really_ like it if I
could do the same type of things I've done before, just in a rubyish
sort of way. Setting up a shared memory area and all that jazz that
you had to do for forking really doesn't sound like a fun, especially
when the point of the code I wanna write _is_ for fun.

Thanks,
Kyle

James Edward Gray II · May 8, 2007

I've read somewhere, and would love for it to be wrong, that ruby
doesn't use real threads, that it handles it's threads internally. Is
that true? If it is true, will that still be true when ruby 1.9, or
2.0 comes out?

This was recently discussed in detail by the creators:

http://blog.grayproductions.net/articles/2007/04/27/the-ruby-vm-
episode-iii

James Edward Gray II

Kyle Schmitt · May 8, 2007

Sweet, thanks for the link!

Kyle Schmitt · May 8, 2007

OK, so I'm reading that article, and I'm getting three things form it:
YARV uses native threads.
YARV doesn't run them simultaneously.
YARV will eventually run them simultaneously.

Good enough for me, I'll just hope that writing threaded code doesn't
change to much with ruby2.0/YARV.

--Kyle

Marcin Raczkowski · May 9, 2007

OK, so I'm reading that article, and I'm getting three things form it:
YARV uses native threads.
YARV doesn't run them simultaneously.
YARV will eventually run them simultaneously.

Good enough for me, I'll just hope that writing threaded code doesn't
change to much with ruby2.0/YARV.

--Kyle

well you can use fastthreads gem (part of mongrel)
also you can fork your script ^^ threads usually execute on same processor
AFIK, that's why if you want to use 2 processors you have to fork your
scripts, and if you need comunication between them consider using drb.

very good gem is slave - it makes creating new processes super easy - it
provides easy way to comunicate, so you can create 4-6 new processes each
will get data to compute from mother process and the'll use both processors

sorry for lots of randomness and strange grammar - to much coffeine
to sumarize - read rdoc for gems:
- fasthread(s)
- slave(s)
(i never remember if they are plurar or singular)

MenTaLguY · May 9, 2007

well you can use fastthreads gem (part of mongrel)

fastthread just makes the locking primitives from thread.rb a little faster; it doesn't otherwise affect the operation of Ruby threads. Additionally, it is applicable only to Ruby 1.8, not YARV/1.9.

-mental

Marcin Raczkowski · May 9, 2007

fastthread just makes the locking primitives from thread.rb a little
faster; it doesn't otherwise affect the operation of Ruby threads.
Additionally, it is applicable only to Ruby 1.8, not YARV/1.9.

-mental

Click to expand...

I didn't say it makes use of POSIX threads - i just recomended it becouse they
are well ... faster.

only thing right now that'll let you use botht procesors is fork

Kyle Schmitt · May 9, 2007

If you fork, is there even a way to create objects that are shared
between the two forks? Or do you have to rely on rpc/ipc stuff
instead?

If someone were to... write a c extension who's objects were threaded,
via pthreads, would it be a nightmare?

Even just typing that line almost scares me....but I can think of some
clean(ish?) ways of doing it. I'm just worried I'd loose the rubyness
of the thing if I did it that way.

Thanks,
Kyle

Gary Wright · May 9, 2007

I didn't say it makes use of POSIX threads - i just recomended it
becouse they
are well ... faster.

only thing right now that'll let you use botht procesors is fork

Just my opinion but my default choice would be fork when I need
concurrency rather than threads. The main reason is that it forces you
to be explicit in how you structure the communication between processes.
One process can't inadvertently change the state of another.
On a multi-processor box you'll get IO multiplexing and real CPU
concurrency automatically with fork.

Some problems can't be partitioned easily into separate addresses
spaces,
in which case threads are a better choice. Even then I might consider
using shared memory among cooperating processes first.

I realize that the Unix fork/exec model of processes doesn't quite apply
in the Windows environment. Anecdotal evidence makes me think that
Windows programmers tend to reach for threads as a multi-tasking
solution
more often than Unix programmers.

One more observation. The desire for real concurrency using multiple
processors is great for problems that can be cleanly partitioned, but if
you have a problem that requires concurrent access to shared data then
you'll have to keep in mind the memory/cache contention that will be
created when processing is distributed across multiple processors (via
processes or threads).

Gary Wright

MenTaLguY · May 9, 2007

If you fork, is there even a way to create objects that are shared
between the two forks? Or do you have to rely on rpc/ipc stuff
instead?

Totally RPC. You could use DRb to do this in a Rubyesque fashion.

It's worth noting that no matter _what_ threading approach you use, it's
absolutely best to minimize the number of objects shared between threads.

If someone were to... write a c extension who's objects were threaded,
via pthreads, would it be a nightmare?

Yes, somewhere between nightmare and flesh-rending terror. At least if you're
planning on manipulating Ruby objects from each thread.

You might want to consider using JRuby instead. It's compatible enough with MRI
that it runs Rails, and it uses "real" threads for multi-CPU goodness.

-mental

Kyle Schmitt · May 9, 2007

Manipulating ruby objects from inside the threads would be the idea in
some cases I'm thinking of... so it looks like JRuby until YARV gets
concurrent threads... and ooh do I hope it does.

Will the threading interface be drastically different between
MRI/JRuby/YARV? IE does anyone know if I code on MRI will it
automatically use real threads on JRuby, or will I have to re-code
some parts to get that?

Thanks again,
Kyle

MenTaLguY · May 9, 2007

does anyone know if I code on MRI will it automatically use real threads on JRuby,

Yes.

The APIs are the same between MRI and JRuby, though JRuby deliberately hedges
on the implementation of certain unsafe features like Thread#kill, Thread#raise,
and Thread.critical=.

-mental

Sylvain Joyeux · May 10, 2007

The APIs are the same between MRI and JRuby, though JRuby deliberately

hedges on the implementation of certain unsafe features like
Thread#kill, Thread#raise, and Thread.critical=.

Thread#raise, "unsafe" ? It is the most useful thread-related functionality
I've seen since I'm using threads ! It allows for instance to handle
failing rendezvous the proper way (by using exceptions).

Could you tell us why you think it is "unsafe" ?

Bill Kelly · May 10, 2007

From: "Sylvain Joyeux said:
Thread#raise, "unsafe" ? It is the most useful thread-related functionality
I've seen since I'm using threads ! It allows for instance to handle
failing rendezvous the proper way (by using exceptions).

Could you tell us why you think it is "unsafe" ?

Hi,

I'm not sure if this is what MenTaLGuY meant, but one way that
Thread#raise is unsafe, is that it can raise an exception in the
specified thread while that thread is executing an 'ensure' block.

This can cause a failure of critical resources to be cleaned up
correctly, such as locks on mutexes, etc., as some or all of the
code in the ensure block is skipped.

I first ran into this when I tried to use timeout{} to implement
a ConditionVariable#timed_wait, like:

require 'thread'
require 'timeout'
class ConditionVariable
def timed_wait(mutex, timeout_secs)
timeout(timeout_secs) { wait(mutex) } # THIS IS UNSAFE
end
end

Note that 'timeout' functions by creating a temporary new thread
which sleeps for the duration, then raises an exception in the
'current' thread that invoked timeout.

If the timeout raises its exception at an unlucky moment, the
various internals of ConditionVariable#wait and Mutex#synchronize
that depend on ensure blocks to restore their class invariants are
skipped, resulting in nasty things like a permanently locked mutex.

Not fun...

Regards,

Bill

Charles Oliver Nutter · May 10, 2007

Bill said:
Hi,

I'm not sure if this is what MenTaLGuY meant, but one way that
Thread#raise is unsafe, is that it can raise an exception in the
specified thread while that thread is executing an 'ensure' block.

And to make it clear, we do implement kill, raise, and critical=, with
the following limitations:

- There are no guarantees all other threads will have stopped before
critical= allows the current thread to continue executing.
- Kill and raise require the target thread to eventually reach a
checkpoint where they are willing to "listen" to the kill or raise
event. If they don't, the calling thread will wait forever.

I even made these operations a bit cleaner and faster in 0.9.9, but
there's no way to do them perfectly with real concurrent threads.

- Charlie

Marcin Raczkowski · May 10, 2007

Just my opinion but my default choice would be fork when I need
concurrency rather than threads. The main reason is that it forces you
to be explicit in how you structure the communication between processes.
One process can't inadvertently change the state of another.
On a multi-processor box you'll get IO multiplexing and real CPU
concurrency automatically with fork.

Some problems can't be partitioned easily into separate addresses
spaces,
in which case threads are a better choice. Even then I might consider
using shared memory among cooperating processes first.

I realize that the Unix fork/exec model of processes doesn't quite apply
in the Windows environment. Anecdotal evidence makes me think that
Windows programmers tend to reach for threads as a multi-tasking
solution
more often than Unix programmers.

One more observation. The desire for real concurrency using multiple
processors is great for problems that can be cleanly partitioned, but if
you have a problem that requires concurrent access to shared data then
you'll have to keep in mind the memory/cache contention that will be
created when processing is distributed across multiple processors (via
processes or threads).

Gary Wright

As i mentioned earlier - easiest way to get REAL concurency (java VM will NOT
use both processors - for few reasons JavaVM ALWAYS use one processor -
scalling for example Tomcat in production enviroment require running 2-4 java
VM's) is to use Slave gem - I'm using it for my project for concurent
parasing of logs - overhead on DRb is not big -and what's more you can use it
on few machines if you want to scale it further

http://www.codeforpeople.com/lib/ruby/slave/slave-1.2.1/

creating new forks is really easy and you can create just one class for
procesing of data that can be concurent and everything else can be done in
main program

John Smith · May 10, 2007

http://www.surfjunky.com/?r=Gabrielll cheach this out

John Smith · May 10, 2007

http://www.surfjunky.com/?r=Gabrielll chack this out

it chaged my
life style

MenTaLguY · May 10, 2007

Could you tell us why you think [Thread#raise] is "unsafe" ?

Because you have no control over when the exception is delivered, which may be at the worst possible moment. Even ensure does not provide adequate protection.

Consider what happens with this code if an exception happens to arrive just before the begin block is processed:

@counter += 1
begin
# ... do stuff ...
ensure
@counter -= 1
end

Lest you think there's an easy fix, consider what happens with this second example if an exception arrives after the begin block is entered, but before the counter has been incremented:

begin
@counter += 1
# ... do stuff ...
ensure
@counter -= 1
end

-mental

MenTaLguY · May 10, 2007

As i mentioned earlier - easiest way to get REAL concurency (java VM will
NOT use both processors - for few reasons JavaVM ALWAYS use one processor -

Have you got evidence for this? I do not believe it to be the case for a
non-green-threaded JVM.

-mental

Ruby Weekly News 5th - 11th June 2006	0	Jun 14, 2006
Ruby Weekly News 19th - 25th September 2005	0	Sep 27, 2005
Ruby Weekly News 1-7th November 2004	0	Nov 9, 2004
Ruby Weekly News 27th March - 2nd April 2006	2	Apr 4, 2006
Ruby Weekly News 14th - 20th March 2005	0	Mar 20, 2005
Ruby Weekly News 14th - 20th February 2005	4	Feb 20, 2005
Ruby Weekly News 6th - 12th June 2005	0	Jun 14, 2005
Ruby Weekly News 28th March - 3rd April 2005	6	Apr 4, 2005

basic threading question: can ruby use real threads?

Kyle Schmitt

James Edward Gray II

Kyle Schmitt

Kyle Schmitt

Marcin Raczkowski

MenTaLguY

Marcin Raczkowski

Kyle Schmitt

Gary Wright

MenTaLguY

Kyle Schmitt

MenTaLguY

Sylvain Joyeux

Bill Kelly

Charles Oliver Nutter

Marcin Raczkowski

John Smith

John Smith

MenTaLguY

MenTaLguY

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads