basic threading question: can ruby use real threads?

K

Kyle Schmitt

I've read somewhere, and would love for it to be wrong, that ruby
doesn't use real threads, that it handles it's threads internally. Is
that true? If it is true, will that still be true when ruby 1.9, or
2.0 comes out?

For many systems this isn't a big deal one way or the other, since
they only have one physical processor. Luckily(?) pretty much all my
systems have two procs. (Two real processors, not HT, but that's a
debate for another day.) I'd like to write some threaded ruby code,
and have it spread across my cpus, share data structures etc.

I'm used to pthreads in UNIX systems :) so I'd _really_ like it if I
could do the same type of things I've done before, just in a rubyish
sort of way. Setting up a shared memory area and all that jazz that
you had to do for forking really doesn't sound like a fun, especially
when the point of the code I wanna write _is_ for fun.

Thanks,
Kyle
 
K

Kyle Schmitt

OK, so I'm reading that article, and I'm getting three things form it:
YARV uses native threads.
YARV doesn't run them simultaneously.
YARV will eventually run them simultaneously.

Good enough for me, I'll just hope that writing threaded code doesn't
change to much with ruby2.0/YARV.

--Kyle
 
M

Marcin Raczkowski

OK, so I'm reading that article, and I'm getting three things form it:
YARV uses native threads.
YARV doesn't run them simultaneously.
YARV will eventually run them simultaneously.

Good enough for me, I'll just hope that writing threaded code doesn't
change to much with ruby2.0/YARV.

--Kyle

well you can use fastthreads gem (part of mongrel)
also you can fork your script ^^ threads usually execute on same processor
AFIK, that's why if you want to use 2 processors you have to fork your
scripts, and if you need comunication between them consider using drb.

very good gem is slave - it makes creating new processes super easy - it
provides easy way to comunicate, so you can create 4-6 new processes each
will get data to compute from mother process and the'll use both processors

sorry for lots of randomness and strange grammar - to much coffeine
to sumarize - read rdoc for gems:
- fasthread(s)
- slave(s)
(i never remember if they are plurar or singular)
 
M

MenTaLguY

well you can use fastthreads gem (part of mongrel)

fastthread just makes the locking primitives from thread.rb a little faster; it doesn't otherwise affect the operation of Ruby threads. Additionally, it is applicable only to Ruby 1.8, not YARV/1.9.

-mental
 
M

Marcin Raczkowski

fastthread just makes the locking primitives from thread.rb a little
faster; it doesn't otherwise affect the operation of Ruby threads.
Additionally, it is applicable only to Ruby 1.8, not YARV/1.9.

-mental

I didn't say it makes use of POSIX threads - i just recomended it becouse they
are well ... faster.

only thing right now that'll let you use botht procesors is fork
 
K

Kyle Schmitt

If you fork, is there even a way to create objects that are shared
between the two forks? Or do you have to rely on rpc/ipc stuff
instead?


If someone were to... write a c extension who's objects were threaded,
via pthreads, would it be a nightmare?

Even just typing that line almost scares me....but I can think of some
clean(ish?) ways of doing it. I'm just worried I'd loose the rubyness
of the thing if I did it that way.

Thanks,
Kyle
 
G

Gary Wright

I didn't say it makes use of POSIX threads - i just recomended it
becouse they
are well ... faster.

only thing right now that'll let you use botht procesors is fork

Just my opinion but my default choice would be fork when I need
concurrency rather than threads. The main reason is that it forces you
to be explicit in how you structure the communication between processes.
One process can't inadvertently change the state of another.
On a multi-processor box you'll get IO multiplexing and real CPU
concurrency automatically with fork.

Some problems can't be partitioned easily into separate addresses
spaces,
in which case threads are a better choice. Even then I might consider
using shared memory among cooperating processes first.

I realize that the Unix fork/exec model of processes doesn't quite apply
in the Windows environment. Anecdotal evidence makes me think that
Windows programmers tend to reach for threads as a multi-tasking
solution
more often than Unix programmers.

One more observation. The desire for real concurrency using multiple
processors is great for problems that can be cleanly partitioned, but if
you have a problem that requires concurrent access to shared data then
you'll have to keep in mind the memory/cache contention that will be
created when processing is distributed across multiple processors (via
processes or threads).

Gary Wright
 
M

MenTaLguY

If you fork, is there even a way to create objects that are shared
between the two forks? Or do you have to rely on rpc/ipc stuff
instead?

Totally RPC. You could use DRb to do this in a Rubyesque fashion.

It's worth noting that no matter _what_ threading approach you use, it's
absolutely best to minimize the number of objects shared between threads.
If someone were to... write a c extension who's objects were threaded,
via pthreads, would it be a nightmare?

Yes, somewhere between nightmare and flesh-rending terror. At least if you're
planning on manipulating Ruby objects from each thread.

You might want to consider using JRuby instead. It's compatible enough with MRI
that it runs Rails, and it uses "real" threads for multi-CPU goodness.

-mental
 
K

Kyle Schmitt

Manipulating ruby objects from inside the threads would be the idea in
some cases I'm thinking of... so it looks like JRuby until YARV gets
concurrent threads... and ooh do I hope it does.

Will the threading interface be drastically different between
MRI/JRuby/YARV? IE does anyone know if I code on MRI will it
automatically use real threads on JRuby, or will I have to re-code
some parts to get that?

Thanks again,
Kyle
 
M

MenTaLguY

does anyone know if I code on MRI will it automatically use real threads on JRuby,

Yes.

The APIs are the same between MRI and JRuby, though JRuby deliberately hedges
on the implementation of certain unsafe features like Thread#kill, Thread#raise,
and Thread.critical=.

-mental
 
S

Sylvain Joyeux

The APIs are the same between MRI and JRuby, though JRuby deliberately
hedges on the implementation of certain unsafe features like
Thread#kill, Thread#raise, and Thread.critical=.
Thread#raise, "unsafe" ? It is the most useful thread-related functionality
I've seen since I'm using threads ! It allows for instance to handle
failing rendezvous the proper way (by using exceptions).

Could you tell us why you think it is "unsafe" ?
 
B

Bill Kelly

From: "Sylvain Joyeux said:
Thread#raise, "unsafe" ? It is the most useful thread-related functionality
I've seen since I'm using threads ! It allows for instance to handle
failing rendezvous the proper way (by using exceptions).

Could you tell us why you think it is "unsafe" ?

Hi,

I'm not sure if this is what MenTaLGuY meant, but one way that
Thread#raise is unsafe, is that it can raise an exception in the
specified thread while that thread is executing an 'ensure' block.

This can cause a failure of critical resources to be cleaned up
correctly, such as locks on mutexes, etc., as some or all of the
code in the ensure block is skipped.

I first ran into this when I tried to use timeout{} to implement
a ConditionVariable#timed_wait, like:

require 'thread'
require 'timeout'
class ConditionVariable
def timed_wait(mutex, timeout_secs)
timeout(timeout_secs) { wait(mutex) } # THIS IS UNSAFE
end
end

Note that 'timeout' functions by creating a temporary new thread
which sleeps for the duration, then raises an exception in the
'current' thread that invoked timeout.

If the timeout raises its exception at an unlucky moment, the
various internals of ConditionVariable#wait and Mutex#synchronize
that depend on ensure blocks to restore their class invariants are
skipped, resulting in nasty things like a permanently locked mutex.

Not fun... :(


Regards,

Bill
 
C

Charles Oliver Nutter

Bill said:
Hi,

I'm not sure if this is what MenTaLGuY meant, but one way that
Thread#raise is unsafe, is that it can raise an exception in the
specified thread while that thread is executing an 'ensure' block.

And to make it clear, we do implement kill, raise, and critical=, with
the following limitations:

- There are no guarantees all other threads will have stopped before
critical= allows the current thread to continue executing.
- Kill and raise require the target thread to eventually reach a
checkpoint where they are willing to "listen" to the kill or raise
event. If they don't, the calling thread will wait forever.

I even made these operations a bit cleaner and faster in 0.9.9, but
there's no way to do them perfectly with real concurrent threads.

- Charlie
 
M

Marcin Raczkowski

Just my opinion but my default choice would be fork when I need
concurrency rather than threads. The main reason is that it forces you
to be explicit in how you structure the communication between processes.
One process can't inadvertently change the state of another.
On a multi-processor box you'll get IO multiplexing and real CPU
concurrency automatically with fork.

Some problems can't be partitioned easily into separate addresses
spaces,
in which case threads are a better choice. Even then I might consider
using shared memory among cooperating processes first.

I realize that the Unix fork/exec model of processes doesn't quite apply
in the Windows environment. Anecdotal evidence makes me think that
Windows programmers tend to reach for threads as a multi-tasking
solution
more often than Unix programmers.

One more observation. The desire for real concurrency using multiple
processors is great for problems that can be cleanly partitioned, but if
you have a problem that requires concurrent access to shared data then
you'll have to keep in mind the memory/cache contention that will be
created when processing is distributed across multiple processors (via
processes or threads).

Gary Wright

As i mentioned earlier - easiest way to get REAL concurency (java VM will NOT
use both processors - for few reasons JavaVM ALWAYS use one processor -
scalling for example Tomcat in production enviroment require running 2-4 java
VM's) is to use Slave gem - I'm using it for my project for concurent
parasing of logs - overhead on DRb is not big -and what's more you can use it
on few machines if you want to scale it further

http://www.codeforpeople.com/lib/ruby/slave/slave-1.2.1/

creating new forks is really easy and you can create just one class for
procesing of data that can be concurent and everything else can be done in
main program
 
M

MenTaLguY

Could you tell us why you think [Thread#raise] is "unsafe" ?

Because you have no control over when the exception is delivered, which may be at the worst possible moment. Even ensure does not provide adequate protection.

Consider what happens with this code if an exception happens to arrive just before the begin block is processed:

@counter += 1
begin
# ... do stuff ...
ensure
@counter -= 1
end

Lest you think there's an easy fix, consider what happens with this second example if an exception arrives after the begin block is entered, but before the counter has been incremented:

begin
@counter += 1
# ... do stuff ...
ensure
@counter -= 1
end

-mental
 
M

MenTaLguY

As i mentioned earlier - easiest way to get REAL concurency (java VM will
NOT use both processors - for few reasons JavaVM ALWAYS use one processor -

Have you got evidence for this? I do not believe it to be the case for a
non-green-threaded JVM.

-mental
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top