Threads and Ruby

barjunk · Jun 30, 2008

I've been hunting around for information regarding threads, and to me,
it seems confusing and conflicting.

What I'm trying to find out is...if I was going to start using threads
in Ruby, which version of Ruby should I be using.

I've seen folks say that I should use Ruby 1.9 and others say that it
is possible to use earlier versions. Nothing that I found seemed
definitive.

I'm new to all this, so this may be part of the problem.

What I'd like to accomplish is starting a main ruby instance, then
launch threads from that instance that run in their own sandbox.

At this point, I don't believe the threads need to talk with each
other, but it seems I could use some form of message passing to
accomplish this.

Any ideas and direction would be helpful. Thanks.

Mike B.

ara.t.howard · Jun 30, 2008

I've been hunting around for information regarding threads, and to me,
it seems confusing and conflicting.

What I'm trying to find out is...if I was going to start using threads
in Ruby, which version of Ruby should I be using.

I've seen folks say that I should use Ruby 1.9 and others say that it
is possible to use earlier versions. Nothing that I found seemed
definitive.

I'm new to all this, so this may be part of the problem.

What I'd like to accomplish is starting a main ruby instance, then
launch threads from that instance that run in their own sandbox.

At this point, I don't believe the threads need to talk with each
other, but it seems I could use some form of message passing to
accomplish this.

Any ideas and direction would be helpful. Thanks.

Mike B.

for any situation you want processes. use fork or systemu if you want
it portable. threads are not the way to get a sandbox.

cheers.

a @ http://codeforpeople.com/

ara.t.howard · Jun 30, 2008

for any situation you want processes. use fork or systemu if you
want it portable. threads are not the way to get a sandbox.

*or* checkout the slave lib - it may be quite appropriate.

a @ http://codeforpeople.com/

Charles Oliver Nutter · Jun 30, 2008

ara.t.howard said:
for any situation you want processes. use fork or systemu if you want
it portable. threads are not the way to get a sandbox.

I hope you mean "for this situation". Processes are definitely not the
solution to all problems.

- Charlie

Zhukov Pavel · Jun 30, 2008

I haven't used 1.9 much, but the impression I get is:

- use 1.9 if you need _native_ threads (e.g. to take advantage of multiple
processors, or blocking system calls)

- use 1.8 if you want in-process threads, which are lighter and pretty good
for multiplexing io calls (using select()).

If the threads don't need shared state, why not use fork instead of threads?
You can use DRb for IPC.

really advantage on multiply processors? Ruby 1.9 does't use GIL???

Zhukov Pavel · Jun 30, 2008

I hope you mean "for this situation". Processes are definitely not the
solution to all problems.

- Charlie

Ruby doesn't have _real_ threads AFAIK, so DRb+fork it's the only way
to get true parallel work.

Robert Klemme · Jun 30, 2008

2008/6/30 barjunk said:
I've been hunting around for information regarding threads, and to me,
it seems confusing and conflicting.

What I'm trying to find out is...if I was going to start using threads
in Ruby, which version of Ruby should I be using.

I've seen folks say that I should use Ruby 1.9 and others say that it
is possible to use earlier versions. Nothing that I found seemed
definitive.

I'm new to all this, so this may be part of the problem.

What I'd like to accomplish is starting a main ruby instance, then
launch threads from that instance that run in their own sandbox.

At this point, I don't believe the threads need to talk with each
other, but it seems I could use some form of message passing to
accomplish this.

As Ara said, in this case processes are the better choice for several
reasons: they have separation out of the box and they can make use of
multiple cores (which Ruby threads can't unless you use JRuby - this
may change with future 1.9 versions AFAIK).

Kind regards

robert

Joel VanderWerf · Jun 30, 2008

Zhukov said:
really advantage on multiply processors? Ruby 1.9 does't use GIL???

You're quite right.

ara.t.howard · Jun 30, 2008

I hope you mean "for this situation". Processes are definitely not
the solution to all problems.

well, given that the difference between processes and threads is an
incredibly small one for any modern os, and given that threads are *at
least* 100 harder to write deterministic code for (as your bug reports
regarding exception handling and ruby illustrate) i'd hazard a guess
that processes are *almost* always the correct solution when robust
code is desired. in otherwords i'd take the position that one should
always use processes unless the reason becomes clear to use threads
and, of course ,there are indeed reasons. this is mostly a comment on
the limitations of programmers and not on platforms or languages,
nevertheless the incredible ease of IPC with ruby makes it even more
true imho.

cheers.

a @ http://codeforpeople.com/

IÃ±aki Baz Castillo · Jun 30, 2008

El Lunes, 30 de Junio de 2008, Joel VanderWerf escribi=C3=B3:

You're quite right.

A good article about it:
http://www.infoq.com/news/2007/05/ruby-threading-futures

=2D-=20
I=C3=B1aki Baz Castillo

Charles Oliver Nutter · Jul 1, 2008

ara.t.howard said:
well, given that the difference between processes and threads is an
incredibly small one for any modern os, and given that threads are *at
least* 100 harder to write deterministic code for (as your bug reports
regarding exception handling and ruby illustrate) i'd hazard a guess
that processes are *almost* always the correct solution when robust code
is desired. in otherwords i'd take the position that one should always
use processes unless the reason becomes clear to use threads and, of
course ,there are indeed reasons. this is mostly a comment on the
limitations of programmers and not on platforms or languages,
nevertheless the incredible ease of IPC with ruby makes it even more
true imho.

The fact that Ruby's threading has many breakages and pitfalls does not
mean threading in general is the wrong way to fix things. Java threading
works extremely well, with the only real requirement that you must
either synchronize or avoid access to shared resources.
Power...responsibility...etc. You can't damn threading because the
standard implementation of Ruby doesn't do it well.

Perhaps you're right that when you only have access to green threads
that processes are the right way to go, since green threads don't really
gain you anything other than simulated asynchrony. But native threads
done right are as good as separate processes, with the bonus that you
can share fast in-memory access to resources if you're willing to accept
the synchronization cost and complexity.

- Charlie

Charles Oliver Nutter · Jul 1, 2008

Zhukov said:
Ruby doesn't have _real_ threads AFAIK, so DRb+fork it's the only way
to get true parallel work.

JRuby has native threads that are _really_ parallel.

- Charlie

Charles Oliver Nutter · Jul 1, 2008

Robert said:
As Ara said, in this case processes are the better choice for several
reasons: they have separation out of the box and they can make use of
multiple cores (which Ruby threads can't unless you use JRuby - this
may change with future 1.9 versions AFAIK).

This is an eventual goal, but I asked ko1 about it and such work has not
started yet. It will be hard.

Processes are probably better under Ruby, but it's most definitely worth
trying threads under JRuby first.

- Charlie

ara.t.howard · Jul 1, 2008

The fact that Ruby's threading has many breakages and pitfalls does
not mean threading in general is the wrong way to fix things. Java
threading works extremely well, with the only real requirement that
you must either synchronize or avoid access to shared resources.
Power...responsibility...etc. You can't damn threading because the
standard implementation of Ruby doesn't do it well.

Perhaps you're right that when you only have access to green threads
that processes are the right way to go, since green threads don't
really gain you anything other than simulated asynchrony. But native
threads done right are as good as separate processes, with the bonus
that you can share fast in-memory access to resources if you're
willing to accept the synchronization cost and complexity.

yeah i agree 100% in principle. however i was programming java when
stopping threads suddenly became depreciated, which i know you know
all about, but for others

http://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html

so doing something as simple as stopping a thread can be complicated.
i can kill a process and all resources will be returned to the
system. the fact that sun took quite a few years to figure this out,
and that matz ruby had the bugs you recently found beg the question:
if matz cannot do exceptions + threads right, if sun cannot get
stopping a thread right for years, what chance do i have of writing
code for, say, a web server that's supposed to run 24x7? i think
modern languages are caving to the reality that most (aka average)
programmers simply cannot program threads safely and are increasingly
moving towards the message passing paradigm ousterhout has been raving
about for years.

now having said that, i very often use ruby threads but often do so in
a message passing fashion and even more often use those threads to
spawn processes and achieve parallelism so i definitely am glad they
are there (Thread.new{ curl } is ultra powerful). still, i can't help
but feel they are destined to become relics - at least in the direct
fashion we use them now.

kind regards.

a @ http://codeforpeople.com/

Charles Oliver Nutter · Jul 1, 2008

ara.t.howard said:
yeah i agree 100% in principle. however i was programming java when
stopping threads suddenly became depreciated, which i know you know all
about, but for others

The deprecation of thread stop, suspend, and exception raising was
implemented precisely because of the shared resource requirements. If
you can stop a thread in an environment where it may have been using
resources other threads will use, it's impossible to know if those
resources have been cleaned up or released safely. Sure, you can stop a
process. The Java deprecations were done because it's provably
impossible to share in-process resources and safely terminate threads at
will.

The same goes for shared out-of-process resources, but since it's harder
to share out-of-process resources it's harder to do serious damage. You
can still corrupt files, orphan processes, or leave sibling processes
waiting for data that will never arrive. You can even introduce exactly
the same race conditions common to threading if you want multiple
processes to perform atomic mutations of shared files or memory. If you
have a large interconnected app with lots of processes communicating or
using shared resources, arbitrarily nuking one of them can cause exactly
the same headaches. It's a factor of resource sharing and
interdependency, rather than anything specific to threading over processes.

now having said that, i very often use ruby threads but often do so in a
message passing fashion and even more often use those threads to spawn
processes and achieve parallelism so i definitely am glad they are there
(Thread.new{ curl } is ultra powerful). still, i can't help but feel
they are destined to become relics - at least in the direct fashion we
use them now.

Probably not, but hopefully neither will typical IPC mechanisms, which
are almost as painful to get right and make reliable. Threads are a
low-level API, perhaps lower-level than day-to-day programmers should
generally have to deal with. But it's absurd to say that processes can
do everything threads can, otherwise we'd have a massive process bloat
for almost every nontrivial applications we use. Threads have a place,
though the ease in which resources can be shared often makes it a
dangerous place to go. Let's not throw the threading baby out with the
shared resource bathwater.

- Charlie

ara.t.howard · Jul 1, 2008

The Java deprecations were done because it's provably impossible to
share in-process resources and safely terminate threads at will.

<snip>

i think java is correct to have done so, precisely because people have
found it too hard to write safe code using those mechanisms but as you
point out, you can do the same with processes while no OS has limited
us yet. why? i think it's because sharing data between threads, or
processes, is both dangerous and powerful. when someone builds an OS
that supports message passing we'll see those operations limited on a
processes too i bet but, for now, they are just too useful despite the
danger and do work 'much' of the time which, for better or worse,
seems to be the MO for many programming tasks.

Probably not, but hopefully neither will typical IPC mechanisms,
which are almost as painful to get right and make reliable.

in fairness we're talking about ruby here where that is definitely not
true. it's extremely painless to have reliable ipc with ruby using
drb or com with sqlite as a message store.

. But it's absurd to say that processes can do everything threads
can, otherwise we'd have a massive process bloat for almost every
nontrivial applications we use.

for the record i'm not saying that - i'm saying that processes are a
better starting point for most people wanting to gain parallelism in
ruby for most problems. threads are appropriate at times too but, in
ruby, the disadvantages like lack of cpu migration, blocking the
entire process (in win), etc limit their usefulness - jruby excepted
of course (no jab, it's a huge advantage jruby has over the mri)

kind regards.

a @ http://codeforpeople.com/

Charles Oliver Nutter · Jul 1, 2008

ara.t.howard said:
in fairness we're talking about ruby here where that is definitely not
true. it's extremely painless to have reliable ipc with ruby using drb
or com with sqlite as a message store.

Since DRb operates over a network it's not reliable by definition; you
have to deal with the other end going away, etc. With COM, you're either
going over a network or using same-machine IPC mechanisms that are only
a bit more reliable (or loading things in-process, which is then back to
threads). And with sqlite, you need to synchronize writes and possibly
reads or you need to hope sqlite will do that for you (I don't know if
it does). And then you're into locking, atomicity, etc.

So for IPC or cross-"process" data comm or sharing, I think processes:

- give you fewer ways to shoot yourself in the foot
- the remaining ways are somewhat less likely to be dangerous
- but they mostly leave the options that are by and large the most
complicated and the most prone to complete failure (e.g. external
process goes away completely).

Meanwhile, threads

- give you many, many ways to shoot yourself in the foot
- sometimes with catastrophic consequences
- but you can turn the complexity knob down much lower

Choose wisely.

- Charlie

Joel VanderWerf · Jul 1, 2008

Charles said:
Since DRb operates over a network it's not reliable by definition; you
have to deal with the other end going away, etc. With COM, you're either
going over a network or using same-machine IPC mechanisms that are only
a bit more reliable (or loading things in-process, which is then back to
threads). And with sqlite, you need to synchronize writes and possibly
reads or you need to hope sqlite will do that for you (I don't know if
it does). And then you're into locking, atomicity, etc.

Yes, sqlite does synchronize, but a potential problem is granularity: a
writer gets an exclusive lock on the entire db.

Joel VanderWerf · Jul 1, 2008

Joel said:
Yes, sqlite does synchronize, but a potential problem is granularity: a
writer gets an exclusive lock on the entire db.

Clarification: "a potential problem _for IPC_". We're using it for IPC
between programs written in C and ruby, and haven't had major problems
with this yet.

ara.t.howard · Jul 1, 2008

Yes, sqlite does synchronize, but a potential problem is
granularity: a writer gets an exclusive lock on the entire db.

but only very briefly, in practice the throughput is close to what you
can achieve with mutexes combined with the mri thread scheduler

http://www.sqlite.org/lockingv3.html

one of the reasons this is true is that for a heavily threaded ruby
program (green threads) you end up with the entire process sometimes
blocked on io and the threads end up getting into a pattern where all
of them need to write at once - a kind of rhythm - with processes the
ability for the OS to schedule access to resources ends up staggering
the phase of execution so access is generally faster than it 'ought'
to be taking only TPS into account.

this a *wild* generalization based only on the kinds of parallel
processing i've done, but i've seen the pattern where a heavily
threaded program ends up being effectively serial enough times to
mention it...

a @ http://codeforpeople.com/

C extension and Ruby threads	3	Jul 25, 2009
Threads	1	Oct 14, 2008
Ruby & Threads	3	Jul 14, 2008
Ruby 1.9.1, Threads and "[BUG] The handle is invalid."	10	Apr 21, 2010
ruby-forum truncating threads?	1	Oct 14, 2010
[THREADS] Behaviour of Thread#stop?	3	Dec 14, 2010
blocking ruby threads	1	Feb 24, 2007
Ruby Threads in Windows	6	Nov 3, 2008

Threads and Ruby

barjunk

ara.t.howard

ara.t.howard

Charles Oliver Nutter

Zhukov Pavel

Zhukov Pavel

Robert Klemme

Joel VanderWerf

ara.t.howard

IÃ±aki Baz Castillo

Charles Oliver Nutter

Charles Oliver Nutter

Charles Oliver Nutter

ara.t.howard

Charles Oliver Nutter

ara.t.howard

Charles Oliver Nutter

Joel VanderWerf

Joel VanderWerf

ara.t.howard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads