Fault Tolerant DRb?

K

Kirk Haines

Just pondering different things this morning, and my mind came back to
something I've thought about now and again.

Assume you are using a DRb service for....something. It doesn't matter what.
The case is the same whether one is accessing an array via DRb or a Rinda
Ring. Is there some reasonably easy way of making a service work in a fault
tolerant way? That is, one could have two processes on two different
machines both offering the same service. If one process dies, the data is
still present on the other, and the clients of that service can continue
operating without data loss?


Kirk Haines
 
A

Ara.T.Howard

Just pondering different things this morning, and my mind came back to
something I've thought about now and again.

Assume you are using a DRb service for....something. It doesn't matter what.
The case is the same whether one is accessing an array via DRb or a Rinda
Ring. Is there some reasonably easy way of making a service work in a fault
tolerant way? That is, one could have two processes on two different
machines both offering the same service. If one process dies, the data is
still present on the other, and the clients of that service can continue
operating without data loss?


Kirk Haines

i've done tons of ha (high availability) setups before for stateful and
stateless machines. suffice it to say it is almost un-imaginably complex.
consider:

* how to you tell if one machine is down vs. the network just being slow?
for instance on our machines monthly backups might make any machine seem
dead (can't ping) for 20 minutes or more. typically this is solved via a
serial cable between nodes to ping on using real-time priorities.

* if you have the data on both machines and it can EVER be written to
(modified) how to you bring the data back in sync when a machine has died
but is now back up?

these problems are solved - but it's still amazingly hard to get right. check
out the linux-ha project (google it).

depending on you needs you may be able to code something simple that 'good
enough' but you'll need some sort of distributed transaction capability and
the easist way to get that is via a real rdbms like postgresql. however, once
you have that setup it's stilly to use drb unless your data is terrible to
model within the relational model.

feel free to contact me offline if you want to setup an ha box(es).

hth.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
S

Shashank Date

Hi Kirk,

I have written something like this a long time back to build fault tolera=
nt database clusters.
It became pretty messy pretty quick (of course I was not as proficient in=
Ruby back then ;-)).=20
So I have some questions:

--- Kirk Haines said:
Assume you are using a DRb service for....something. It doesn't matter= what. =20
The case is the same whether one is accessing an array via DRb or a Rin= da=20
Ring. Is there some reasonably easy way of making a service work in a = fault=20
tolerant way? That is, one could have two processes on two different=20
machines both offering the same service. If one process dies, the data= is=20
still present on the other,=20
^^^^^^^^^^^^^^^^^^^^^^^^^^
How do you propose to ensure that? Is it on a shared file system (like NF=
S).
If true, then take a look at Ara's rq package:

http://www.codeforpeople.com/lib/ruby/rq/rq-2.3.0/TUTORIAL

If false, then think of some "easy" way of replication.
and the clients of that service can continue=20
operating without data loss?

I had to worry about how the clients who were in the middle of a request =
would know that the
service is no longer available.=20
=20
Kirk Haines
=20

-- shanko



=09
____________________________________________________=20
Yahoo! Sports=20
Rekindle the Rivalries. Sign up for Fantasy Football=20
http://football.fantasysports.yahoo.com
 
K

Kirk Haines

depending on you needs you may be able to code something simple that 'good
enough' but you'll need some sort of distributed transaction capability and
the easist way to get that is via a real rdbms like postgresql. however,
once you have that setup it's stilly to use drb unless your data is
terrible to model within the relational model.

LOL. All valid points. You never know, though. Sometimes when one asks for
something magical and unlikely, someone else pipes up and delivers. It was
worth a shot. Thanks Ara (and Shashank) for the comments.


Kirk Haines
 
G

gwtmp01

Assume you are using a DRb service for....something. It doesn't
matter what.
The case is the same whether one is accessing an array via DRb or a
Rinda
Ring. Is there some reasonably easy way of making a service work
in a fault
tolerant way?

You might want to take a look at some of the software and ideas at
http://www.cse.cuhk.edu.hk/~xychen/GroupCS/gcs.htm

This page has a great summary of toolkits that implement
"process group communication" or "virtual synchrony". A variety of
toolkits
have evolved and been released in various forms. While I don't know
of any
ruby implementation or wrapper for these ideas/software it would be a
great
project.

The goal of process group communication is to send a series of
messages to a
named group of recipients and ensure that every member of the group
receives
the messages in a globally consistent order in the presence of
communication
and/or hardware failures. From this foundation you can build a
variety of
fault tolerant systems.


Gary Wright
 
A

Ara.T.Howard

You might want to take a look at some of the software and ideas at
http://www.cse.cuhk.edu.hk/~xychen/GroupCS/gcs.htm

This page has a great summary of toolkits that implement
"process group communication" or "virtual synchrony". A variety of toolkits
have evolved and been released in various forms. While I don't know of any
ruby implementation or wrapper for these ideas/software it would be a great
project.

The goal of process group communication is to send a series of messages to a
named group of recipients and ensure that every member of the group receives
the messages in a globally consistent order in the presence of communication
and/or hardware failures. From this foundation you can build a variety of
fault tolerant systems.

Gary Wright

http://raa.ruby-lang.org/project/rb_spread/

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
A

Ara.T.Howard

Cool! After I posted my link I found the main Spread
site and have been reading about it for the last hour or so.

Now I have something to play with!

i think i may have a patched version of this around... seems like there was a
little buggette or two in it... let me know if you can't get it working and
i'll look for it.

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top