Web services and Ruby

L

Luke Kanies

Hi all,

I'm in the process of writing a kind of distributed application, where one or
more central servers does some initial processing of a set of files, and a
bunch of clients then connect and get an appropriate subset of the processed
information. In addition, each of the clients needs to be queryable, so I can
always figure out their status and get metrics and such.

Obviously there are many ways to do this, but given the industry I'm targeting
with this and the applications with which I expect to need to integrate, it
seems like some kind of semi-standardized web service makes the most sense.

So, using some examples online, I hacked up a quick webrick/soap4r server on
both my client and server, and I'm successfully passing information around.

Well, kind of. The problem is that webrick seems to require that my process be
entirely reactive -- both my client and server want to sit there waiting for
someone to connect, when obviously that won't work. I need to get separate
actions going on each process, but webrick seems to want to require that all
action is entirely reactive. So, I'm now in the situation where the
server works entirely reactively, and the client can contact it fine
before I start the client's webrick server, but after the server starts I
lose control of the process.

What I'm really looking for is something like Perl's POE: Something that
allows me to set up multiple sub-processes, none of which are blocking, and all
of which run based on callbacks. On the server side, I want to respond to
requests, and periodically reprocess files as necessary (as they change or
whatever). On the client side, I want to periodically connect to the server
and get new data, and the data I have all has a period on which it is
reassessed -- e.g., every hour verify X is still true. The client needs to
also respond to requests for metrics and such when they come in.

I've been considering setting up the server as a Rails server, although that is
certainly overkill at this point in the game and might be overkill in the long
term. I think that's too heavyweight for the client, though, and I'm not sure
I would get the features I want out of Rails anyway.

Can anyone recommend anything I can use to get this kind of behaviour?
Are threads the only answer? (Please say they aren't.)
 
S

Sam Roberts

Can anyone recommend anything I can use to get this kind of behaviour?
Are threads the only answer? (Please say they aren't.)

Whats wrong with threads? Its a good answer.

Taking a guess, in case you are worried, they aren't real threads, its
just a very nice wrapper around select().

Cheers,
Sam
 
L

Luke Kanies

Whats wrong with threads? Its a good answer.

Well, there are at least two problems with threads: I've never done them
before and I hope not to have to learn them just to do this relatively
simple piece, and second they do add another dimension of complexity, one
which I must always be at least somewhat aware. Given that, as I
mentioned, I'm a newbie to threads, this does not fill me with confidence.
Taking a guess, in case you are worried, they aren't real threads, its
just a very nice wrapper around select().

Yah, I know that they aren't real threads, but I believe they still come
with some of the problems that I would have with real threads.
 
R

Robert Klemme

Luke Kanies said:
Well, there are at least two problems with threads: I've never done them
before and I hope not to have to learn them just to do this relatively
simple piece, and second they do add another dimension of complexity, one
which I must always be at least somewhat aware. Given that, as I
mentioned, I'm a newbie to threads, this does not fill me with confidence.

Yeah, but

- there's quite some resources out there to learn from

- you will have to at some point in time

- but most important so: your application won't work without concurrency on
your nodes; it's an application requirement. It doesn't matter whether you
do that with threads or processes, you need the concurrency. And
concurrency always needs some form of synchronization. I'd say Ruby threads
and synchronization are easier to learn that what different OS provide in
terms of semaphores, locks, mutexes etc. And once you get the basic
concepts it's probably not that difficult to transfer that to some other
implementation / technology.
Yah, I know that they aren't real threads, but I believe they still come
with some of the problems that I would have with real threads.

Like you having to learn them?

Kind regards

robert
 
L

Luke Kanies

Yeah, but

- there's quite some resources out there to learn from

- you will have to at some point in time

- but most important so: your application won't work without concurrency on
your nodes; it's an application requirement. It doesn't matter whether you
do that with threads or processes, you need the concurrency. And concurrency
always needs some form of synchronization. I'd say Ruby threads and
synchronization are easier to learn that what different OS provide in terms
of semaphores, locks, mutexes etc. And once you get the basic concepts it's
probably not that difficult to transfer that to some other implementation /
technology.

Yeah, after sending the previous email, I kind of resigned myself to
threads.

The real problem is that I'm hoping eventually to hire someone who
actually knows what they're doing, and I expect my forays into this area
will be replaced when I succeed in doing so.

In case anyone's interested, I'm writing a (hopefully sophisticated)
project entirely in Ruby:

http://madstop.com/svn/blink

There's a small tutorial:

http://madstop.com/svn/blink/language/trunk/doc/intro.rst

It's kind of a next-generation cfengine, I guess, although it has a far
wider scope.
Like you having to learn them?

Well, yeah, that would be the first problem. :)
 
A

Ara.T.Howard

I'm in the process of writing a kind of distributed application, where one
or more central servers does some initial processing of a set of files, and
a bunch of clients then connect and get an appropriate subset of the
processed information. In addition, each of the clients needs to be
queryable, so I can always figure out their status and get metrics and such.

Obviously there are many ways to do this, but given the industry I'm
targeting with this and the applications with which I expect to need to
integrate, it seems like some kind of semi-standardized web service makes
the most sense.

So, using some examples online, I hacked up a quick webrick/soap4r server on
both my client and server, and I'm successfully passing information around.

Well, kind of. The problem is that webrick seems to require that my process
be entirely reactive -- both my client and server want to sit there waiting
for someone to connect, when obviously that won't work. I need to get
separate actions going on each process, but webrick seems to want to require
that all action is entirely reactive. So, I'm now in the situation where
the server works entirely reactively, and the client can contact it fine
before I start the client's webrick server, but after the server starts I
lose control of the process.

What I'm really looking for is something like Perl's POE: Something that
allows me to set up multiple sub-processes, none of which are blocking, and
all of which run based on callbacks. On the server side, I want to respond
to requests, and periodically reprocess files as necessary (as they change
or whatever). On the client side, I want to periodically connect to the
server and get new data, and the data I have all has a period on which it is
reassessed -- e.g., every hour verify X is still true. The client needs to
also respond to requests for metrics and such when they come in.

I've been considering setting up the server as a Rails server, although that
is certainly overkill at this point in the game and might be overkill in the
long term. I think that's too heavyweight for the client, though, and I'm
not sure I would get the features I want out of Rails anyway.

Can anyone recommend anything I can use to get this kind of behaviour? Are
threads the only answer? (Please say they aren't.)

if you are in *nix and have a central nfs filesystem all nodes can see check
out rq (ruby queue)

http://raa.ruby-lang.org/project/rq/
http://www.codeforpeople.com/lib/ruby/rq/
http://www.linuxjournal.com/article/7922

here's a snapshot of our system

jib:~ > cfq status
---
jobs:
pending: 243
holding: 0
running: 36
finished: 501
dead: 0
total: 780
temporal:
pending:
earliest: { jid: 619, metric: submitted, time: 2005-05-12 11:31:42.919905 }
latest: { jid: 1275, metric: submitted, time: 2005-05-20 14:20:15.163355 }
shortest:
longest:
holding:
earliest:
latest:
shortest:
longest:
running:
earliest: { jid: 613, metric: started, time: 2005-05-19 19:46:09.532144 }
latest: { jid: 1197, metric: started, time: 2005-05-20 15:26:14.373168 }
shortest: { jid: 1197, duration: 00:01:1.258993 }
longest: { jid: 613, duration: 19:41:41.339677 }
finished:
earliest: { jid: 781, metric: finished, time: 2005-05-12 13:35:31.757662 }
latest: { jid: 723, metric: finished, time: 2005-05-20 15:26:13.962584 }
shortest: { jid: 546, duration: 00:11:11.688514 }
longest: { jid: 976, duration: 30:18:18.852480 }
dead:
earliest:
latest:
shortest:
longest:
performance:
avg_time_per_job: 13:02:2.998790
n_jobs_in_last_1_hrs: 3
n_jobs_in_last_2_hrs: 6
n_jobs_in_last_4_hrs: 10
n_jobs_in_last_8_hrs: 23
n_jobs_in_last_16_hrs: 44
n_jobs_in_last_32_hrs: 91
exit_status:
successes: 501
failures: 0

we've run about a half a million jobs through our system now with zero falures
or bugs. if you nfs server/clients are setup right you can install in about 5
minutes without root privledges.

basically the concept would be to have each client/server have a queue that it
was putlling jobs from where all queues were located on a central nfs location.
so every node can submit jobs to every other node and all nodes can run jobs.
this is a servant architechture.

so, for example, working on an nfs mount, on two nodes of mine - jib and carp -
we can setup a queue for each node:


jib:~/shared > rq `hostname`.q create
---
q: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q
db: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/db
schema: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/db.schema
lock: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/lock

carp:~/shared > rq `hostname`.q create
---
q: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q
db: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/db
schema: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/db.schema
lock: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/lock

so now each node has a queue located on a central nfs mount

carp submits a job to jib:

carp:~/shared > rq jib.ngdc.noaa.gov.q/ submit echo 42
---
-
jid: 1
priority: 0
state: pending
submitted: 2005-05-20 15:32:54.664324
started:
finished:
elapsed:
submitter: carp.ngdc.noaa.gov
runner:
pid:
exit_status:
tag:
restartable:
command: echo 42

jib submits a job to carp:

jib:~/shared > rq carp.ngdc.noaa.gov.q/ submit echo 42
---
-
jid: 1
priority: 0
state: pending
submitted: 2005-05-20 15:33:31.209160
started:
finished:
elapsed:
submitter: jib.ngdc.noaa.gov
runner:
pid:
exit_status:
tag:
restartable:
command: echo 42


'feeders' (a process that takes jobs from the queue, runs them, and returns
them to the queue) is started on each node. (normally these are daemons and
be cron'd to be made 'immortal' - the restart if they die)

carp:~/shared > rq carp.ngdc.noaa.gov.q/ feed --log=/dev/null
42

jib:~/shared > rq jib.ngdc.noaa.gov.q/ feed --log=/dev/null
42

so carp ran jib's job and jib ran carp's job. we can see this by:

carp:~/shared > rq jib.ngdc.noaa.gov.q/ query jid=1
---
-
jid: 1
priority: 0
state: finished
submitted: 2005-05-20 15:32:54.664324
started: 2005-05-20 15:39:33.309159
finished: 2005-05-20 15:39:33.438110
elapsed: 0.128951
submitter: carp.ngdc.noaa.gov
runner: jib.ngdc.noaa.gov
pid: 26632
exit_status: 0
tag:
restartable:
command: echo 42

jib:~/shared > rq carp.ngdc.noaa.gov.q/ query jid=1
---
-
jid: 1
priority: 0
state: finished
submitted: 2005-05-20 15:33:31.209160
started: 2005-05-20 15:38:43.503715
finished: 2005-05-20 15:38:43.779134
elapsed: 0.275419
submitter: jib.ngdc.noaa.gov
runner: carp.ngdc.noaa.gov
pid: 20500
exit_status: 0
tag:
restartable:
command: echo 42


all the output is available as yaml and much of it can be input to other
commands. in addition the queue is easily available directly via an api so
it's pretty easy to code descision making based on some other node's queue
contents.

i also have a peice of software called 'dirwatch' (on raa too) that makes it
trivial to setup 'watches' on directories to trigger actions when files are
created, modified, deleted, etc. it's under revision as we speak and is
undergoing major internal overhaul - but the basic funtionality an user
interface won't change much.

hth.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
L

Luke Kanies

if you are in *nix and have a central nfs filesystem all nodes can see check
out rq (ruby queue)

I'm planning on having this system be cross-platform, and it'll be used to
do things like set up NFS, so I can't depend on NFS's existence,
unfortunately.

I'm always interested in more cluster software, though, especially if it's
written in Ruby; a lot of the people working on the same problem I'm
attacking (building software to maintain the computers for us) come from
the cluster world, because the problem is generally both exacerbated there
(because of node count) and easier to solve (because of node consistency).

When I have something that's somewhat more functional, I'd love to hear
whether you thought it would be something you'd be interested in using on
your compute clusters.
i also have a peice of software called 'dirwatch' (on raa too) that makes it
trivial to setup 'watches' on directories to trigger actions when files are
created, modified, deleted, etc. it's under revision as we speak and is
undergoing major internal overhaul - but the basic funtionality an user
interface won't change much.

Does dirwatch use FAM, or is it its own distinct implementation? I am
planning on using something FAM-like eventually, and I notice there's a
very early release of a Ruby FAM library, but, well, 0.1.4 does not
inspire confidence.

--
On Bureaucracy....
The Pythagorean theorem contains 24 words. Archimedes
Principle, 67. The Ten Commandments, 179. The American Declaration of
Independence, 300. And recent legislation in Europe concerning when
and where to smoke, 23,942. -- The European, June 23-29, 1995
 
S

Sam Roberts

Quoting (e-mail address removed), on Sat, May 21, 2005 at 04:57:29AM +0900:
Well, there are at least two problems with threads: I've never done them
before and I hope not to have to learn them just to do this relatively
simple piece, and second they do add another dimension of complexity, one
which I must always be at least somewhat aware. Given that, as I
mentioned, I'm a newbie to threads, this does not fill me with confidence.

I was worried about threads, too, then I found they are trivial. You'll
just need two, I guess, and the data thats shared between them should
all be one place (good design), and you just wrap that with a Mutex.

If really should be easy.
Yah, I know that they aren't real threads, but I believe they still come
with some of the problems that I would have with real threads.

Yeah, but doing things other ways has problems, too, and this way is
well supported in ruby.

Have fun,
Sam
 
A

Ara.T.Howard

I'm planning on having this system be cross-platform, and it'll be used to
do things like set up NFS, so I can't depend on NFS's existence,
unfortunately.

ah. oh well.
I'm always interested in more cluster software, though, especially if it's
written in Ruby; a lot of the people working on the same problem I'm
attacking (building software to maintain the computers for us) come from the
cluster world, because the problem is generally both exacerbated there
(because of node count) and easier to solve (because of node consistency).

we manange this by mounting all binaries via nfs - make && make install then
just goes cluster wide ;-)
When I have something that's somewhat more functional, I'd love to hear
whether you thought it would be something you'd be interested in using on
your compute clusters.

please do.
Does dirwatch use FAM, or is it its own distinct implementation? I am
planning on using something FAM-like eventually, and I notice there's a very
early release of a Ruby FAM library, but, well, 0.1.4 does not inspire
confidence.

nope. it uses an sqlite db to maintain the state of the filesystem. every so
often (5 min by default) in scan the dir and compares to it's db - spawning
actions accordingly. actions can be normal unix filter programs and are
passed the list of file (deleted files, created files, etc) in stdin - or can
also be given more complete info in yaml format - also on stdin. the programs
are run in a syncronous fashion by default - but can also be run async. the
issue with async is nothing prevent 40,000 processes being spawned at once if
40,000 files arrive in a directory at once and you have a job configured to be
launched when a file is created... say - that reminds me - let me tell you
about this time i took down our entire cluster and three raids ;-)

the update of dirwatch should be out next week sometime.

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
A

Ara.T.Howard

20/05/2005 23:11:47
This is very interesting... The code is pure Ruby - would this work on a
Windows environment too?

i think it could made too quite easily since i avoid fork (using IO::popen
instead) to run processes... i would happy to make it work on window, as it
really should be able to - but don't have access to any windows machine. if
someone want to iterate with me after the next release we can 'window-ify' it.

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
R

Robert Klemme

Luke Kanies said:
I'm planning on having this system be cross-platform, and it'll be used to
do things like set up NFS, so I can't depend on NFS's existence,
unfortunately.

I'm always interested in more cluster software, though, especially if it's
written in Ruby; a lot of the people working on the same problem I'm
attacking (building software to maintain the computers for us) come from
the cluster world, because the problem is generally both exacerbated there
(because of node count) and easier to solve (because of node consistency).

Maybe have a look at grid computing. I think there is a standard out there
and maybe even an implementation in ruby. Could be that it uses CORBA for
communication - I don't quite remember.

Cheers

robert
 
B

Bill Guindon

On Sat, 21 May 2005, Graham Foster wrote:
=20
=20
i think it could made too quite easily since i avoid fork (using IO::pope= n
instead) to run processes... i would happy to make it work on window, as= it
really should be able to - but don't have access to any windows machine. = if
someone want to iterate with me after the next release we can 'window-ify=
' it.

You can add me to the list of windows beta testers. I had just run
across this, and had wondered the same thing, but didn't have the time
to dig into it then. Time's freeing up a bit, and I'm very interested
in it.

--=20
Bill Guindon (aka aGorilla)
 
B

Booker C. Bense

-----BEGIN PGP SIGNED MESSAGE-----

Maybe have a look at grid computing.

_ Yikes, only someone that hasn't looked at it would suggest
that.
I think there is a standard out there
and maybe even an implementation in ruby. Could be that it uses CORBA for
communication - I don't quite remember.

_ Grid as in Globus wouldn't work for this since it requires a
pretty hefty infrastructure in place already. The problem isn't
that there is a standard, there are currently 3 standards.
Grid computing is a complete and utter mess at the moment.
The most viable long term solution seems to be something that
looks like web services with enough added sugar to provide
state like capablities.

_ Booker C. Bense

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBQpX5+GTWTAjn5N/lAQHLLwP7BbMKyFn5LplYkrZgVC27Ph92n3J0/9zK
uhmqGjL4aKxBgG+uLrDzhFrBU81R3S4+D6t7BOoR1Mw7o1BDF1xt+eYOgSqaoMsN
lJ7VXci9lW2usbWemH+P9wqrjKNlYj4s8lyqN+0azNUaCubjr79sDFEzIDsLu1i0
gMJICaOMjts=
=aGF7
-----END PGP SIGNATURE-----
 
R

Robert Klemme

Booker said:
-----BEGIN PGP SIGNED MESSAGE-----



_ Yikes, only someone that hasn't looked at it would suggest
that.


_ Grid as in Globus wouldn't work for this since it requires a
pretty hefty infrastructure in place already. The problem isn't
that there is a standard, there are currently 3 standards.
Grid computing is a complete and utter mess at the moment.
The most viable long term solution seems to be something that
looks like web services with enough added sugar to provide
state like capablities.

Thanks for that update! In fact it's quite some time since I took a look
at grid computing. It could have developed in a better direction since
then - but apparently it took the road of many academic / commitee
inventions...

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top