[ANN] ruby queue : rq-0.1.2


A

ara howard

(very sorry if this is posted multiple times - for some reason i have
not seen
my original post)

rubyists-

rq (ruby queue) is a project aimed at filling the void between
roll-your own
distributed processing using ssh/rsh and full blown clustering
software like
sun grid engine. it is a tool designed to throw a bunch of nodes at a
list of
tasks in hurry. it is highly fault tolerant due to it's decentralized
design
and simple to use, requiring only a few minutes to setup and a the use
of
three or four simple commands. at this point doccumentation is scant
and this
release carries an experimental status; however, our site has run
nearly a
million jobs through rq over that last few months with no problems and
i am
excited to gather opions about the intial design before starting in
ernest on
an alpha release. please feel free to contact me either on or offline
with
any questions or assistance getting setup as i am eagar to find some
willing
testers.

for now the project lives at

http://raa.ruby-lang.org/project/rq/

though a rubyforge/gem dist will accompany the alpha release

cheers.

-a


from 'rq -help'

NAME
rq v0.1.2

SYNOPSIS
rq [queue] mode [mode_args]* [options]*

DESCRIPTION
rq is an __experimental__ tool used to manage nfs mounted work
queues. multiple instances of rq on multiples hosts can work
from
these queues to distribute processing load to 'n' nodes -
bringing many dozens
of otherwise powerful cpus to their knees with a single blow.
clearly this
software should be kept out of the hands of radicals, SETI
enthusiasts, and
one mr. jeff safran.

rq operates in one of the modes create, submit, feed, list,
delete,
query, or help. depending on the mode of operation and the
options used the
meaning of mode_args may change, sometime wildly and
unpredictably (i jest, of
course).


MODES

modes may be abbreviated to uniqueness, therefore the following
shortcuts
apply :

c => create
s => submit
f => feed
l => list
d => delete
q => query
h => help

create, c :

creates a queue. the queue MUST be located on an nfs mounted
file system
visible from all nodes intended to run jobs from it.

examples :

0) to create a queue
~ > rq q create
or simply
~ > rq q c

list, l :

show combinations of pending, running, dead, or finished jobs.
for this
command mode_args must be one of pending, running, dead,
finished, or all.
the default is all.

mode_args may be abbreviated to uniqueness, therefore the
following
shortcuts apply :

p => pending
r => running
f => finished
d => dead
a => all

examples :

0) show everything in q
~ > rq q list all
or
~ > rq q l all
or
~ > export RQ_Q=q
~ > rq l

0) show q's pending jobs
~ > rq q list pending

1) show q's running jobs
~ > rq q list running

2) show q's finished jobs
~ > rq q list finshed


submit, s :

submit jobs to a queue to be proccesed by any feeding node.
any mode_args
are taken as the command to run. note that mode_args are
subject to shell
expansion - if you don't understand what this means do not use
this feature.

when running in submit mode a file may by specified as a list
of commands to
run using the '--infile, -i' option. this file is taken to be
a newline
separated list of commands to submit, blank lines and comments
(#) are
allowed. if submitting a large number of jobs the input file
method is MUCH
more efficient. if no commands are specified on the command
line rq
automaticallys reads them from STDIN. yaml formatted files
are also allowed
as input (http://www.yaml.org/) - note that output of nearly
all rq
commands is valid yaml and may, therefore, be piped as input
into the submit
command.

the '--priority, -p' option can be used here to determine the
priority of
jobs. priorities may be any number (0, 10]; therefore 9 is
the maximum
priority. submitting a high priority job will NOT supplant
currently
running low priority jobs, but higher priority jobs will
always migrate
above lower priority jobs in the queue in order that they be
run sooner.
note that constant submission of high priority jobs may create
a starvation
situation whereby low priority jobs are never allowed to run.
avoiding this
situation is the responsibility of the user.

examples :

0) submit the job ls to run on some feeding host

~ > rq q s ls

1) submit the job ls to run on some feeding host, at
priority 9

~ > rq -p9 q s ls

2) submit 42000 jobs (quietly) to run from a command file.

~ > wc -l cmdfile
42000
~ > rq q s -q < cmdfile

3) submit 42 jobs to run at priority 9 from a command file.

~ > wc -l cmdfile
42
~ > rq -p9 q s < cmdfile

4) re-submit all finished jobs

~ > rq q l f | rq q s


feed, f :

take jobs from the queue and run them on behalf of the
submitter. jobs are
taken from the queue in an 'oldest highest priority' order.

feeders can be run from any number of nodes allowing you to
harness the CPU
power of many nodes simoultaneously in order to more
effectively clobber
your network.

the most useful method of feeding from a queue is to do so in
daemon mode so
that if the process loses it's controling terminal and will
not exit when
you exit your terminal session. use the '--daemon, -d' option
to accomplish
this. by default only one feeding process per host per queue
is allowed to
run at any given moment. because of this it is acceptable to
start a feeder
at some regular interval from a cron entry since, if a feeder
is alreay
running, the process will simply exit and otherwise a new
feeder will be
started. in this way you may keep feeder processing running
even acroess
machine reboots.


examples :

0) feed from a queue verbosely for debugging purposes, using
a minimum and
maximum polling time of 2 and 4 respectively

~ > rq q feed -v4 -m2 -M4

1) feed from a queue in daemon mode logging into
/home/ahoward/rq.log

~ > rq q feed -d -l/home/ahoward/rq.log

2) use something like this sample crontab entry to keep a
feeder running
forever (it attempts to (re)start every fifteen minutes)

#
# your crontab file
#

*/15 * * * * /full/path/to/bin/rq
/full/path/to/nfs/mounted/q f -d -l/home/user/rq.log

log rolling while running in daemon mode is automatic.


delete, d :

delete combinations of pending, running, finished, dead, or
specific jobs.
the delete mode is capable of parsing the output of list mode,
making it
possible to create filters to delete jobs meeting very
specific conditions.

mode_args are the same as for 'list', including 'running'.
note that it is
possible to 'delete' a running job, but there is no way to
actually STOP it
mid execution since the node doing the deleteing has no way to
communicate
this information to the (possibly) remote execution host.
therefore you
should use the 'delete running' feature with care and only for
housekeeping
purposes or to prevent future jobs from being scheduled.

examples :

0) delete all pending, running, and finished jobs from a
queue

~ > rq q d all

1) delete all pending jobs from a queue

~ > rq q d p

2) delete all finished jobs from a queue

~ > rq q d f

3) delete jobs via hand crafted filter program

~ > rq q list | filter_prog | rq q d

query, q :

query exposes the database more directly the user, evaluating
the where
clause specified on the command line (or from STDIN). this
feature can be
used to make a fine grained slection of jobs for reporting or
as input into
the delete command. you must have a basic understanding of
SQL syntax to
use this feature, but it is fairly intuitive in this capacity.

examples:

0) show all jobs submitted within a specific 10 minute range

~ > rq q query "started >= '2004-06-29 22:51:00' and
started < '2004-06-29 22:51:10'"

1) shell quoting can be tricky here so input on STDIN is
also allowed

~ > cat contraints
started >= '2004-06-29 22:51:00' and
started < '2004-06-29 22:51:10'

~ > rq q query < contraints
or (same thing)

~ > cat contraints | rq q query

2) this query output may then be used to delete specific
jobs

~ > cat contraints | rq q query | rq q d

3) show all jobs which are either finished or dead

~ > rq q q state=finished or state=dead


NOTES
- realize that your job is going to be running on a remote host
and this has
implication. paths, for example, should be absolute, not
relative.
specifically the submitted job must be visible from all hosts
currently
feeding from a q.

- you need to consider __CAREFULLY__ what the ramifications of
having multiple
instances of your program all running at the same time will
be. it is
beyond the scope of rq to ensure multiple instances of a
program
will not overwrite each others output files, for instance.
coordination of
programs is left entirely to the user.

- the list of finished jobs will grow without bound unless you
sometimes
delete some (all) of them. the reason for this is that rq
cannot
know when the user has collected the exit_status, etc. from a
job and so
keeps this information in the queue until instructed to delete
it.

- if you are using the crontab feature to maintain an immortal
feeder on a
host then that feeder will be running in the environment
provided by cron.
this is NOT the same environment found in a login shell and
you may be
suprised at the range of commands which do not function. if
you want
submitted jobs to behave as closely as possibly to their
behaviour when
typed interactively you'll need to wrap each job in a shell
script that
looks like the following:

#/bin/bash --login
commmands_for_your_job

and submit that script


ENVIRONMENT
RQ_Q: full path to queue

the queue argument to all commands may be omitted if, and only
if, the
environment variable 'RQ_Q' contains the full path to the q.
eg.

~ > export RQ_Q=/full/path/to/my/q

this feature can save a considerable amount of typing for
those weak of wrist


DIAGNOSTICS
success => $? == 0
failure => $? != 0


AUTHOR
(e-mail address removed)


BUGS
1 < bugno && bugno <= 42


OPTIONS


-f, --feed=appetite
-p, --priority=priority
--name
-d, --daemon
-q, --quiet
-e, --select
-i, --infile=infile
-M, --max_sleep=seconds
-m, --min_sleep=seconds
-l, --log=path
-v=0-4|debug|info|warn|error|fatal
--verbosity
--log_age=log_age
--log_size=log_size
-c, --config=path
--template=template
-h, --help
 
Ad

Advertisements

J

Jamis Buck

Congrats, Ara! You've been working on this one for a while now, I
believe. I'm excited to play with it, though I regret that there's
probably not much I could use it on at work... I'll definately add it to
my toolbox, though.

Thanks!

- Jamis

ara said:
(very sorry if this is posted multiple times - for some reason i have
not seen
my original post)

rubyists-

rq (ruby queue) is a project aimed at filling the void between
roll-your own
distributed processing using ssh/rsh and full blown clustering
software like
sun grid engine. it is a tool designed to throw a bunch of nodes at a
list of
tasks in hurry. it is highly fault tolerant due to it's decentralized
design
and simple to use, requiring only a few minutes to setup and a the use
of
three or four simple commands. at this point doccumentation is scant
and this
release carries an experimental status; however, our site has run
nearly a
million jobs through rq over that last few months with no problems and
i am
excited to gather opions about the intial design before starting in
ernest on
an alpha release. please feel free to contact me either on or offline
with
any questions or assistance getting setup as i am eagar to find some
willing
testers.

for now the project lives at

http://raa.ruby-lang.org/project/rq/

though a rubyforge/gem dist will accompany the alpha release

cheers.

-a


from 'rq -help'

NAME
rq v0.1.2

SYNOPSIS
rq [queue] mode [mode_args]* [options]*

DESCRIPTION
rq is an __experimental__ tool used to manage nfs mounted work
queues. multiple instances of rq on multiples hosts can work
from
these queues to distribute processing load to 'n' nodes -
bringing many dozens
of otherwise powerful cpus to their knees with a single blow.
clearly this
software should be kept out of the hands of radicals, SETI
enthusiasts, and
one mr. jeff safran.

rq operates in one of the modes create, submit, feed, list,
delete,
query, or help. depending on the mode of operation and the
options used the
meaning of mode_args may change, sometime wildly and
unpredictably (i jest, of
course).


MODES

modes may be abbreviated to uniqueness, therefore the following
shortcuts
apply :

c => create
s => submit
f => feed
l => list
d => delete
q => query
h => help

create, c :

creates a queue. the queue MUST be located on an nfs mounted
file system
visible from all nodes intended to run jobs from it.

examples :

0) to create a queue
~ > rq q create
or simply
~ > rq q c

list, l :

show combinations of pending, running, dead, or finished jobs.
for this
command mode_args must be one of pending, running, dead,
finished, or all.
the default is all.

mode_args may be abbreviated to uniqueness, therefore the
following
shortcuts apply :

p => pending
r => running
f => finished
d => dead
a => all

examples :

0) show everything in q
~ > rq q list all
or
~ > rq q l all
or
~ > export RQ_Q=q
~ > rq l

0) show q's pending jobs
~ > rq q list pending

1) show q's running jobs
~ > rq q list running

2) show q's finished jobs
~ > rq q list finshed


submit, s :

submit jobs to a queue to be proccesed by any feeding node.
any mode_args
are taken as the command to run. note that mode_args are
subject to shell
expansion - if you don't understand what this means do not use
this feature.

when running in submit mode a file may by specified as a list
of commands to
run using the '--infile, -i' option. this file is taken to be
a newline
separated list of commands to submit, blank lines and comments
(#) are
allowed. if submitting a large number of jobs the input file
method is MUCH
more efficient. if no commands are specified on the command
line rq
automaticallys reads them from STDIN. yaml formatted files
are also allowed
as input (http://www.yaml.org/) - note that output of nearly
all rq
commands is valid yaml and may, therefore, be piped as input
into the submit
command.

the '--priority, -p' option can be used here to determine the
priority of
jobs. priorities may be any number (0, 10]; therefore 9 is
the maximum
priority. submitting a high priority job will NOT supplant
currently
running low priority jobs, but higher priority jobs will
always migrate
above lower priority jobs in the queue in order that they be
run sooner.
note that constant submission of high priority jobs may create
a starvation
situation whereby low priority jobs are never allowed to run.
avoiding this
situation is the responsibility of the user.

examples :

0) submit the job ls to run on some feeding host

~ > rq q s ls

1) submit the job ls to run on some feeding host, at
priority 9

~ > rq -p9 q s ls

2) submit 42000 jobs (quietly) to run from a command file.

~ > wc -l cmdfile
42000
~ > rq q s -q < cmdfile

3) submit 42 jobs to run at priority 9 from a command file.

~ > wc -l cmdfile
42
~ > rq -p9 q s < cmdfile

4) re-submit all finished jobs

~ > rq q l f | rq q s


feed, f :

take jobs from the queue and run them on behalf of the
submitter. jobs are
taken from the queue in an 'oldest highest priority' order.

feeders can be run from any number of nodes allowing you to
harness the CPU
power of many nodes simoultaneously in order to more
effectively clobber
your network.

the most useful method of feeding from a queue is to do so in
daemon mode so
that if the process loses it's controling terminal and will
not exit when
you exit your terminal session. use the '--daemon, -d' option
to accomplish
this. by default only one feeding process per host per queue
is allowed to
run at any given moment. because of this it is acceptable to
start a feeder
at some regular interval from a cron entry since, if a feeder
is alreay
running, the process will simply exit and otherwise a new
feeder will be
started. in this way you may keep feeder processing running
even acroess
machine reboots.


examples :

0) feed from a queue verbosely for debugging purposes, using
a minimum and
maximum polling time of 2 and 4 respectively

~ > rq q feed -v4 -m2 -M4

1) feed from a queue in daemon mode logging into
/home/ahoward/rq.log

~ > rq q feed -d -l/home/ahoward/rq.log

2) use something like this sample crontab entry to keep a
feeder running
forever (it attempts to (re)start every fifteen minutes)

#
# your crontab file
#

*/15 * * * * /full/path/to/bin/rq
/full/path/to/nfs/mounted/q f -d -l/home/user/rq.log

log rolling while running in daemon mode is automatic.


delete, d :

delete combinations of pending, running, finished, dead, or
specific jobs.
the delete mode is capable of parsing the output of list mode,
making it
possible to create filters to delete jobs meeting very
specific conditions.

mode_args are the same as for 'list', including 'running'.
note that it is
possible to 'delete' a running job, but there is no way to
actually STOP it
mid execution since the node doing the deleteing has no way to
communicate
this information to the (possibly) remote execution host.
therefore you
should use the 'delete running' feature with care and only for
housekeeping
purposes or to prevent future jobs from being scheduled.

examples :

0) delete all pending, running, and finished jobs from a
queue

~ > rq q d all

1) delete all pending jobs from a queue

~ > rq q d p

2) delete all finished jobs from a queue

~ > rq q d f

3) delete jobs via hand crafted filter program

~ > rq q list | filter_prog | rq q d

query, q :

query exposes the database more directly the user, evaluating
the where
clause specified on the command line (or from STDIN). this
feature can be
used to make a fine grained slection of jobs for reporting or
as input into
the delete command. you must have a basic understanding of
SQL syntax to
use this feature, but it is fairly intuitive in this capacity.

examples:

0) show all jobs submitted within a specific 10 minute range

~ > rq q query "started >= '2004-06-29 22:51:00' and
started < '2004-06-29 22:51:10'"

1) shell quoting can be tricky here so input on STDIN is
also allowed

~ > cat contraints
started >= '2004-06-29 22:51:00' and
started < '2004-06-29 22:51:10'

~ > rq q query < contraints
or (same thing)

~ > cat contraints | rq q query

2) this query output may then be used to delete specific
jobs

~ > cat contraints | rq q query | rq q d

3) show all jobs which are either finished or dead

~ > rq q q state=finished or state=dead


NOTES
- realize that your job is going to be running on a remote host
and this has
implication. paths, for example, should be absolute, not
relative.
specifically the submitted job must be visible from all hosts
currently
feeding from a q.

- you need to consider __CAREFULLY__ what the ramifications of
having multiple
instances of your program all running at the same time will
be. it is
beyond the scope of rq to ensure multiple instances of a
program
will not overwrite each others output files, for instance.
coordination of
programs is left entirely to the user.

- the list of finished jobs will grow without bound unless you
sometimes
delete some (all) of them. the reason for this is that rq
cannot
know when the user has collected the exit_status, etc. from a
job and so
keeps this information in the queue until instructed to delete
it.

- if you are using the crontab feature to maintain an immortal
feeder on a
host then that feeder will be running in the environment
provided by cron.
this is NOT the same environment found in a login shell and
you may be
suprised at the range of commands which do not function. if
you want
submitted jobs to behave as closely as possibly to their
behaviour when
typed interactively you'll need to wrap each job in a shell
script that
looks like the following:

#/bin/bash --login
commmands_for_your_job

and submit that script


ENVIRONMENT
RQ_Q: full path to queue

the queue argument to all commands may be omitted if, and only
if, the
environment variable 'RQ_Q' contains the full path to the q.
eg.

~ > export RQ_Q=/full/path/to/my/q

this feature can save a considerable amount of typing for
those weak of wrist


DIAGNOSTICS
success => $? == 0
failure => $? != 0


AUTHOR
(e-mail address removed)


BUGS
1 < bugno && bugno <= 42


OPTIONS


-f, --feed=appetite
-p, --priority=priority
--name
-d, --daemon
-q, --quiet
-e, --select
-i, --infile=infile
-M, --max_sleep=seconds
-m, --min_sleep=seconds
-l, --log=path
-v=0-4|debug|info|warn|error|fatal
--verbosity
--log_age=log_age
--log_size=log_size
-c, --config=path
--template=template
-h, --help

.


--
Jamis Buck
(e-mail address removed)
http://www.jamisbuck.org/jamis

"I use octal until I get to 8, and then I switch to decimal."
 
S

Shashank Date

Hi Ara,

--- ara howard said:
DESCRIPTION
rq is an __experimental__ tool used to manage
nfs mounted work queues.
^^^^^^^^^^^^^
Has this been tested/tried on a mix of *ix and Windows
platform, and if yes, which NFS (free or commercial)
package was used on the Windows side?

Thanks,
-- shanko



__________________________________
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail
 
A

Ara.T.Howard

Hi Ara,


^^^^^^^^^^^^^
Has this been tested/tried on a mix of *ix and Windows
platform, and if yes, which NFS (free or commercial)
package was used on the Windows side?

Thanks,
-- shanko

this has not been tested on any window machines.

our environment is 100% linux - latest enterprise kernel on all boxes,
including nfs servers. the code SHOULD work on any box for which nfs locking
works (both my code and sqlite depend on fcntl locks working). if you are
interested in trying it out please let me know and i'll help you get a test
system up.

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 
S

Shashank Date

Hi Ara,

Ara.T.Howard said:
this has not been tested on any window machines.

our environment is 100% linux - latest enterprise kernel on all boxes,
including nfs servers. the code SHOULD work on any box for which nfs locking
works (both my code and sqlite depend on fcntl locks working). if you are
interested in trying it out please let me know and i'll help you get a test
system up.

Definitely interested, but unfortunately won't be able to devote testing
time
until early November (if at all). I am looking at a project in the pipe
which
may have a mix of Linux and Windows for which this would be just perfect.

If and when we start using it, I will be able to contribute towards:

0. GUI Console,
1. dynamic load-balancer
2. rudimentary work-flow engine

in that order....if it does not have these by then, that is.

Wish you the very best.
-- shanko
 
Ad

Advertisements

A

Ara.T.Howard

Hi Ara,



Definitely interested, but unfortunately won't be able to devote testing
time until early November (if at all). I am looking at a project in the pipe
which may have a mix of Linux and Windows for which this would be just
perfect.

you'll probably want to test you nfs setup first - i can give a little script
that should determine if mixed windows/linux nfs access works with locking -
it's pure ruby so it should just run. do you have a few nodes you can test
on?
If and when we start using it, I will be able to contribute towards:
0. GUI Console,

that'd be great - it's on the TODO list...
1. dynamic load-balancer

it doesn't need one! ;-) all nodes access the queue taking jobs in a 'highest
priority oldest out first' fashion. in otherwords, all nodes bail water as
fast as possible - if the boat is sinking the only answers are:

- write faster jobs
- make network faster (vsftp is great!)
- buy faster nodes
- buy more nodes

this is one of the great things about the totally decentralized architechture
- there is no need for load balancing or scheduling!
2. rudimentary work-flow engine

you mean dependancies? yes that would be nice...
in that order....if it does not have these by then, that is.

Wish you the very best.
-- shanko

it all sounds great - i'll try to get some more docs out in the next few weeks
so you can read about it.

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 
A

Ara.T.Howard

^^^^^^^^^^^^^^
Forgot to add a link which lists some W/F engines out there:
http://jbpm.org/article.html#wftk

i did something like this once:

http://raa.ruby-lang.org/project/flow/

thanks for the link.

the biggest issue in cluster flows, i think, is HOW to collect the exit status
w/o polling... this is especially true w/my design since there is no central
brain (controlling process). this is one of the disadvantages of a
decentralized system - but, i think, the advantages outweigh this and other
negatives.

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 
S

Shashank Date

Hello Ara,

Ara.T.Howard said:
you'll probably want to test you nfs setup first - i can give a little script
that should determine if mixed windows/linux nfs access works with locking -
it's pure ruby so it should just run. do you have a few nodes you can test
on?

Not right away ... but I will try and get a 2-node configuration working
over
the (already crowded ;-) week-end. No promises though.

If I succeed, I will have Win XP (Home) on one node and SuSE 8.2 on the
other. But please go ahead and email the script which determines
if it works or not.
that'd be great - it's on the TODO list...

Good ... I will be using wxRuby if it comes to me.
it doesn't need one! ;-) all nodes access the queue taking jobs in a 'highest
priority oldest out first' fashion. in otherwords, all nodes bail water as
fast as possible - if the boat is sinking the only answers are:

- write faster jobs
- make network faster (vsftp is great!)
- buy faster nodes
- buy more nodes

this is one of the great things about the totally decentralized architechture
- there is no need for load balancing or scheduling!

Hmmm... then may be I have mis-understood the nature of your project.
I was thinking of using it for CPU intensive (eg. image analysis) jobs over
a
cluster of heterogenous (in terms of CPU power, O.S. and in terms
of functionality) nodes and wanted the ability to "farm" (push) work
requests to least busy CPUs (provided there is a way to determine that, of
course). Phil Tomson's TaskMaster comes to mind. See:
http://raa.ruby-lang.org/project/taskmaster/

[Phil: if you are reading this, will love some feedback from you:
specifically
are you planning to work on it in near future? Or is the project closed?]

From your describtion above it appears to me that work will be
"fetched" (pulled) by least busy CPUs. Am I correct? (We can take this
discussion off line if it starts becoming [OT]).
you mean dependancies? yes that would be nice...

Well, a little more than that. Right now, this is still "pie in the sky"
kind
of an idea... but I can see some of the patterns being implemented.

See: http://tmitwww.tm.tue.nl/research/patterns/patterns.htm
for some details.

This is very likely to be extremely specific to the problem we are trying
to solve and may also be proprietory. I hope your licensing terms
will permit me that.
it all sounds great - i'll try to get some more docs out in the next few weeks
so you can read about it.

Fantastic ... I am all ears (ummm eyes ;-)!

Thanks,
-- shanko
 
Ad

Advertisements

A

Ara.T.Howard

Not right away ... but I will try and get a 2-node configuration working
over the (already crowded ;-) week-end. No promises though.

If I succeed, I will have Win XP (Home) on one node and SuSE 8.2 on the
other. But please go ahead and email the script which determines if it works
or not.

i'll tar it up and send it your way. any chance you could try compiliing this
on windows?

http://raa.ruby-lang.org/project/posixlock/

i don't have access to a windows machine with a compiler toolchain and don't
even know if windows offers a posix fctnl - but i'm hoping it does. sqlite
compiles on windows so it must - but i may have to add some #ifdefs to that
code to get it working... i should also add that you can simply pack a struct
to get fcntl working (thanks matz for this pure ruby solution) so a c
extension is not strictly needed for access to fcntl locking. however, some
form of it is required so we should probably look into that pronto. i have an
RCR out there (i think) for posixlocking in ruby but havn't had time to pursue
it - the lack of it is a problem IMHO....

Hmmm... then may be I have mis-understood the nature of your project. I was
thinking of using it for CPU intensive (eg. image analysis) jobs over a
cluster of heterogenous (in terms of CPU power, O.S. and in terms of
functionality) nodes and wanted the ability to "farm" (push) work requests
to least busy CPUs (provided there is a way to determine that, of course).
Phil Tomson's TaskMaster comes to mind. See:
http://raa.ruby-lang.org/project/taskmaster/

[Phil: if you are reading this, will love some feedback from you:
specifically are you planning to work on it in near future? Or is the
project closed?]

From your describtion above it appears to me that work will be "fetched"
(pulled) by least busy CPUs. Am I correct? (We can take this discussion off
line if it starts becoming [OT]).

exactly. the flow is something like

def feed
daemon do
loop do
start_jobs until busy?
reap_jobs
end
end
end

but obviously a bit more compilcated. the 'busy?' method only returns true if
a predefined number of jobs are already running (we set it to two for dual CPU
nodes) but i've got plans for this method to hook into resource monitoring so
a node may become 'busy' if some critical resource is maxed and, of course, so
that resources may be requested.

this approach totally avoids needing a scheduler since, as you correct state,
jobs are 'fetched' from the queue and the strongest fastest node simply get
more work done. it is working __really__ good for us - we see our node
performance line up exactly as we would have predicted it as the number of
jobs grows.

i think this approach should work very well for the scenario you describe.
i'm assuming you've also checked out technologies like condor, sge, etc? all
the software i looked at was extremely bloated, full of complexity bugs, and
didn't support one of the main features i'll eventually need (full boolean
resource requests) which is what lead me down this path. if you get a chance
send a message offline (on online i suppose if it's relevant) detailing your
planned processing flow if you don't mind. it's handy to know how people
might use your software since, sometimes, choices can be arbitrary and this
knowledge might help me make better ones.

cheers.
This is very likely to be extremely specific to the problem we are trying to
solve and may also be proprietory. I hope your licensing terms will permit
me that.

ruby's license - so it should.


kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

[ANN] rq-3.0.0 : ruby queue gets gem'd 3
[ANN] ruby queue : rq-2.3.1 0
[ANN] rq-1.0.0 0
[ANN] rq-0.1.7 0
[ANN] bj-0.0.2 15
[ANN] bj-1.0.0 0
[ANN] bj-0.0.3 0
[ANN] priority-queue 0.1.2 5

Top