Forking job scheduler

Krishna Dole · Sep 29, 2006

Hi all,
If anyone is willing, I'd be grateful for some advice on the forking
job scheduler I've written. It works fine in simple tests, but does
not feel elegant. On IRC kbrooks recommended an asynchronous main
loop, but i don't understand how to implement that in this situation.
The first version I wrote used threads, but several sources
recommended fork instead. I have also considered just using the shell
command 'ps' to see how many jobs are running, launching more as
needed.

The basic requirements:
- Each job is a long-running external process (taking a day or more)
and all jobs require a different amount of time to run (so
asynchronous launching will be needed).
- I want to keep N jobs running at all times (N = 4 in the example below)

Thanks,
Krishna

##############################################

@jobs = (1..10).to_a # an imaginary list of jobs

# any time a job finishes, launch another
Signal.trap("CLD") { start_job unless @jobs.empty? }

def start_job
my_job = @jobs.pop
puts "starting job #{my_job}"
exec("sleep 2") if fork == nil # launch a job. in reality it would
run for a day or more
end

for num in 1..4 # i want to keep 4 jobs running at all times
start_job
end

# this doesn't wait for the last jobs to finish
while @jobs.size > 0
Process.wait
end

# this waits for the last jobs, but if i only had this line, it
wouldn't wait for all the jobs to start!
Process.wait

Krishna Dole · Sep 29, 2006

Hi Francis,

I actually had considered the cron approach, but wasn't sure if it was
the best way to do things. What you say makes a lot of sense (I was
already nervous about the watchdog running into trouble, and there are
no coordination requirements), so I will go with your suggestion.

Thanks!
Krishna

ara.t.howard · Sep 29, 2006

Hi all,
If anyone is willing, I'd be grateful for some advice on the forking
job scheduler I've written. It works fine in simple tests, but does
not feel elegant. On IRC kbrooks recommended an asynchronous main
loop, but i don't understand how to implement that in this situation.
The first version I wrote used threads, but several sources
recommended fork instead. I have also considered just using the shell
command 'ps' to see how many jobs are running, launching more as
needed.

The basic requirements:
- Each job is a long-running external process (taking a day or more)
and all jobs require a different amount of time to run (so
asynchronous launching will be needed).
- I want to keep N jobs running at all times (N = 4 in the example below)

no need to reinvent the wheel! ;-)

http://www.linuxjournal.com/article/7922
http://codeforpeople.com/lib/ruby/rq/
http://codeforpeople.com/lib/ruby/rq/rq-2.3.4/README

download ruby queue (rq) and run it locally. it does this and much, much more

# setup a work q

harp:~ > rq ./q create
---
q: /home/ahoward/q
db: /home/ahoward/q/db
schema: /home/ahoward/q/db.schema
lock: /home/ahoward/q/lock
bin: /home/ahoward/q/bin
stdin: /home/ahoward/q/stdin
stdout: /home/ahoward/q/stdout
stderr: /home/ahoward/q/stderr

# start a daemon processs that will run 4 jobs at a time

harp:~ > rq ./q start --max_feed=4

# submit a job

harp:~ > rq ./q submit echo foobar
---
-
jid: 1
priority: 0
state: pending
submitted: 2006-09-29 08:49:46.814603
started:
finished:
elapsed:
submitter: jib.ngdc.noaa.gov
runner:
stdin: stdin/1
stdout:
stderr:
pid:
exit_status:
tag:
restartable:
command: echo foobar

# wait a bit

# check the status

jib:~ > rq ./q list 2
---
-
jid: 2
priority: 0
state: finished
submitted: 2006-09-29 08:49:50.839391
started: 2006-09-29 08:50:09.282754
finished: 2006-09-29 08:50:09.798060
elapsed: 0.515306
submitter: jib.ngdc.noaa.gov
runner: jib.ngdc.noaa.gov
stdin: stdin/2
stdout: stdout/2
stderr: stderr/2
pid: 721
exit_status: 0
tag:
restartable:
command: echo barfoo

# view the stdout

jib:~ > rq ./q stdout 2
barfoo

there is a command-line interface plus programming api - so you can almost
certainly accomplish whatever it is you need to do with zero or very little
coding on your part.

kind regards.

-a

ara.t.howard · Sep 29, 2006

You say nothing about the coordination requirements of the external
processes with the "watchdog" process. Is your requirement really just to
ensure that four jobs are running at all times? If so, I would avoid using a
long-running watchdog process, because you're making an assumption that it
will never crash, catch a signal, etc. Why not run a cron job every five
minutes or so that checks the running processes (via pgrep or ps as you
suggested), starts more if necessary, writes status to syslog, and then
quits? Much, much easier.

this is exactly how rq works - except it does both: the feeder process is a
daemon, but one which refuses to start two copies of itself. therefore a
crontab entry can be used to make it 'immortal'. basically, the crontab
simply starts if it's not running, otherwise it does nothing.

cheers.

-a

ara.t.howard · Sep 29, 2006

Sounds cool, Ara. How does it keep two copies of itself from running? Does
it flock a file in /var/run or something like that?

yeah - basically. it's under the users home dir though, named after the
queue. the effect is 'one feeder per host per user' by default. it really
works nicely because you can have a daemon process totally independent of
system space and without root privs. dirwatch works the same way. here's my
crontab on our nrt system:

mussel:~ > crontab -l
leader = /dmsp/reference/bin/leader
worker = /dmsp/reference/bin/worker
env = /dmsp/reference/bin/bashenv
shush = /dmsp/reference/bin/shush
dirwatch = /dmsp/reference/bin/dirwatch
nrt = /dmsp/reference/bin/nrt
nrtq = /dmsp/reference/bin/nrtq
nrtw = /dmsp/reference/bin/nrtw
nrts = /dmsp/reference/bin/nrts
beveldevil = /dmsp/reference/bin/beveldevil
sfctmp1p0 = /dmsp/reference/bin/sfctmp1p0
afwa_watch = /dmsp/nrt/dirwatches/data/incoming/afwa/dirwatch
subscriptions_watch = /dmsp/nrt/dirwatches/subscriptions/dirwatch
dmsp_watch = /dmsp/nrt/dirwatches/data/incoming/dmsp/dirwatch
night_files_watch = /dmsp/nrt/dirwatches/data/incoming/night_files/dirwatch
mosaic_watch = /dmsp/nrt/dirwatches/data/incoming/mosaic/dirwatch
www = /dmsp/nrt/www/root/
qdb = /dmsp/nrt/queues/q/db
show_failed = /dmsp/reference/bin/show_failed

#
# mussel is the current leader
#

*/15 * * * * $leader $env $shush $afwa_watch start
*/15 * * * * $leader $env $shush $subscriptions_watch start
*/15 * * * * $leader $env $shush $dmsp_watch start
*/15 * * * * $leader $env $shush $night_files_watch start
*/15 * * * * $leader $env $shush $mosaic_watch start
*/15 * * * * $leader $env $shush $beveldevil
59 23 * * * $leader $env $shush $nrtq rotate

#
# clam, oyster, bismarck, scallop, shrimp are current workers
#

*/15 * * * * $worker $env $shush $nrtq start

this same crontab is installed across our nrt cluster. basically one node
runs a bunch of dirwatchs which trigger submits to the master queue. the
workers, for their part, are completely stupid, all the have is a user account
and the '$worker' crontab entry that keeps a feeding process running at all
times, even after reboot. it's a simple was to setup durable userland
daemons.

($leader and $worker are xargs style programs - $leader obviously only
executes it's command line if run on the leader, vise verse for worker)

regards.

-a

I'm tempted to quit out of frustration	1	Aug 13, 2023
where to find a job scheduler (implemented by C++) for clustercomputer system	3	May 28, 2010
Suggestions for a distributed job queue	14	Dec 22, 2009
Ruby ruffus scheduler doesn't work on Sunday's	2	Aug 10, 2008
ANN: Python Job Board - Call for volunteers	0	Feb 27, 2014
fork, trap and exec. Pickaxe typo?	6	Nov 20, 2006
[ANN] rq-3.0.0 : ruby queue gets gem'd	3	Mar 2, 2007
[ANN] bj-1.0.0	0	Dec 29, 2007

Forking job scheduler

Krishna Dole

Krishna Dole

ara.t.howard

ara.t.howard

ara.t.howard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads