A
Ara.T.Howard
===============================================================================
URLS
===============================================================================
http://codeforpeople.com/lib/ruby/dirwatch/
http://raa.ruby-lang.org/project/dirwatch/
===============================================================================
README (also see TUTORIAL below)
===============================================================================
NAME
dirwatch v0.9.0
SYNOPSIS
dirwatch [ options ]+ [ directory = ./ ] [ mode = watch ]
DESCRIPTTION
dirwatch is a tool used to rapidly build event driven processing systems.
dirwatch manages an sqlite database that mirrors the state of a directory and
then triggers user definable event handlers for certain filesystem activities
such file creation, modification, deletion, etc. dirwatch can also implement
a tmpwatch like behaviour to ensure files of a certain age are removed from
the directory being watched. dirwatch normally runs as a daemon process by
first sychronizing the database inventory with that of the directory and then
firing appropriate triggers as they occur.
-----------------------------------------------------------------------------
the following actions may have triggers configured for them
-----------------------------------------------------------------------------
created -> a file was created
modified -> a file has had it's mtime updated
updated -> the union of created and modified
deleted -> a file was deleted
existing -> a file has not changed but is still exists
-----------------------------------------------------------------------------
the command line 'mode' must be one of the following
-----------------------------------------------------------------------------
create (c) -> initialize the database and supporting files
watch (w) -> monitor directory and trigger actions in the foreground
start (S) -> spawn a daemon watcher in the background
restart (R) -> (re)spawn a daemon watcher in the background
stop (H) -> stop/halt any currently running watcher
status (T) -> determine if any watcher is currently running
truncate (D) -> truncate/delete all entries from the database
archive (a) -> create a hot-backup of a watch's database contents
list (l) -> dump database to stdout in silky smooth yaml format
the default mode is to 'watch'.
for all modes the command line argument must be the name of the directory to
which to apply the operation - this defaults to the current directory.
-----------------------------------------------------------------------------
mode: create (c)
-----------------------------------------------------------------------------
initializes a storage directory with all required database files, logs,
command directories, sample configuration, sample programs, etc.
examples:
0) initialize the directory incoming_data/ to be dirwatched using all
defaults
~ > dirwatch create incoming_data/
-----------------------------------------------------------------------------
mode: start (S)
-----------------------------------------------------------------------------
dirwatch is normally run in daemon mode. the start mode is equivalent to
running in 'watch' mode with the '--daemon' and '--quiet' flags.
examples:
0) start a background daemon process watching incoming_data/
~ > dirwatch start incoming_data/
-----------------------------------------------------------------------------
mode: restart (R)
-----------------------------------------------------------------------------
'restart' mode checks a watcher's pidfile and either restarts the currently
running watcher or starts a new one as in 'start' mode. this is equivalent to
sending SIGHUP to the watcher daemon process.
examples:
0) re-start a background daemon process watching incoming_data/
~ > dirwatch restart incoming_data/
-----------------------------------------------------------------------------
mode: stop (H)
-----------------------------------------------------------------------------
'stop' mode checks for any process watching the specified directory and kills
this process if it exists. this is equivalent to sending TERM to the watcher
daemon process. the process will not exit immediately but will do at the
first possible safe opportunity. do __not__ kill -9 the daemon process.
examples:
0) stop the daemon process watching incoming_data/
~ > dirwatch stop incoming_data/
-----------------------------------------------------------------------------
mode: status (T)
-----------------------------------------------------------------------------
'status' mode reports whether or not a watcher is running for the given
directory.
examples:
0) report on the watcher, iff any, watching incoming_data/
~ > dirwatch status incoming_data/
-----------------------------------------------------------------------------
mode: truncate (D)
-----------------------------------------------------------------------------
'truncate' mode empties the database of all state in an atomic fashion.
examples:
0) empty the database in a safe way
~ > dirwatch truncate incoming_data/
-----------------------------------------------------------------------------
mode: archive (a)
-----------------------------------------------------------------------------
archive mode is used to atomically create a hot-backup tgz file of a the
storage directory for a given directory while respecting the locking
subsystem.
examples:
0) make a hot-backup of the database and all supporting files in
incoming_data/
~ > dirwatch archive incoming_data/
-----------------------------------------------------------------------------
mode: watch (w)
-----------------------------------------------------------------------------
this is the meat of dirwatch.
dirwatch is designed to run as a daemon, updating a database inventory at the
interval specified by the '--interval' option (5 minutes by default) and
firing appropriate trigger commands. two watchers may not watch the same dir
simoultaneously and attempting the start a second watcher will fail when the
second watcher is unable to obtain a lockfile. it is a non-fatal error to
attempt to start another watcher when one is running and this failure can be
made silent by using the '--quiet' option. the reason for this is to allow a
crontab entry to be used to make the daemon 'immortal'. for example, the
following crontab entry
*/15 * * * * dirwatch directory --daemon
will __attempt__ to start a daemon watching 'directory' every fifteen minutes.
if the daemon is not already running one will started, otherwise dirwatch will
simply fail silently (no cron email sent due to stderr).
this feature allows a normal user to setup daemon processes that will not only
run after machine reboot, but which will continue to run after other unforseen
terminal program behaviour. such a daemon is known as an 'immortal' daemon.
as the watcher runs and maintains the database inventory it is noted when
files/directories (entries) have been created, modified, updated, deleted, or
are existing. these entries are then handled by user definable triggers as
specified in the config file. the config file is of the format
...
actions :
created :
commands :
...
updated :
commands :
...
...
...
where the commands to be run for each trigger type are enumerated. each
command entry is of the following format:
...
-
command : the command to run
type : calling convention, how info is passed to the program
pattern : filter files by this regex
timing : synchronous or asynchronous execution
...
further explanation of each field:
command: this is the program to run. the search path for the program is
modified to first include the commands/ dir underneath the
.dirwatch/ dir in the directory being watched.
type: there are four types of commands. the type merely indicates the
calling convention of the program. when commands are run there are
two peices of information which are passed to the program, the file
in question and the mtime of that file. the mtime is less
important but programs may use it to know if the file has been
changed since they were last spawned or other bookkeeping. mtime
will probably be ignored for most commands. the four types of
commands fall into two catagories: those commands called once for
each file and those types of commands called once with __all__
files
file at a time:
simple: the command will be called with two arguments: the file
in question and the mtime datetime, eg:
command foobar.txt '2002-11-04 01:01:01.1234'
expanded: the command will be have the strings '@file' and
'@mtime' replaced with appropriate values. eg:
command '@file' '@mtime'
expands to (and is called as)
command 'somefile' '2002-11-04 01:01:01.1234'
files at once:
filter: the stdin of the program will be given a list where each
line contains two items, the file and the datetime.
yaml: the stdin of the program will be given a list where each
entry contains two items, the file and the mtime. the
format of the list is valid yaml and the schema is an
array of hashes where each hash has the keys 'path' and
'mtime'.
pattern: all the files for a given action are filtered by this pattern,
and only those files matching pattern will have triggers fired.
timing: if timing is asynchronous the command will be run and not waited
for before starting the next command. asynchronous commands may
yield better performance but may also result in many commands being
run at once. asyncronous commands should not be programs that load
the system heavily unless one is looking to freeze a machine.
synchronous commands are spawned and waited for before the next
command is started. a side effect of synchronous commands is that
the time spent waiting may sum to an ammount of time greater than
the interval ('--interval' option) specified - if the amount of
time spent running commands exceeds the interval the next inventory
simply begins immeadiately with no pause. because of this one
should think of the interval used as a minimum bound only,
especially when synchronous commands are used.
note that sample commands of each type are auto-generated in the
dbdir/commands directory. reading these should answer any questions regarding
the calling conventions of any of the four types. for other questions regard
the sample config, which is also auto-generated.
examples:
0) run a watch from this terminal (non daemon)
~ > dirwatch directory watch
-----------------------------------------------------------------------------
mode: list (l)
-----------------------------------------------------------------------------
dump the contents of the database in yaml format for easy viewing/parsing
examples:
0) dump database as yaml
~ > dirwatch directory list
ENVIRONMENT
for dirwatch itself:
export SLDB_DEBUG=1 -> cause sldb lib actions (sql) to be logged
export LOCKFILE_DEBUG=1 -> cause lockfile lib actions to be logged
for programs run by dirwatch the following environment variables will be set:
DIRWATCH_DIR -> the directory being watched
DIRWATCH_ACTION -> action type, one of 'instance', 'created', 'modified',
'updated', 'deleted', or 'existing'
DIRWATCH_TYPE -> command type, one of 'simple', 'expanded', 'filter', or
'yaml'
DIRWATCH_N_PATHS -> the total number of paths for this action. the paths
themselves will be passed to the program in a different
way depending on DIRWATCH_TYPE, for instance on the
command line or on stdin, but this number will always
be the total number of paths the program should expect.
DIRWATCH_PATH_IDX -> for some command types, like 'simple', the program will
be run more than once to handle all paths since calling
convention only allows the program to be called with
one path at a time. this number is the index of the
current path in such cases. for instance, a 'simple'
program may only be called with one path at a time so
if 10 files were created in the directory that would
result in the program being called 10 times. in each
case DIRWATCH_N_PATHS would be 10 and DIRWATCH_PATH_IDX
would range from 0 to 9 for each of the 10 calls to the
program. in the case of 'filter' and 'yaml' command
types, where every path is given at once on stdin this
value will be equal to DIRWATCH_N_PATHS
DIRWATCH_PATH -> for 'simple' and 'expanded' command types, which are
called once for each path, this will contain the path
the program is being called with. in the case of
'filter' or 'yaml' command types the varible contains
the string 'stdin' implying that all paths are
available on stdin.
DIRWATCH_MTIME -> for 'simple' and 'expanded' command types, which are
called once for each path, this will contain the mtime
the program is being called with. in the case of
'filter' or 'yaml' command types the varible contains
the string 'stdin' implying that all mtimes are
available on stdin.
DIRWATCH_PID -> the pid of dirwatch watcher process
DIRWATCH_ID -> an identifier for this action that will be unique for
any given run of a dirwatch watcher process.
restarting the watcher resets the generator. this
identifier is logged in the dirwatch watcher logs to is
useful to match program logs with dirwatch logs
PATH -> the normal shell path. for each program run the PATH
is modified to contain the commands dir of the dirwatch
watcher processs. normally this will be
$DIRWATCH_DIR/.dirwatch/commands/:$PATH
note that all the sample programs generated show how to access these
environment vars.
FILES
directory/.dirwatch/ -> dirwatch data files
directory/.dirwatch/dirwatch.conf -> default configuration file
directory/.dirwatch/commands/ -> default location for triggers
directory/.dirwatch/db -> sldb/sqlite database
directory/.dirwatch/dirwatch.pid -> default pidfile
directory/.dirwatch/logs/ -> automatically rolled log files
DIAGNOSTICS
success -> $? == 0
failure -> $? != 0
AUTHOR
(e-mail address removed)
BUGS
1 < bugno && bugno < 42
OPTIONS
--help, -h
this message
--log=path, -l
set log file - (default stderr)
--verbosity=verbostiy, -v
0|fatal < 1|error < 2|warn < 3|info < 4|debug - (default info)
--config=path
valid path - specify config file (default nil)
--template=[path]
valid path - generate a template config file in path (default stdout)
--recursive, -r
recurse into subdirectories (default do not recurse)
--all, -a
consider all filesystem entries, includig directories (default files
only)
--follow, -f
follow links (default does not follow links)
--pattern=pattern, -p
consider only filesystem entries that match pattern (default all
entries)
--daemon, -D
specify daemon mode (default not daemon)
--quiet, -Q
be wery wery quiet (default not quiet)
--dirwatch_dir=dirwatch_dir, -S
specify dirwatch storage dir (default .dirwatch/ in dir being watched)
--n_loops=n_loops, -N
loop only this many times before exiting (default infinite)
--interval=seconds, -I
sleep at least this long between loops (default 300sec (5min))
--lockfile, -L
create a lockfile in dir while running (default no lockfile)
===============================================================================
TUTORIAL
===============================================================================
INTRODUCTION
the following shows how to setup a simple file processing system using
dirwatch. it assumes a successful install of dirwatch. eg. the command
~> dirwatch --help
should operate
STEP 0
make a temporaray directory, if using sh/bash do something like
~ > export tmp=./tmp
~ > mkdir $tmp
for here on we use the $tmp variable to refer to our directory
STEP 1
initialize the directory for dirwatch
~ > dirwatch $tmp create
---
./tmp:
dirwatch_dir : ./tmp/.dirwatch
db : ./tmp/.dirwatch/db
logs_dir : ./tmp/.dirwatch/logs
config : ./tmp/.dirwatch/dirwatch.conf
commands_dir : ./tmp/.dirwatch/commands
STEP 2
create three subdirectories in $tmp, a, b, and c
~ > for d in a b c;do mkdir $tmp/$d;done
STEP 3
edit the dirwatch.conf
~ > vi $tmp/.dirwatch/dirwatch.conf
change the section which reads
actions:
updated :
-
command: simple.sh
type: simple
pattern: ^.*$
timing: sync
to
actions:
updated :
-
command: yaml.rb
type: yaml
pattern: ^.*$
timing: sync
here we are telling dirwatch to run the command 'yaml.rb' (which will be
looked for in $tmp/.dirwatch/commands and then the normal $PATH) whenever a
file is 'updated.' updated means that a file has been created or modified.
run
~ > dirwatch --help
for more info
STEP 4
edit yaml.rb
~ > vi $tmp/.dirwatch/commands/yaml.rb
we want a program that looks very close to this, you may have to adjust your
shebang line:
#!/usr/bin/env ruby
require 'yaml'
#
# the dir being watched
#
dirwatch_dir = ENV['DIRWATCH_DIR']
#
# load entries from stdin. this is a yaml doccument.
#
entries = YAML::load STDIN
#
# process each entry
#
entries.each do |entry|
#
# get the path and mtime of the updated file
#
path, mtime = entry['path'], entry['mtime']
#
# split into directory and filename components
#
dirname, basename = File::split path
#
# get the last directory component
#
dir = File::basename dirname
#
# perform actions based on dir - files contain numbers:
#
# - new files in dir 'a' get doubled and the result written to dir 'b'
# - new files in dir 'b' get two added and the result written to dir 'c'
# - new files in in dir 'c' are displayed as the result
#
case dir
when 'a'
n = Integer(IO::read(path))
n *= 2
output = File::join dirwatch_dir, 'b', basename
open(output, 'w'){|f| f.write n}
when 'b'
n = Integer(IO::read(path))
n += 2
output = File::join dirwatch_dir, 'c', basename
open(output, 'w'){|f| f.write n}
when 'c'
n = Integer(IO::read(path))
puts "result <#{ basename }> => <#{ n }>"
end
the comments should make it obvious that this program, which dirwatch will
spawn as new files are created or modified loads the updated (because we
configured it that way) file and assumes a number in contained in it. when
the file was updated in directory $tmp/a we double the number and write the
output into a file of the same basename in $tmp/b. here the number in $tmp/b
has two added to it and this result in written to a file of the same basename
in $tmp/c.
be sure you've edited $tmp/.dirwatch/commands/yaml.rb and
$tmp/.dirwatch/dirwatch.conf before continuing.
STEP 5
start dirwatch. normally dirwatch runs as a daemon that checks the dir every
five minutes, but here we will run from the console so we can see it's logging
information. note the '--recursive' flag is given so that dirwatch will
descend into the subdirectories of $tmp. this is important!. also, we use
the '--interval' option to specify a polling interval of 5 seconds. we would
not use such a short period for a production system but this interval is
alright for illustration. we start a watch:
~ > dirwatch $tmp --interval=5 --recursive
I, [2005-07-01T16:33:37.821687 #9146] INFO -- : ** STARTED **
I, [2005-07-01T16:33:37.822853 #9146] INFO -- : config <./tmp/.dirwatch/dirwatch.conf>
I, [2005-07-01T16:33:37.823136 #9146] INFO -- : recursive <true>
I, [2005-07-01T16:33:37.823309 #9146] INFO -- : all <false>
I, [2005-07-01T16:33:37.823423 #9146] INFO -- : follow <false>
I, [2005-07-01T16:33:37.823549 #9146] INFO -- : pattern <>
I, [2005-07-01T16:33:37.823680 #9146] INFO -- : n_loops <>
I, [2005-07-01T16:33:37.823887 #9146] INFO -- : interval <00:00:05>
I, [2005-07-01T16:33:37.824170 #9146] INFO -- : lockfile <./tmp/.dirwatch.lock>
I, [2005-07-01T16:33:37.824335 #9146] INFO -- : tmpwatch[all] <false>
I, [2005-07-01T16:33:37.824432 #9146] INFO -- : tmpwatch[nodirs] <false>
I, [2005-07-01T16:33:37.824551 #9146] INFO -- : tmpwatch[force] <true>
I, [2005-07-01T16:33:37.824745 #9146] INFO -- : tmpwatch[age] <30 days> == <2592000.0s>
I, [2005-07-01T16:33:37.824859 #9146] INFO -- : tmpwatch[rm] <rm_rf>
...
...
...
STEP 6
now, from another terminal drop a file containing a number into $tmp/a.
something like
~ > echo 10 > $tmp/a/n
within a few seconds you'll see, in the dirwatch terminal something like
I, [2005-07-01T16:33:47.855151 #9146] INFO -- : ACTION.UPDATED.0.0 - cmd : yaml.rb
I, [2005-07-01T16:33:47.928216 #9146] INFO -- : ACTION.UPDATED.0.0 - exit_status : 0
I, [2005-07-01T16:33:52.880694 #9146] INFO -- : ACTION.UPDATED.1.1 - cmd : yaml.rb
I, [2005-07-01T16:33:52.948847 #9146] INFO -- : ACTION.UPDATED.1.1 - exit_status : 0
I, [2005-07-01T16:33:57.856376 #9146] INFO -- : ACTION.UPDATED.2.2 - cmd : yaml.rb
result <n> => <22>
I, [2005-07-01T16:33:57.928320 #9146] INFO -- : ACTION.UPDATED.2.2 - exit_status : 0
so we have produced a result of 22 by doubling 10 and adding two to it merely
by dropping a file in a directory!
notice that both the output and the logging are going to the terminal here.
actually the logging goes to stderr by default and any program output/errput
is mingled here. in actual use the logging goes into a log file in $tmp/logs/
that automatically rolls (you never need to truncate it) and any output/errput
from the programs run is simply discarded. note that you can certainly keep
output by using something like
command: myprogram >> myprogram.log 2>&1
in the dirwatch.conf file.
STEP 7
now, remember that we configured yaml.rb to fire for any file that was
updated where the meaning of updated is that a file was created or modified.
if we we're to open up $tmp/a/n in vi and change the 10 to a 20 we'd soon see
result <n> => <42>
appear in the console running the watch.
STEP 8
after getting a system configured and the triggers working properly you
defintely don't want to have to start dirwatch be hand each time. dirwatch
will refuse to start two watches on a given directory and can be enabled to
run as a daemon. because of this it's quite acceptable to cron a dirwatch to
start every so often. something like
*/15 * * * * dirwatch /full/path/to/directory --daemon
will maintain a dirwatch process at all times, even after machine reboot.
note that this does not start a new watch each time - if the watch fails to
start because another is already running dirwatch simply exits with 1 but
nothing is printed to stderr so cron won't mail you tons of stuff. using this
technique a normal user can configure a daemon process to run at all times.
of course a feature could be started at machine boot too using a simply
script.
STEP 9
we now have set up a simply processing system using dirwatch. it can be used
to configure quite complex processing flows via the configuration file and the
programs run - hopefully you'll find a useful way of using it yourself. if so
please contact me at (e-mail address removed) and let me know the details.
enjoy.
-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
URLS
===============================================================================
http://codeforpeople.com/lib/ruby/dirwatch/
http://raa.ruby-lang.org/project/dirwatch/
===============================================================================
README (also see TUTORIAL below)
===============================================================================
NAME
dirwatch v0.9.0
SYNOPSIS
dirwatch [ options ]+ [ directory = ./ ] [ mode = watch ]
DESCRIPTTION
dirwatch is a tool used to rapidly build event driven processing systems.
dirwatch manages an sqlite database that mirrors the state of a directory and
then triggers user definable event handlers for certain filesystem activities
such file creation, modification, deletion, etc. dirwatch can also implement
a tmpwatch like behaviour to ensure files of a certain age are removed from
the directory being watched. dirwatch normally runs as a daemon process by
first sychronizing the database inventory with that of the directory and then
firing appropriate triggers as they occur.
-----------------------------------------------------------------------------
the following actions may have triggers configured for them
-----------------------------------------------------------------------------
created -> a file was created
modified -> a file has had it's mtime updated
updated -> the union of created and modified
deleted -> a file was deleted
existing -> a file has not changed but is still exists
-----------------------------------------------------------------------------
the command line 'mode' must be one of the following
-----------------------------------------------------------------------------
create (c) -> initialize the database and supporting files
watch (w) -> monitor directory and trigger actions in the foreground
start (S) -> spawn a daemon watcher in the background
restart (R) -> (re)spawn a daemon watcher in the background
stop (H) -> stop/halt any currently running watcher
status (T) -> determine if any watcher is currently running
truncate (D) -> truncate/delete all entries from the database
archive (a) -> create a hot-backup of a watch's database contents
list (l) -> dump database to stdout in silky smooth yaml format
the default mode is to 'watch'.
for all modes the command line argument must be the name of the directory to
which to apply the operation - this defaults to the current directory.
-----------------------------------------------------------------------------
mode: create (c)
-----------------------------------------------------------------------------
initializes a storage directory with all required database files, logs,
command directories, sample configuration, sample programs, etc.
examples:
0) initialize the directory incoming_data/ to be dirwatched using all
defaults
~ > dirwatch create incoming_data/
-----------------------------------------------------------------------------
mode: start (S)
-----------------------------------------------------------------------------
dirwatch is normally run in daemon mode. the start mode is equivalent to
running in 'watch' mode with the '--daemon' and '--quiet' flags.
examples:
0) start a background daemon process watching incoming_data/
~ > dirwatch start incoming_data/
-----------------------------------------------------------------------------
mode: restart (R)
-----------------------------------------------------------------------------
'restart' mode checks a watcher's pidfile and either restarts the currently
running watcher or starts a new one as in 'start' mode. this is equivalent to
sending SIGHUP to the watcher daemon process.
examples:
0) re-start a background daemon process watching incoming_data/
~ > dirwatch restart incoming_data/
-----------------------------------------------------------------------------
mode: stop (H)
-----------------------------------------------------------------------------
'stop' mode checks for any process watching the specified directory and kills
this process if it exists. this is equivalent to sending TERM to the watcher
daemon process. the process will not exit immediately but will do at the
first possible safe opportunity. do __not__ kill -9 the daemon process.
examples:
0) stop the daemon process watching incoming_data/
~ > dirwatch stop incoming_data/
-----------------------------------------------------------------------------
mode: status (T)
-----------------------------------------------------------------------------
'status' mode reports whether or not a watcher is running for the given
directory.
examples:
0) report on the watcher, iff any, watching incoming_data/
~ > dirwatch status incoming_data/
-----------------------------------------------------------------------------
mode: truncate (D)
-----------------------------------------------------------------------------
'truncate' mode empties the database of all state in an atomic fashion.
examples:
0) empty the database in a safe way
~ > dirwatch truncate incoming_data/
-----------------------------------------------------------------------------
mode: archive (a)
-----------------------------------------------------------------------------
archive mode is used to atomically create a hot-backup tgz file of a the
storage directory for a given directory while respecting the locking
subsystem.
examples:
0) make a hot-backup of the database and all supporting files in
incoming_data/
~ > dirwatch archive incoming_data/
-----------------------------------------------------------------------------
mode: watch (w)
-----------------------------------------------------------------------------
this is the meat of dirwatch.
dirwatch is designed to run as a daemon, updating a database inventory at the
interval specified by the '--interval' option (5 minutes by default) and
firing appropriate trigger commands. two watchers may not watch the same dir
simoultaneously and attempting the start a second watcher will fail when the
second watcher is unable to obtain a lockfile. it is a non-fatal error to
attempt to start another watcher when one is running and this failure can be
made silent by using the '--quiet' option. the reason for this is to allow a
crontab entry to be used to make the daemon 'immortal'. for example, the
following crontab entry
*/15 * * * * dirwatch directory --daemon
will __attempt__ to start a daemon watching 'directory' every fifteen minutes.
if the daemon is not already running one will started, otherwise dirwatch will
simply fail silently (no cron email sent due to stderr).
this feature allows a normal user to setup daemon processes that will not only
run after machine reboot, but which will continue to run after other unforseen
terminal program behaviour. such a daemon is known as an 'immortal' daemon.
as the watcher runs and maintains the database inventory it is noted when
files/directories (entries) have been created, modified, updated, deleted, or
are existing. these entries are then handled by user definable triggers as
specified in the config file. the config file is of the format
...
actions :
created :
commands :
...
updated :
commands :
...
...
...
where the commands to be run for each trigger type are enumerated. each
command entry is of the following format:
...
-
command : the command to run
type : calling convention, how info is passed to the program
pattern : filter files by this regex
timing : synchronous or asynchronous execution
...
further explanation of each field:
command: this is the program to run. the search path for the program is
modified to first include the commands/ dir underneath the
.dirwatch/ dir in the directory being watched.
type: there are four types of commands. the type merely indicates the
calling convention of the program. when commands are run there are
two peices of information which are passed to the program, the file
in question and the mtime of that file. the mtime is less
important but programs may use it to know if the file has been
changed since they were last spawned or other bookkeeping. mtime
will probably be ignored for most commands. the four types of
commands fall into two catagories: those commands called once for
each file and those types of commands called once with __all__
files
file at a time:
simple: the command will be called with two arguments: the file
in question and the mtime datetime, eg:
command foobar.txt '2002-11-04 01:01:01.1234'
expanded: the command will be have the strings '@file' and
'@mtime' replaced with appropriate values. eg:
command '@file' '@mtime'
expands to (and is called as)
command 'somefile' '2002-11-04 01:01:01.1234'
files at once:
filter: the stdin of the program will be given a list where each
line contains two items, the file and the datetime.
yaml: the stdin of the program will be given a list where each
entry contains two items, the file and the mtime. the
format of the list is valid yaml and the schema is an
array of hashes where each hash has the keys 'path' and
'mtime'.
pattern: all the files for a given action are filtered by this pattern,
and only those files matching pattern will have triggers fired.
timing: if timing is asynchronous the command will be run and not waited
for before starting the next command. asynchronous commands may
yield better performance but may also result in many commands being
run at once. asyncronous commands should not be programs that load
the system heavily unless one is looking to freeze a machine.
synchronous commands are spawned and waited for before the next
command is started. a side effect of synchronous commands is that
the time spent waiting may sum to an ammount of time greater than
the interval ('--interval' option) specified - if the amount of
time spent running commands exceeds the interval the next inventory
simply begins immeadiately with no pause. because of this one
should think of the interval used as a minimum bound only,
especially when synchronous commands are used.
note that sample commands of each type are auto-generated in the
dbdir/commands directory. reading these should answer any questions regarding
the calling conventions of any of the four types. for other questions regard
the sample config, which is also auto-generated.
examples:
0) run a watch from this terminal (non daemon)
~ > dirwatch directory watch
-----------------------------------------------------------------------------
mode: list (l)
-----------------------------------------------------------------------------
dump the contents of the database in yaml format for easy viewing/parsing
examples:
0) dump database as yaml
~ > dirwatch directory list
ENVIRONMENT
for dirwatch itself:
export SLDB_DEBUG=1 -> cause sldb lib actions (sql) to be logged
export LOCKFILE_DEBUG=1 -> cause lockfile lib actions to be logged
for programs run by dirwatch the following environment variables will be set:
DIRWATCH_DIR -> the directory being watched
DIRWATCH_ACTION -> action type, one of 'instance', 'created', 'modified',
'updated', 'deleted', or 'existing'
DIRWATCH_TYPE -> command type, one of 'simple', 'expanded', 'filter', or
'yaml'
DIRWATCH_N_PATHS -> the total number of paths for this action. the paths
themselves will be passed to the program in a different
way depending on DIRWATCH_TYPE, for instance on the
command line or on stdin, but this number will always
be the total number of paths the program should expect.
DIRWATCH_PATH_IDX -> for some command types, like 'simple', the program will
be run more than once to handle all paths since calling
convention only allows the program to be called with
one path at a time. this number is the index of the
current path in such cases. for instance, a 'simple'
program may only be called with one path at a time so
if 10 files were created in the directory that would
result in the program being called 10 times. in each
case DIRWATCH_N_PATHS would be 10 and DIRWATCH_PATH_IDX
would range from 0 to 9 for each of the 10 calls to the
program. in the case of 'filter' and 'yaml' command
types, where every path is given at once on stdin this
value will be equal to DIRWATCH_N_PATHS
DIRWATCH_PATH -> for 'simple' and 'expanded' command types, which are
called once for each path, this will contain the path
the program is being called with. in the case of
'filter' or 'yaml' command types the varible contains
the string 'stdin' implying that all paths are
available on stdin.
DIRWATCH_MTIME -> for 'simple' and 'expanded' command types, which are
called once for each path, this will contain the mtime
the program is being called with. in the case of
'filter' or 'yaml' command types the varible contains
the string 'stdin' implying that all mtimes are
available on stdin.
DIRWATCH_PID -> the pid of dirwatch watcher process
DIRWATCH_ID -> an identifier for this action that will be unique for
any given run of a dirwatch watcher process.
restarting the watcher resets the generator. this
identifier is logged in the dirwatch watcher logs to is
useful to match program logs with dirwatch logs
PATH -> the normal shell path. for each program run the PATH
is modified to contain the commands dir of the dirwatch
watcher processs. normally this will be
$DIRWATCH_DIR/.dirwatch/commands/:$PATH
note that all the sample programs generated show how to access these
environment vars.
FILES
directory/.dirwatch/ -> dirwatch data files
directory/.dirwatch/dirwatch.conf -> default configuration file
directory/.dirwatch/commands/ -> default location for triggers
directory/.dirwatch/db -> sldb/sqlite database
directory/.dirwatch/dirwatch.pid -> default pidfile
directory/.dirwatch/logs/ -> automatically rolled log files
DIAGNOSTICS
success -> $? == 0
failure -> $? != 0
AUTHOR
(e-mail address removed)
BUGS
1 < bugno && bugno < 42
OPTIONS
--help, -h
this message
--log=path, -l
set log file - (default stderr)
--verbosity=verbostiy, -v
0|fatal < 1|error < 2|warn < 3|info < 4|debug - (default info)
--config=path
valid path - specify config file (default nil)
--template=[path]
valid path - generate a template config file in path (default stdout)
--recursive, -r
recurse into subdirectories (default do not recurse)
--all, -a
consider all filesystem entries, includig directories (default files
only)
--follow, -f
follow links (default does not follow links)
--pattern=pattern, -p
consider only filesystem entries that match pattern (default all
entries)
--daemon, -D
specify daemon mode (default not daemon)
--quiet, -Q
be wery wery quiet (default not quiet)
--dirwatch_dir=dirwatch_dir, -S
specify dirwatch storage dir (default .dirwatch/ in dir being watched)
--n_loops=n_loops, -N
loop only this many times before exiting (default infinite)
--interval=seconds, -I
sleep at least this long between loops (default 300sec (5min))
--lockfile, -L
create a lockfile in dir while running (default no lockfile)
===============================================================================
TUTORIAL
===============================================================================
INTRODUCTION
the following shows how to setup a simple file processing system using
dirwatch. it assumes a successful install of dirwatch. eg. the command
~> dirwatch --help
should operate
STEP 0
make a temporaray directory, if using sh/bash do something like
~ > export tmp=./tmp
~ > mkdir $tmp
for here on we use the $tmp variable to refer to our directory
STEP 1
initialize the directory for dirwatch
~ > dirwatch $tmp create
---
./tmp:
dirwatch_dir : ./tmp/.dirwatch
db : ./tmp/.dirwatch/db
logs_dir : ./tmp/.dirwatch/logs
config : ./tmp/.dirwatch/dirwatch.conf
commands_dir : ./tmp/.dirwatch/commands
STEP 2
create three subdirectories in $tmp, a, b, and c
~ > for d in a b c;do mkdir $tmp/$d;done
STEP 3
edit the dirwatch.conf
~ > vi $tmp/.dirwatch/dirwatch.conf
change the section which reads
actions:
updated :
-
command: simple.sh
type: simple
pattern: ^.*$
timing: sync
to
actions:
updated :
-
command: yaml.rb
type: yaml
pattern: ^.*$
timing: sync
here we are telling dirwatch to run the command 'yaml.rb' (which will be
looked for in $tmp/.dirwatch/commands and then the normal $PATH) whenever a
file is 'updated.' updated means that a file has been created or modified.
run
~ > dirwatch --help
for more info
STEP 4
edit yaml.rb
~ > vi $tmp/.dirwatch/commands/yaml.rb
we want a program that looks very close to this, you may have to adjust your
shebang line:
#!/usr/bin/env ruby
require 'yaml'
#
# the dir being watched
#
dirwatch_dir = ENV['DIRWATCH_DIR']
#
# load entries from stdin. this is a yaml doccument.
#
entries = YAML::load STDIN
#
# process each entry
#
entries.each do |entry|
#
# get the path and mtime of the updated file
#
path, mtime = entry['path'], entry['mtime']
#
# split into directory and filename components
#
dirname, basename = File::split path
#
# get the last directory component
#
dir = File::basename dirname
#
# perform actions based on dir - files contain numbers:
#
# - new files in dir 'a' get doubled and the result written to dir 'b'
# - new files in dir 'b' get two added and the result written to dir 'c'
# - new files in in dir 'c' are displayed as the result
#
case dir
when 'a'
n = Integer(IO::read(path))
n *= 2
output = File::join dirwatch_dir, 'b', basename
open(output, 'w'){|f| f.write n}
when 'b'
n = Integer(IO::read(path))
n += 2
output = File::join dirwatch_dir, 'c', basename
open(output, 'w'){|f| f.write n}
when 'c'
n = Integer(IO::read(path))
puts "result <#{ basename }> => <#{ n }>"
end
the comments should make it obvious that this program, which dirwatch will
spawn as new files are created or modified loads the updated (because we
configured it that way) file and assumes a number in contained in it. when
the file was updated in directory $tmp/a we double the number and write the
output into a file of the same basename in $tmp/b. here the number in $tmp/b
has two added to it and this result in written to a file of the same basename
in $tmp/c.
be sure you've edited $tmp/.dirwatch/commands/yaml.rb and
$tmp/.dirwatch/dirwatch.conf before continuing.
STEP 5
start dirwatch. normally dirwatch runs as a daemon that checks the dir every
five minutes, but here we will run from the console so we can see it's logging
information. note the '--recursive' flag is given so that dirwatch will
descend into the subdirectories of $tmp. this is important!. also, we use
the '--interval' option to specify a polling interval of 5 seconds. we would
not use such a short period for a production system but this interval is
alright for illustration. we start a watch:
~ > dirwatch $tmp --interval=5 --recursive
I, [2005-07-01T16:33:37.821687 #9146] INFO -- : ** STARTED **
I, [2005-07-01T16:33:37.822853 #9146] INFO -- : config <./tmp/.dirwatch/dirwatch.conf>
I, [2005-07-01T16:33:37.823136 #9146] INFO -- : recursive <true>
I, [2005-07-01T16:33:37.823309 #9146] INFO -- : all <false>
I, [2005-07-01T16:33:37.823423 #9146] INFO -- : follow <false>
I, [2005-07-01T16:33:37.823549 #9146] INFO -- : pattern <>
I, [2005-07-01T16:33:37.823680 #9146] INFO -- : n_loops <>
I, [2005-07-01T16:33:37.823887 #9146] INFO -- : interval <00:00:05>
I, [2005-07-01T16:33:37.824170 #9146] INFO -- : lockfile <./tmp/.dirwatch.lock>
I, [2005-07-01T16:33:37.824335 #9146] INFO -- : tmpwatch[all] <false>
I, [2005-07-01T16:33:37.824432 #9146] INFO -- : tmpwatch[nodirs] <false>
I, [2005-07-01T16:33:37.824551 #9146] INFO -- : tmpwatch[force] <true>
I, [2005-07-01T16:33:37.824745 #9146] INFO -- : tmpwatch[age] <30 days> == <2592000.0s>
I, [2005-07-01T16:33:37.824859 #9146] INFO -- : tmpwatch[rm] <rm_rf>
...
...
...
STEP 6
now, from another terminal drop a file containing a number into $tmp/a.
something like
~ > echo 10 > $tmp/a/n
within a few seconds you'll see, in the dirwatch terminal something like
I, [2005-07-01T16:33:47.855151 #9146] INFO -- : ACTION.UPDATED.0.0 - cmd : yaml.rb
I, [2005-07-01T16:33:47.928216 #9146] INFO -- : ACTION.UPDATED.0.0 - exit_status : 0
I, [2005-07-01T16:33:52.880694 #9146] INFO -- : ACTION.UPDATED.1.1 - cmd : yaml.rb
I, [2005-07-01T16:33:52.948847 #9146] INFO -- : ACTION.UPDATED.1.1 - exit_status : 0
I, [2005-07-01T16:33:57.856376 #9146] INFO -- : ACTION.UPDATED.2.2 - cmd : yaml.rb
result <n> => <22>
I, [2005-07-01T16:33:57.928320 #9146] INFO -- : ACTION.UPDATED.2.2 - exit_status : 0
so we have produced a result of 22 by doubling 10 and adding two to it merely
by dropping a file in a directory!
notice that both the output and the logging are going to the terminal here.
actually the logging goes to stderr by default and any program output/errput
is mingled here. in actual use the logging goes into a log file in $tmp/logs/
that automatically rolls (you never need to truncate it) and any output/errput
from the programs run is simply discarded. note that you can certainly keep
output by using something like
command: myprogram >> myprogram.log 2>&1
in the dirwatch.conf file.
STEP 7
now, remember that we configured yaml.rb to fire for any file that was
updated where the meaning of updated is that a file was created or modified.
if we we're to open up $tmp/a/n in vi and change the 10 to a 20 we'd soon see
result <n> => <42>
appear in the console running the watch.
STEP 8
after getting a system configured and the triggers working properly you
defintely don't want to have to start dirwatch be hand each time. dirwatch
will refuse to start two watches on a given directory and can be enabled to
run as a daemon. because of this it's quite acceptable to cron a dirwatch to
start every so often. something like
*/15 * * * * dirwatch /full/path/to/directory --daemon
will maintain a dirwatch process at all times, even after machine reboot.
note that this does not start a new watch each time - if the watch fails to
start because another is already running dirwatch simply exits with 1 but
nothing is printed to stderr so cron won't mail you tons of stuff. using this
technique a normal user can configure a daemon process to run at all times.
of course a feature could be started at machine boot too using a simply
script.
STEP 9
we now have set up a simply processing system using dirwatch. it can be used
to configure quite complex processing flows via the configuration file and the
programs run - hopefully you'll find a useful way of using it yourself. if so
please contact me at (e-mail address removed) and let me know the details.
enjoy.
-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================