[ANN] dirwatch-0.9.0

A

Ara.T.Howard

===============================================================================
URLS
===============================================================================

http://codeforpeople.com/lib/ruby/dirwatch/
http://raa.ruby-lang.org/project/dirwatch/


===============================================================================
README (also see TUTORIAL below)
===============================================================================

NAME
dirwatch v0.9.0

SYNOPSIS
dirwatch [ options ]+ [ directory = ./ ] [ mode = watch ]

DESCRIPTTION
dirwatch is a tool used to rapidly build event driven processing systems.

dirwatch manages an sqlite database that mirrors the state of a directory and
then triggers user definable event handlers for certain filesystem activities
such file creation, modification, deletion, etc. dirwatch can also implement
a tmpwatch like behaviour to ensure files of a certain age are removed from
the directory being watched. dirwatch normally runs as a daemon process by
first sychronizing the database inventory with that of the directory and then
firing appropriate triggers as they occur.

-----------------------------------------------------------------------------
the following actions may have triggers configured for them
-----------------------------------------------------------------------------

created -> a file was created
modified -> a file has had it's mtime updated
updated -> the union of created and modified
deleted -> a file was deleted
existing -> a file has not changed but is still exists

-----------------------------------------------------------------------------
the command line 'mode' must be one of the following
-----------------------------------------------------------------------------

create (c) -> initialize the database and supporting files
watch (w) -> monitor directory and trigger actions in the foreground
start (S) -> spawn a daemon watcher in the background
restart (R) -> (re)spawn a daemon watcher in the background
stop (H) -> stop/halt any currently running watcher
status (T) -> determine if any watcher is currently running
truncate (D) -> truncate/delete all entries from the database
archive (a) -> create a hot-backup of a watch's database contents
list (l) -> dump database to stdout in silky smooth yaml format

the default mode is to 'watch'.

for all modes the command line argument must be the name of the directory to
which to apply the operation - this defaults to the current directory.

-----------------------------------------------------------------------------
mode: create (c)
-----------------------------------------------------------------------------

initializes a storage directory with all required database files, logs,
command directories, sample configuration, sample programs, etc.

examples:

0) initialize the directory incoming_data/ to be dirwatched using all
defaults

~ > dirwatch create incoming_data/

-----------------------------------------------------------------------------
mode: start (S)
-----------------------------------------------------------------------------

dirwatch is normally run in daemon mode. the start mode is equivalent to
running in 'watch' mode with the '--daemon' and '--quiet' flags.

examples:

0) start a background daemon process watching incoming_data/

~ > dirwatch start incoming_data/

-----------------------------------------------------------------------------
mode: restart (R)
-----------------------------------------------------------------------------

'restart' mode checks a watcher's pidfile and either restarts the currently
running watcher or starts a new one as in 'start' mode. this is equivalent to
sending SIGHUP to the watcher daemon process.

examples:

0) re-start a background daemon process watching incoming_data/

~ > dirwatch restart incoming_data/

-----------------------------------------------------------------------------
mode: stop (H)
-----------------------------------------------------------------------------

'stop' mode checks for any process watching the specified directory and kills
this process if it exists. this is equivalent to sending TERM to the watcher
daemon process. the process will not exit immediately but will do at the
first possible safe opportunity. do __not__ kill -9 the daemon process.

examples:

0) stop the daemon process watching incoming_data/

~ > dirwatch stop incoming_data/

-----------------------------------------------------------------------------
mode: status (T)
-----------------------------------------------------------------------------

'status' mode reports whether or not a watcher is running for the given
directory.

examples:

0) report on the watcher, iff any, watching incoming_data/

~ > dirwatch status incoming_data/

-----------------------------------------------------------------------------
mode: truncate (D)
-----------------------------------------------------------------------------

'truncate' mode empties the database of all state in an atomic fashion.

examples:

0) empty the database in a safe way

~ > dirwatch truncate incoming_data/

-----------------------------------------------------------------------------
mode: archive (a)
-----------------------------------------------------------------------------

archive mode is used to atomically create a hot-backup tgz file of a the
storage directory for a given directory while respecting the locking
subsystem.

examples:

0) make a hot-backup of the database and all supporting files in
incoming_data/

~ > dirwatch archive incoming_data/

-----------------------------------------------------------------------------
mode: watch (w)
-----------------------------------------------------------------------------

this is the meat of dirwatch.

dirwatch is designed to run as a daemon, updating a database inventory at the
interval specified by the '--interval' option (5 minutes by default) and
firing appropriate trigger commands. two watchers may not watch the same dir
simoultaneously and attempting the start a second watcher will fail when the
second watcher is unable to obtain a lockfile. it is a non-fatal error to
attempt to start another watcher when one is running and this failure can be
made silent by using the '--quiet' option. the reason for this is to allow a
crontab entry to be used to make the daemon 'immortal'. for example, the
following crontab entry

*/15 * * * * dirwatch directory --daemon

will __attempt__ to start a daemon watching 'directory' every fifteen minutes.
if the daemon is not already running one will started, otherwise dirwatch will
simply fail silently (no cron email sent due to stderr).

this feature allows a normal user to setup daemon processes that will not only
run after machine reboot, but which will continue to run after other unforseen
terminal program behaviour. such a daemon is known as an 'immortal' daemon.

as the watcher runs and maintains the database inventory it is noted when
files/directories (entries) have been created, modified, updated, deleted, or
are existing. these entries are then handled by user definable triggers as
specified in the config file. the config file is of the format

...
actions :
created :
commands :
...
updated :
commands :
...
...
...

where the commands to be run for each trigger type are enumerated. each
command entry is of the following format:
...
-
command : the command to run
type : calling convention, how info is passed to the program
pattern : filter files by this regex
timing : synchronous or asynchronous execution
...

further explanation of each field:

command: this is the program to run. the search path for the program is
modified to first include the commands/ dir underneath the
.dirwatch/ dir in the directory being watched.

type: there are four types of commands. the type merely indicates the
calling convention of the program. when commands are run there are
two peices of information which are passed to the program, the file
in question and the mtime of that file. the mtime is less
important but programs may use it to know if the file has been
changed since they were last spawned or other bookkeeping. mtime
will probably be ignored for most commands. the four types of
commands fall into two catagories: those commands called once for
each file and those types of commands called once with __all__
files

file at a time:

simple: the command will be called with two arguments: the file
in question and the mtime datetime, eg:

command foobar.txt '2002-11-04 01:01:01.1234'

expanded: the command will be have the strings '@file' and
'@mtime' replaced with appropriate values. eg:

command '@file' '@mtime'

expands to (and is called as)

command 'somefile' '2002-11-04 01:01:01.1234'

files at once:

filter: the stdin of the program will be given a list where each
line contains two items, the file and the datetime.

yaml: the stdin of the program will be given a list where each
entry contains two items, the file and the mtime. the
format of the list is valid yaml and the schema is an
array of hashes where each hash has the keys 'path' and
'mtime'.

pattern: all the files for a given action are filtered by this pattern,
and only those files matching pattern will have triggers fired.

timing: if timing is asynchronous the command will be run and not waited
for before starting the next command. asynchronous commands may
yield better performance but may also result in many commands being
run at once. asyncronous commands should not be programs that load
the system heavily unless one is looking to freeze a machine.
synchronous commands are spawned and waited for before the next
command is started. a side effect of synchronous commands is that
the time spent waiting may sum to an ammount of time greater than
the interval ('--interval' option) specified - if the amount of
time spent running commands exceeds the interval the next inventory
simply begins immeadiately with no pause. because of this one
should think of the interval used as a minimum bound only,
especially when synchronous commands are used.


note that sample commands of each type are auto-generated in the
dbdir/commands directory. reading these should answer any questions regarding
the calling conventions of any of the four types. for other questions regard
the sample config, which is also auto-generated.

examples:

0) run a watch from this terminal (non daemon)

~ > dirwatch directory watch


-----------------------------------------------------------------------------
mode: list (l)
-----------------------------------------------------------------------------

dump the contents of the database in yaml format for easy viewing/parsing

examples:

0) dump database as yaml

~ > dirwatch directory list


ENVIRONMENT

for dirwatch itself:

export SLDB_DEBUG=1 -> cause sldb lib actions (sql) to be logged
export LOCKFILE_DEBUG=1 -> cause lockfile lib actions to be logged

for programs run by dirwatch the following environment variables will be set:

DIRWATCH_DIR -> the directory being watched
DIRWATCH_ACTION -> action type, one of 'instance', 'created', 'modified',
'updated', 'deleted', or 'existing'
DIRWATCH_TYPE -> command type, one of 'simple', 'expanded', 'filter', or
'yaml'
DIRWATCH_N_PATHS -> the total number of paths for this action. the paths
themselves will be passed to the program in a different
way depending on DIRWATCH_TYPE, for instance on the
command line or on stdin, but this number will always
be the total number of paths the program should expect.
DIRWATCH_PATH_IDX -> for some command types, like 'simple', the program will
be run more than once to handle all paths since calling
convention only allows the program to be called with
one path at a time. this number is the index of the
current path in such cases. for instance, a 'simple'
program may only be called with one path at a time so
if 10 files were created in the directory that would
result in the program being called 10 times. in each
case DIRWATCH_N_PATHS would be 10 and DIRWATCH_PATH_IDX
would range from 0 to 9 for each of the 10 calls to the
program. in the case of 'filter' and 'yaml' command
types, where every path is given at once on stdin this
value will be equal to DIRWATCH_N_PATHS
DIRWATCH_PATH -> for 'simple' and 'expanded' command types, which are
called once for each path, this will contain the path
the program is being called with. in the case of
'filter' or 'yaml' command types the varible contains
the string 'stdin' implying that all paths are
available on stdin.
DIRWATCH_MTIME -> for 'simple' and 'expanded' command types, which are
called once for each path, this will contain the mtime
the program is being called with. in the case of
'filter' or 'yaml' command types the varible contains
the string 'stdin' implying that all mtimes are
available on stdin.
DIRWATCH_PID -> the pid of dirwatch watcher process
DIRWATCH_ID -> an identifier for this action that will be unique for
any given run of a dirwatch watcher process.
restarting the watcher resets the generator. this
identifier is logged in the dirwatch watcher logs to is
useful to match program logs with dirwatch logs
PATH -> the normal shell path. for each program run the PATH
is modified to contain the commands dir of the dirwatch
watcher processs. normally this will be
$DIRWATCH_DIR/.dirwatch/commands/:$PATH


note that all the sample programs generated show how to access these
environment vars.


FILES
directory/.dirwatch/ -> dirwatch data files
directory/.dirwatch/dirwatch.conf -> default configuration file
directory/.dirwatch/commands/ -> default location for triggers
directory/.dirwatch/db -> sldb/sqlite database
directory/.dirwatch/dirwatch.pid -> default pidfile
directory/.dirwatch/logs/ -> automatically rolled log files

DIAGNOSTICS
success -> $? == 0
failure -> $? != 0


AUTHOR
(e-mail address removed)


BUGS
1 < bugno && bugno < 42

OPTIONS
--help, -h
this message
--log=path, -l
set log file - (default stderr)
--verbosity=verbostiy, -v
0|fatal < 1|error < 2|warn < 3|info < 4|debug - (default info)
--config=path
valid path - specify config file (default nil)
--template=[path]
valid path - generate a template config file in path (default stdout)
--recursive, -r
recurse into subdirectories (default do not recurse)
--all, -a
consider all filesystem entries, includig directories (default files
only)
--follow, -f
follow links (default does not follow links)
--pattern=pattern, -p
consider only filesystem entries that match pattern (default all
entries)
--daemon, -D
specify daemon mode (default not daemon)
--quiet, -Q
be wery wery quiet (default not quiet)
--dirwatch_dir=dirwatch_dir, -S
specify dirwatch storage dir (default .dirwatch/ in dir being watched)
--n_loops=n_loops, -N
loop only this many times before exiting (default infinite)
--interval=seconds, -I
sleep at least this long between loops (default 300sec (5min))
--lockfile, -L
create a lockfile in dir while running (default no lockfile)



===============================================================================
TUTORIAL
===============================================================================

INTRODUCTION

the following shows how to setup a simple file processing system using
dirwatch. it assumes a successful install of dirwatch. eg. the command

~> dirwatch --help

should operate


STEP 0

make a temporaray directory, if using sh/bash do something like

~ > export tmp=./tmp
~ > mkdir $tmp

for here on we use the $tmp variable to refer to our directory


STEP 1

initialize the directory for dirwatch

~ > dirwatch $tmp create
---
./tmp:
dirwatch_dir : ./tmp/.dirwatch
db : ./tmp/.dirwatch/db
logs_dir : ./tmp/.dirwatch/logs
config : ./tmp/.dirwatch/dirwatch.conf
commands_dir : ./tmp/.dirwatch/commands



STEP 2

create three subdirectories in $tmp, a, b, and c

~ > for d in a b c;do mkdir $tmp/$d;done


STEP 3

edit the dirwatch.conf

~ > vi $tmp/.dirwatch/dirwatch.conf

change the section which reads

actions:
updated :
-
command: simple.sh
type: simple
pattern: ^.*$
timing: sync

to

actions:
updated :
-
command: yaml.rb
type: yaml
pattern: ^.*$
timing: sync

here we are telling dirwatch to run the command 'yaml.rb' (which will be
looked for in $tmp/.dirwatch/commands and then the normal $PATH) whenever a
file is 'updated.' updated means that a file has been created or modified.
run

~ > dirwatch --help

for more info


STEP 4

edit yaml.rb

~ > vi $tmp/.dirwatch/commands/yaml.rb

we want a program that looks very close to this, you may have to adjust your
shebang line:

#!/usr/bin/env ruby
require 'yaml'
#
# the dir being watched
#
dirwatch_dir = ENV['DIRWATCH_DIR']
#
# load entries from stdin. this is a yaml doccument.
#
entries = YAML::load STDIN
#
# process each entry
#
entries.each do |entry|
#
# get the path and mtime of the updated file
#
path, mtime = entry['path'], entry['mtime']
#
# split into directory and filename components
#
dirname, basename = File::split path
#
# get the last directory component
#
dir = File::basename dirname
#
# perform actions based on dir - files contain numbers:
#
# - new files in dir 'a' get doubled and the result written to dir 'b'
# - new files in dir 'b' get two added and the result written to dir 'c'
# - new files in in dir 'c' are displayed as the result
#
case dir
when 'a'
n = Integer(IO::read(path))
n *= 2
output = File::join dirwatch_dir, 'b', basename
open(output, 'w'){|f| f.write n}
when 'b'
n = Integer(IO::read(path))
n += 2
output = File::join dirwatch_dir, 'c', basename
open(output, 'w'){|f| f.write n}
when 'c'
n = Integer(IO::read(path))
puts "result <#{ basename }> => <#{ n }>"
end

the comments should make it obvious that this program, which dirwatch will
spawn as new files are created or modified loads the updated (because we
configured it that way) file and assumes a number in contained in it. when
the file was updated in directory $tmp/a we double the number and write the
output into a file of the same basename in $tmp/b. here the number in $tmp/b
has two added to it and this result in written to a file of the same basename
in $tmp/c.

be sure you've edited $tmp/.dirwatch/commands/yaml.rb and
$tmp/.dirwatch/dirwatch.conf before continuing.


STEP 5

start dirwatch. normally dirwatch runs as a daemon that checks the dir every
five minutes, but here we will run from the console so we can see it's logging
information. note the '--recursive' flag is given so that dirwatch will
descend into the subdirectories of $tmp. this is important!. also, we use
the '--interval' option to specify a polling interval of 5 seconds. we would
not use such a short period for a production system but this interval is
alright for illustration. we start a watch:


~ > dirwatch $tmp --interval=5 --recursive
I, [2005-07-01T16:33:37.821687 #9146] INFO -- : ** STARTED **
I, [2005-07-01T16:33:37.822853 #9146] INFO -- : config <./tmp/.dirwatch/dirwatch.conf>
I, [2005-07-01T16:33:37.823136 #9146] INFO -- : recursive <true>
I, [2005-07-01T16:33:37.823309 #9146] INFO -- : all <false>
I, [2005-07-01T16:33:37.823423 #9146] INFO -- : follow <false>
I, [2005-07-01T16:33:37.823549 #9146] INFO -- : pattern <>
I, [2005-07-01T16:33:37.823680 #9146] INFO -- : n_loops <>
I, [2005-07-01T16:33:37.823887 #9146] INFO -- : interval <00:00:05>
I, [2005-07-01T16:33:37.824170 #9146] INFO -- : lockfile <./tmp/.dirwatch.lock>
I, [2005-07-01T16:33:37.824335 #9146] INFO -- : tmpwatch[all] <false>
I, [2005-07-01T16:33:37.824432 #9146] INFO -- : tmpwatch[nodirs] <false>
I, [2005-07-01T16:33:37.824551 #9146] INFO -- : tmpwatch[force] <true>
I, [2005-07-01T16:33:37.824745 #9146] INFO -- : tmpwatch[age] <30 days> == <2592000.0s>
I, [2005-07-01T16:33:37.824859 #9146] INFO -- : tmpwatch[rm] <rm_rf>
...
...
...



STEP 6

now, from another terminal drop a file containing a number into $tmp/a.
something like

~ > echo 10 > $tmp/a/n

within a few seconds you'll see, in the dirwatch terminal something like

I, [2005-07-01T16:33:47.855151 #9146] INFO -- : ACTION.UPDATED.0.0 - cmd : yaml.rb
I, [2005-07-01T16:33:47.928216 #9146] INFO -- : ACTION.UPDATED.0.0 - exit_status : 0
I, [2005-07-01T16:33:52.880694 #9146] INFO -- : ACTION.UPDATED.1.1 - cmd : yaml.rb
I, [2005-07-01T16:33:52.948847 #9146] INFO -- : ACTION.UPDATED.1.1 - exit_status : 0
I, [2005-07-01T16:33:57.856376 #9146] INFO -- : ACTION.UPDATED.2.2 - cmd : yaml.rb
result <n> => <22>
I, [2005-07-01T16:33:57.928320 #9146] INFO -- : ACTION.UPDATED.2.2 - exit_status : 0

so we have produced a result of 22 by doubling 10 and adding two to it merely
by dropping a file in a directory!

notice that both the output and the logging are going to the terminal here.
actually the logging goes to stderr by default and any program output/errput
is mingled here. in actual use the logging goes into a log file in $tmp/logs/
that automatically rolls (you never need to truncate it) and any output/errput
from the programs run is simply discarded. note that you can certainly keep
output by using something like

command: myprogram >> myprogram.log 2>&1

in the dirwatch.conf file.


STEP 7

now, remember that we configured yaml.rb to fire for any file that was
updated where the meaning of updated is that a file was created or modified.
if we we're to open up $tmp/a/n in vi and change the 10 to a 20 we'd soon see

result <n> => <42>

appear in the console running the watch.


STEP 8

after getting a system configured and the triggers working properly you
defintely don't want to have to start dirwatch be hand each time. dirwatch
will refuse to start two watches on a given directory and can be enabled to
run as a daemon. because of this it's quite acceptable to cron a dirwatch to
start every so often. something like

*/15 * * * * dirwatch /full/path/to/directory --daemon

will maintain a dirwatch process at all times, even after machine reboot.
note that this does not start a new watch each time - if the watch fails to
start because another is already running dirwatch simply exits with 1 but
nothing is printed to stderr so cron won't mail you tons of stuff. using this
technique a normal user can configure a daemon process to run at all times.
of course a feature could be started at machine boot too using a simply
script.


STEP 9

we now have set up a simply processing system using dirwatch. it can be used
to configure quite complex processing flows via the configuration file and the
programs run - hopefully you'll find a useful way of using it yourself. if so
please contact me at (e-mail address removed) and let me know the details.


enjoy.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
J

James Britt

Ara.T.Howard said:
===============================================================================

URLS
===============================================================================


http://codeforpeople.com/lib/ruby/dirwatch/
http://raa.ruby-lang.org/project/dirwatch/


===============================================================================

README (also see TUTORIAL below)
===============================================================================


NAME
dirwatch v0.9.0

Very, very nice.

Are there any docs for it?


:)



James

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
 
A

Ara.T.Howard

02/07/2005 12:58:11

http://codeforpeople.com/lib/ruby/dirwatch/
http://raa.ruby-lang.org/project/dirwatch/
To pick up an earlier thread - "will it run on Windows?" - I guess
the answer is "no" as the startup process seems to require shell
script? (i.e. Unix only). I've no clue about Unix.. how can I at
least start it under Windows?

yeah... i lost (accidentally deleted) the email from you.. sorry. the
start-up process does not require a shell script - it's all one ruby program.

in any case there are a few things that would make it tough to run on windows.

* posixlocking - windows doesn't support it. you can fix this by making a
posixlock.rb file that has this in it

class File; alias posixlock flock; end

* running as a daemon requires a fork. you don't have to do this though.

if you do the posixlock thing you could probably then do

~ > dirwatch directory/ create
~ > dirwatch directory/ watch

and see what happens. if that works there is a good chance you could use it
under windows. i don't know how to run a service under windows but that's
what you'd want to do. try the posixlock thing, install all the depends
(included in the tar ball) and see where you can get. there's nothing about
it that requires windows it's just that i don't have a windows machine at home
or anywhere at work - just thousands on linux boxes ;-)

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
A

Ara.T.Howard

03/07/2005 08:34:16

WOW - a complete treasure trove of useful functionality that I didn't
know existed. The ChangeNotify seems to be deprecated and
ChangeJournal its replacement. (I hope that tChangeJournal doesn't
require you to watch an entire drive).
These methods seem very much smaller and less complex than dirwatch,
as they don't appear to use a database behind them. (I assume they
trust the filesystem information and events?) Anyone any experience
with either toolset?
Thx
Graham

dirwatch basically emulates (and adds) to this functionality, which is
built-in to the windows file systems. the normal unix file systems does not
provide hooks to do this sort of thing - that's why i wrote dirwatch? of
course a big difference is that dirwatch is durable across machine reboots.

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top