[ANN] dirwatch-0.9.0

Discussion in 'Ruby' started by Ara.T.Howard, Jul 2, 2005.

  1. Ara.T.Howard

    Ara.T.Howard Guest



    README (also see TUTORIAL below)

    dirwatch v0.9.0

    dirwatch [ options ]+ [ directory = ./ ] [ mode = watch ]

    dirwatch is a tool used to rapidly build event driven processing systems.

    dirwatch manages an sqlite database that mirrors the state of a directory and
    then triggers user definable event handlers for certain filesystem activities
    such file creation, modification, deletion, etc. dirwatch can also implement
    a tmpwatch like behaviour to ensure files of a certain age are removed from
    the directory being watched. dirwatch normally runs as a daemon process by
    first sychronizing the database inventory with that of the directory and then
    firing appropriate triggers as they occur.

    the following actions may have triggers configured for them

    created -> a file was created
    modified -> a file has had it's mtime updated
    updated -> the union of created and modified
    deleted -> a file was deleted
    existing -> a file has not changed but is still exists

    the command line 'mode' must be one of the following

    create (c) -> initialize the database and supporting files
    watch (w) -> monitor directory and trigger actions in the foreground
    start (S) -> spawn a daemon watcher in the background
    restart (R) -> (re)spawn a daemon watcher in the background
    stop (H) -> stop/halt any currently running watcher
    status (T) -> determine if any watcher is currently running
    truncate (D) -> truncate/delete all entries from the database
    archive (a) -> create a hot-backup of a watch's database contents
    list (l) -> dump database to stdout in silky smooth yaml format

    the default mode is to 'watch'.

    for all modes the command line argument must be the name of the directory to
    which to apply the operation - this defaults to the current directory.

    mode: create (c)

    initializes a storage directory with all required database files, logs,
    command directories, sample configuration, sample programs, etc.


    0) initialize the directory incoming_data/ to be dirwatched using all

    ~ > dirwatch create incoming_data/

    mode: start (S)

    dirwatch is normally run in daemon mode. the start mode is equivalent to
    running in 'watch' mode with the '--daemon' and '--quiet' flags.


    0) start a background daemon process watching incoming_data/

    ~ > dirwatch start incoming_data/

    mode: restart (R)

    'restart' mode checks a watcher's pidfile and either restarts the currently
    running watcher or starts a new one as in 'start' mode. this is equivalent to
    sending SIGHUP to the watcher daemon process.


    0) re-start a background daemon process watching incoming_data/

    ~ > dirwatch restart incoming_data/

    mode: stop (H)

    'stop' mode checks for any process watching the specified directory and kills
    this process if it exists. this is equivalent to sending TERM to the watcher
    daemon process. the process will not exit immediately but will do at the
    first possible safe opportunity. do __not__ kill -9 the daemon process.


    0) stop the daemon process watching incoming_data/

    ~ > dirwatch stop incoming_data/

    mode: status (T)

    'status' mode reports whether or not a watcher is running for the given


    0) report on the watcher, iff any, watching incoming_data/

    ~ > dirwatch status incoming_data/

    mode: truncate (D)

    'truncate' mode empties the database of all state in an atomic fashion.


    0) empty the database in a safe way

    ~ > dirwatch truncate incoming_data/

    mode: archive (a)

    archive mode is used to atomically create a hot-backup tgz file of a the
    storage directory for a given directory while respecting the locking


    0) make a hot-backup of the database and all supporting files in

    ~ > dirwatch archive incoming_data/

    mode: watch (w)

    this is the meat of dirwatch.

    dirwatch is designed to run as a daemon, updating a database inventory at the
    interval specified by the '--interval' option (5 minutes by default) and
    firing appropriate trigger commands. two watchers may not watch the same dir
    simoultaneously and attempting the start a second watcher will fail when the
    second watcher is unable to obtain a lockfile. it is a non-fatal error to
    attempt to start another watcher when one is running and this failure can be
    made silent by using the '--quiet' option. the reason for this is to allow a
    crontab entry to be used to make the daemon 'immortal'. for example, the
    following crontab entry

    */15 * * * * dirwatch directory --daemon

    will __attempt__ to start a daemon watching 'directory' every fifteen minutes.
    if the daemon is not already running one will started, otherwise dirwatch will
    simply fail silently (no cron email sent due to stderr).

    this feature allows a normal user to setup daemon processes that will not only
    run after machine reboot, but which will continue to run after other unforseen
    terminal program behaviour. such a daemon is known as an 'immortal' daemon.

    as the watcher runs and maintains the database inventory it is noted when
    files/directories (entries) have been created, modified, updated, deleted, or
    are existing. these entries are then handled by user definable triggers as
    specified in the config file. the config file is of the format

    actions :
    created :
    commands :
    updated :
    commands :

    where the commands to be run for each trigger type are enumerated. each
    command entry is of the following format:
    command : the command to run
    type : calling convention, how info is passed to the program
    pattern : filter files by this regex
    timing : synchronous or asynchronous execution

    further explanation of each field:

    command: this is the program to run. the search path for the program is
    modified to first include the commands/ dir underneath the
    .dirwatch/ dir in the directory being watched.

    type: there are four types of commands. the type merely indicates the
    calling convention of the program. when commands are run there are
    two peices of information which are passed to the program, the file
    in question and the mtime of that file. the mtime is less
    important but programs may use it to know if the file has been
    changed since they were last spawned or other bookkeeping. mtime
    will probably be ignored for most commands. the four types of
    commands fall into two catagories: those commands called once for
    each file and those types of commands called once with __all__

    file at a time:

    simple: the command will be called with two arguments: the file
    in question and the mtime datetime, eg:

    command foobar.txt '2002-11-04 01:01:01.1234'

    expanded: the command will be have the strings '@file' and
    '@mtime' replaced with appropriate values. eg:

    command '@file' '@mtime'

    expands to (and is called as)

    command 'somefile' '2002-11-04 01:01:01.1234'

    files at once:

    filter: the stdin of the program will be given a list where each
    line contains two items, the file and the datetime.

    yaml: the stdin of the program will be given a list where each
    entry contains two items, the file and the mtime. the
    format of the list is valid yaml and the schema is an
    array of hashes where each hash has the keys 'path' and

    pattern: all the files for a given action are filtered by this pattern,
    and only those files matching pattern will have triggers fired.

    timing: if timing is asynchronous the command will be run and not waited
    for before starting the next command. asynchronous commands may
    yield better performance but may also result in many commands being
    run at once. asyncronous commands should not be programs that load
    the system heavily unless one is looking to freeze a machine.
    synchronous commands are spawned and waited for before the next
    command is started. a side effect of synchronous commands is that
    the time spent waiting may sum to an ammount of time greater than
    the interval ('--interval' option) specified - if the amount of
    time spent running commands exceeds the interval the next inventory
    simply begins immeadiately with no pause. because of this one
    should think of the interval used as a minimum bound only,
    especially when synchronous commands are used.

    note that sample commands of each type are auto-generated in the
    dbdir/commands directory. reading these should answer any questions regarding
    the calling conventions of any of the four types. for other questions regard
    the sample config, which is also auto-generated.


    0) run a watch from this terminal (non daemon)

    ~ > dirwatch directory watch

    mode: list (l)

    dump the contents of the database in yaml format for easy viewing/parsing


    0) dump database as yaml

    ~ > dirwatch directory list


    for dirwatch itself:

    export SLDB_DEBUG=1 -> cause sldb lib actions (sql) to be logged
    export LOCKFILE_DEBUG=1 -> cause lockfile lib actions to be logged

    for programs run by dirwatch the following environment variables will be set:

    DIRWATCH_DIR -> the directory being watched
    DIRWATCH_ACTION -> action type, one of 'instance', 'created', 'modified',
    'updated', 'deleted', or 'existing'
    DIRWATCH_TYPE -> command type, one of 'simple', 'expanded', 'filter', or
    DIRWATCH_N_PATHS -> the total number of paths for this action. the paths
    themselves will be passed to the program in a different
    way depending on DIRWATCH_TYPE, for instance on the
    command line or on stdin, but this number will always
    be the total number of paths the program should expect.
    DIRWATCH_PATH_IDX -> for some command types, like 'simple', the program will
    be run more than once to handle all paths since calling
    convention only allows the program to be called with
    one path at a time. this number is the index of the
    current path in such cases. for instance, a 'simple'
    program may only be called with one path at a time so
    if 10 files were created in the directory that would
    result in the program being called 10 times. in each
    case DIRWATCH_N_PATHS would be 10 and DIRWATCH_PATH_IDX
    would range from 0 to 9 for each of the 10 calls to the
    program. in the case of 'filter' and 'yaml' command
    types, where every path is given at once on stdin this
    value will be equal to DIRWATCH_N_PATHS
    DIRWATCH_PATH -> for 'simple' and 'expanded' command types, which are
    called once for each path, this will contain the path
    the program is being called with. in the case of
    'filter' or 'yaml' command types the varible contains
    the string 'stdin' implying that all paths are
    available on stdin.
    DIRWATCH_MTIME -> for 'simple' and 'expanded' command types, which are
    called once for each path, this will contain the mtime
    the program is being called with. in the case of
    'filter' or 'yaml' command types the varible contains
    the string 'stdin' implying that all mtimes are
    available on stdin.
    DIRWATCH_PID -> the pid of dirwatch watcher process
    DIRWATCH_ID -> an identifier for this action that will be unique for
    any given run of a dirwatch watcher process.
    restarting the watcher resets the generator. this
    identifier is logged in the dirwatch watcher logs to is
    useful to match program logs with dirwatch logs
    PATH -> the normal shell path. for each program run the PATH
    is modified to contain the commands dir of the dirwatch
    watcher processs. normally this will be

    note that all the sample programs generated show how to access these
    environment vars.

    directory/.dirwatch/ -> dirwatch data files
    directory/.dirwatch/dirwatch.conf -> default configuration file
    directory/.dirwatch/commands/ -> default location for triggers
    directory/.dirwatch/db -> sldb/sqlite database
    directory/.dirwatch/dirwatch.pid -> default pidfile
    directory/.dirwatch/logs/ -> automatically rolled log files

    success -> $? == 0
    failure -> $? != 0


    1 < bugno && bugno < 42

    --help, -h
    this message
    --log=path, -l
    set log file - (default stderr)
    --verbosity=verbostiy, -v
    0|fatal < 1|error < 2|warn < 3|info < 4|debug - (default info)
    valid path - specify config file (default nil)
    valid path - generate a template config file in path (default stdout)
    --recursive, -r
    recurse into subdirectories (default do not recurse)
    --all, -a
    consider all filesystem entries, includig directories (default files
    --follow, -f
    follow links (default does not follow links)
    --pattern=pattern, -p
    consider only filesystem entries that match pattern (default all
    --daemon, -D
    specify daemon mode (default not daemon)
    --quiet, -Q
    be wery wery quiet (default not quiet)
    --dirwatch_dir=dirwatch_dir, -S
    specify dirwatch storage dir (default .dirwatch/ in dir being watched)
    --n_loops=n_loops, -N
    loop only this many times before exiting (default infinite)
    --interval=seconds, -I
    sleep at least this long between loops (default 300sec (5min))
    --lockfile, -L
    create a lockfile in dir while running (default no lockfile)



    the following shows how to setup a simple file processing system using
    dirwatch. it assumes a successful install of dirwatch. eg. the command

    ~> dirwatch --help

    should operate

    STEP 0

    make a temporaray directory, if using sh/bash do something like

    ~ > export tmp=./tmp
    ~ > mkdir $tmp

    for here on we use the $tmp variable to refer to our directory

    STEP 1

    initialize the directory for dirwatch

    ~ > dirwatch $tmp create
    dirwatch_dir : ./tmp/.dirwatch
    db : ./tmp/.dirwatch/db
    logs_dir : ./tmp/.dirwatch/logs
    config : ./tmp/.dirwatch/dirwatch.conf
    commands_dir : ./tmp/.dirwatch/commands

    STEP 2

    create three subdirectories in $tmp, a, b, and c

    ~ > for d in a b c;do mkdir $tmp/$d;done

    STEP 3

    edit the dirwatch.conf

    ~ > vi $tmp/.dirwatch/dirwatch.conf

    change the section which reads

    updated :
    command: simple.sh
    type: simple
    pattern: ^.*$
    timing: sync


    updated :
    command: yaml.rb
    type: yaml
    pattern: ^.*$
    timing: sync

    here we are telling dirwatch to run the command 'yaml.rb' (which will be
    looked for in $tmp/.dirwatch/commands and then the normal $PATH) whenever a
    file is 'updated.' updated means that a file has been created or modified.

    ~ > dirwatch --help

    for more info

    STEP 4

    edit yaml.rb

    ~ > vi $tmp/.dirwatch/commands/yaml.rb

    we want a program that looks very close to this, you may have to adjust your
    shebang line:

    #!/usr/bin/env ruby
    require 'yaml'
    # the dir being watched
    dirwatch_dir = ENV['DIRWATCH_DIR']
    # load entries from stdin. this is a yaml doccument.
    entries = YAML::load STDIN
    # process each entry
    entries.each do |entry|
    # get the path and mtime of the updated file
    path, mtime = entry['path'], entry['mtime']
    # split into directory and filename components
    dirname, basename = File::split path
    # get the last directory component
    dir = File::basename dirname
    # perform actions based on dir - files contain numbers:
    # - new files in dir 'a' get doubled and the result written to dir 'b'
    # - new files in dir 'b' get two added and the result written to dir 'c'
    # - new files in in dir 'c' are displayed as the result
    case dir
    when 'a'
    n = Integer(IO::read(path))
    n *= 2
    output = File::join dirwatch_dir, 'b', basename
    open(output, 'w'){|f| f.write n}
    when 'b'
    n = Integer(IO::read(path))
    n += 2
    output = File::join dirwatch_dir, 'c', basename
    open(output, 'w'){|f| f.write n}
    when 'c'
    n = Integer(IO::read(path))
    puts "result <#{ basename }> => <#{ n }>"

    the comments should make it obvious that this program, which dirwatch will
    spawn as new files are created or modified loads the updated (because we
    configured it that way) file and assumes a number in contained in it. when
    the file was updated in directory $tmp/a we double the number and write the
    output into a file of the same basename in $tmp/b. here the number in $tmp/b
    has two added to it and this result in written to a file of the same basename
    in $tmp/c.

    be sure you've edited $tmp/.dirwatch/commands/yaml.rb and
    $tmp/.dirwatch/dirwatch.conf before continuing.

    STEP 5

    start dirwatch. normally dirwatch runs as a daemon that checks the dir every
    five minutes, but here we will run from the console so we can see it's logging
    information. note the '--recursive' flag is given so that dirwatch will
    descend into the subdirectories of $tmp. this is important!. also, we use
    the '--interval' option to specify a polling interval of 5 seconds. we would
    not use such a short period for a production system but this interval is
    alright for illustration. we start a watch:

    ~ > dirwatch $tmp --interval=5 --recursive
    I, [2005-07-01T16:33:37.821687 #9146] INFO -- : ** STARTED **
    I, [2005-07-01T16:33:37.822853 #9146] INFO -- : config <./tmp/.dirwatch/dirwatch.conf>
    I, [2005-07-01T16:33:37.823136 #9146] INFO -- : recursive <true>
    I, [2005-07-01T16:33:37.823309 #9146] INFO -- : all <false>
    I, [2005-07-01T16:33:37.823423 #9146] INFO -- : follow <false>
    I, [2005-07-01T16:33:37.823549 #9146] INFO -- : pattern <>
    I, [2005-07-01T16:33:37.823680 #9146] INFO -- : n_loops <>
    I, [2005-07-01T16:33:37.823887 #9146] INFO -- : interval <00:00:05>
    I, [2005-07-01T16:33:37.824170 #9146] INFO -- : lockfile <./tmp/.dirwatch.lock>
    I, [2005-07-01T16:33:37.824335 #9146] INFO -- : tmpwatch[all] <false>
    I, [2005-07-01T16:33:37.824432 #9146] INFO -- : tmpwatch[nodirs] <false>
    I, [2005-07-01T16:33:37.824551 #9146] INFO -- : tmpwatch[force] <true>
    I, [2005-07-01T16:33:37.824745 #9146] INFO -- : tmpwatch[age] <30 days> == <2592000.0s>
    I, [2005-07-01T16:33:37.824859 #9146] INFO -- : tmpwatch[rm] <rm_rf>

    STEP 6

    now, from another terminal drop a file containing a number into $tmp/a.
    something like

    ~ > echo 10 > $tmp/a/n

    within a few seconds you'll see, in the dirwatch terminal something like

    I, [2005-07-01T16:33:47.855151 #9146] INFO -- : ACTION.UPDATED.0.0 - cmd : yaml.rb
    I, [2005-07-01T16:33:47.928216 #9146] INFO -- : ACTION.UPDATED.0.0 - exit_status : 0
    I, [2005-07-01T16:33:52.880694 #9146] INFO -- : ACTION.UPDATED.1.1 - cmd : yaml.rb
    I, [2005-07-01T16:33:52.948847 #9146] INFO -- : ACTION.UPDATED.1.1 - exit_status : 0
    I, [2005-07-01T16:33:57.856376 #9146] INFO -- : ACTION.UPDATED.2.2 - cmd : yaml.rb
    result <n> => <22>
    I, [2005-07-01T16:33:57.928320 #9146] INFO -- : ACTION.UPDATED.2.2 - exit_status : 0

    so we have produced a result of 22 by doubling 10 and adding two to it merely
    by dropping a file in a directory!

    notice that both the output and the logging are going to the terminal here.
    actually the logging goes to stderr by default and any program output/errput
    is mingled here. in actual use the logging goes into a log file in $tmp/logs/
    that automatically rolls (you never need to truncate it) and any output/errput
    from the programs run is simply discarded. note that you can certainly keep
    output by using something like

    command: myprogram >> myprogram.log 2>&1

    in the dirwatch.conf file.

    STEP 7

    now, remember that we configured yaml.rb to fire for any file that was
    updated where the meaning of updated is that a file was created or modified.
    if we we're to open up $tmp/a/n in vi and change the 10 to a 20 we'd soon see

    result <n> => <42>

    appear in the console running the watch.

    STEP 8

    after getting a system configured and the triggers working properly you
    defintely don't want to have to start dirwatch be hand each time. dirwatch
    will refuse to start two watches on a given directory and can be enabled to
    run as a daemon. because of this it's quite acceptable to cron a dirwatch to
    start every so often. something like

    */15 * * * * dirwatch /full/path/to/directory --daemon

    will maintain a dirwatch process at all times, even after machine reboot.
    note that this does not start a new watch each time - if the watch fails to
    start because another is already running dirwatch simply exits with 1 but
    nothing is printed to stderr so cron won't mail you tons of stuff. using this
    technique a normal user can configure a daemon process to run at all times.
    of course a feature could be started at machine boot too using a simply

    STEP 9

    we now have set up a simply processing system using dirwatch. it can be used
    to configure quite complex processing flows via the configuration file and the
    programs run - hopefully you'll find a useful way of using it yourself. if so
    please contact me at and let me know the details.


    | email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    | phone :: 303.497.6469
    | My religion is very simple. My religion is kindness.
    | --Tenzin Gyatso
    Ara.T.Howard, Jul 2, 2005
    1. Advertisements

  2. Ara.T.Howard

    James Britt Guest

    Ara.T.Howard wrote:
    > ===============================================================================
    > URLS
    > ===============================================================================
    > http://codeforpeople.com/lib/ruby/dirwatch/
    > http://raa.ruby-lang.org/project/dirwatch/
    > ===============================================================================
    > README (also see TUTORIAL below)
    > ===============================================================================
    > NAME
    > dirwatch v0.9.0

    Very, very nice.

    Are there any docs for it?




    http://www.ruby-doc.org - The Ruby Documentation Site
    http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
    http://www.rubystuff.com - The Ruby Store for Ruby Stuff
    http://www.jamesbritt.com - Playing with Better Toys
    James Britt, Jul 2, 2005
    1. Advertisements

  3. Ara.T.Howard

    Ara.T.Howard Guest

    On Sat, 2 Jul 2005, Graham Foster wrote:

    > 02/07/2005 12:58:11
    > "Ara.T.Howard" <> wrote in message
    > http://codeforpeople.com/lib/ruby/dirwatch/
    > http://raa.ruby-lang.org/project/dirwatch/
    > To pick up an earlier thread - "will it run on Windows?" - I guess
    > the answer is "no" as the startup process seems to require shell
    > script? (i.e. Unix only). I've no clue about Unix.. how can I at
    > least start it under Windows?

    yeah... i lost (accidentally deleted) the email from you.. sorry. the
    start-up process does not require a shell script - it's all one ruby program.

    in any case there are a few things that would make it tough to run on windows.

    * posixlocking - windows doesn't support it. you can fix this by making a
    posixlock.rb file that has this in it

    class File; alias posixlock flock; end

    * running as a daemon requires a fork. you don't have to do this though.

    if you do the posixlock thing you could probably then do

    ~ > dirwatch directory/ create
    ~ > dirwatch directory/ watch

    and see what happens. if that works there is a good chance you could use it
    under windows. i don't know how to run a service under windows but that's
    what you'd want to do. try the posixlock thing, install all the depends
    (included in the tar ball) and see where you can get. there's nothing about
    it that requires windows it's just that i don't have a windows machine at home
    or anywhere at work - just thousands on linux boxes ;-)

    | email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    | phone :: 303.497.6469
    | My religion is very simple. My religion is kindness.
    | --Tenzin Gyatso
    Ara.T.Howard, Jul 2, 2005
  4. Patrick Hurley, Jul 3, 2005
  5. Ara.T.Howard


    On Sun, 3 Jul 2005, Graham Foster wrote:

    > 03/07/2005 08:34:16
    > Patrick Hurley <> wrote in message
    > <>
    >> You might want to check out change notify from Win32 Util stuff at
    >> ruby forge (http://rubyforge.org/projects/win32utils/).

    > WOW - a complete treasure trove of useful functionality that I didn't
    > know existed. The ChangeNotify seems to be deprecated and
    > ChangeJournal its replacement. (I hope that tChangeJournal doesn't
    > require you to watch an entire drive).
    > These methods seem very much smaller and less complex than dirwatch,
    > as they don't appear to use a database behind them. (I assume they
    > trust the filesystem information and events?) Anyone any experience
    > with either toolset?
    > Thx
    > Graham

    dirwatch basically emulates (and adds) to this functionality, which is
    built-in to the windows file systems. the normal unix file systems does not
    provide hooks to do this sort of thing - that's why i wrote dirwatch? of
    course a big difference is that dirwatch is durable across machine reboots.


    | email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    | phone :: 303.497.6469
    | My religion is very simple. My religion is kindness.
    | --Tenzin Gyatso
    , Jul 3, 2005
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike Sampson [MSFT]

    [ANN]: NNTP Server slow downs.

    Mike Sampson [MSFT], Oct 7, 2003, in forum: ASP .Net
    Mike Sampson [MSFT]
    Oct 7, 2003
  2. Mike Sampson [MSFT]

    [ANN]: NNTP Server slow downs.

    Mike Sampson [MSFT], Dec 6, 2003, in forum: ASP .Net
    Mike Sampson [MSFT]
    Dec 6, 2003
  3. Richard Grimes [MVP]

    ANN: Free .NET Workshops

    Richard Grimes [MVP], Jul 4, 2005, in forum: ASP .Net
    Richard Grimes [MVP]
    Jul 4, 2005
  4. Tom Hawkins

    [ANN] Confluence 0.7.1 Released

    Tom Hawkins, Oct 23, 2003, in forum: VHDL
    Tom Hawkins
    Oct 23, 2003
  5. Michael Livsey
    Michael Livsey
    May 27, 2004
  6. Ara.T.Howard

    [ANN] dirwatch-0.0.3

    Ara.T.Howard, Oct 15, 2004, in forum: Ruby
    Oct 15, 2004
  7. Ara.T.Howard

    [ANN] dirwatch-0.0.6

    Ara.T.Howard, Nov 4, 2004, in forum: Ruby
    Nov 4, 2004
  8. Replies: