Ruby fs watcher?

Jean-Etienne Durand · Oct 26, 2006

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

Thank you,
Jean-Etienne

Wilson Bilkovich · Oct 26, 2006

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

This is fairly operating-system specific. Which one are you using?

Tim Pease · Oct 26, 2006

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

Take a look at Ara's dirwatch solution. Does exactly what you want.

http://raa.ruby-lang.org/project/dirwatch/

Blessings,
TwP

Thomas Adam · Oct 26, 2006

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

If you're using Linux, you can use dnotify (and the newer 'anotify'
where applicable). I have no idea how Windows would handle such a
thing, if at all.

-- Thomas Adam

Sam Smoot · Oct 26, 2006

Thomas said:
If you're using Linux, you can use dnotify (and the newer 'anotify'
where applicable). I have no idea how Windows would handle such a
thing, if at all.

-- Thomas Adam

You'd used WMI in Windows, potentially through the WIN32OLE library, to
monitor for file events. The FileSystemWatcher class in .NET is just a
wrapper around this functionality.

ara.t.howard · Oct 26, 2006

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

Thank you,
Jean-Etienne

http://codeforpeople.com/lib/ruby/dirwatch/
http://codeforpeople.com/lib/ruby/dirwatch/dirwatch-0.9.0/README

-a

ara.t.howard · Oct 26, 2006

This problem is easily solved, and in a portable way. You create a list of
the files and their modification times, then sleep for some interval, then
wake up and compare the stored modification times with the new ones, also
test for any new files. Take action on any new or modified files. Maybe 25
lines of Ruby code.

you'd think - until your script stops, restarts, and you re-fire actions for
all previous actions. if your action happens to have been something like

system "something_which_should_only_happen_for_new_files #{ file }"

you're screwed. that approach is simply not that much more durable that
cron'ing a script to process every file every minute since, logically the
system can degrade to that.

i think a transactional db is an absolute requirement of such a system.

another, absoulute must, for such a system, is the ability to deal with
batches up updated files. the reason is that this:

while(true)
get_new_files
process_new_files
end

is terrifically flawed of 100,000 new files arrive at once - since it requires
you to spawn 100,000 new processes. ideally the processing can be batched in
chunks. dirwatch allows this by providing a config option to pass all
files to be processed to the script on stdin.

Be sure to sleep for an interval between tests, otherwise your script will
hog the processor.

also a flaw. if you simply sleep, say 200s, between loops you waste time when
the actions you just took required more than that time. basically you want to
ensure at least n seconds elapses between scans of the directory, but if the
system is very busy you will not need to sleep since simply processing may
require this amount of time.

in summary, having written three or four such systems and ultimately arriving
at the code for dirwatch, which we've used in production for 24x7 satellite
ingest systems for several year, i can assure you the task isn't quite that
trivial.

regards.

-a

Robert Klemme · Oct 26, 2006

Paul said:
This problem is easily solved, and in a portable way. You create a list of
the files and their modification times, then sleep for some interval, then
wake up and compare the stored modification times with the new ones, also
test for any new files. Take action on any new or modified files. Maybe 25
lines of Ruby code.

Be sure to sleep for an interval between tests, otherwise your script will
hog the processor.

You rather need the mtime of the directory and only that - at least if
you are interested in /new/ files only:

require 'set'

last = nil
set = Set.new

loop do
current = File.mtime "."
if last.nil? || last < current
s = Dir["*"].to_set
p s - set
set = s
last = current
end
sleep 1
end

Test run:

$ !ru
ruby /cygdrive/c/Temp/watch.rb &
[1] 1432

robert@fussel ~
$ #<Set: {"xx", "x", "bin", "a.1234"}>
touch foo

robert@fussel ~
$ #<Set: {"foo"}>
touch bar

robert@fussel ~
$ #<Set: {"bar"}>
touch bar

robert@fussel ~
$ touch baz

robert@fussel ~
$ #<Set: {"baz"}>

Kind regards

robert

ara.t.howard · Oct 26, 2006

(e-mail address removed) wrote:

/ ...

Yes, but the OP wants to know how to do it, not produce a mature, robust
version. He may want to learn the programming aspects on his own, make his
own mistakes. From the content of his post, he didn't bother to yield any
time during execution, therefore his script ate up the CPU and consequently
failed.

At that level, a simple solution really is simple.

i guess you're right. i get defensive at the mere suggestion of simply event
or cron based processing systems without mutual exclusion - i've debugged way
too many of them!

cheers.

-a

basic directory watcher - sanity check	2	May 29, 2009
directory watcher, trying to match filename to directory name.	10	Apr 21, 2009
Potential bug: ASP.NET session variables and file watcher service	1	May 21, 2007
Ruby Regex	8	May 9, 2011
Bare metal.	0	Sep 14, 2022
print gem VERSION from ruby program	9	Jun 22, 2010
Help figuring out a directory permission change problem	1	May 12, 2023
Checkinstall fails to create a Ruby 1.9.2 package	5	Apr 8, 2011

Ruby fs watcher?

Jean-Etienne Durand

Wilson Bilkovich

Tim Pease

Thomas Adam

Sam Smoot

ara.t.howard

ara.t.howard

Robert Klemme

ara.t.howard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads