Ruby fs watcher?

  • Thread starter Jean-Etienne Durand
  • Start date
J

Jean-Etienne Durand

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

Thank you,
Jean-Etienne
 
W

Wilson Bilkovich

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

This is fairly operating-system specific. Which one are you using?
 
T

Tim Pease

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

Take a look at Ara's dirwatch solution. Does exactly what you want.

http://raa.ruby-lang.org/project/dirwatch/

Blessings,
TwP
 
T

Thomas Adam

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

If you're using Linux, you can use dnotify (and the newer 'anotify'
where applicable). I have no idea how Windows would handle such a
thing, if at all.

-- Thomas Adam
 
S

Sam Smoot

Thomas said:
If you're using Linux, you can use dnotify (and the newer 'anotify'
where applicable). I have no idea how Windows would handle such a
thing, if at all.

-- Thomas Adam

You'd used WMI in Windows, potentially through the WIN32OLE library, to
monitor for file events. The FileSystemWatcher class in .NET is just a
wrapper around this functionality.
 
A

ara.t.howard

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

Thank you,
Jean-Etienne

http://codeforpeople.com/lib/ruby/dirwatch/
http://codeforpeople.com/lib/ruby/dirwatch/dirwatch-0.9.0/README

-a
 
A

ara.t.howard

This problem is easily solved, and in a portable way. You create a list of
the files and their modification times, then sleep for some interval, then
wake up and compare the stored modification times with the new ones, also
test for any new files. Take action on any new or modified files. Maybe 25
lines of Ruby code.

you'd think - until your script stops, restarts, and you re-fire actions for
all previous actions. if your action happens to have been something like

system "something_which_should_only_happen_for_new_files #{ file }"

you're screwed. that approach is simply not that much more durable that
cron'ing a script to process every file every minute since, logically the
system can degrade to that.

i think a transactional db is an absolute requirement of such a system.

another, absoulute must, for such a system, is the ability to deal with
batches up updated files. the reason is that this:

while(true)
get_new_files
process_new_files
end

is terrifically flawed of 100,000 new files arrive at once - since it requires
you to spawn 100,000 new processes. ideally the processing can be batched in
chunks. dirwatch allows this by providing a config option to pass all
files to be processed to the script on stdin.
Be sure to sleep for an interval between tests, otherwise your script will
hog the processor.

also a flaw. if you simply sleep, say 200s, between loops you waste time when
the actions you just took required more than that time. basically you want to
ensure at least n seconds elapses between scans of the directory, but if the
system is very busy you will not need to sleep since simply processing may
require this amount of time.

in summary, having written three or four such systems and ultimately arriving
at the code for dirwatch, which we've used in production for 24x7 satellite
ingest systems for several year, i can assure you the task isn't quite that
trivial.

regards.

-a
 
R

Robert Klemme

Paul said:
This problem is easily solved, and in a portable way. You create a list of
the files and their modification times, then sleep for some interval, then
wake up and compare the stored modification times with the new ones, also
test for any new files. Take action on any new or modified files. Maybe 25
lines of Ruby code.

Be sure to sleep for an interval between tests, otherwise your script will
hog the processor.

You rather need the mtime of the directory and only that - at least if
you are interested in /new/ files only:

require 'set'

last = nil
set = Set.new

loop do
current = File.mtime "."
if last.nil? || last < current
s = Dir["*"].to_set
p s - set
set = s
last = current
end
sleep 1
end


Test run:

$ !ru
ruby /cygdrive/c/Temp/watch.rb &
[1] 1432

robert@fussel ~
$ #<Set: {"xx", "x", "bin", "a.1234"}>
touch foo

robert@fussel ~
$ #<Set: {"foo"}>
touch bar

robert@fussel ~
$ #<Set: {"bar"}>
touch bar

robert@fussel ~
$ touch baz

robert@fussel ~
$ #<Set: {"baz"}>

Kind regards

robert
 
A

ara.t.howard

(e-mail address removed) wrote:

/ ...


Yes, but the OP wants to know how to do it, not produce a mature, robust
version. He may want to learn the programming aspects on his own, make his
own mistakes. From the content of his post, he didn't bother to yield any
time during execution, therefore his script ate up the CPU and consequently
failed.

At that level, a simple solution really is simple.

i guess you're right. i get defensive at the mere suggestion of simply event
or cron based processing systems without mutual exclusion - i've debugged way
too many of them!

cheers.

-a
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top