C
Chris
I need to be able to have multiple processes add and delete files in a
directory in completely atomic way. Can't quite figure out how to do it.
I have three independent processes. One adds small files to a directory
(the "Adder"). Another merges the small files into big ones, and then
deletes the small ones (the "Merger"). A third simply reads the files in
a random way (the "Reader"). The Reader is heavily multi-threaded --
lots of reading might be going on simultaneously.
The files are read-only. There will be a maximum of a few hundred files
at any given time.
These different functions may or may not be running in the same JVM. It
is possible that multiple JVMs will be hitting the same directory,
possibly ones running on different machines all hitting a shared drive.
When a Reader thread starts to read files, the list of files must not
change until it's done. The file-reading process takes at most a second
or two. If the Merger wants to delete a file during that time, it must wait.
The Adder process must be able to notify the Reader and Merger processes
that a new file has been added. The Merger must be able to notify the
Reader that the current list of files has changed, so that the next time
the Reader starts a new thread, it uses the most current list.
I'm guessing that I might be able to do all this by having a plain text
file in the directory that lists the "current" files, and just have the
Adder and Merger processes put an exclusive file lock on it whenever the
list needs to change. The Adder and Merger can create any new files with
a .tmp extension, and then rename them in a very fast operation to make
them live.
I haven't figured out how to handle it, though, if the system crashes
while the Merger is renaming or deleting files, or how to prevent files
from being deleted while the Reader is using them (how do we know when
the various Reader threads have finished with a file?). I'm hoping that
I won't need to implement some kind of transaction log with commit/rollback.
Any thoughts appreciated.
directory in completely atomic way. Can't quite figure out how to do it.
I have three independent processes. One adds small files to a directory
(the "Adder"). Another merges the small files into big ones, and then
deletes the small ones (the "Merger"). A third simply reads the files in
a random way (the "Reader"). The Reader is heavily multi-threaded --
lots of reading might be going on simultaneously.
The files are read-only. There will be a maximum of a few hundred files
at any given time.
These different functions may or may not be running in the same JVM. It
is possible that multiple JVMs will be hitting the same directory,
possibly ones running on different machines all hitting a shared drive.
When a Reader thread starts to read files, the list of files must not
change until it's done. The file-reading process takes at most a second
or two. If the Merger wants to delete a file during that time, it must wait.
The Adder process must be able to notify the Reader and Merger processes
that a new file has been added. The Merger must be able to notify the
Reader that the current list of files has changed, so that the next time
the Reader starts a new thread, it uses the most current list.
I'm guessing that I might be able to do all this by having a plain text
file in the directory that lists the "current" files, and just have the
Adder and Merger processes put an exclusive file lock on it whenever the
list needs to change. The Adder and Merger can create any new files with
a .tmp extension, and then rename them in a very fast operation to make
them live.
I haven't figured out how to handle it, though, if the system crashes
while the Merger is renaming or deleting files, or how to prevent files
from being deleted while the Reader is using them (how do we know when
the various Reader threads have finished with a file?). I'm hoping that
I won't need to implement some kind of transaction log with commit/rollback.
Any thoughts appreciated.