File Transfer and Multiple Disk Partition Support

J

jeffotn

I have a scenario I would like to discuss.

I have a set of servers that get constant ftp of files and a subset of
those files need to be transferred to a central file storage system.
The files are also processed internally on those servers for their own
local file storage for review. A java batch job runs to determine
which files meet the criteria to be sent to the central storage.

What is the best way to transfer all of these files via the network to
the central file storage server, control directory size, and support
multiple disk partition? I currently use a network share to create
move the files over to the central share vs. FTP. I clocked around
over 530 files per minute transferred to date.

There is no problem when the central server processes them and makes a
file visible online, however as the base directory where the regional
servers transfers files to the central files server grows, I expect the
processing engine to slow down.

Also how should multiple disk partition get handled? Simply via a NAS
implementation or should this be considered in the code.

Expected volume of files daily is around 48,000 with about 30-50% of
those files being sent to the central file storage server. Currently a
directory is created for each day of processing and processing occurs
every hour, so daily you are looking at around 24 new directories.

I dont want to reinvent the wheel if something exists that I could
purchase, leverage via Open Source etc. I did some digging and
understand that network latency will be an issue that it is recommended
that you have about 10K files per directory, and that you should try to
have numerous subdirectories to maintain the browsing at the parent
directory level small.
 
C

Chris

I have a scenario I would like to discuss.

I have a set of servers that get constant ftp of files and a subset of
those files need to be transferred to a central file storage system.
The files are also processed internally on those servers for their own
local file storage for review. A java batch job runs to determine
which files meet the criteria to be sent to the central storage.

What is the best way to transfer all of these files via the network to
the central file storage server, control directory size, and support
multiple disk partition?

I asked about transferring files earlier today, and got suggestions to
use FTP, SCP, RSync, etc. Not sure this is the best solution.

I *will* say, though, that maintaining that many separate files in a
directory structure will lead to madness. If you send 50% of 48,000
files every day, you're talking 6 million files a year. (Assuming files
sent on business days only). Have you ever tried to copy or back up a
million files? It's hellishly slow. File systems just aren't designed
for this.

If it's possible, I would zip the files together and send one a day. Or
even a hundred a day. Your life will be much easier. java.util.zip works
great for this, and if you need individual, random access to the files,
it's not hard to write code that will extract the file of interest.

Maybe rsync is the way to go. Put your zipped files in a local directory
structure, and then just replicate them out.
 
J

jeffotn

Thanks Chris,

Unfortunately RSynch is out of the question. The expect load per day
8000 files and these files are zipped and can contain about 6-8 files
in them. The file names are cyrptic and based on the file name I weed
them out and send about 50%-75% < goal over time to the central
processing as good files we want.

The central file server then unzips the files using Info-Zip
(opensource -they are password protected and not supported by the
java.util.zip package) <dont ask> I am just one piece of the large
puzzle. Since java does not have a move utility, i have to do buffered
file IO, When the network is good we do about 136 per minute and the
process runs every 2 minutes (picks up the zip files ftped , weeds the
good from the bad, and then sends to central server)

The ftps are 24x7. I know that the files will continue to grow but I
want to make them managable in a directory/subdirectory convention and
did not know if there was an opensource utility that did this. Else I
have to program in my own algorithm with throttle per directory.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top