Chris Berg said:
I am writing an application that maintains a lot of binary files. Each
file has to be accessed randomly, that is, the file pointer slides
back and forth to get data, and occasionally data is appended to the
end of the file or data is overwritten in the middle. There can be
several hundred files open at the same time, accessed by concurrent
threads. (Q: "What on earth is he up to now?" A: Well, it is a sort of
server application, but it is too complicated to go into detail. Bear
with me).
Now, I wonder if I can replace all the data files with file entries in
a database. I am not very experienced in programming databases, and I
am certainly a novice in evaluating in advance what the performance
implications may be.
Yes, you could put your files into databases, although the exact technique
may vary depending on the size of the files. If they are small enough, they
might be stored as normal values within the table; if they are very large,
you might store the file name in the database and provide a reference to the
file so that it can be easily obtained.
But here goes:
1) Can I make random access to the data in a database file the same
way as in a disk file?
You can access each value directly but the coding would be quite different
from the code you use to access a random access file. Not necessarily harder
but definitely different. You normally use the SQL language via either ODBC
or JDBC to access the data that you want.
2) - or do I have to read the whole file entry into ram before
accessing it?
Typically, the entire file is stored in a single specific row and column of
a table and you would read the whole file in one gulp. However, if the data
is text and you store it in a CLOB (Character Large OBject) datatype, you
can use "locators" to find and remove specific sections of the file. For
instance, if the file was a resume and each resume followed a consistent
pattern, you could search for just the Education information on the resume
and return only that to your program.
3) Can I do it concurrently with several hundred files?
I'm not sure what you mean by that. Do you mean can you store several
hundred files in the database at the same time? If so, no problem: you can
store MILLIONS of files in a database. If you are asking whether hundreds of
programs/users can be connected to the database to look for data at the same
time, the answer is also yes. That's what the big corporations do; think
about all the people using ATMs for a specific bank at any given moment:
they are probably all accessing the same copy of the same database.
Now, depending on how many concurrent users you have and the resources they
are going after, you could get performance bottlenecks but most should be
tunable: if you throw enough extra disk space or memory at the server
running the database, it should be able to handle almost any load.
4) Is MySQL a good choice?
MySQL is a pretty decent relational database and seems to be professional
quality. (I've been working with databases professionally for 20+ years.) I
know DB2 better and think it is perhaps better than MySQL but MySQL is
catching up fairly quickly in terms of functionality.
5) I reckon it will be somewhat slower, but how much?
Don't count on it being slower! Remember those ATMs! Look at how quickly you
can get cash from your account, even though the computer may be hundreds of
kilometers from where you are and the database may have millions of rows in
it and hundreds of concurrent users.
Quantifying any performance improvement or degradation is virtually
impossible without a lot more information on the specifics of your hardware,
software, data requirements, user load and database design. Remember,
performance on professional quality databases like MySQL and DB2 can be
tuned considerably so even if your initial performance is not great, you can
probably improve it a lot by careful tuning.
Now, before I leave you with the impression that entering the world of
databases is all peaches and cream, be warned. Databases are really neat and
very useful but if you've never used them before, expect some learning curve
as you learn how to use them and used them effectively. Maybe a LOT of
learning curve if you have no ideas about the concepts involved, let alone
any previous experience with them. Prepare to invest time and/or money in
learning about databases or hiring people to do it for you. There is also,
of course, the expense of acquiring a licence for your database, the
database software itself, the computer it will run on, and the extra
hardware that computer may need (more disks and memory are a possibility).
Also, you will need to set up communications to enable your programs to talk
to your database which could be trivial or a real can of worms, depending on
what you're doing. That doesn't mean it will necessarily be expensive to
implement a database; one of my friends downloaded MySQL and installed it on
a Linux server that is housed on an old PC he had hanging around; we use
TCP/IP to communicate between the programs running on my computer and the
database on his server so we didn't need any additional network.
Any answer is appreciated.
I hope this helps you. Feel free to ask followup questions.
Rhino