Random Access Files in databases

C

Chris Berg

I am writing an application that maintains a lot of binary files. Each
file has to be accessed randomly, that is, the file pointer slides
back and forth to get data, and occasionally data is appended to the
end of the file or data is overwritten in the middle. There can be
several hundred files open at the same time, accessed by concurrent
threads. (Q: "What on earth is he up to now?" A: Well, it is a sort of
server application, but it is too complicated to go into detail. Bear
with me).

Now, I wonder if I can replace all the data files with file entries in
a database. I am not very experienced in programming databases, and I
am certainly a novice in evaluating in advance what the performance
implications may be.

But here goes:

1) Can I make random access to the data in a database file the same
way as in a disk file?

2) - or do I have to read the whole file entry into ram before
accessing it?

3) Can I do it concurrently with several hundred files?

4) Is MySQL a good choice?

5) I reckon it will be somewhat slower, but how much?

Any answer is appreciated.

Chris
 
C

Chris Smith

Chris Berg said:
1) Can I make random access to the data in a database file the same
way as in a disk file?

Not the same way, but you can do it. See the java.sql.Blob class for
details.
2) - or do I have to read the whole file entry into ram before
accessing it?
No.

3) Can I do it concurrently with several hundred files?

Sure, as long as your database is up to the task.
4) Is MySQL a good choice?

No, MySQL is never a good choice, for two reasons:

1. They abuse the GPL and thus contribute to creating false fears among
companies using GPLed software. They try to push the viral nature of
the GPL to extremes by redefining common words like "distribute", to the
point that they claim you can violate their copyright without ever
having heard of their product.

2. They don't care about data integrity. A simple typo can cause the
database to forget about all the reliability techniques that are common
to all other major database products; things like transactions, foreign
keys, etc. MySQL tries to look good by avoiding ever giving you warning
or error messages, even when it's almost certain that something is
wrong.

Depending on your budget and goals, there are a number of choices. On
the open-source side, PostgreSQL is just as easy to install and use as
MySQL, and solves both of the two problems above.
5) I reckon it will be somewhat slower, but how much?

Depends heavily on the choice of database server.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
O

Oliver Wong

Chris Berg said:
I am writing an application that maintains a lot of binary files. Each
file has to be accessed randomly, that is, the file pointer slides
back and forth to get data, and occasionally data is appended to the
end of the file or data is overwritten in the middle. There can be
several hundred files open at the same time, accessed by concurrent
threads. (Q: "What on earth is he up to now?" A: Well, it is a sort of
server application, but it is too complicated to go into detail. Bear
with me).

Now, I wonder if I can replace all the data files with file entries in
a database. I am not very experienced in programming databases, and I
am certainly a novice in evaluating in advance what the performance
implications may be.

I am not a Database design guru either, but it might be wise to break
the files into individual records and store each record as a row in a table,
rather than making the whole file be a row in a table.

In other words, are you really moving around the file on a byte by byte
basis, or is it more like you want to get a String, so you move to a
location and read in in the string, and then you want to get an integer, so
you move to a different location and read in the integer, etc.?

Aren't there natural boundaries that you can break up the files into?

Anyway, here's the answers to your question (to the best of my
knowledge) but I think you might be approaching this the wrong way (though
it's hard to say without knowing more about what the application does).
1) Can I make random access to the data in a database file the same
way as in a disk file?

I'm pretty sure the SQL language does have ways to seek around in BLOB
data values. They might be treating the BLOBs like strings though, and using
"substring-like syntax" to do this seeking.
2) - or do I have to read the whole file entry into ram before
accessing it?

I think some underlying SQL implementations may do this as part of the
query described above.
3) Can I do it concurrently with several hundred files?

Quite probably, some servers have limits on number of connections that
can be open at a time.
4) Is MySQL a good choice?

The documentation is relatively good and the community is decent, though
some SQL purist are quick to point out that MySQL doesn't implement the SQL
standard properly.
5) I reckon it will be somewhat slower, but how much?

You'll have to benchmark it yourself. Anything you read here will
probably be wild guesses, especially since we have no idea what your program
does.

- Oliver
 
R

Rhino

Chris Berg said:
I am writing an application that maintains a lot of binary files. Each
file has to be accessed randomly, that is, the file pointer slides
back and forth to get data, and occasionally data is appended to the
end of the file or data is overwritten in the middle. There can be
several hundred files open at the same time, accessed by concurrent
threads. (Q: "What on earth is he up to now?" A: Well, it is a sort of
server application, but it is too complicated to go into detail. Bear
with me).

Now, I wonder if I can replace all the data files with file entries in
a database. I am not very experienced in programming databases, and I
am certainly a novice in evaluating in advance what the performance
implications may be.
Yes, you could put your files into databases, although the exact technique
may vary depending on the size of the files. If they are small enough, they
might be stored as normal values within the table; if they are very large,
you might store the file name in the database and provide a reference to the
file so that it can be easily obtained.
But here goes:

1) Can I make random access to the data in a database file the same
way as in a disk file?
You can access each value directly but the coding would be quite different
from the code you use to access a random access file. Not necessarily harder
but definitely different. You normally use the SQL language via either ODBC
or JDBC to access the data that you want.
2) - or do I have to read the whole file entry into ram before
accessing it?
Typically, the entire file is stored in a single specific row and column of
a table and you would read the whole file in one gulp. However, if the data
is text and you store it in a CLOB (Character Large OBject) datatype, you
can use "locators" to find and remove specific sections of the file. For
instance, if the file was a resume and each resume followed a consistent
pattern, you could search for just the Education information on the resume
and return only that to your program.
3) Can I do it concurrently with several hundred files?
I'm not sure what you mean by that. Do you mean can you store several
hundred files in the database at the same time? If so, no problem: you can
store MILLIONS of files in a database. If you are asking whether hundreds of
programs/users can be connected to the database to look for data at the same
time, the answer is also yes. That's what the big corporations do; think
about all the people using ATMs for a specific bank at any given moment:
they are probably all accessing the same copy of the same database.

Now, depending on how many concurrent users you have and the resources they
are going after, you could get performance bottlenecks but most should be
tunable: if you throw enough extra disk space or memory at the server
running the database, it should be able to handle almost any load.
4) Is MySQL a good choice?
MySQL is a pretty decent relational database and seems to be professional
quality. (I've been working with databases professionally for 20+ years.) I
know DB2 better and think it is perhaps better than MySQL but MySQL is
catching up fairly quickly in terms of functionality.
5) I reckon it will be somewhat slower, but how much?
Don't count on it being slower! Remember those ATMs! Look at how quickly you
can get cash from your account, even though the computer may be hundreds of
kilometers from where you are and the database may have millions of rows in
it and hundreds of concurrent users.

Quantifying any performance improvement or degradation is virtually
impossible without a lot more information on the specifics of your hardware,
software, data requirements, user load and database design. Remember,
performance on professional quality databases like MySQL and DB2 can be
tuned considerably so even if your initial performance is not great, you can
probably improve it a lot by careful tuning.

Now, before I leave you with the impression that entering the world of
databases is all peaches and cream, be warned. Databases are really neat and
very useful but if you've never used them before, expect some learning curve
as you learn how to use them and used them effectively. Maybe a LOT of
learning curve if you have no ideas about the concepts involved, let alone
any previous experience with them. Prepare to invest time and/or money in
learning about databases or hiring people to do it for you. There is also,
of course, the expense of acquiring a licence for your database, the
database software itself, the computer it will run on, and the extra
hardware that computer may need (more disks and memory are a possibility).
Also, you will need to set up communications to enable your programs to talk
to your database which could be trivial or a real can of worms, depending on
what you're doing. That doesn't mean it will necessarily be expensive to
implement a database; one of my friends downloaded MySQL and installed it on
a Linux server that is housed on an old PC he had hanging around; we use
TCP/IP to communicate between the programs running on my computer and the
database on his server so we didn't need any additional network.
Any answer is appreciated.
I hope this helps you. Feel free to ask followup questions.

Rhino
 
R

Roedy Green

1) Can I make random access to the data in a database file the same
way as in a disk file?

If you put entire files in a database, then you read/write entire
files at a pop. You would presumably be dragging a lot of data around
you did not change. However, the database might be clever about
caching.

If your files can be thought of as variable length records, then you
can put the individual records it the database. The database then
handles records growing and shrinking, inserting, deleting etc in
clever ways.

You could put all 100 files into one big database pot, saving file
handles and buffers for inactive files.

The database also deals with problems of simultaneous update for you.

If your records are all fixed length then using memory mapping is
likely to work well. Just place everything in one big memory mapped
file and read and write with abandoning nio. Let the OS cache and
delay writes.

To decide which approach is best would need you to reveal more detail.
If you don't want to do that publicly, you might consider hiring
someone to consult with you on a confidential basis. Hint.
 
A

Andrey Kuznetsov

file has to be accessed randomly, that is, the file pointer slides
back and forth to get data, and occasionally data is appended to the
end of the file or data is overwritten in the middle. There can be
several hundred files open at the same time, accessed by concurrent
threads. (Q: "What on earth is he up to now?" A: Well, it is a sort of
server application, but it is too complicated to go into detail. Bear
with me).

Now, I wonder if I can replace all the data files with file entries in
a database. I am not very experienced in programming databases, and I
am certainly a novice in evaluating in advance what the performance
implications may be.

But here goes:

1) Can I make random access to the data in a database file the same
way as in a disk file?
seems that this anno is exactly for you:
http://www.javalobby.org/java/forums/t53408.html
 
S

steve

I am writing an application that maintains a lot of binary files. Each
file has to be accessed randomly, that is, the file pointer slides
back and forth to get data, and occasionally data is appended to the
end of the file or data is overwritten in the middle. There can be
several hundred files open at the same time, accessed by concurrent
threads. (Q: "What on earth is he up to now?" A: Well, it is a sort of
server application, but it is too complicated to go into detail. Bear
with me).

Now, I wonder if I can replace all the data files with file entries in
a database. I am not very experienced in programming databases, and I
am certainly a novice in evaluating in advance what the performance
implications may be.

But here goes:

1) Can I make random access to the data in a database file the same
way as in a disk file?

2) - or do I have to read the whole file entry into ram before
accessing it?

3) Can I do it concurrently with several hundred files?

4) Is MySQL a good choice?

5) I reckon it will be somewhat slower, but how much?

Any answer is appreciated.

Chris

forget random access files, do it all in the database , it is WAY more
scalable.

postgreSQL, or oracle.

depending if you have no money , or you do have money.

if the app is designed correctly , it will not be that much more slower.


Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top