Hey Thomas (or anybody else)
(e-mail address removed) (Thomas Weidenfeller), Wed, 09 Jul 2003
07:06:15 +0000:
I know. I just didn't want to tell the original poster that his idea of
first reading in everything into memory is a !@#$%^&* idea. Instead I
wanted to provide an off-the-shelf solution. Of course, writing your
own little special-purpose "database" and search code is also an
option.
<a
href="
http://forum.java.sun.com/thread.jsp?forum=4&thread=420695&start=0&range=30#1868195">
I was talking to somebody on JDC about multiple instances of serialised
data in a single file, and was wondering if anybody could provide some
input:
into the previously allocated space. And a small
buffer could be used to copy the rest of the file backward.
I see that deletion does not have to be much of an issue per se.
But the thing is: we are talking about serialised data, and thus variable
length 'records'.
Another problem is modification. There is no way to guarantee that a
modified object will fit back into it's own 'bucket'. That is when using
the approach of storing multiple serialised objects in a single file,
every modification of a serialised object will imply a deletion and addition.
The only things that are easy to do when having a single file with
multiple objects is to add. And even that requires you to do a hack by
adding deletion flag.
Searching this file for a specific instance will also be akward. For one
thing if you want to do fast searching, you would have to resort to
pattern matching, as the data is not stored in any particular ordering and
has variable length.
The more think of it, the nastier it gets. Deletion is but the start of a
problem. I am not saying it won't work. But all the extra work to get it
working just to have an esthetically nice solution (single file) makes me
think it is the result of not thinking clearly.
Maybe somebody has done it like this already, I really don't know. I
haven't seen an implementation like this yet. I thought about what dub
said. I'd like to know what anybody has to say about the following:
Code a class that will handle the save/delete/find methods.
<code>
public abstract class PersistenceManager {
public static PersistenceManager getInstance(Object object);
public void save(Serializable object) throws NotLocked;
public void delete(Serializable object) throws NotLocked;
public void lock(Serializable object) throws AlreadyLocked, ObjectNotFound;
public void unlock(Serializable object) throws NotLocked;
public Serializable find(Class serializableClass, Comparator query);
public Serializable find(Class serializableClass, String md5hash);
}
</code>
It's a singleton so it should be able to handle concurrent access to the
backing store. This way the PersistenceManager can do some form of
rudimentary caching (LRU, FIFO, whatever) and row locking in a single VM.
The basic idea is clear, I hope. I'm not totally sure about the lock and
unlock methods (deadlocks/starvation).
A simple scenario would be something like this:
<code>
File baseDirectory=..; // like the namespaces in SQL
...
public void save(Serializable object)
throws NotLocked
{
File classDirectory;
classDirectory = new File(baseDirectory, object.getClass().getName());
if (!classDirectory.exists()) classDirectory().mkdirs();
.. calculate key ..
.. check if this object has been locked, lock the key ..
.. serialise and save the object..
.. unlock ..
}
</code>
To get back on-topic. The major benefit of doing it like this, with a
single file per object instance is:
- deletion is a sequence like lock/delete file/unlock.
- saving is lock/create file/unlock.
- sequential search should not occur extra overhead (but is not synchronized)
One catch I can see now is that it requires the Serializable class to
either have some kind of unique id, meaning all objects should be unique:
This way everything is still accessible using random access methods, but
with the filesystem acting as 'one big file'. On the other hand, if I am
correct the OS itself usually does the caching of file access so you don't
need to go great lengths to do caching (hopefully).
And what I find most important: memory usage of this approach should be
realy low; since the it will only use the amount of a single object with
some serialisation overhead.
I am just not sure what do with the indexing of the serialised objects. I
am planning to use the filenames as the key to an entry. Should I subclass
the Serializable contract to enforce the presence of a keyvalue? I can't
rely on the hashCode() method. Or should I do something like a MD5/SHA
hash on the byte representation of the serialised object? That should
provide unique keyvalues shouldn't it? These would be consistent across
several executions of the application.
I am planning om implementing this, can anybody provide some useful input
if this meant to fail beforehand?
Greets
Bhun.