Issues with unique object IDs in persistence

Jarrick Chagma · May 14, 2009

I am working on implementing a persistence mechanism for Java objects but I
cannot work out how to assign each object a unique ID of some kind. This ID
must have the following properties:

1. Must be of type long.
2. Every distinct object must have a distinct ID.
3. Objects which are the same according to equals() must have the same ID.
4. The ID must be able to be determined entirely from the object by itself.
5. If an object is stored in the persistence layer and then retrieved later,
the ID must be the same.

I thought about using hashcode() but it fails (1) and (5).

Can anyone think of another way to do this?

Stefan Ram · May 14, 2009

Jarrick Chagma said:
I am working on implementing a persistence mechanism for Java
objects but I cannot work out how to assign each object a
unique ID of some kind. This ID must have the following
properties:

1. Must be of type long.
2. Every distinct object must have a distinct ID.
3. Objects which are the same according to equals() must have the same ID.

»2.« and »3.« seem to contradict each other.

Can anyone think of another way to do this?

java.util.Map<java.lang.Object,lang.lang.Long>

Lew · May 14, 2009

Stefan said:
»2.« and »3.« seem to contradict each other.

Stefan said:
java.util.Map<java.lang.Object,lang.lang.Long>

I suggest using JPA and an RDBMS. The work's already done and it
leverages the best of both worlds.

Jarrick Chagma · May 14, 2009

Stefan Ram said:
»2.« and »3.« seem to contradict each other.

What I meant by "distinct" was in terms of their unique state.

java.util.Map<java.lang.Object,lang.lang.Long>

I am not sure how this would solve the problem. Are you saying that I would
need to provide the IDs myself (using some arbitrary algorithm) and then
store a map between each object and its ID in the persistence layer?
Wouldn't the use of Object in a Map still rely on hashcode() anyway?

Stefan Ram · May 14, 2009

Jarrick Chagma said:
Are you saying that I would need to provide the IDs myself
(using some arbitrary algorithm) and then store a map between
each object and its ID in the persistence layer?

Yes.

And whenever the state of an object is changed,
it needs to be removed from the map and inserted again.

Wouldn't the use of Object in a Map still rely on hashcode() anyway?

This might depend on the implementation of the Map.

Seamus MacRae · May 14, 2009

Jarrick said:
I am working on implementing a persistence mechanism for Java objects
but I cannot work out how to assign each object a unique ID of some
kind. This ID must have the following properties:

1. Must be of type long.
2. Every distinct object must have a distinct ID.
3. Objects which are the same according to equals() must have the same ID.
4. The ID must be able to be determined entirely from the object by itself.
5. If an object is stored in the persistence layer and then retrieved
later, the ID must be the same.

I thought about using hashcode() but it fails (1) and (5).

And (2).

Can anyone think of another way to do this?

When assigning an ID to an object:
1. Look the object up in a HashMap to see if an equal object already
exists. If so, assign it that ID.
2. Assign it the first unused ID.

This involves keeping a global counter that is incremented, call it
nextID. Trickier, it involves maintaining a HashMap of persisted
objects. You won't be able to use an actual Java HashMap; you'll need to
make a disk-based hash map. You already need a way to retrieve objects
by ID number; extend that to enable hash based retrieval as well.

The main trickiness is the actual hashing. Java objects can have either
of two types of hash code, as a rule: determined by the object's fields
(String, Integer) or the "identity hash code" (Object, ActionListener).
The former should work as hash codes in your persistence database,
modulo code changes or rt.jar updates that change some hashCode()
methods. (Those will require rehashing the objects stored in the database.)

The identity hash code is the problem here. However, objects with the
identity hash code actually are very easy to handle, because
nonidentical objects of this sort are (supposed to be) unequal ones. So
you can avoid the issue entirely. The procedure becomes:

1. Maintain a nextID in the persistence DB. Start it at zero.
2. When you get a new object, serialize it to a file.
3. Read it back in, and compare with the original. If they compare
.equals(), go to 4. If not, skip to 5.
4. The object has value semantics. Calculate its hash and look up
that hash in the persistent hash table. If the bucket is non-empty,
walk the linked list in it reading in objects and comparing them
to the target. If one is identical, assign the target that ID and
erase the serialized file and we're done. If none are identical,
or the bucket was empty, put the object into that hash bucket and
proceed.
5. Assign the target nextID, in the by-ID lookup table point to the
serialized file (perhaps by simply renaming it to the ID), and
bump nextID.

A slight trickiness remains: if 5 is implemented by renaming the file,
step 4 has to take that into account in the "put the object into that
hash bucket" step.

Another slight trickiness: step 3 will also catch singletons that
readResolve() to themselves, even if they use Object's hashCode(),
assuming standard Java serialization is used. This can be caught by
seeing if the deserialized object and the original compare ==. If they
do, treat them as if they had the Object hashCode() (even though they
might not). A persisted one will always deserialize to the singleton anyway.

This scheme will fail if:
* Disk space runs out.
* Objects are persisted that can't be serialized using the serialization
mechanism chosen.
* More than 2^64 distinct objects are persisted.
* Objects are persisted whose hashCode() is inconsistent with equals().
Particularly ones that don't override Object.hashCode() and do
override equals().

Of course, java.util.HashMap fails if the third or fourth condition is
met, as well. (In particular, 2^64 objects won't fit in RAM in
present-day or near-future computers, so you'll get OOME if you try to
store that many in a HashMap.) And any persistence scheme will fail if
the first or second condition is met. So this scheme apparently has only
the bare minimum requirements to succeed, instead of more stringent ones.

Jarrick Chagma · May 14, 2009

Seamus MacRae said:
And (2).

When assigning an ID to an object:
1. Look the object up in a HashMap to see if an equal object already
exists. If so, assign it that ID.
2. Assign it the first unused ID.

This involves keeping a global counter that is incremented, call it
nextID. Trickier, it involves maintaining a HashMap of persisted objects.
You won't be able to use an actual Java HashMap; you'll need to make a
disk-based hash map. You already need a way to retrieve objects by ID
number; extend that to enable hash based retrieval as well.

The main trickiness is the actual hashing. Java objects can have either of
two types of hash code, as a rule: determined by the object's fields
(String, Integer) or the "identity hash code" (Object, ActionListener).
The former should work as hash codes in your persistence database, modulo
code changes or rt.jar updates that change some hashCode() methods. (Those
will require rehashing the objects stored in the database.)

The identity hash code is the problem here. However, objects with the
identity hash code actually are very easy to handle, because nonidentical
objects of this sort are (supposed to be) unequal ones. So you can avoid
the issue entirely. The procedure becomes:

1. Maintain a nextID in the persistence DB. Start it at zero.
2. When you get a new object, serialize it to a file.
3. Read it back in, and compare with the original. If they compare
.equals(), go to 4. If not, skip to 5.
4. The object has value semantics. Calculate its hash and look up
that hash in the persistent hash table. If the bucket is non-empty,
walk the linked list in it reading in objects and comparing them
to the target. If one is identical, assign the target that ID and
erase the serialized file and we're done. If none are identical,
or the bucket was empty, put the object into that hash bucket and
proceed.
5. Assign the target nextID, in the by-ID lookup table point to the
serialized file (perhaps by simply renaming it to the ID), and
bump nextID.

A slight trickiness remains: if 5 is implemented by renaming the file,
step 4 has to take that into account in the "put the object into that hash
bucket" step.

Another slight trickiness: step 3 will also catch singletons that
readResolve() to themselves, even if they use Object's hashCode(),
assuming standard Java serialization is used. This can be caught by seeing
if the deserialized object and the original compare ==. If they do, treat
them as if they had the Object hashCode() (even though they might not). A
persisted one will always deserialize to the singleton anyway.

This scheme will fail if:
* Disk space runs out.
* Objects are persisted that can't be serialized using the serialization
mechanism chosen.
* More than 2^64 distinct objects are persisted.
* Objects are persisted whose hashCode() is inconsistent with equals().
Particularly ones that don't override Object.hashCode() and do
override equals().

Of course, java.util.HashMap fails if the third or fourth condition is
met, as well. (In particular, 2^64 objects won't fit in RAM in present-day
or near-future computers, so you'll get OOME if you try to store that many
in a HashMap.) And any persistence scheme will fail if the first or second
condition is met. So this scheme apparently has only the bare minimum
requirements to succeed, instead of more stringent ones.

Thanks Seamus for such a comprehensive and helpful reply. That's a lot of
information for me to digest. I will report back once I have gone through
it all in detail if I have any further questions.

Lew · May 14, 2009

Jarrick said:
I am not sure how this would solve the problem. Are you saying that I would
need to provide the IDs myself (using some arbitrary algorithm) and then

The algorithm need not be arbitrary. There are a number of standard
ones available. If you are willing to use a 128-bit ID instead of a
'long', you could use java.util.UUID, for example.

store a map between each object and its ID in the persistence layer?
Wouldn't the use of Object in a Map still rely on hashcode() anyway?

Even if it did use hashCode(), which, as Stefan Ram pointed out, it
might not, that wouldn't be the only thing it would rely on.
java.util.HashMap, for example, relies on hashCode() and equals().

hashCode() is not guaranteed to be unique for distinct objects. In
your case, since you want your ID to be a long and hashCode() returns
int, you have a prima facie inability to ensure such a guarantee.

Lew · May 14, 2009

Seamus said:
This involves keeping a global counter that is incremented, call it
nextID. Trickier, it involves maintaining a HashMap of persisted
objects. You won't be able to use an actual Java HashMap; you'll need to
make a disk-based hash map.

Seems like Derby (a.k.a., Java DB) or Postgres (among others) will
serve the need for a disk-based persistence mechanism with the ability
to maintain simultaneously a 'long' surrogate key and a unique
combination of values as a natural key. They have added advantages,
such as having already optimized access for single- or multi-user use,
having worked out the various algorithms such as unique ID generation
that one might need, coming with Java already in the case of Derby or
freely available in the case of Postgres and others, ready integration
with Java, scalability, and a host of other advantages.

Seamus MacRae · May 15, 2009

Lew said:
Seems like Derby (a.k.a., Java DB) or Postgres (among others) will
serve the need for a disk-based persistence mechanism with the ability
to maintain simultaneously a 'long' surrogate key and a unique
combination of values as a natural key. They have added advantages,
such as having already optimized access for single- or multi-user use,
having worked out the various algorithms such as unique ID generation
that one might need, coming with Java already in the case of Derby or
freely available in the case of Postgres and others, ready integration
with Java, scalability, and a host of other advantages.

If heavyweight persistence is needed, or the application has other needs
that can be met by a database, a heavyweight database might be a good
back-end for the implementation, yes.

It might still be useful for the OP to have some clue how to implement
something like this on his own, for instance, for general advancement of
computer science skills, for specific application when a heavyweight DB
would be overkill, or for specific application when no existing DB
implementation quite fits a particular project's needs for whatever reason.

Of course, implementing one's own full-blown heavyweight DB is rather
more involved than just persistence. You need transactions and
atomicity, some kind of consistency checking, and probably some
structured records capability beyond just "ID number, hash, and
serialized Java Object". And you need a query engine, joins, and the like.

Lew · May 15, 2009

Seamus said:
If heavyweight persistence is needed, or the application has other needs
that can be met by a database, a heavyweight database might be a good
back-end for the implementation, yes.

Derby hardly qualifies as "heavyweight" - learning enough to use it is
probably about as much effort as figuring out all the wrinkles of the
disk-based solution you outlined upthread. For that effort, you get a
well-thought out and heavily-tested solution and a skill set that increases
your marketability.

It might still be useful for the OP to have some clue how to implement
something like this on his own, for instance, for general advancement of
computer science skills, for specific application when a heavyweight DB
would be overkill, or for specific application when no existing DB
implementation quite fits a particular project's needs for whatever reason.

Of course, implementing one's own full-blown heavyweight DB is rather
more involved than just persistence. You need transactions and
atomicity, some kind of consistency checking, and probably some
structured records capability beyond just "ID number, hash, and
serialized Java Object". And you need a query engine, joins, and the like.

All of which come with Derby, Postgres and the rest. Derby is embeddable with
only about a 2MB memory footprint. It comes with (Sun) Java already. It's
very straightforward to use. So it has more power than you think you need at
first - the effort to use it is so small that all that extra power and
flexibility is not a disadvantage.

Seamus MacRae · May 15, 2009

Lew said:
Derby hardly qualifies as "heavyweight" - learning enough to use it is
probably about as much effort as figuring out all the wrinkles of the
disk-based solution you outlined upthread.

Heavyweight also involves such factors as code and data size,
configuration headache-inducingness, and complications to deployment.
For example, if the project is a desktop application, can you ask your
users to install a database server? Can they be expected to know how to
fix it if something gets corrupted that persists across reboots?

(Consider how much trouble typical computer users have if a problem
requires fiddling with the Windows registry; that's probably a good
indication of how they'd cope with any DB issue without a simple
push-button fix. Most likely they'd have to uninstall and reinstall to
recover, realistically. If the application is targeted at users with a
higher level of computer science education/technical skill than the lay
public, then it might not be a problem. If the application is targeted
at server admins, dealing with databases is already part of their job
description.)

To be clear, I'm not knocking using a full-blown database. I'm noting
that there will be situations where it's the best solution, and there
will be situations where it isn't.

Roedy Green · May 15, 2009

Can anyone think of another way to do this?

see http://mindprod.com/jgloss/unique.html
--
Roedy Green Canadian Mind Products
http://mindprod.com

"It wasn’t the Exxon Valdez captain’s driving that caused the Alaskan oil spill. It was yours."
~ Greenpeace advertisement New York Times 1990-02-25

Tom Anderson · May 15, 2009

What I meant by "distinct" was in terms of their unique state.

Your definition still isn't coherent, i'm afraid. The problem is that two
objects with different state can be equal - for instance:

public class Thing {
private int x;
private String s;
public Thing(int x, String s) {
this.x = x; this.s = s;
}
public boolean equals(Object obj) {
if ((obj == null) || (!obj instanceof Thing)) return false;
Thing that = (Thing)obj;
return this.x == that.x;
}
}

new Thing(1, "red").equals(new Thing(1, "blue")); // == true

If you're willing to drop condition 3, then the definition makes sense.

If were interested in object identity rather than object state, then the
solution is fairly easy: set up a counter, and allocate numbers from it to
every object you see, using an identityHashMap to keep track of them.

Distinction by state is much harder, though. And still not well-defined -
presumably, you'd consider these objects:

List a = Arrays.asList(Arrays.asList("foo"));
List a = Arrays.asList(a.get(0));

The same. But do you consider these objects:

List a = Arrays.asList(Arrays.asList("foo"));
List b = Arrays.asList(Arrays.asList("foo"));

The same or not? They're equal, they look very similar, but if you do
this:

a.set(0, "bar");

Or this:

((List)a.get(0)).set(0, "bar");

Then they will no longer be equal, which means that they shouldn't
necessarily be considered 'the same'.

If you don't consider the latter pair of lists the same, you can build an
ID out of the class of the object in question, the values of any primitive
fields in the object and the identity numbers (worked out as above!) of
any reference fields. It won't fit in a long, though. You could hash it
down to a long, but then you lose the guarantee of different objects
having different identities. Or you could use the counter mechanism again,
but this time with these complete-state IDs as a key, used to look up the
counter-issued ID. Note that since you need both the identity and state
mechanisms running in parallel, it probably makes sense to include the
counter-issued state ID in the record keyed by identity, since every
object has exactly one state ID, and the lookups in the identity map are
likely to be much faster than those in the state map, since you don't have
to construct a state ID to do them.

If you do consider the latter pair of lists the same, you have a rather
harder problem. You can apply the same strategy of constructing a mega-ID
containing the complete state of the object, but this time it has to
include the state of any pointed-to objects as well. You can do this by
descending into those objects, constructing their state ID, and then
combining those to make the state ID for the root object, applying this
recursively as necessary. Note that you'll have to detect and deal with
cycles in the reference graph, or else you'll end up in an infinite loop.
You might be able to get clever here, and use child objects' state IDs
instead of ther actual values in constructing the root's state ID, but
then you end up with a different version of the infinite loop problem.

tom

Lew · May 15, 2009

Seamus said:
Heavyweight also involves such factors as code and data size,
configuration headache-inducingness, and complications to deployment.
For example, if the project is a desktop application, can you ask your
users to install a database server? Can they be expected to know how to
fix it if something gets corrupted that persists across reboots?

You don't have to ask the user to install an embedded database; that's
what "embedded" means. You install Derby with the application.

(Consider how much trouble typical computer users have if a problem
requires fiddling with the Windows registry; that's probably a good
indication of how they'd cope with any DB issue without a simple

No registry fiddling required. Red herring. Derby is started by the
application within the same JVM. No DB administration required. No
pressure on the user at all. All the responsibility is on the
programmer.

push-button fix. Most likely they'd have to uninstall and reinstall to
recover, realistically. If the application is targeted at users with a
higher level of computer science education/technical skill than the lay
public, then it might not be a problem. If the application is targeted
at server admins, dealing with databases is already part of their job
description.)

Why don't you read up on Derby and rethink your arguments?

To be clear, I'm not knocking using a full-blown database. I'm noting
that there will be situations where it's the best solution, and there
will be situations where it isn't.

I think it likely that Derby or another embedded database (though
Derby is the one that already comes with Java) will be simpler than
the disk-based solution recommended upthread, by a wide margin, though
not necessarily simpler than a Map-based solution.

Seamus MacRae · May 15, 2009

Lew said:
You don't have to ask the user to install an embedded database; that's
what "embedded" means. You install Derby with the application.

That has its own problems, namely, the user using several such
applications ends up with several copies of the DBMS chewing up disk
space (not just several databases, which was not avoidable, but several
database SERVERS too).

No registry fiddling required. Red herring.

I was using the Windows registry as an analogy. Fixing a problem in the
application's database and fixing a Windows registry problem would
probably be equally hard. The registry is probably easier: at least
Windows comes with regedit, which is a GUI app, and the Windows registry
is so ubiquitous lots of web sites provide technical information about
monkeying with it. The application's database will probably not be
widely documented online, not have a regedit-like GUI to directly fiddle
with it, and won't even be easy for the user to locate on disk. Once the
user found it, to manually fix it would require feeding the database
server some SQL queries. How many Windows users do you know who you'd
credit with being able to use SQL queries?

The database will in effect have no user-serviceable parts inside. If
anything gets wacko in it, the typical user's only realistic recourse
will be the uninstall and reinstall the affected application. And then
they lose whatever the database is used to store.

It may have its uses, but it's probably overkill for just storing window
positions and other state like that, and for anything it's not overkill
for, using it means users can easily lose years of work if something
goes wrong.

System administrators can be expected to be able to recover from such a
thing, however.

Ultimately, it depends on what level of technical skill your application
is targeted at. Typical Windows users: uh-oh. Developers (say it's a
compiler or IDE): maybe OK. Sysadmins running network servers and the
like: should be fine.

Derby is started by the application within the same JVM. No DB
administration required.

Until something goes wrong.

I think it likely that Derby or another embedded database (though
Derby is the one that already comes with Java) will be simpler than

Implementation simplicity isn't the issue above.

Lew · May 16, 2009

Seamus said:
That has its own problems, namely, the user using several such
applications ends up with several copies of the DBMS chewing up disk
space (not just several databases, which was not avoidable, but several
database SERVERS too).

The data stored is the largest part of the disk overhead, which would be true
for the other disk-based solution, too, of course. Derby itself only adds 2
MB to the footprint - not much at all.

Seamus said:
I was using the Windows registry as an analogy. Fixing a problem in the
application's database and fixing a Windows registry problem would
probably be equally hard. The registry is probably easier: at least
Windows comes with regedit, which is a GUI app, and the Windows registry
is so ubiquitous lots of web sites provide technical information about

I would not tell a user to use regedit.

monkeying with it. The application's database will probably not be
widely documented online, not have a regedit-like GUI to directly fiddle
with it, and won't even be easy for the user to locate on disk. Once the
user found it, to manually fix it would require feeding the database
server some SQL queries. How many Windows users do you know who you'd
credit with being able to use SQL queries?

The users should not be doing SQL queries on an embedded application. Another
red herring.

Also, the comparison is between Derby, a simple database solution, and a
custom-written disk-based solution that has to deal with many of the same
troubles that Derby has already handled for you. In this comparison, I'd lay
odds that Derby wins.

The database will in effect have no user-serviceable parts inside. If
anything gets wacko in it, the typical user's only realistic recourse
will be the uninstall and reinstall the affected application. And then
they lose whatever the database is used to store.

Same for the disk-based solution.

It may have its uses, but it's probably overkill for just storing window
positions and other state like that, and for anything it's not overkill
for, using it means users can easily lose years of work if something
goes wrong.

Same for the disk-based solution. Only the disk-based solution has more risk
of trouble.

System administrators can be expected to be able to recover from such a
thing, however.

System administrators are not the target market here.

Ultimately, it depends on what level of technical skill your application
is targeted at. Typical Windows users: uh-oh. Developers (say it's a
compiler or IDE): maybe OK. Sysadmins running network servers and the
like: should be fine.

I'm saying "typical users".

Lew said:

Seamus said:
Until something goes wrong.

And this is superior to the custom disk-based solution how?

the problem is that a custom solution carries much risk. Bugs happen. A
tested, mature product with a small footprint that has already weathered those
storms reduces the risk.

Seamus said:
Implementation simplicity isn't the issue above.

I am saying that it is, along with maintenance simplicity and reduced risks of
bugs.

All the concerns you mentioned are valid, it's just that Derby represents a
lower risk with respect to those problems than the custom disk-based solution.
Plus, the programmer has a clean, well-documented model for data storage and
correlation with the Derby approach, and doesn't risk getting entangled in the
low-level details that a custom file-based solution would entail.

Calling Derby "full-blown" or "heavyweight" is misleading. It's actually
lighter weight and lower risk than trying to manage files oneself and still
get the functionality needed for the application. System administrators do
not get involved - everything is handled by the application. SQL queries are
not an issue for the users - everything is handled by the application.
Registry editing (yecch) is not an issue - everything is handled by the
application. The programmer's life is easier because Derby gives the
necessary functionality. Compared to low-level file manipulation that the
programmer has to re-invent, Derby is a lighter, safer, more powerful solution.

Seamus MacRae · May 16, 2009

Lew said:
The data stored is the largest part of the disk overhead, which would be
true for the other disk-based solution, too, of course. Derby itself
only adds 2 MB to the footprint - not much at all.

A two-megabyte DBMS? That'll be the day.

I would not tell a user to use regedit.

Well, you're telling application developers to design their applications
so that their support organization will have to tell customers to do
something like "Get to a C:\ prompt. Fire up DBFixulator and type in the
following SQL queries..." (expected response: "What's a sea prompt?")

The users should not be doing SQL queries on an embedded application.

Oh, now you're recommending a DBMS be used in an *embedded* application!
I don't know about that, either. Set-top boxes already suffer quite
heavily from "no user-serviceable parts inside", so it might not
actually make things any worse there.

But I thought we were discussing desktop applications, which will get
screwed up and users will want to be able to get them working again,
preferably without losing any of their work and without the mess and
time-wastage of an uninstall/reinstall loop.

Also, the comparison is between Derby, a simple database solution, and a
custom-written disk-based solution that has to deal with many of the
same troubles that Derby has already handled for you. In this
comparison, I'd lay odds that Derby wins.

Well, except that there are a few differences between the two that you
neglected to address, owing to the differing storage formats.

My idea was basically to use serialized Java objects, probably in
individual files. Likely a problem could be solved, if not by fixing,
then by deleting a particular such file and the app recreating it.

A database, on the other hand, typically takes the form of a B-tree
represented who-knows-how and living on its own dedicated disk
partition. It won't be mountable as NTFS or VFAT or whatever, and
probably won't even be visible in Explorer. The installer has to do the
semi-dangerous job of repartitioning the customer's hard drive -- hope
they keep backups. If anything goes wrong, there's no apparent way for
anyone short of an expert to get into there and make changes. The
typical user that even learns of the partition might try reformatting
it, probably with catastrophic consequences (even if they don't reformat
the wrong partition by mistake).

Same for the disk-based solution.

Somewhat, but maybe not as severely. See above.

Same for the disk-based solution. Only the disk-based solution has more
risk of trouble.

I don't know. It does have more scope for non-geek end-user
intervention, at least in that they can just click named-by-support or
somehow-suspect files and press "delete" instead of find some obscure
widget somewhere and type in "DELETE * FROM PROFILES1" or whatever.

System administrators are not the target market here.

That's what I was afraid of.

And this is superior to the custom disk-based solution how?

See above.

the problem is that a custom solution carries much risk. Bugs happen.
A tested, mature product with a small footprint that has already
weathered those storms reduces the risk.

As a side effect, you can't delete a corrupted "file" from it without
knowing "DELETE * FROM X" or similar, and also knowing where to put it
in. It's bad enough most users can't find the Documents and
Settings/Application Data directory where the files will probably be
that need deleting in the normal-disk-files case.

I am saying that it is, along with maintenance simplicity and reduced
risks of bugs.

And I'm saying that that is but one issue among several, and there's a
tradeoff among them.

Calling Derby "full-blown" or "heavyweight" is misleading. It's
actually lighter weight and lower risk than trying to manage files
oneself and still get the functionality needed for the application.

I'll believe that when I see it.

SQL queries are not an issue for the users - everything is
handled by the application.

Except fixing things the application's bugs borked up but the
application provides no UI to fix.

The programmer's life is easier because Derby gives the necessary
functionality.

Making the programmer's life easier isn't always good. In particular,
whenever it means making the end-users' lives harder, it almost always
isn't.

Compared to low-level file manipulation that the programmer has to
re-invent, Derby is a lighter, safer, more powerful solution.

Maybe. Probably at least part of the time. Probably not 100% of the time.

Lew · May 16, 2009

Seamus said:
A two-megabyte DBMS? That'll be the day.

That day is today. From the website:

Derby has a small footprint -- about 2 megabytes
for the base engine and embedded JDBC driver.

What, you thought I'd make such a specific claim out of the blue? Come on.

Well, you're telling application developers to design their applications
so that their support organization will have to tell customers to do
something like "Get to a C:\ prompt. Fire up DBFixulator and type in the
following SQL queries..." (expected response: "What's a sea prompt?")

Nonsense. I'm not telling anyone anything of the sort. Don't misstate my
point in an attempt to refute it; that just shows weakness in your points.

Oh, now you're recommending a DBMS be used in an *embedded* application!
I don't know about that, either. Set-top boxes already suffer quite
heavily from "no user-serviceable parts inside", so it might not
actually make things any worse there.

Not *that* kind of "embedded". Sheesh. Is your reasoning so weak, then?

If you'd been paying attention, you'd've seen that I'm speaking of Derby being
embedded in the Java application. Stop misstating my points; it just shows
the weakness in yours.

But I thought we were discussing desktop applications, which will get
screwed up and users will want to be able to get them working again,
preferably without losing any of their work and without the mess and
time-wastage of an uninstall/reinstall loop.

I've already discussed how that is not a problem with Derby.

Well, except that there are a few differences between the two that you
neglected to address, owing to the differing storage formats.

My idea was basically to use serialized Java objects, probably in
individual files. Likely a problem could be solved, if not by fixing,
then by deleting a particular such file and the app recreating it.

Serialization has noted weaknesses, including not fulfilling the OP's need for
assocation between objects, something that an RDBMS does innately.

A database, on the other hand, typically takes the form of a B-tree
represented who-knows-how and living on its own dedicated disk
partition. It won't be mountable as NTFS or VFAT or whatever, and

Now you're ranting. Have you even bothered to learn what Derby is?

There's no separate installation. There's no separate partition. There's
none of the nonsense that you're spouting.

probably won't even be visible in Explorer. The installer has to do the
semi-dangerous job of repartitioning the customer's hard drive -- hope

Nope. It's all contained in the application.

they keep backups. If anything goes wrong, there's no apparent way for
anyone short of an expert to get into there and make changes. The
typical user that even learns of the partition might try reformatting
it, probably with catastrophic consequences (even if they don't reformat
the wrong partition by mistake).

What are you on about?

Somewhat, but maybe not as severely. See above.

The above was incoherent and irrelevant rambling.

I don't know. It does have more scope for non-geek end-user
intervention, at least in that they can just click named-by-support or
somehow-suspect files and press "delete" instead of find some obscure
widget somewhere and type in "DELETE * FROM PROFILES1" or whatever.

Again, as stated, this is only something that the application deals with. The
user, in an embedded-database application (NOT a set-top box, silly), never
deals with this.

That's what I was afraid of.

Wha...? You were the one who said it was bad that an application would need a
sysadmin. I'm only pointing out that your concern is addressed with the
solution I propose.

See above.

The "above" didn't address how embedded Derby works, and therefore does not
address the point.

As a side effect, you can't delete a corrupted "file" from it without
knowing "DELETE * FROM X" or similar, and also knowing where to put it
in. It's bad enough most users can't find the Documents and
Settings/Application Data directory where the files will probably be
that need deleting in the normal-disk-files case.

And I'm saying that that is but one issue among several, and there's a
tradeoff among them.

Tradeoffs that you have yet to mention.

I'll believe that when I see it.

So why don't you take a look, then?

Except fixing things the application's bugs borked up but the
application provides no UI to fix.

Again, a disk-based solution is not necessarily superior in this respect, and
arguably it's easier to clean up with a robust, tested proven product like
Derby than with a custom hack.

Making the programmer's life easier isn't always good. In particular,
whenever it means making the end-users' lives harder, it almost always
isn't.

Except that it doesn't make the end user's life harder, it makes it easier, by
reducing the risk of bugs.

Maybe. Probably at least part of the time. Probably not 100% of the time.

At last we agree.

Seamus MacRae · May 16, 2009

Lew said:
That day is today. From the website:

Didn't your mother tell you not to believe everything you read on the
web?

Nonsense. I'm not telling anyone anything of the sort.

Well, not in so many words, but it's an implication that follows
naturally from the predictable sequence of events:

1. Programmer uses Derby.
2. Program winds up containing a bug.
3. At a customer deployment, program triggers bug.
4. Database gets b0rked.
5. Customer finds program stopped working properly.
6. Customer finds quitting and restarting it doesn't fix it.
7. Customer calls support...

I'm not sure which of the above you'd argue is implausible. 1 is your
own advice. 2 is pretty much inevitable, like it or not, as is 3 given
that 2 occurred. 4 is dependent on the nature of the bug, but it doesn't
seem implausible. 5 follows from 2. 6 follows from 5, 4, and the
database being nonvolatile. 7 is inevitable given 5 and 6.

Not *that* kind of "embedded". Sheesh. Is your reasoning so weak, then?

There's no call for personal attacks here, especially when you just
admonished a bunch of comp.lang.lispers for exactly the same behavior.

You said, and I quote, "an embedded application". That pretty
unambiguously means the application runs in a dedicated appliance like a
set-top box, to most programmers.

If you'd been paying attention, you'd've seen that I'm speaking of Derby
being embedded in the Java application.

That would be an "embedded database" in an application that may, or may
not, itself be embedded, rather than an "embedded application".

I've already discussed how that is not a problem with Derby.

You've asserted, not discussed, this implausible claim. An actual
argument to support it would be far more interesting than another random
personal attacks.

So tell me: Why do you think the DB would be bulletproof, uncorruptable
even by bugs in the client code?

Serialization has noted weaknesses, including not fulfilling the OP's
need for assocation between objects, something that an RDBMS does innately.

What sort of association? If he just meant objects referencing other
objects would go together, Java serialization does that already.

Now you're ranting.

No, I'm calmly stating common-place and well-known facts about databases.

Ranting would be:

ARE YOU OUT OF YOUR COTTON-PICKING MIND?! HOW IS THE END USER SUPPOSED
TO FIX ANYTHING NOW? OH, NOW YOU'VE ***REALLY*** DONE IT!!!!1!1!one

There's no separate installation. There's no separate partition.

Apparently there's no actual database in this DBMS, then. How interesting.

Nope. It's all contained in the application.

Well, if the developer wants to go that route and have a truly
standalone binary (that repartitions the host computer's hard drive when
first run), I suppose they could do that instead.

What are you on about?

The risks and headaches involved in repartitioning a hard disk. Haven't
you been paying attention?

The above was incoherent and irrelevant rambling.

Yes, it was; please try to be a bit more focused the next time you post.
Heavier on the rational argumentation and lighter on the personal jabs,
particularly, if you please.

Again, as stated, this is only something that the application deals
with.

Until something goes wrong.

Unless you honestly intend to make the outlandish claim that nothing
will ever, ever go wrong.

But try telling that to anyone who's ever had to muck about with the
Windows registry, Firefox profiles, or pretty much anything else of that
nature, based on instructions off some web site or read to them over the
phone by tech support.

Wha...? You were the one who said it was bad that an application would
need a sysadmin.

No, I said that if an application's target audience is sysadmins, it can
get away with having a much more cryptic user interface and
harder-to-fix problems that need more technical monkeying to correct
than if it's target audience includes Uncle Bob and Aunt Mathilda.

(Examples: vi's user interface; .htaccess files for Apache)

Intermediate is the case where some features are only of interest to
power users, or the whole program is.

(Examples: quake config files for power-gamers to tweak; POV-Ray scene
files)

When the software is widely used by ordinary folk, there are problems
when it isn't easy to fix things that go wrong.

(Example without problems: Notepad. It just works. And if it ever
somehow fucks up one of your text files, you can try opening it in
Wordpad or Editpad or whatever and they will read it; text is text.)

(Example with problems: The Windows registry. If that gets scrozzled, it
typically means a long phone call to Microsoft tech support.)

The "above" didn't address how embedded Derby works, and therefore does
not address the point.

Sure it did. You are the one who didn't address the point. When you
aren't changing the subject to ease of implementation or my lack of
specific knowledge about Derby, you're claiming the application
magically takes care of everything.

If that were possible, wouldn't Microsoft have made Windows able to
magically take care of the registry so that users never had to deal with
it, ever?

"Nothing will go wrong", or "It will all work out somehow -- trust me",
when asked how the user is to fix things when they go wrong, is not
addressing the point. It is a cop-out.

Tradeoffs that you have yet to mention.

I've mentioned it quite plainly. But I'll spell it out once again:

Database: easier implementation, harder end-user servicing if it gets
corrupted or otherwise screwed up.
Normal disk files: harder implementation, easier end-user servicing.

Your only response to that, besides lying by saying I never mentioned
it, has been the incredibly dubious assertion that there will never be a
need for end-user servicing. Maybe if it's a "hello, world" program. In
which case a database, however "embedded", is way overkill.

So why don't you take a look, then?

Sorry -- don't have the time tonight to download fifty megs or so of
whosits and whatsits. Too many plates to keep spinning. Maybe tomorrow.

Again, a disk-based solution is not necessarily superior in this
respect, and arguably it's easier to clean up with a robust, tested
proven product like Derby than with a custom hack.

Easier for a technician to clean up, I'll grant you.

Easy for a user who can maybe find and delete a file by name in Explorer
but wouldn't know SQL from the things harvesting nuts outside his window
-- maybe not so much.

(Although the hardest hit will probably be the poor, the downtrodden,
the rank-and-file technical support peons in their awful little
call-center cubicles.)

Except that it doesn't make the end user's life harder, it makes it
easier, by reducing the risk of bugs.

If we accept your claim, there'd be fewer bugs, but harder for the user
to recover from. It all boils down to how many fewer, and how much
harder, and which ends up outweighing which, doesn't it? Which probably
depends on the particular application, its nature and its user-base's
technical sophistication particularly. Which was my contention all along.

At last we agree.

That was my opinion from the outset. Are you now telling me you've been
violently *agreeing* with me the entire time?

How to implement simple DB persistence	9	Aug 15, 2010
weakrefs, threads,, and object ids	1	Jun 14, 2009
Help with passing test	3	Jun 8, 2023
Issues in generating unique time id using virtual memory address	9	Jun 10, 2012
Object Persistence for a MUD	5	Oct 5, 2008
Unique IDs not yielded by INamingContainer	0	Apr 7, 2008
Fatal error: Uncaught Error: Cannot use object of type WP_Error as array in	0	Dec 23, 2021
Object persistence in C	11	Jun 29, 2005

Issues with unique object IDs in persistence

Jarrick Chagma

Stefan Ram

Lew

Jarrick Chagma

Stefan Ram

Seamus MacRae

Jarrick Chagma

Lew

Lew

Seamus MacRae

Lew

Seamus MacRae

Roedy Green

Tom Anderson

Lew

Seamus MacRae

Lew

Seamus MacRae

Lew

Seamus MacRae

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads