See if you can find the thread where we dug up and interpreted the
implementation of identityHashCode - it's almost exactly as you've
described. Except that numbers are only assigned when an object's identity
has is first asked for, not at allocation, thus avoiding the work of
assigning numbers to objects which will never need them.
This would be compatible with the above, as well, so long as
synchronizing on an object together with other objects counted as
asking for it, and a 64-bit globally-unique integer was used under the
hood (while the System.identityHashCode() method truncated it to 32
bits).
The trouble with that is when you're dealing with mutable objects: a
change in state may mean a change in hashCode order, which means that two
attempts to lock the same pair of objects at different times may go in
different orders, which could lead to deadlock.
These are objects being synchronized on. If one's being mutated while
a synchronized (x, y, z...) has a lock on it and is still trying to
get another lock, there's already a thread-safety bug somewhere else
and fixing that fixes the potential deadlock. If on the other hand the
multi-synch already has all of its locks, that multi-synch can't cause
a deadlock, and if it has none, the order rearrangement has no effect.
It can only cause a deadlock if one thread locks one of some pair of
objects, their order changes, and then the other thread locks the
other of the pair, then they each try to get the remaining locks.
There may still be a risk, if one object isn't yet locked by either
but its hashCode changes.
On the other hand, there are already problems with having mutable
objects whose hashCode changes (for example, if it's actually used as
a key in a hash table anywhere, it gets lost). IMO hashCode,
compareTo, and equals should only be overridden on immutable objects.
(The JRE classes violate that, notably the collection classes,
unfortunately; it would have been interesting if they used Object's
equals and hashCode except for a set of unmodifiable versions,
distinct from the current wrappers in that they were immutable copies
rather than read-only views of a possibly-still-changing collection.)
There's also a consideration with long-running systems eventually
going through the entire pool of numbers (though it might take an
astronomically long time in the 64-bit case). Two solutions to *that*:
* Use 64 bits no matter what, and pray to one of your cultures'
deities.
No - if we're going to come up with [suspected insult deleted] schemes
None of the nasty things that you have said or implied about me are at
all true.
Another option might be to have some kind of monotonically increasing
number - something like time, although it only has to increase when things
get moved - and use that in the high bits, so that objects allocated in
the same place at different times don't get the same number.
That'll work. At least until January 19 2038, 03:14:07 UTC.
(Actually, it suffices that the bits used don't recycle for a very
long time, rather than that they don't at all. If the program runs
across one wrap-around, but never until the number represented by the
used bits actually returns to a value that previously occurred DURING
THE RUN, the wrap-around is not a problem. The used bits are likely to
be fewer than 32, so wrapping around sooner than 2038.)
In fact, if you just the the offset of the object in the block in the low
bits, then this works as long as every block into which things are
alllocated gets a unique prefix for the high bits. So, you have a global
counter, and every time you designate a block as a nursery, you use that
counter to issue it with a prefix.
This is similar to the notion I had of assigning a range to each
nursery.
As long as pages are bigger than objects, this is going to lead to
fragmentation - you end up keeping a whole page in core just for one
object.
Objects might be allocated within a single page that are likely to
have similar lifespans. Simply allocating objects created sequentially
by the same thread adjacent to one another may have that effect. But
actually I was hypothesizing an OS that had some more sophisticated
notion of memory structure than pages. I doubt such exists yet on your
planet.
Normal speed efficiency can be restored by using integers of machine-
native size up until -1 (all Fs in hex; the "maximum" before it wraps
to numbers already used) and then layering on an alternative scheme
for all subsequent objects (which are marked by all having -1 in the
"object number" field). Ternary strings is one such alternative
scheme. So is a second integer field, used until it becomes -1, then a
third, and so forth, or a zero-terminated string of bytes or words
used as a bignum, or something. (A bignum that only has to support
order-comparison and addition, simplifying things somewhat.)
I am filing this under very clever but [implied insult deleted] idea.
None of the nasty things that you have said or implied about me are at
all true.
Exactly. I think the idea of allocating persistent, unique-for-all-time
identity numbers to objects isn't necessary - as long as you can get the
GC and locking subsystems to cooperate. Bear in mind that it's doesn't
matter in the slightest if we lock a pair of objects in a different order
at different times, as long as we never do it at overlapping times. If we
do lock a-then-b, unlock both, then lock b-then-a, we're fine.
That was what I was thinking.
Risky - a contended lock could hold up movement, which sounds like bad
news.
A frequently-locked object that lives a long time is a good candidate
for tenuring. G1 can work around this by simply using blocks
containing such objects for tenuring *other* objects, and emptying
*other* blocks to recycle into nurseries. It won't play as nice with
the current GC of course.
I thought so too.
This reminds me of Richard Gabriel's PC losering story, and the
winning solution adopted by unix:
http://www.jwz.org/doc/worse-is-better.html
My solution would be to be a bit more MIT, and say that (a) multi-locks
are acquired in allocation order and (b) if objects which are currently
multi-locked (or perhaps just locked at all, for simplicity) are moved,
they must maintain their allocation order with respect to other locked
objects. If we arrange things so that memory is divided into zones with
known non-overlapping ranges of the allocation order (which corresponds
nicely to generations), then that just means taking care when moving
objects from one zone to another, or within a zone, to keep locked ones in
the same order in memory. This avoids the need for identity numbers at
all.
It might require some tricky juggling by the GC though.
Although i don't know what happens if a multi-lock is underway during a
move; i worry that two threads could end up with different ideas about the
order, but i *think* that as long as they pay attention to when moves
occur, and adjust their locking sequence accordingly, it's okay.
Making all moves, regardless of locking, order-preserving would avoid any
problems. There's even a fighting chance that this is already what
happens.
I doubt it. What happens if two objects are allocated in the order A,
then B, and then B (the slightly-newer) is tenured by the GC? Tenured
are presumably assumed to be older than eden objects, so B now looks
older than A, and their order has reversed. Oops.
This could happen; B might be reached first in a traversal of the
object graph and so tenured first, though A is bound to be tenured
soon as well in this case. A narrow window of opportunity for a
deadlock nonetheless arises.
Elsewhere in this thread, a different problem with these schemes was
pointed out: the risk that a lock was already acquired earlier, and
out of order. But it will always be the case that separate, nested
synchronized statements create a risk of deadlock. One thought is to
try to use multisynch around such sites, rather and inside them, to
remove deadlocks. On the other hand, that doesn't work for cases like
List.equals(). If the equals is in a bigger synchronized block that
has to do some reads and modifies of the list atomically, there's a
problem.
Perhaps the long-term solution is to make a collections framework that
supports a notion of "transactions". Some of the java.util.collections
classes seem to be steps in that direction.
Regardless, it goes to show that there is no magic bullet for
concurrency correctness. Slapping synchronizeds on everything
certainly doesn't work except in the most trivial cases, and slapping
multi-synchronizeds willy-nilly is not much better. Thinking about the
objects that are shared and how they are used and in what order is
important. I worked a year or two ago on a network app that had large
pools of threads that had to access a handful of shared collections --
some unique sets and maps and a priority queue, I believe, and some
non-unique (lots of same-role) objects -- and the solution there was
to lock these separately and in a fixed order, the non-unique objects
last, and only one non-unique object at a time. Fortunately, the non-
unique objects never needed to be locked other than individually and
separately; no operations were performed atomically on pairs or larger
sets of them. Eclipse's thread monitor proved invaluable for
identifying the causes of the deadlocks that did occur during
development, when developers got the locking order wrong. (So did
searching for the word "synchronized" in each commit, going backwards
in time, sometimes. Invariably the deadlocks occurred when the locks
were acquired in separate methods, but one got called from another, so
the fact that the critical sections were nested was not immediately
obvious, though. Eventually we started using method javadocs to record
what locks were typically held, and what locks must not be held, going
into each method. Sometimes significant redesign and/or refactoring
was needed to make a lock allowable going into a method that formerly
wasn't. The order of locking for some pairs of resources got reversed
repeatedly during redesigns until eventually everything worked.)
In general, I think a design like the above will usually be
applicable. If groups of non-unique-role objects sometimes need atomic
operations performed on them, but always PARTICULAR groups, those
groups can have locks of their own that are acquired before locking
more than one of their individual members (the "napkin" metaphor from
elsethread). Another option is to go the Swing route of having some
complex set of objects handled on a dedicated thread, with atomic jobs
(similar to Swing event handlers) handed to this thread to be run in
some sequence or another (similarly to SwingUtilities.invokeLater()).
Multisynch would be a handy additional tool in the kit for cases where
some sets of objects needing to be acted on atomically were
essentially arbitrary, but it could never be a substitute for having a
sound design, though it could simplify such a design.